X

Under the hood: Hyper-V shutdown registry settings

Under the hood

Poking around in the registry on Windows computers can be risky, but messing with the registry of a Windows Server system can bring your whole network to a stop. Still, there are times when it can be useful to make changes directly to the registry on a server, for example, to adjust a setting that isn’t exposed in the GUI or turn on some feature that is hidden by default. This article details a few interesting registry settings associated with Hyper-V shutdown on hosts running Windows Server 2012 and Windows Server 2012 R2. These registry settings arose in discussions with colleagues about real-life situations they encountered as they either administered their own Hyper-V environments or provided help as consultants for customers who were using Hyper-V for their organizations. Hopefully, some of the stories here may be of help in your own situations involving Hyper-V. As usual, the protagonist in these two somewhat fictionalized stories is named Bob.

No clean shutdown of virtual machines

Bob was getting confused. He had configured his Generation 1 virtual machines with “Shut down the guest operating system” as the automatic stop action for each virtual machine, but for some reason, his virtual machines sometimes failed to shut down cleanly when he shut down the Windows Server 2012 R2 host they were running on. This became evident when he restarted the host and found that the virtual machines needed to restart from scratch.

Investigating this situation further, Bob discovered the following event in the System event log on the host:

“The system has rebooted without cleanly shutting down first. This error could be caused if the system stopped responding, crashed, or lost power unexpectedly.”

Bob talked with a system engineer at Microsoft about this and found out that under the hood Hyper-V monitors virtual machines as they are shutting down, and if they take too long to shut down properly, Hyper-V steps in and pulls the plug on them, i.e. turns them off. The reason Hyper-V does this sort of thing is to prevent a single virtual machine from preventing the host itself from shutting down should it need to do so or be asked to do so by the administrator.

How long does Hyper-V wait for a virtual machine to shut down properly before pulling the plug on it? The default setting for this is 120 seconds and it’s stored in the following registry setting on the host:

HKLM\Software\Microsoft\Windows NT\CurrentVersion\Virtualization\ShutdownTimeout

What Bob ended up doing then is increasing the ShutdownTimeout on his host from 120 to 240 seconds to give the virtual machines more time to complete the Hyper-V shutdown process so they could successfully shut down before the host began its shutdown process. This was more of a workaround than a solution, however, as it failed to identify the reason why the virtual machines were taking so long to shut down when a shutdown command was issued to the host. After some further investigation using Performance Monitor, Bob determined that the problem had to do with the storage subsystem on the host, which was acting as a bottleneck because it was overwhelmed by I/O writes associated with the virtual machines all shutting down concurrently. After upgrading the storage subsystem on the host to provide better performance, Bob was able to restore the setting ShutdownTimeout to its original value and no further problem occurred.

By the way, changes to the above registry setting will only take effect if you reboot the host.

TIP: Altaro has a helpful article on how to use Group Policy to manage the ShutdownTimeout on your Hyper-V hosts.

Hyper-V shutdown redux for clustered hosts

Sometime afterward, Bob faced a similar situation involving a Hyper-V host cluster where all of the nodes were running Windows Server 2012 R2. The problem was that he needed to shut down one of the cluster nodes for maintenance and was using Live Migration to migrate the virtual machines off of the node before shutting the node down. What he discovered in doing this was that the Live Migration wouldn’t complete before the node started shutting down.

Looking under the hood some more, Bob discovered the following registry setting on each node of the host cluster:

HKLM\Cluster\ShutdownTimeoutInMinutes

By default, the value of this setting was 25 minutes on each of his machines, which sounded like a lot of time but apparently wasn’t long enough for the Live Migration of the virtual machines on the node to finish. Bob wondered why this particular value of 25 minutes was present and found his answer in this MSDN article which explains that the default value for the ShutdownTimeoutInMinutes setting is determined by dividing the physical RAM in GB by 64 and multiplying by 100.

Bob proceeded to double the value of ShutdownTimeoutInMinutes on each node to 50 minutes. After doing this, the Live Migrations were able to be completed with no issues and the host that needed maintenance work could be cleanly shut down and serviced. Once again the change to this registry setting only took effect after Bob had rebooted the nodes in the cluster.

As an aside to Bob’s story, one might ask why didn’t he save the state of the virtual machines instead of shutting them down? The answer interestingly enough is that saving the state of a virtual machine often takes longer than shutting the virtual machine down. For example, if you have a host cluster with each node having 128GB of RAM and the virtual machine files are stored on a SAN, then saving the virtual machines means lots of file copy operations happening over your storage network which can end up taking a significant amount of time. But if you simply shut down all of the virtual machines, then this will happen mostly in parallel and the only information that will be written to storage is the virtual machine state information that hasn’t already been committed to storage, and that process usually won’t take very much time to complete. And, of course, there can be problems if you try to save the state of a virtual machine that is functioning as a domain controller as this second article from Altaro explains.