Back in the days of Windows XP, Microsoft had some ambitious plans for the technologies that they wanted to build into their next new desktop operating system, which turned out to be Vista. Among the things that Microsoft was considering was a brand-new file system in which files would all be encapsulated within some sort of database-like structure. Ultimately, this particular feature never made it into Windows, but at the time I wrote some editorial pieces in which I said that I thought that the new file system was a bad idea, because the encapsulation layer could easily become a single point of failure. If the encapsulation were to become corrupted, then the entire file structure could conceivably be lost as a result.
Fast-forward a couple of decades, and the idea of encapsulation has been proven to be reliable. SharePoint document libraries, for example, store files in SQL Server blob storage. Likewise, Hyper-V virtual machines store files in virtual hard disks, which is essentially an encapsulated file system. In spite of the use of encapsulation, Hyper-V virtual hard disks are so reliable that they are used on countless production systems. Even so, things can sometimes go wrong. I have seen several instances of virtual hard disk corruption over the years, especially on virtual machines that rely on checkpoint chains. When this happens, you could restore a backup if you have one, but sometimes you can just as easily fix the corrupt virtual hard disk, and here’s how.
Why does corruption happen?
Before I show you how to fix a corrupt virtual hard disk, you may be wondering how corruption occurs in the first place. Sometimes corruption can be caused by faulty hardware. I once had an external RAID appliance that corrupted quite a bit of data as a result of a bad cable. In the case of Hyper-V virtual hard disks, however, corruption often occurs as a result of a broken disk chain.
If a virtual machine is not shielded and is not running, the Windows operating system will allow you to mount the virtual hard disk and access its contents from the parent operating system. I use this technique all the time when setting up lab environments. If checkpoints exist for a virtual hard disk, however, then mounting the virtual hard disk from outside of Hyper-V will break the checkpoint chain. This holds true even if you do not make any changes to the virtual hard disk’s contents. The next time that you try to start the virtual machine, you will receive an error like the one shown below.
Fixing a corrupt virtual hard disk
When it comes to testing virtual hard disks, conventional wisdom often dictates using the Test-VHD cmdlet within PowerShell. In my own experience, however, I have found that the Test-VHD cmdlet really isn’t all that helpful. The cmdlet returns a value of True if the virtual hard disk is deemed to be healthy, and false if the virtual hard disk is not healthy. The problem is that when a virtual hard disk checkpoint chain is broken as a result of mounting a virtual hard disk that contains checkpoints, then the Test-VHD cmdlet reports a status of True even though the virtual hard disk chain is corrupt. To see what I mean, check out the figure below. The Test-VHD cmdlet is returning a status of True even though the error message specifically says that “the chain of virtual hard disks is corrupt.”
The key to fixing this problem is displayed toward the end of the error message shown in the figure above. The error states that there is a mismatch in the identifiers of the parent virtual hard disk and the differencing disk. This means that if you can fix the mismatch between disk identifiers, then it may be possible to relink the virtual hard disk chain, thereby eliminating the corruption.
The figure below shows the Hyper-V Manager, and the virtual machine that is having problems is currently selected. Even though the Hyper-V Manager continues to show the virtual machine’s full checkpoint tree, the checkpoint chain is broken at the disk level.
As you continue to look at the figure above, you will notice that the Actions pane contains an Inspect Disk option. Clicking on Inspect Disk causes Windows to display a dialog box prompting you to select the virtual hard disk that you want to inspect. If I were to select the Client.VHDX file, then the resulting window, which you can see below, makes it seem as though the virtual hard disk is completely healthy. Indeed, if I were to mount the virtual hard disk at the host OS level, then I can browse its contents without issue.
The root virtual hard disk, in this case, does seem to be healthy, and the problem obviously stems from the linkage between the root virtual hard disk and the first checkpoint. Because the root virtual hard disk seems to be unaware of the checkpoint tree, let’s take a look at what happens if you click on the Hyper-V Manager’s Inspect Disk option one more time, and this time choose a AVHDX file (a differencing disk associated with a checkpoint) as opposed to the root VHDX file.
This time, the Hyper-V Manager displays an error stating that the virtual hard disk chain is broken. As you can see in the figure below, however, I am also given a button that can be used to reconnect the disk chain.
Clicking the Reconnect button launches the Edit Virtual Hard Disk Wizard, which prompts you to select the parent virtual hard disk file. Depending on how the corruption occurred, you may have to select the Ignore ID Mismatch checkbox, shown in the figure below. In my case, the error message specifically mentioned an ID mismatch, so selecting the checkbox was necessary. Upon doing so, the virtual machine started without issue, and our problem with a corrupt virtual hard disk was fixed.
Obviously, it’s best to try to prevent disk corruption from occurring in Hyper-V environments, but if something goes wrong, you may be able to fix the corrupt virtual hard disk by using native tools. As a best practice, however, you should avoid directly mounting any virtual hard disk for which checkpoints exist.