Setting Up Failover Clustering for Hyper-V (Part 1)

If you would like to read the other parts of this article series please go to:

Introduction

Over the last several years, server virtualization technology has completely transformed IT. In spite of all that this remarkable technology has accomplished, there has always been one major disadvantage to server virtualization. To put it bluntly, virtualized environments raise the stakes on hardware failure.

Think about it for a moment. In a non-virtualized environment, if a server experienced a catastrophic hardware failure it would most likely be an inconvenience. Sure, it would be important to promptly fix the problem, but the failure of a single server probably wouldn’t bring an organization’s business to a grinding halt. Even if the failed server was running a mission critical application, the end users would at least be able to access other network resources such as E-mail, file shares, and other applications while the problem is being fixed.

Virtual servers play by different rules though. Multiple virtual servers typically reside on a single host machine. As such, if the host were to experience a hardware failure then the end result would be equivalent to the failure of multiple machines in a non-virtualized environment.

Of course the perils of hosting multiple VMs on a single server can sometimes come into play even if a hardware failure has not occurred. For example, I recently heard of someone who was running twelve virtual machines on a single host server. The administrator decided that the host server needed more memory, but discovered that scheduling a time in which all twelve virtual machines could be taken off line at the same time so that the necessary memory could be installed was a major issue.

Thankfully, Microsoft has designed Windows Server 2008 R2 in a way that allows Hyper-V to reside within a failover cluster. Failover clustering for Hyper-V can help to address all of the issues that I have described so far.

Best of all, a new feature called Live Migration makes it possible to move virtual machines between cluster nodes without taking the virtual machine offline. That way, if you need to take a host server offline for maintenance, you can simply move all of the virtual machines to a different cluster node and perform the necessary maintenance without having to schedule any down time.

Even though Windows Server 2008 R2 has been around for a while, I have to confess that until recently I had never gotten the chance to experiment with failover clustering for Hyper-V or with the Live Migration feature. Recently though, one of my clients asked me to implement failover clustering for them so that they could take advantage of Live Migration. In doing so, I discovered that even though the setup and configuration process is relatively straightforward, there are a few gotchas. I also found that there are several tutorials on the Internet that are either inaccurate or incomplete. As such, I wanted to write this article series as a way of providing the IT community with an easy to follow guide to the configuration process.

Hardware Planning

Before you can even begin building a failover cluster, you must ensure that you have the proper hardware in place. One thing that I need to point out up front is that Microsoft will not support Hyper-V failover clustering unless all of the hardware that you use has been certified for use with Windows Server 2008 R2. This doesn’t mean that you can’t use non certified hardware. When I first set out to build a Hyper-V failover cluster in a lab environment, I used low budget commodity hardware that was most certainly not certified for use with Windows Server 2008 R2, and my cluster worked fine. However, I would never condone the use of such hardware in a production environment because doing so would cause your cluster to be in an unsupported state.

Keep in mind that it isn’t enough to simply purchase servers that have been certified for use with Windows Server 2008 R2. The Microsoft Web site specifically says that “Microsoft supports a failover cluster solution only if all the hardware features are marked as Certified for Windows Server 2008 R2”. The site goes on to indicate that even the network adapters that you use must be Windows Server 2008 certified. As such, you could inadvertently throw your cluster into an unsupported state unless you are careful to ensure that you are purchasing only certified hardware. Microsoft provides a list of certified hardware here.

With that in mind, the servers that you will use as cluster nodes don’t have to be exact matches, but they should have similar capabilities. At a minimum, your servers must have 64-bit processors and support hardware assisted virtualization. Additionally, you should use the same processor architecture for all of your cluster nodes (all Intel or all AMD). The servers that you use must also support Data Execution Prevention via either the Intel XD bit or the AMD NX bit.

The network requirements for your cluster nodes will vary depending on the type of storage that you are using for your cluster. In this article series, I will be showing you how to use iSCSI storage, but you also have the option of connecting your cluster nodes to a SAN.

My general recommendation would be to provision each cluster node with three NICS (or two NICs and a Fibre Channel card is you are using Fibre Channel based storage). One of these NICs is used for communications between the server (and the virtual machines residing on it) and the rest of the network. The second NIC is reserved for heartbeat communications between the cluster nodes. The third NIC is dedicated to iSCSI communications with the shared storage volume.

In some cases, you may find that your servers simply cannot accommodate three NICs. For example, the hardware that I used for my lab deployment had one built in NIC, and had one expansion slot, which I was able to use to install a second NIC. This type of configuration is less than ideal, but if you are forced into using it then I recommend using one NIC for network access and using the other NIC for heartbeat communications and iSCSI traffic.

It is also possible that your servers may have room for additional NICs. Additional NICs can be useful in this type of environment. For instance, you might dedicate one of the additional NICs to managing the physical server (placing virtual server traffic on a separate NIC). Likewise, you may be able to dedicate a separate NIC to each virtual machine. Another approach is to use spare NICs to provide a degree of redundancy. I am not going to be covering the use of more than three NICs per server in this article, but you should at least be aware that you can use additional NICs should they be available to you.

Conclusion

Now that I have talked about some basic hardware requirements, it is time to start the construction of our failover cluster. In Part 2 of this article series, I will show you a method for setting up shared storage between the cluster nodes.

If you would like to read the other parts of this article series please go to:

Setting Up Failover Clustering for Hyper-V (Part 2)
Setting Up Failover Clustering for Hyper-V (Part 3)
Setting Up Failover Clustering for Hyper-V (Part 4)
Setting Up Failover Clustering for Hyper-V (Part 5)
Setting Up Failover Clustering for Hyper-V (Part 6)
Setting Up Failover Clustering for Hyper-V (Part 7)
Setting Up Failover Clustering for Hyper-V (Part 8)
Setting Up Failover Clustering for Hyper-V (Part 9)