An Enhanced vMotion Compatibility Primer
You may have noticed that, every so often, Intel and AMD release new processors. Each new generation of processors differs from earlier ones through the inclusion of architectural changes and new processor features. As operating systems are deployed to servers that use these new processors, those operating systems make use of the various processor features. In the traditional all-physical server world, this isn’t that critical. You install your operating system and move on. If, however, you’re attempting to mix and match VMware ESX/ESXi host servers with processors spanning generations in a single cluster, problems can arise.
For example, suppose you’ve got a cluster of three Intel Penryn-based VMware hosts and you add a server with an Intel Nehalem-based processor. Unless you take special steps, you’ll be unable to use vMotion to move virtual machines to hosts that have incompatible features. This can be a major hindrance when it comes to ensuring high availability since some virtual machines will not be able to be migrated to some hosts.
In order to enable a mixed-processor VMware cluster to support vMotion to any and all hosts, the hosts must be configured to some kind of least common denominator. This is the job of Enhanced vMotion Compatibility (EVC), which allows you to migrate running virtual machines between different generations of processors. Once you’ve enabled the EVC feature in an ESX cluster, each host is configured to present only the CPU features of the selected processor type. By enabling EVC, you ensure that there is CPU compatibility for vMotion even if the actual processor generations differ from host to host. For each virtual machine running in the cluster, identical CPU features are exposed to virtual machines. This is the process that really enables vMotion.
Why is this so important? Let’s get a little more specific. Imagine this scenario: You have a three host cluster made up of all Nehalem-based CPUs. All virtual machines running in this cluster see that the CPU is capable of using the CPU’s AES feature which, according to Wikipedia, has the following function:
“Advanced Encryption Standard (AES) Instruction Set is an extension to the x86 instruction set architecture for microprocessors from Intel and AMD proposed by Intel in March 2008. The purpose of the instruction set is to improve the speed of applications performing encryption and decryption using the Advanced Encryption Standard (AES), similar to the PadLock engine found in current processors from VIA Technologies.”
Now, suppose you decide to add an additional server you have on hand to the cluster but this new server has Penryn-based processors. If you were allowed to vMotion a VM running on the Nehalem systems to the Penryn system, all of a sudden, running virtual machines would immediately see a feature simply disappear. From a stability perspective, this would not do good things. As such, there needs to be some mechanism to level the playing field – EVC.
By using EVC, you can phase in new hosts rather than having to replace them all at once in order to maintain architectural consistency. Throughout the process, you can maintain your existing availability mechanisms, such as vMotion, since processor features appear to virtual machines to be consistent between all hosts.
One thing that EVC does not do is enable you to vMotion virtual machines between AMD and Intel processors. At the very least, you must make sure that all of your ESX hosts are using CPUs from the same manufacturer. Besides, vCenter won’t allow you to try to mix and match processor vendors, anyway.
Let’s take a graphical look at what EVC can accomplish for you. In Figure 1 below, here’s what you’re seeing:
- In the top picture is a three server cluster with two Intel Core2-based servers and one Core i7-based server. Because the lowest common denominators are the Xeon Core2 systems, this cluster operates in Xeon Core2 mode so that vMotion will work between all three hosts. For virtual machines running in the cluster, EVC basically blocks the Core i7-only features from being exposed to virtual machines.
- In the middle picture, one of the remaining Core2 servers has been replaced with a Core i7 unit. However, because there is still a Xeon Core2 server in the cluster, the cluster still cannot make use of the more advanced i7 processor features if you want vMotion compatibility across all three hosts.
- In the third picture, the last remaining Core2 host has been replaced with an i7-based host, so the cluster’s EVC status can now be upgraded to Core i7 status since that is the newest lowest common denominator. Once you’ve replaced all of the hosts, you can raise the cluster’s EVC mode. However, you must first power off and then power on each of the virtual machines in the cluster before they will be able to see any new CPU features made available by raising the EVC mode. A reboot of the virtual machine is not sufficient since CPU features are determined at the time a virtual machine is powered on.
Figure 1: How EVC operates
To access the EVC mode of your cluster, open vCenter and right-click your cluster. From the shortcut menu, choose Edit Settings. This opens the window that you see in Figure 2 below. You’ll note that this cluster is already configured for Intel Xeon 45nm Core 2 EVC mode which allows any processor with the features from that processor line admission into the cluster, including Penryn, Nehalem and Westmere processors.
Figure 2: This cluster is configured for Core 2 mode
By clicking the Change EVC Mode button, you can opt to change the baseline processor feature set that is allowed admission into the cluster. Or, you can choose to disable EVC altogether, the results of which are shown in Figure 3. Note that re-enabling EVC after disabling it might require that you restart virtual machines.
Figure 3: Disabling EVC
On the main EVC page, you also saw a button labeled Current CPUID Details. If you click that button, you get a screen like the one in Figure 4. If you’d like to learn more about the CPUID, here’s a good article on the subject. There was a day in VMware-land in which you had to create processor masks to hide features from virtual machines and you’d have to understand the CPUID to do it. EVC makes this process a whole lot easier.
Figure 4: CPUID details
Now, getting back to changing the EVC mode, in vSphere 4.1, you’ll have four Intel options from which to choose, if your host is running Intel processors. Remember, you have to choose the lowest common denominator in order to allow all hosts in the cluster to interoperate. From VMware documentation, here is a look at the features available for each kind of processor:
Intel EVC modes expose the following features:
- Intel Xeon Core2. All features of Intel Core2 CPUs.
- Intel Xeon 45nm Core2. All features of Intel Core2 CPUs and additional CPU features including SSE4.1
- Intel Xeon Core i7. All features of Intel Core2 CPUs and additional CPU features including SSE4.2 and POPCOUNT.
- Intel Xeon 32nm Core i7. Applies baseline feature set of Intel Xeon 32nm Corei7 (Westmere) processors to all hosts in the cluster. Compared to the Intel Xeon Corei7 mode, this EVC mode exposes additional CPU features including AES and PCLMULQDQ. Intel i3/i5 Xeon Clarkdale Series processors that do not support AESNI and PCLMULQDQ cannot be admitted to EVC modes higher than the Intel Xeon Corei7 mode.
Figure 5: Intel EVC options
If you try to choose an EVC option that won’t work for some of the hosts in the cluster, you’ll get an error message. In Figure 6, you’ll see that my attempt to set the EVC mode to Core i7 status results in three hosts being reported as not being capable of supporting this EVC mode – esx4, esx5, and esx6.
Figure 6: Three hosts don't meet the requirements
Likewise, if I attempt to choose an EVC mode that is too low, I can also have problems with virtual machines that are currently in operation. For example, in Figure 7, you’ll note that, when I attempt to set the EVC mode to Core 2 (Merom) – as opposed to 45 nm Core 2 (Penryn) – I get errors indicating that some running virtual machines are using features from the higher processor level. In order to set the cluster to this lower level, I’d need to shut down any affected virtual machines.
In fact, when I initially established EVC at Westminster, I had to shut down the virtual machines running on the Core i7 (Nehalem) host in order to be able to set that host’s EVC level to something lower. Once that was complete, I set the EVC level, restarted those virtual machines and carried on. Now, instead of having four clustered servers broken down into a set of three between which vMotion worked and one other host (but still in the cluster), all four server participate as equals in the cluster and all support vMotion.
Figure 7: Some machines are using advanced processor features
For completeness, I also wanted to show you the full list of AMD Opteron-based EVC options. Note in Figure 8 that you’re told that these options are not available since the processors on the systems running in the cluster are Intel processors – not AMD.
AMD EVC modes expose the following features:
- AMD Opteron Generation 1. All features of AMD Opteron Rev. E CPUs
- AMD Opteron Generation 2. All features of AMD Opteron Generation 1 and additional CPU features including CMPXCHG16B and RDTSCP
- AMD Opteron Generation 3. All features of AMD Opteron Generation 2 and additional CPU features including SSE4A, MisAlignSSE, POPCOUNT, ABM (LZCNT)
- AMD Opteron Generation 3 (no 3Dnow!). Applies baseline feature set of AMD Opteron Generation 3 (Greyhound) processors, with 3DNow! Support removed, to all hosts in the cluster. This mode allows you to prepare clusters containing AMD hosts to accept AMD processors without 3DNow! support.
Figure 8: AMD EVC features
There’s more than just CPUID that goes into determining whether or not a vMotion operation will succeed. For example, vCenter also performs a number of other checks, such as determining whether or not target and destination hosts have shared storage available. If they don’t, the vMotion will fail. EVC is but one component in the process.
I should also point out that EVC will not work 100% of the time. It depends on applications behaving the way that they’re supposed to by using the CPUID feature rather than some other mechanism for determining CPU features. If an application happens to use some other method and is able to get past EVC’s feature-blocking mechanisms, all bets are off and vMotion operations could fail.