Over the past couple of days, there's been a really interesting exchange (no pun intended) going on between the Microsoft Exchange team and a few folks from VMware. The topic: Exchange's supportability and stability on VMware installations that include HA
The timeline goes roughly like this:
- VMware releases a document entitled Microsoft Exchange 2010 on VMware Availability and Recovery Options, which includes some guidance that goes counter to published Microsoft support requirements.
- The Microsoft Exchange Team, on their blog, responds by calling VMware out for "unnecessarily increasing storage and maintenance costs and putting customers at risk." The team further indicates that "[i]t is simply reckless for VMware to recommend customers deploy this configuration, while ignoring important Microsoft system requirements and unsupported scenarios."
- On the VMware Virtual Reality blog, Alex Fontana and Scott Salyer, the original authors of the Exchange on VMware Best Practices and availability docs, seemingly point out that the good folks at Microsoft appear to have no clue what vSphere's HA feature actually does behind the scenes.
Let's start at the beginning
Page 10 of the VMware document includes guidance with the title "4.2.2 Example: VMware HA Used with DAG Clustering for Faster Recovery". They even include a nice diagram depicting the exact process that takes place when a DAG member fails and HA takes over and brings the workload up on another ESX host.
There's a catch, though. On Microsoft's Exchange 2010 System Requirements page you'll find the following information under a heading entitled Microsoft supports Exchange 2010 in production on hardware virtualization software only when all the following conditions are true:
"Microsoft doesn't support combining Exchange high availability solutions (database availability groups (DAGs)) with hypervisor-based clustering, high availability, or migration solutions that will move or automatically failover mailbox servers that are members of a DAG between clustered root servers. DAGs are supported in hardware virtualization environments provided that the virtualization environment doesn't employ clustered root servers, or the clustered root servers have been configured to never failover or automatically move mailbox servers that are members of a DAG to another root server."
Apparently, this guidance from VMware was cause for concern from someone on the Exchange team as it puts customers "at risk" and was "reckless" advice. The Exchange team goes on to say that they are thrilled that customers in mass quantities are virtualizing Exchange and indicates that the company is committed to supporting these use cases.
Not being thrilled with being called out with what they believe is a misinterpretation of the ESX HA feature's role, VMware's blog posting works to explain that HA is really a pretty simplistic "move and reboot" technology that is no different than an administrator performing the migrate and reboot steps manually.
VMware does indicate on page 18 of the document the following support caveat:
"VMotion, HA, and DRS are not currently supported for MSCS nodes. We've seen lab results and real-world customers combine the functionality successfully, just understand that there are support implications. You can disable HA and DRS in the properties of the affected virtual machines."
So, they know it's unsupported and even warn us about it.
So, who's right?
Frankly, I think that both sides are right and wrong on different points:
- VMware is probably in the wrong when it comes to actively recommending to customers what Microsoft considers an unsupported configuration. While the solution is almost 100% likely to work from a technical perspective (and, this has been tested), Microsoft is known for their strict support policies when it comes to virtualization and, from the standpoint of ability to get support when necessary, this unsupported configuration could leave customers at risk for not getting support. That said, they do go out of their way to make sure to let customers know that there could be support implications.
- Microsoft is correct in reiterating their support policies in order to clear up what could create customer confusion around the topic. However, c'mon – HA isn't supported? Really? Why? Rather than simply say "it's not supported" I'd love Microsoft to come out and clearly indicate the technical reason behind this lack of support. The group indicates that VMware's HA adds cost and complexity, but with an understanding for how the specific HA feature works, exactly how is additional cost and complexity added to the equation?
As for who is more wrong, I think Microsoft is clearly "wronger" here, which is hard for me to say, especially because I really like Exchange 2010 and other Microsoft products in general. So, in this battle, I declare VMware victorious!
Microsoft: If you have solid technical reasons for not supporting HA (not DRS) for DAG members complete with the test results and numbers to back up the "cost and complexity" claim, we want to hear it! If there is something to the claim, additional information will help customers make much better decisions regarding availability and will go a long way toward helping us understand this stance.