Options for VMware Virtual Center / vCenter Redundancy
VMware Virtual Center (now called vCenter) is a critical piece of VMware’s Virtual Infrastructure Suite. If you use the VMware Virtual Infrastructure Suite, it is likely that you use Virtual Center / vCenter for ANYTHING related to management of your virtual infrastructure. Many of us have Virtual Center, SQL Server, and the VMware License server running all on a single server. We use VC to administer our guest VMs, check performance, configure high availability, load balancing, and so much more. But what if Virtual Center went down? What would happen to your Virtual Infrastructure and the entire critical guest VMs? Let us find out what would happen and then what you can do to keep VC as highly available as possible.
What happens if Virtual Center goes down?
If you ask VMware (especially a sales rep) what happens if Virtual Center goes down, the short answer you will get is “nothing”. Technically, this is true and is the answer that most Admins are hoping to hear. However, there is a lot more to it than that as you can image. I mean, you use VC for everything related to VI administration, right? Well if it is no longer there, certainly you cannot do the “stuff” you did before. So, in reality, “something” happens when VC goes down.
Besides being unable to perform all of your common VC tasks when VC is down, what most VMware Admins are concerned about is if their high availability and load balanced clusters (using VMHA and DRS) will function if VC goes down. I can tell you that both VMHA and DRS will continue to function if VC goes down.
With VMHA, an agent is placed on each of the ESX hosts that are part of the cluster. This is done so that each host in the cluster can communicate with the other hosts in order to maintain state information and to know when one of the hosts in the ESX cluster fails. If the VC Server goes down HA will continue to function as normal. The only caveat is that the resource availability information used to determine what ESX host to start guest VMs on, in the event of an ESX host failure, will be based on the information available before the VC failure.
With DRS and a VC failure, all guest VMs running in the DRS enabled cluster will continue to function as normal. However, no more recommendations for guest reallocation or automatic reallocation of the guest VMs will happen.
Thus, Virtual Center is not a single point of failure for advanced features of VMware’s Virtual Infrastructure.
Assuming you have your License server installed on the same host as the Virtual Center server and that host is down you cannot issue any new licenses for new hosts or new features. However, all Virtual Center licensed features continue to operate including advanced features like VMotion and DRS. All ESX Server licensed features will operate for 14 days (even if rebooted). After the grace period, certain ESX Server features like powering on an ESX host no longer function.
Still, what is most important is that even without a license server and Virtual Center, all ESX hosts and guests continue to function for 14 days.
What are the options for VMware Virtual Center Redundancy?
Let us say that you are concerned about what would happen if your Virtual Center server goes down. What are your options to improve its reliability and, perhaps, offer some kind of redundancy / fault-tolerance?
1. Install a second Virtual Center Server and use as a “hot-standby”
Most importantly, you cannot run TWO Virtual Center servers at the same time, hitting the same SQL or Oracle backend database. Virtual Center is not cluster aware (at least in version 3.5 but it is rumored to be available in an upcoming release).
What I am proposing here is to:
Have your SQL or Oracle data available by either copying/replicating it to another database server, using Clustering, or just having your database on a different server than the VC server (a server that hasn’t failed).
Install a second Virtual Center server on a different host (perhaps even a virtual machine), creating a new system DSN, and pointing it to the existing database.
Power Off that second Virtual Center server.
If the primary server fails, you can power on the secondary VC server.
Another variation of this would be to just do a P2V conversion of the primary VC Server then power off that cloned VC server. In the event that the primary physical VC server fails, you could power on the cloned & virtualized version of it.
2. Run Virtual Center inside a Guest VM and put the VM in a VMHA Cluster
Assuming you have at least 2 ESX hosts running in a VMHA cluster already, why not just run Virtual Center in the cluster as a VM. You would install SQL and the license server all locally on the Virtual Center VC. If the ESX host that the VC VM was running on fails, VMHA will power the VC VM up on the other ESX host and VC will continue to function.
To me, this seems like a much better option than option 1 because you don’t have to have two hosts (and one of them off) nor do you have to worry about copying/replicating your VC data. If you are already using VMHA for your critical production machines, running Virtual Center in it should be a no-brainer.
3. NeverFail for VirtualCenter
If you are looking for a true high-availability option for Virtual Center, the only option dedicated to the task is NeverFail for VMware Virtual Center. With this 3rd party product, your Virtual Center software, SQL database on the VC Server, and License server will all be replicated to another server. If your primary VC Server fails, NeverFail for VirtualCenter will bring the backup server immediately online. Besides automated failover, NeverFail for VC provides automated failback.
What’s my recommendation?
While there is no single answer that fits the needs of every company, my recommendation for those wanting to ensure higher availably of their Virtual Center server, I would first recommend option #2, above. By making VC a guest VM, it will be automatically restarted if the ESX host it is running on fails. For very high availability of VC, I recommend option #3 – NeverFail for VMware. However, now that you know that HA, DRS, and licensed server continue to function if VC goes down, no highly critical pieces of VMware cease to function when VC goes down, perhaps option #4 is overkill.