Where Does Guest VM HA Fit-in Versus Other HA Software? Part 1: About Guest VM HA
By Adam Hawley on Mar 20, 2009
Customers ask us about Oracle VM and its guest HA / auto-restart functionality in the context of how it is best used in relation to other HA technologies available “up the stack” and I thought that it might be useful to share the discussion here over a couple of blog entries. This is the first entry in that series. This installment is about providing some context and then a summary of how the Guest VM HA / auto-restart feature works in Oracle VM.
Oracle Real Application Clusters (if you are using a database), Oracle Clusterware, and Oracle VM Guest HA are all available choices for users in implementing a highly available environment so how should they think about the best way to leverage these in their production enterprise? This is an especially important topic because some vendors like VMware or others that have only one part of this…say, only the guest VM restart features, or only the HA clusterware…are anxious to position their solution as THE complete solution when that is just not the case. As with many things in IT, it depends on what you are doing.
For context, and for those of you not familiar with the details on these products/features and Oracle VM’s Guest HA features, there are some whitepapers to look at on the Oracle Technology Network (OTN) page for Oracle VM (look for the Guest VM HA paper but also the paper on using Oracle RAC on Oracle VM and on using Oracle Clusterware to make Oracle VM Manager highly available…). I won’t go into all the details here except to summarize that Oracle VM has embedded portions of the OCFS2 clusterware stack into Oracle VM Manager to basically make the server pools into HA clusters and automatically restart VMs after a server or VM failure. Since this is sophisticated clusterware and not just the ICMP-based “pingware” that many other virtualization products offer, Oracle VM does an excellent, very deterministic job of detecting true failures and restarting accurately and cleanly without a lot of guessing as to the status of the VM.
For example, we perform not only network heartbeating but also disk-based heartbeating to enable more robust failure detection. And then we do distributed lock management on the storage to make sure there is no chance of data corruption in restarting a VM after declaring it failed. So aside from the fact that this is more sophisticated than the vast majority of guest VM HA solutions out there that don’t run a heartbeat on the disk, and that maybe only perform basic reserve release on the storage, the nice thing about the implementation is that it is super easy to make a VM highly available: just check a box. Truly. Yes, the clusterware is there under the covers, but the user creating the VM is not exposed to that so no agents to install, no services to register…just check the Enable HA box when you create the VM and you are done.
Over the course of a couple additional blog entries, we’ll walk through some considerations to help you decide which techniques provide the best total solution in your environment. Luckily, the considerations are pretty clear, with each product having a distinct set of considerations. Yes, we are Oracle, so of course we’ll speak to some considerations that are specific to the database, but most of this applies generically to any workload. The next blog entry in this HA series will be about RAC and Guest VM HA and should come out in the next few days so keep an eye out for that.