This is part 3 of my series on HA techniques up-and-down the stack and how they relate to use of Oracle VM’s Guest VM HA features. This installment talks about HA cluster software or “clusterware”. For databases, of course, Oracle RAC is the ultimate, but what about HA for database or other workloads where you might be using clusterware? Even with databases, you may not require continuous availability or you may not require the ability to support a workload that is greater than the capacity of a single physical server. In that case, having a fail-over based model where there may be some very short outage before automatic resumption of service is probably acceptable - clusterware is probably a good solution. But, wait, doesn’t it seem like HA implemented at the guest VM level would work in this situation as well? The answer is yes, but they are not totally identical in capabilities to let’s examine those a bit.
HA clusterware software runs inside the guest itself while guest VM HA solutions from the major vendors, including Oracle, execute outside the VM itself. In other words, the HA clusterware is generally application- or application-service aware: it knows what is running, maybe even down to the process level and can monitor each individual registered service. As a result, it has the advantage that it can do things like selectively restart specific services without requiring a restart of the node. It is a better position to do hang detection more quickly and to potentially resolve issues at a finer-grained level: why reboot the whole machine if, say, the OS is healthy, but the web server is hung for some reason? Much better to just restart the web server. From the outside, it is very hard to detect even an OS hang consistently (a node may appear to be “running” when, in reality, it has ceased performing productive work and would need to be restarted). And it is essentially impossible to detect the hang of one individual service or application without some specific, intrusive integration. But clusterware like Oracle’s generally has sophisticated hang detection capability to permit a rapid restoration of service(s). It is this finer-grained “application/service awareness” that is a key strength of clusterware above what guest VM HA restarts from the virtualization layer can provide.
At this point in these series of blog entries, you may be wondering about the value of implementing HA at the virtualization layer if I’m saying that it does not provide continuous availability (like RAC) and it is not explicitly application aware (like clusterware), but the case for guest VM HA is actually quite strong. The reality is that today anyway, the products available up the stack (from any vendor) typically provide their benefits at the cost of configuration complexity and likely licensing fees* (*not always...see below) beyond the cost of the virtualization layer and thus those benefits need to be weighed against the costs.
Many scenarios absolutely justify the cost and effort for implementing these powerful HA solutions for mission-critical applications, but, equally, there are likely a large number of server instances of all types where it is definitely desirable to automatically restart the server/VM instance should it fail (especially at 3am Sunday morning after a night out!), but where you do not want to incur the costs of implementing HA software up the stack (costs of all types…learning/training, configuration/maintenance complexity, support costs, licensing costs, etc). For these scenarios, HA implemented in the virtualization layer is ideal because it will automatically restart failed nodes but has absolutely minimal complexity for the admin creating the virtual machine: typically just checking a box to enable the HA functionality for that VM and you are done. No adding HA agents or setting up HA services or registering applications. It just works.
*One final note on the more commercial aspects of this for Oracle customers: licensing expense. Oracle is unique amongst virtualization vendors in that we offer enterprise class software not only at the virtualization layer but also a portfolio of software that runs inside the VMs, including key infrastructure like Clusterware. Oracle VM, including Oracle VM Manager is free: no license expense so you only pay for annual support. Similarly, Oracle Clusterware is also included in the support fee when purchase an Unbreakable Support subscription for Enterprise Linux or if you are using the Clusterware to support an Oracle database. This is powerful, enterprise class HA at a bargain price. Not only no license fee, but even comparing Oracle’s support pricing for these products with the support pricing for equivalent products from other vendors, you would find this to be incredibly affordable.
The conclusion in this series is that all of these techniques have a vital role to play and that no one of them eliminates the need for the other despite what other vendors would try to have you believe. In fact, these are solidly complementary techniques that can work very well together to further improve the availability of your stack from top-to-bottom. And an advantage of working with Oracle is that we can work with you across all these options to tailor the best solution for you.