Friday Apr 17, 2009

Oracle VM Blog: 64bit RAC is Now Certified!

I'm sure many of you are interested to know that 64bit Oracle RAC is now certified on Oracle VM in addition to the previously cerrtified 32bit RAC.

As a result, the RAC-on-Oracle VM whitepaper has been updated to reflect this as well as generally updated to reflect the latest in best practices and configuration guidelines. The updated paper can be found here:

Note that you can also navigate to it from the Oracle VM OTN whitepapers page here...

Some of the new content in the updated whitepaper:

  • Support for 64-bit Oracle RAC on Oracle VM

  • Support for CPU over-committing

  • Oracle RAC versus the OVM HA feature

  • Detailed VCPU allocation rules and examples

  • More info on supported storage for RAC in a OVM environment

  • Setting diagwait and how to prevent clock skewing

  • OCFS2 in the guest domain example

    RAC on Oracle VM is of high interest amongst our customers as they see it as a way to improve development and testing (create a multi-node RAC on a single physical server for development and functional verification), but also as a way to insulate the RAC nodes from the specifics of the underlying hardware to allow more flexibility for that hardware to change over time without requiring a change to the RAC configuration.

    Friday Apr 03, 2009

    Oracle VM: Part 3 - Where Does Guest VM HA Fit-in Versus Other HA Software: HA Cluster Software and Guest VM HA

    This is part 3 of my series on HA techniques up-and-down the stack and how they relate to use of Oracle VM’s Guest VM HA features. This installment talks about HA cluster software or “clusterware”. For databases, of course, Oracle RAC is the ultimate, but what about HA for database or other workloads where you might be using clusterware? Even with databases, you may not require continuous availability or you may not require the ability to support a workload that is greater than the capacity of a single physical server. In that case, having a fail-over based model where there may be some very short outage before automatic resumption of service is probably acceptable - clusterware is probably a good solution. But, wait, doesn’t it seem like HA implemented at the guest VM level would work in this situation as well? The answer is yes, but they are not totally identical in capabilities to let’s examine those a bit.

    HA clusterware software runs inside the guest itself while guest VM HA solutions from the major vendors, including Oracle, execute outside the VM itself. In other words, the HA clusterware is generally application- or application-service aware: it knows what is running, maybe even down to the process level and can monitor each individual registered service. As a result, it has the advantage that it can do things like selectively restart specific services without requiring a restart of the node. It is a better position to do hang detection more quickly and to potentially resolve issues at a finer-grained level: why reboot the whole machine if, say, the OS is healthy, but the web server is hung for some reason? Much better to just restart the web server. From the outside, it is very hard to detect even an OS hang consistently (a node may appear to be “running” when, in reality, it has ceased performing productive work and would need to be restarted). And it is essentially impossible to detect the hang of one individual service or application without some specific, intrusive integration. But clusterware like Oracle’s generally has sophisticated hang detection capability to permit a rapid restoration of service(s). It is this finer-grained “application/service awareness” that is a key strength of clusterware above what guest VM HA restarts from the virtualization layer can provide.

    At this point in these series of blog entries, you may be wondering about the value of implementing HA at the virtualization layer if I’m saying that it does not provide continuous availability (like RAC) and it is not explicitly application aware (like clusterware), but the case for guest VM HA is actually quite strong. The reality is that today anyway, the products available up the stack (from any vendor) typically provide their benefits at the cost of configuration complexity and likely licensing fees* (*not always...see below) beyond the cost of the virtualization layer and thus those benefits need to be weighed against the costs.

    Many scenarios absolutely justify the cost and effort for implementing these powerful HA solutions for mission-critical applications, but, equally, there are likely a large number of server instances of all types where it is definitely desirable to automatically restart the server/VM instance should it fail (especially at 3am Sunday morning after a night out!), but where you do not want to incur the costs of implementing HA software up the stack (costs of all types…learning/training, configuration/maintenance complexity, support costs, licensing costs, etc). For these scenarios, HA implemented in the virtualization layer is ideal because it will automatically restart failed nodes but has absolutely minimal complexity for the admin creating the virtual machine: typically just checking a box to enable the HA functionality for that VM and you are done. No adding HA agents or setting up HA services or registering applications. It just works.

    *One final note on the more commercial aspects of this for Oracle customers: licensing expense. Oracle is unique amongst virtualization vendors in that we offer enterprise class software not only at the virtualization layer but also a portfolio of software that runs inside the VMs, including key infrastructure like Clusterware. Oracle VM, including Oracle VM Manager is free: no license expense so you only pay for annual support. Similarly, Oracle Clusterware is also included in the support fee when purchase an Unbreakable Support subscription for Enterprise Linux or if you are using the Clusterware to support an Oracle database. This is powerful, enterprise class HA at a bargain price. Not only no license fee, but even comparing Oracle’s support pricing for these products with the support pricing for equivalent products from other vendors, you would find this to be incredibly affordable.

    The conclusion in this series is that all of these techniques have a vital role to play and that no one of them eliminates the need for the other despite what other vendors would try to have you believe. In fact, these are solidly complementary techniques that can work very well together to further improve the availability of your stack from top-to-bottom. And an advantage of working with Oracle is that we can work with you across all these options to tailor the best solution for you.

    Thursday Mar 26, 2009

    Part 2: Where Does Guest VM HA Fit-in Versus Other HA Software: RAC and Guest VM HA

    As part 2 in this series of blog entries about HA techniques in a virtualized environment, I’m going to focus on considerations when thinking about using Oracle Real Application Clusters (RAC) with- (or without) Oracle VM HA. For databases, Oracle RAC is the ultimate: Continuous availability and ability to handle workloads that are larger than what a single physical server can handle. If you need continuous uptime for your database applications, then you need RAC. Others are claiming that a single instance database with their flavor of clusterware or with their flavor of virtualization is equivalent to RAC. Ummm…No. Sorry: That is a terrible blurring of lines to force-fit the wrong solution. Again, the question is, do you need continuous availability? No one’s guest VM HA or clusterware solution provides continuous availability in a way that is practical for production databases today except RAC. Period*. (I’ll get a comment or two for that statement but let me get to that in a minute). Even if you don’t need continuous availability, do you need to support database workloads greater than a single physical (or virtual) machine could support? If so, again, you need RAC. No one’s clusterware or virtualization-based HA can do that today. So, if faced with these claims just ask two very simple questions:
    1. Will that give me continuous availability?
    2. Will that allow workloads that will scale larger than a single physical node?
    If the answer to either one is “no”, then it is not equivalent to RAC and they are trying to fool you.

    *On that continuous availability thing…OK…so there are some projects and some vendors starting to claim they have a “fault tolerant” or “continuous availability” mode for their virtualization product here or coming soon, but these are no where near ready for production use. These appear to be uniprocessor-only, very slow, and unproven in the reliability of their methodology for assuring there will be no corruption…these are simply not ready to host things like databases and not yet even worth talking about for use in the “real world”. Someday, the technology may get there but when that changes, then I’ll blog about that. Also, no flames/comments from the proprietary hardware crowd on how they’ve been doing fault tolerant stuff since before I was born and before wood was invented or whatever. That stuff was/is great, but this blog is about virtualization on industry standard components that are about the only stuff mere mortals can afford these days.

    Now let’s get back to positioning HA techniques in the stack. What about using RAC and HA at the virtualization layer, i.e. with guest VM HA / auto-restart enabled? There is definitely some value here. Virtualization, by definition, abstracts you from many of the constraints of the underlying hardware meaning it allows you to do things like automatically restarting a node on different, healthy hardware when the original hardware fails. In this scenario, if you have a physical server failure that takes down a node, the RAC database will continue service based on the surviving nodes, while, in parallel, Oracle VM Guest HA will restart the VM (RAC node) hosted on the failed physical server on another, healthy server in the pool. The node can then quickly and automatically rejoin the RAC cluster, bringing it to full capacity faster than any manual process. One thing to keep in mind is that you should specify what is known as “Preferred Server Policies” for your RAC nodes (VMs) in such a way as to prevent two RAC VMs from the same instance being hosted on a single physical server at any time to assure the best performance and service levels.

    So that’s Oracle VM Guest HA and Oracle RAC. In the next installment in this series on HA, we’ll discuss the more general (i.e. non-database specific) case of HA cluster software and how it compares and complements Oracle VM Guest HA.

    And, by the way, if you want to read a whitepaper on RAC and Oracle VM, go here and find it in the "Whitepapers" section.

    Friday Mar 20, 2009

    Where Does Guest VM HA Fit-in Versus Other HA Software? Part 1: About Guest VM HA

    Customers ask us about Oracle VM and its guest HA / auto-restart functionality in the context of how it is best used in relation to other HA technologies available “up the stack” and I thought that it might be useful to share the discussion here over a couple of blog entries. This is the first entry in that series. This installment is about providing some context and then a summary of how the Guest VM HA / auto-restart feature works in Oracle VM.

    Oracle Real Application Clusters (if you are using a database), Oracle Clusterware, and Oracle VM Guest HA are all available choices for users in implementing a highly available environment so how should they think about the best way to leverage these in their production enterprise? This is an especially important topic because some vendors like VMware or others that have only one part of this…say, only the guest VM restart features, or only the HA clusterware…are anxious to position their solution as THE complete solution when that is just not the case. As with many things in IT, it depends on what you are doing.

    For context, and for those of you not familiar with the details on these products/features and Oracle VM’s Guest HA features, there are some whitepapers to look at on the Oracle Technology Network (OTN) page for Oracle VM (look for the Guest VM HA paper but also the paper on using Oracle RAC on Oracle VM and on using Oracle Clusterware to make Oracle VM Manager highly available…). I won’t go into all the details here except to summarize that Oracle VM has embedded portions of the OCFS2 clusterware stack into Oracle VM Manager to basically make the server pools into HA clusters and automatically restart VMs after a server or VM failure. Since this is sophisticated clusterware and not just the ICMP-based “pingware” that many other virtualization products offer, Oracle VM does an excellent, very deterministic job of detecting true failures and restarting accurately and cleanly without a lot of guessing as to the status of the VM.

    For example, we perform not only network heartbeating but also disk-based heartbeating to enable more robust failure detection. And then we do distributed lock management on the storage to make sure there is no chance of data corruption in restarting a VM after declaring it failed. So aside from the fact that this is more sophisticated than the vast majority of guest VM HA solutions out there that don’t run a heartbeat on the disk, and that maybe only perform basic reserve release on the storage, the nice thing about the implementation is that it is super easy to make a VM highly available: just check a box. Truly. Yes, the clusterware is there under the covers, but the user creating the VM is not exposed to that so no agents to install, no services to register…just check the Enable HA box when you create the VM and you are done.

    Over the course of a couple additional blog entries, we’ll walk through some considerations to help you decide which techniques provide the best total solution in your environment. Luckily, the considerations are pretty clear, with each product having a distinct set of considerations. Yes, we are Oracle, so of course we’ll speak to some considerations that are specific to the database, but most of this applies generically to any workload. The next blog entry in this HA series will be about RAC and Guest VM HA and should come out in the next few days so keep an eye out for that.


    Get the latest scoop on products, strategy, events, news, and more, from Oracle's virtualization experts




    « March 2015