Tuesday Apr 21, 2009

Oracle VM High Availability - Hands-on Guide to Implementing Guest VM HA

We just released a new White Paper: Oracle VM High Availability – Hands-on Guide to Implementing Guest VM HA. Guest VM HA functionality provides a powerful, easy-to-manage solution for maximizing up-time for virtually any guest VM workload, without requiring any tailoring inside the VM, making it simple to set-up, use, and maintain.

This white paper focuses on best practices of the Oracle VM Guest VM High Availability (HA) design and implementation. It's complementary to the previous White Paper: Oracle VM – Creating & Maintaining a Highly Available Environments for Guest VMs, and serves as a practical guide to help customers design the HA environment and experience the benefits of Oracle VM. It provides a step-by-step guide to plan and set up the Oracle VM environment so you can implement the guest VM HA feature to assure predictable, reliable, and accurate restarting of failed VM and Servers.

To implement HA, you must create a cluster of Virtual Machine Servers in a server pool and have them managed by Oracle VM Manager or Oracle Enterprise Manager Grid Control. Some basic steps include:

1. Installing Oracle VM Server and Manager
2. Creating Shared Storage for the Server Pool
3. Enabling HA for the Server Pool
4. Adding a new Server to the Server Pool
5. Enabling HA for the Virtual Machines

The most important part is to create shared storage for the server pool. You can set up shared storage for the server pool in the following configurations:

* OCFS2 (Oracle Cluster File System) using the iSCSI (Internet SCSI) network protocol
* OCFS2 using SAN (Storage Area Network)
* NFS (Network File System)

The procedures for creating shared storage for HA are essentially the same as what's described in the Oracle VM Server User Guide for creating a shared virtual disk using the above storage configurations for live migration. But you have fewer steps to go through when creating shared storage for HA. For example, you don't need to manually modify /etc/fstab for enabling HA since the configuration files will be handled by Oracle VM server agent automatically when you run /usr/lib/ovs/ovs-makerepo utility. In addition, the startup of related cluster services (o2cb) will also be handled when you run /usr/lib/ovs/ovs-cluster-configure utility.

One of the common mistakes is that when the network is not configured properly, the cluster configuration files such as /etc/ocfs2/cluster.conf won't be propagated correctly to each server of the server pool. For example, the loopback address (127.0.0.1) may show up in the /etc/ocfs2/cluster.conf for some servers. You should verify your network settings (DNS, routing table, etc.), replace the loopback address with the public IP address for each server and make sure that the ocsf2 cluster configuration file (/etc/ocfs2/cluster.conf) be the same across all the servers within the same pool.

In summary, Oracle VM Guest VM HA functionality provides the following benefits:

* Auto-restart unexpectedly failed individual VMs on other servers in the server pool;
* Auto-restart all the guest VMs on another server in the server pool when an unexpected physical server failure occurs;
* Powerful cluster-based network- and storage heartbeat algorithms quickly and deterministically identify failed and/or isolated servers in the server pool to ensure rapid, accurate recovery;
* Sophisticated distributed lock management functionality for SAN, NFS, NAS, and iSCSI storage ensures VMs or entire servers can be rapidly restarted with no risk of data corruption.

For more information about Oracle VM and how customers are deploying it, please visit
http://oracle.com/virtualization.

Friday Apr 03, 2009

Oracle VM: Part 3 - Where Does Guest VM HA Fit-in Versus Other HA Software: HA Cluster Software and Guest VM HA

This is part 3 of my series on HA techniques up-and-down the stack and how they relate to use of Oracle VM’s Guest VM HA features. This installment talks about HA cluster software or “clusterware”. For databases, of course, Oracle RAC is the ultimate, but what about HA for database or other workloads where you might be using clusterware? Even with databases, you may not require continuous availability or you may not require the ability to support a workload that is greater than the capacity of a single physical server. In that case, having a fail-over based model where there may be some very short outage before automatic resumption of service is probably acceptable - clusterware is probably a good solution. But, wait, doesn’t it seem like HA implemented at the guest VM level would work in this situation as well? The answer is yes, but they are not totally identical in capabilities to let’s examine those a bit.

HA clusterware software runs inside the guest itself while guest VM HA solutions from the major vendors, including Oracle, execute outside the VM itself. In other words, the HA clusterware is generally application- or application-service aware: it knows what is running, maybe even down to the process level and can monitor each individual registered service. As a result, it has the advantage that it can do things like selectively restart specific services without requiring a restart of the node. It is a better position to do hang detection more quickly and to potentially resolve issues at a finer-grained level: why reboot the whole machine if, say, the OS is healthy, but the web server is hung for some reason? Much better to just restart the web server. From the outside, it is very hard to detect even an OS hang consistently (a node may appear to be “running” when, in reality, it has ceased performing productive work and would need to be restarted). And it is essentially impossible to detect the hang of one individual service or application without some specific, intrusive integration. But clusterware like Oracle’s generally has sophisticated hang detection capability to permit a rapid restoration of service(s). It is this finer-grained “application/service awareness” that is a key strength of clusterware above what guest VM HA restarts from the virtualization layer can provide.

At this point in these series of blog entries, you may be wondering about the value of implementing HA at the virtualization layer if I’m saying that it does not provide continuous availability (like RAC) and it is not explicitly application aware (like clusterware), but the case for guest VM HA is actually quite strong. The reality is that today anyway, the products available up the stack (from any vendor) typically provide their benefits at the cost of configuration complexity and likely licensing fees* (*not always...see below) beyond the cost of the virtualization layer and thus those benefits need to be weighed against the costs.

Many scenarios absolutely justify the cost and effort for implementing these powerful HA solutions for mission-critical applications, but, equally, there are likely a large number of server instances of all types where it is definitely desirable to automatically restart the server/VM instance should it fail (especially at 3am Sunday morning after a night out!), but where you do not want to incur the costs of implementing HA software up the stack (costs of all types…learning/training, configuration/maintenance complexity, support costs, licensing costs, etc). For these scenarios, HA implemented in the virtualization layer is ideal because it will automatically restart failed nodes but has absolutely minimal complexity for the admin creating the virtual machine: typically just checking a box to enable the HA functionality for that VM and you are done. No adding HA agents or setting up HA services or registering applications. It just works.

*One final note on the more commercial aspects of this for Oracle customers: licensing expense. Oracle is unique amongst virtualization vendors in that we offer enterprise class software not only at the virtualization layer but also a portfolio of software that runs inside the VMs, including key infrastructure like Clusterware. Oracle VM, including Oracle VM Manager is free: no license expense so you only pay for annual support. Similarly, Oracle Clusterware is also included in the support fee when purchase an Unbreakable Support subscription for Enterprise Linux or if you are using the Clusterware to support an Oracle database. This is powerful, enterprise class HA at a bargain price. Not only no license fee, but even comparing Oracle’s support pricing for these products with the support pricing for equivalent products from other vendors, you would find this to be incredibly affordable.

The conclusion in this series is that all of these techniques have a vital role to play and that no one of them eliminates the need for the other despite what other vendors would try to have you believe. In fact, these are solidly complementary techniques that can work very well together to further improve the availability of your stack from top-to-bottom. And an advantage of working with Oracle is that we can work with you across all these options to tailor the best solution for you.

Friday Mar 20, 2009

Where Does Guest VM HA Fit-in Versus Other HA Software? Part 1: About Guest VM HA

Customers ask us about Oracle VM and its guest HA / auto-restart functionality in the context of how it is best used in relation to other HA technologies available “up the stack” and I thought that it might be useful to share the discussion here over a couple of blog entries. This is the first entry in that series. This installment is about providing some context and then a summary of how the Guest VM HA / auto-restart feature works in Oracle VM.

Oracle Real Application Clusters (if you are using a database), Oracle Clusterware, and Oracle VM Guest HA are all available choices for users in implementing a highly available environment so how should they think about the best way to leverage these in their production enterprise? This is an especially important topic because some vendors like VMware or others that have only one part of this…say, only the guest VM restart features, or only the HA clusterware…are anxious to position their solution as THE complete solution when that is just not the case. As with many things in IT, it depends on what you are doing.

For context, and for those of you not familiar with the details on these products/features and Oracle VM’s Guest HA features, there are some whitepapers to look at on the Oracle Technology Network (OTN) page for Oracle VM (look for the Guest VM HA paper but also the paper on using Oracle RAC on Oracle VM and on using Oracle Clusterware to make Oracle VM Manager highly available…). I won’t go into all the details here except to summarize that Oracle VM has embedded portions of the OCFS2 clusterware stack into Oracle VM Manager to basically make the server pools into HA clusters and automatically restart VMs after a server or VM failure. Since this is sophisticated clusterware and not just the ICMP-based “pingware” that many other virtualization products offer, Oracle VM does an excellent, very deterministic job of detecting true failures and restarting accurately and cleanly without a lot of guessing as to the status of the VM.

For example, we perform not only network heartbeating but also disk-based heartbeating to enable more robust failure detection. And then we do distributed lock management on the storage to make sure there is no chance of data corruption in restarting a VM after declaring it failed. So aside from the fact that this is more sophisticated than the vast majority of guest VM HA solutions out there that don’t run a heartbeat on the disk, and that maybe only perform basic reserve release on the storage, the nice thing about the implementation is that it is super easy to make a VM highly available: just check a box. Truly. Yes, the clusterware is there under the covers, but the user creating the VM is not exposed to that so no agents to install, no services to register…just check the Enable HA box when you create the VM and you are done.

Over the course of a couple additional blog entries, we’ll walk through some considerations to help you decide which techniques provide the best total solution in your environment. Luckily, the considerations are pretty clear, with each product having a distinct set of considerations. Yes, we are Oracle, so of course we’ll speak to some considerations that are specific to the database, but most of this applies generically to any workload. The next blog entry in this HA series will be about RAC and Guest VM HA and should come out in the next few days so keep an eye out for that.

About

Get the latest scoop on products, strategy, events, news, and more, from Oracle's virtualization experts

Twitter

Facebook

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
5
6
7
8
9
10
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today