Thursday Jun 25, 2009

Single-node clusters, for disaster recovery and more...

At first sight, a single-node cluster may seem to be a pointless thing. After all, what sort of high availability can you get from one node? :)

That might be a valid point if HA alone were the only consideration, but there are quite a few other ways in which single-node clusters can be useful. Two of the most useful ones are:

  • As part of a Geographic Edition (SCGE) Disaster Recovery (DR) configuration.
  • For development and test.

Disaster Recovery with SCGE

SCGE allows two clusters, separated by enough distance that a disaster at one site will not affect the the other, to be managed together. Several data replication products (AVS, SRDF, Oracle DataGuard, etc.) can be managed within this two-cluster partnership to ensure that the DR site has up-to-date information, ready to take over service.

Obviously this configuration requires two clusters, but if we assume that the DR site will be needed only in the (hopefully rare) instance of a disaster, and probably occasionally during maintenance of the primary, there is no need for it to be an exact copy of the primary site. In fact, it can be a single node. All that is required is that it be running Solaris Cluster software, i.e. it can be a Single-Node Cluster.

Carrying this idea further, it is also fully supported to have single-node clusters at both primary and secondary sites.

As I mentioned at the start, this won't give very much in the way of High Availability in the event of a local primary-site server failure, but you may not need that. Strange though it might seem at first glance, HA isn't a prerequisite for DR, it depends entirely on your business continuity needs (and that's a subject for a future blog entry).

No special tricks or configurations are required for this, it just works “out of the box”. With two single-node clusters and AVS (SNDR) replication between them, you have a fully-supported Disaster Recovery configuration, implemented with no special additional hardware. Larger sites with external storage arrays and replication also work just as well with a single-node cluster as with a multi-node configuration.

Development

Another place where a single-node cluster can be really useful is when developing cluster-based software, especially cluster agents. With the support for Solaris Containers (aka zones) that was added in Solaris Cluster 3.2, this has become even easier.

Fully testing a cluster agent requires that you simulate failures, such as system crashes or disconnections, and ensure that the agent reacts correctly. This is also true when testing that a given application operates correctly in a cluster environment. It's not something that you'd normally want to do on your desktop. However, providing an extra pair of systems in a cluster as lab test equipment for each developer is costly, and takes up valuable lab space and energy.

The solution? A single-node cluster, with some zones configured. With Solaris Cluster 3.2 you can specify zones (in the format of nodename:zonename) in the nodelist of an application resource group, see the clrg(1CL) manpage. The cluster software, running in the global zone, manages those applications just as if they were on separate physical nodes. You can request that the resource groups be switched between zones, or even crash or halt zones to test that automatic recovery is performed correctly. All without leaving your desk or rebooting your development system.

And for my next trick...

I hope that's given a brief idea of what can be done today with a single node. What might the future hold? Well, people will jump on me if I promise anything, but I really like some of the ideas that the Open HA Cluster guys have been demonstrating. Take a look at Thorsten's whitepaper if you want to try clustering VirtualBox systems - on your laptop!

As always, join us at http://www.opensolaris.org/os/community/ha-clusters/ to discuss this or any other cluster topics.


Steve McKinty
SCGE Architect


Wednesday Feb 25, 2009

Disaster Recovery Protection Options for Oracle with Sun Cluster Geographic Edition 01/09

With the announcement of Solaris Cluster 3.2 01/09 comes a new version of Sun Cluster Geographic Edition (SCGE). Among the features delivered in this release is support for replication of Oracle RAC databases using Oracle Data Guard. So it seems like a good opportunity to summarise the ways you can protect your Oracle infrastructure against disaster using the replication support provided by Sun Cluster Geographic Edition.

I'll start by breaking the discussion into two halves: first deployments using a highly available (HA) Oracle implementations, the second using Oracle Real Application Clusters (RAC). Additionally, I'll reiterate the replication technologies that SCGE supports, namely: EMC Symmetrix Remote Data Facility (SRDF), Hitachi TrueCopy, Sun StorageTek Availability Suite (AVS) and last, but not least, Oracle Data Guard (ODG). One final point to make is that SCGE support for SRDF/A is limited to takeover operations only.

HA Oracle Deployments

HA-Oracle deployments are found in environments where the cost/benefit analysis determines that you are prepared to trade off the longer outages involved in switching or failing over an Oracle database compared with the near-continuous service that Oracle RAC can offer, against the additional licensing costs involved.

Deployments of HA-Oracle can be on a file system: UFS or VxFS (stay posted for ZFS support) or on raw disk with, or without, a volume manager: Solaris Volume Manager (SVM) or Veritas Volume Manager (VxVM). Why not Oracle Automatic Storage Management you might ask? Well, while ASM is indeed supported on Oracle RAC, it poses problems when employed in a failover environment. There is a need to fail over either the ASM instance or just the disk groups used to support the dependent databases. These requirements currently preclude ASM from being supportable. Are we working on this? Of course we are!

So this gives us a set of storage deployment options that must be married with the replication options that SCGE supports and any restrictions that may come into play when deploying HA-Oracle in a Solaris Container (a.k.a zone).

Coverage is extensive: Oracle 9i, 10g and 11g are supported on file systems (UFS or VxFS) or raw devices, with or without containers and with or without VxVM, using either AVS, SRDF or TrueCopy. In contrast, SVM restricts the replication technology support to AVS only.

Why isn't Oracle Data Guard supported here, especially given that it's one of the new replication modules in SCGE 3.2 01/09? The answer lies in the use Oracle Data Guard broker as an interface to control the replication. Unfortunately, ODG broker stores a physical host name in its configuration files and after a fail-over it doesn't match that of the new host, thus invalidating the configuration. Consequently, Oracle does not support ODG broker on 'cold failover' database implementations even if this host name change could be avoided, say by putting the database into a Solaris Container.

Oracle RAC Deployments

With HA-Oracle options covered I'll now turn to Oracle RAC. As you will no doubt know from reading "Solaris Cluster 3.2 Software: Making Oracle Database 10G R2 and 11G RAC Even More Unbreakable", Solaris Cluster brings a number of additional benefits to Oracle RAC deployments including support for shared QFS as a means of storing the Oracle data files. So now you'll need to know what deployment options exist when you include SCGE in your architecture.

As we're still working on adding Solaris Container Cluster support to SCGE there is currently no support for Oracle RAC, or indeed any other data service, using this virtualisation technique.

Furthermore, I should remind you that AVS is not an option for any Oracle RAC deployment simply because it cannot intercept the writes coming from more than one node simultaneously.

On the positive side, storage replication products such as SRDF and TrueCopy are an option as they intercept writes at a storage array level rather than at the kernel level. These replication technologies are restricted to configurations using raw disk on hardware RAID or raw VxVM/CVM volumes. These storage options can then be used with Oracle 9i, 10g or 11g RAC. For a write up of just such a configuration, please read EMC's white paper on our joint demonstration at Oracle Open World 2008.

Combinations wishing to use shared QFS or ASM are currently precluded because of the additional steps that must be interposed prior to an SCGE switchover or takeover being effected. Are we looking to address this? Absolutely!

If you want unfettered choice of storage options on Solaris Cluster when replicating Oracle 10g and 11g RAC data, then the new Oracle Data Guard module for SCGE is the answer. You have freedom to choose any combination of raw disk, ASM, shared QFS deployment combination that makes sense to you. You can configure a physical standby partner in either single instance to single instance or dual instance to single instance combinations, i.e. primary site to standby site configurations. All ODG replication modes are supported: maximum performance, maximum availability and maximum protection. Although the SCGE can control logical standby configurations Sun have not yet announced formal support for use of this feature.

You've Read The Blog, Now See The Movie....

I hope that gives you a clear picture of how you can use a combination of Solaris Cluster and the various replication technologies that Sun Cluster Geographic Edition supports to create disaster recovery solutions for your Oracle databases. If you would like to see demonstrations of some of these capabilities, please watch the video of an ODG setup and an SRDF configuration on Sun Learning Exchange.

Tim Read
Staff Engineer
Solaris Cluster Engineering

Friday Jun 13, 2008

Oracle Data Guard Replication Support in Sun Cluster Express Geographic Edition

If you read the Solaris Cluster Express 6/08 available for download blog posting, you will have noticed that it mentioned that SCX Geographic Edition had been enhanced to support Oracle Data Guard replication. Now if you run Oracle 10g or 11g RAC in your data centers and need to integrate them into a disaster recovery framework that supports: Sun StorageTek Availability Suite (AVS), Hitachi TrueCopy and EMC SRDF then read on.

SCX Geographic Edition (SCXGE) is binary distribution of the open source version of Sun Cluster Geographic Edition (SCGE) that runs on Solaris Cluster Express (SCX) on top of Solaris Express Community Edition (SXCE) build 86. [We love our acronyms and abbreviations]. Trivia: did you know that SUN itself is an acronym that originally stood for Stanford University Networks?

Back to the plot... Services made highly available by SCX resource groups can grouped together in "protection groups". SCXGE then uses the robust start and stop methods of SCX to ensure that the protection group is migrated in a controlled fashion from a primary location to a standby site. This migration includes performing all the tasks needed to change the direction of data flow provided by the underlying replication technology. The advantage being that SCXGE hides the complexity of the individual replication methods behind a common command line interface. Thus switching a service from site A to site B is reduced to:

# geopg switchover -m newhost_cluster my-protection-pg

regardless of whether the replication was AVS, TC or SRDF based.

Support for Oracle Data Guard (ODG) extends SCXGE's capability beyond just software and storage based replication technologies. Internally, the ODG module uses the Oracle Data Guard broker interface which restricts its use to Oracle 10g and 11g RAC configurations only, i.e. no support for HA-Oracle.

So once you set up your Oracle RAC databases, configure ODG and create your Data Guard broker configuration, you can add that to your SCXGE configuration and have it monitored and controlled, just as you do for your other non-ODG protection groups. The module allows you to have multiple ODG protection groups, each with one or more ODG broker configuration in. Remember that each protection group can span any SCXGE partnership pair, so you can only have one primary and one standby database in each of these broker configurations.

If this has got you interested, feel free to download and install the SCX software bundle with the new ODG module and try it out. Just remember that this is the first build. I'm working on fixing all the bugs that my colleagues in QA discover, so keep an eye on this blog and the Open HA Cluster for news of any updates.

Although we don't offer support for this product, you can post questions in any of the usual forums (or is it fora :-) ).

Tim Read

Friday May 02, 2008

Solaris Cluster, Sun Cluster Geographic Edition and Virtualization: The Art of the Possible

Virtualization is a hot topic right now. You only need look at what Sun is doing with our CoolThreads servers, our storage, our software, our Open High Availability Cluster (OHAC) HA-xVM and HA Container agent development work ... deep breath ... and our OEM agreements and I think you'll agree - it's everywhere. Consequently, it's generating lots of questions on our external cluster forum and resulting in several posts to the Solaris Cluster blog, including this one.

So does virtualization mean that the need for clustering goes away? Far from it! Now that you've consolidated your multiple, independent servers into a single virtualized platform, you've effectively put more of your 'eggs' in a single basket. If the services are mission or business critical, you are still going to need to protect them. And how are you going to do that? Solaris Cluster, of course. Oh, and what about disaster recovery? Critical services need protection against fire, flood, natural disasters. That's where Sun Cluster Geographic Edition comes in. Is your head hurting yet?

As with any rapidly evolving technology, it's important to understand the ramifications of what you are doing, as well as be able to navigate the what-works-with-what matrix. Failure in either department could result in an unsupported configuration or an implementation that doesn't achieve the levels of service you would expect.

When I first suggested to my colleagues that I would author something on this topic, I had no idea what I was letting myself in for. I can usually talk to one expert, or consult one document and get all the information I need. But as you'll have realised from the preamble, the combination of virtualization and clustering spans many boundaries. So, I am indebted to the many colleagues who provided information and reviews for the Blueprint that resulted from my rash suggestion.

I hope you find "Using Solaris Cluster and Sun Cluster Geographic Edition with Virtualization Technologies" saves you a lot of head scratching when your colleague, manager, or professor asks you, "And how do Solaris Cluster and virtualization technologies work together?".

Enjoy. 

Tim

Tuesday Dec 04, 2007

Sun Cluster Geographic Edition is now Open-Source

The source code for the Sun Cluster Geographic Edition product is now available in the HA Clusters community on OpenSolaris.org! In addition to browsing the Open High Availability Cluster Geographic Edition source code, you can download it and build it with either the Sun Studio or the gcc compiler.

This source code release represents the second phase of the complete Sun Cluster open-sourcing roadmap. The first phase, the Sun Cluster Agents, occurred last June, and the third and final phase, the Sun Cluster core gate, will happen sometime next year.

I'm particularly pleased that, in addition to product code, this release of the Geographic Edition source includes test code, man pages, and globalization source.

Nick Solter

HA Clusters community facilitator and
Sun Cluster developer

About

mkb

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today