Tuesday Oct 20, 2009

Solaris Cluster: Presentations and Demonstrations at Oracle Open World 2009

If you couldn't make it to Oracle Open World in San Francisco (12th - 15th October) I've put together a quick summary of the presentations and demonstrations that my colleagues and I delivered.

The Solaris Cluster group was given two speaking sessions for the conference:

  1.  "Bulletproof Your Oracle E-Business Suite Deployment with Solaris Cluster" - Neil Garthwaite and Pedro Lay
  2. "Delivering High Availability to Oracle Applications with Solaris Cluster" - Dr Ellard Roush and Tim Read

Neil and Pedro's presentation explained how you could combine Solaris Cluster and Sun Cluster Geographic Edition to create a high availability Oracle's E-Business Suite implementation that also has disaster recovery protection.

The underlying 10g RAC database runs in the global zone and is replicated by Oracle Data Guard to a secondary cluster. The application tier comprising concurrent manager and Oracle Process Manager and Notification Server (opnm) managed services (oafm, forms, oacore and http server) runs in a separate, isolated Solaris Containers cluster, also known as a zone cluster. Using the features of Solaris Cluster and Geographic Edition both local failover and disaster recovery can be achieved without the need to re-run autoconfig.

Oracle Open World E-Business suite demonstration configuration

The picture above shows the configuration demonstrated at Oracle Open World with the exception that shared QFS was used in place of ASM. 

The presentation was backed up by a real implementation using two physically separate clusters. During the show we performed numerous demonstrations. These included local failovers, caused by forcing zone or process failures, and full site switchovers to test the disaster recovery plan. Whereas local failover took only a matter of a few seconds, the site switchover took around 15 minutes to complete, albeit on a somewhat untuned system. On the final day we initiated a full disaster scenario by powering off both of the primary cluster nodes. The takeover of the primary site completed successfully in around 10 minutes. And just to re-iterate, at no point did we need to re-run autoconfig, whether we performed a failover, switchover or site takeover.

Throughout these demonstrations we used Load-Runner to place a background load on the system and give an indication of interactive performance.

Keep an eye out for videos of this setup on the Sun Learning Exchange.

The second presentation given by Ellard and me came in two parts. The first covered how Solaris Containers clusters can be used to consolidate workloads onto a Solaris Cluster with both security isolation and resource constraints. The second part described how a new resource type, or agent, could be written to support an application for which Sun doesn't currently have an offering. The example used in the presentation was Oracle Business Intelligence Enterprise Edition.

Other Solaris Cluster demonstrations at the Sun booth included: HA-ASM, a new agent for managing Oracle ASM, running in a zone cluster and Oracle 11g RAC running over RDS on Infiniband on SPARC servers using Sun's QDR PCIe HCAs and the 36-port switch.

Now if you were at Oracle Open World but didn't sign up to attend our presentations (and look what you've missed!) then you can still access copies of the slides and listen to the recording of the above sessions.

Maybe we'll see you at OOW next year!

Tim Read
Solaris Cluster Engineering

Sunday Jun 14, 2009

Oracle on Solaris Cluster: Stuff you might have missed

If, like me, you only have a limited amount of time to browse the multitude of web sites, blogs, journals and news feeds out there, you may have missed some interesting Oracle collateral that my colleagues at Sun have produced. So to help reduce your information overload, I thought I'd highlight them in this post.

As Oracle 11g is growing in importance, let's start there. Both Solaris Cluster and Sun ISV engineering have been working with Oracle 11g now for some considerable time. We've invested a lot of resources in integration and stress testing it to make sure it works seamlessly with Solaris Cluster. The results of these efforts are captured in the "Sun Reference Architecture for Oracle 11g Grid" whitepaper. From a personal standpoint, I find the performance characterization of various networking and file system options the most interesting material as it really demonstrates the breadth of choice Solaris Cluster gives you: 1GbE, 10GbE, Infiniband, ASM, shared QFS, etc.

If you've not kept up with the progress in Solaris Cluster's support for virtualization, then you may not have seen Dr Ellard Roush's excellent "Zone Clusters — How to Deploy Virtual Clusters and Why" Blueprint. If you want to consolidate your Oracle RAC workloads to maximise the system utilisation, then this, together with the "Deploying Oracle Real Application Clusters (RAC) on Solaris Zone Clusters", co-authored with Gia-Khanh Nguyen, are papers you must read.

Following a similar theme, Alexandre Chartre, Daniel Dibbets, and Roman Ivanov have written "Running Oracle Real Application Clusters (RAC) on Sun Logical Domains". This Blueprint describes the best practice for setting up Oracle Clusterware on LDoms. Having this perform reliably is a requirement and precursor to gaining Oracle support for the same configuration running under Solaris Cluster, with all the well known benefits that Solaris Cluster brings.

Finally, if there was ever any doubt that SPARC/Solaris was the most scalable and performant platform for Oracle, the "Performance and Scalability Benchmark: Oracle Communications Billing and Revenue Management on Sun SPARC Enterprise T5220 and M8000 Servers Running the Solaris 10 OS" Blueprint describes how you can bring the best of Sun technologies: ZFS, Solaris Containers, Dynamic System Domains, together to provide highly scalable Communications Billing and Revenue Management solution.

Hopefully, one or more of these will be of use or interest to you. If you have any questions, please feel free to contact us through the blog or via the Solaris Cluster forum. We're always happy to help.

Tim Read
Solaris Cluster Engineering

Wednesday Feb 25, 2009

Disaster Recovery Protection Options for Oracle with Sun Cluster Geographic Edition 01/09

With the announcement of Solaris Cluster 3.2 01/09 comes a new version of Sun Cluster Geographic Edition (SCGE). Among the features delivered in this release is support for replication of Oracle RAC databases using Oracle Data Guard. So it seems like a good opportunity to summarise the ways you can protect your Oracle infrastructure against disaster using the replication support provided by Sun Cluster Geographic Edition.

I'll start by breaking the discussion into two halves: first deployments using a highly available (HA) Oracle implementations, the second using Oracle Real Application Clusters (RAC). Additionally, I'll reiterate the replication technologies that SCGE supports, namely: EMC Symmetrix Remote Data Facility (SRDF), Hitachi TrueCopy, Sun StorageTek Availability Suite (AVS) and last, but not least, Oracle Data Guard (ODG). One final point to make is that SCGE support for SRDF/A is limited to takeover operations only.

HA Oracle Deployments

HA-Oracle deployments are found in environments where the cost/benefit analysis determines that you are prepared to trade off the longer outages involved in switching or failing over an Oracle database compared with the near-continuous service that Oracle RAC can offer, against the additional licensing costs involved.

Deployments of HA-Oracle can be on a file system: UFS or VxFS (stay posted for ZFS support) or on raw disk with, or without, a volume manager: Solaris Volume Manager (SVM) or Veritas Volume Manager (VxVM). Why not Oracle Automatic Storage Management you might ask? Well, while ASM is indeed supported on Oracle RAC, it poses problems when employed in a failover environment. There is a need to fail over either the ASM instance or just the disk groups used to support the dependent databases. These requirements currently preclude ASM from being supportable. Are we working on this? Of course we are!

So this gives us a set of storage deployment options that must be married with the replication options that SCGE supports and any restrictions that may come into play when deploying HA-Oracle in a Solaris Container (a.k.a zone).

Coverage is extensive: Oracle 9i, 10g and 11g are supported on file systems (UFS or VxFS) or raw devices, with or without containers and with or without VxVM, using either AVS, SRDF or TrueCopy. In contrast, SVM restricts the replication technology support to AVS only.

Why isn't Oracle Data Guard supported here, especially given that it's one of the new replication modules in SCGE 3.2 01/09? The answer lies in the use Oracle Data Guard broker as an interface to control the replication. Unfortunately, ODG broker stores a physical host name in its configuration files and after a fail-over it doesn't match that of the new host, thus invalidating the configuration. Consequently, Oracle does not support ODG broker on 'cold failover' database implementations even if this host name change could be avoided, say by putting the database into a Solaris Container.

Oracle RAC Deployments

With HA-Oracle options covered I'll now turn to Oracle RAC. As you will no doubt know from reading "Solaris Cluster 3.2 Software: Making Oracle Database 10G R2 and 11G RAC Even More Unbreakable", Solaris Cluster brings a number of additional benefits to Oracle RAC deployments including support for shared QFS as a means of storing the Oracle data files. So now you'll need to know what deployment options exist when you include SCGE in your architecture.

As we're still working on adding Solaris Container Cluster support to SCGE there is currently no support for Oracle RAC, or indeed any other data service, using this virtualisation technique.

Furthermore, I should remind you that AVS is not an option for any Oracle RAC deployment simply because it cannot intercept the writes coming from more than one node simultaneously.

On the positive side, storage replication products such as SRDF and TrueCopy are an option as they intercept writes at a storage array level rather than at the kernel level. These replication technologies are restricted to configurations using raw disk on hardware RAID or raw VxVM/CVM volumes. These storage options can then be used with Oracle 9i, 10g or 11g RAC. For a write up of just such a configuration, please read EMC's white paper on our joint demonstration at Oracle Open World 2008.

Combinations wishing to use shared QFS or ASM are currently precluded because of the additional steps that must be interposed prior to an SCGE switchover or takeover being effected. Are we looking to address this? Absolutely!

If you want unfettered choice of storage options on Solaris Cluster when replicating Oracle 10g and 11g RAC data, then the new Oracle Data Guard module for SCGE is the answer. You have freedom to choose any combination of raw disk, ASM, shared QFS deployment combination that makes sense to you. You can configure a physical standby partner in either single instance to single instance or dual instance to single instance combinations, i.e. primary site to standby site configurations. All ODG replication modes are supported: maximum performance, maximum availability and maximum protection. Although the SCGE can control logical standby configurations Sun have not yet announced formal support for use of this feature.

You've Read The Blog, Now See The Movie....

I hope that gives you a clear picture of how you can use a combination of Solaris Cluster and the various replication technologies that Sun Cluster Geographic Edition supports to create disaster recovery solutions for your Oracle databases. If you would like to see demonstrations of some of these capabilities, please watch the video of an ODG setup and an SRDF configuration on Sun Learning Exchange.

Tim Read
Staff Engineer
Solaris Cluster Engineering

Thursday Sep 18, 2008

Automating Oracle RAC installations on Solaris Cluster with the N1-SPS Oracle 10g R2 & 11g R1 RAC plug-in

Installing software is tedious, really tedious. It is not something you would want to spend your whole day doing if you could avoid it. The more complex the software installation the more likely that you might make a mistake that results in the software being incorrectly installed. Consequently any group or organisation that takes testing seriously, the Sun Cluster QA team for example, is going to end up installing, re-installing and flexing their software installations on a regular basis. If this labour intensive job is automated, staff can concentrate on the more valuable work of actually testing.

Cue the N1-SPS Oracle 10g R2 & 11g R1 RAC plug-in. This addition to Sun's N1 Service Provisioning System automates each of the steps needed to turn a freshly installed Solaris Cluster into one running the Oracle 10g or 11g RAC software stack. The software has been designed to both install, flex and de-install all the major software components and even coping with changes not made using the plug-in by the system administrators or database administrators.

Features of the plug-in include:

  • Installation of Sun Cluster RAC agent, Oracle UDLM and shared QFS packages.
  • Creation, expansion/contraction and removal of:
    • The RAC framework resource group and resources
    • Shared QFS file systems and associated Sun Cluster resource groups and resources
    • Solaris Cluster 3.2 RAC manageability resource groups and resources
    • Oracle CRS framework
    • Oracle RAC software
    • An Oracle ASM database
    • Multiple Oracle databases on QFS or ASM
  • Creation and removal of multiple Oracle databases on raw disk

The plug-in requires an N1-SPS 5.2, or above, master server and can deploy to the following target configurations:

  • Solaris 9 or Solaris 10
  • SPARC or x64
  • Sun Cluster 3.1 or Solaris Cluster 3.2
  • Oracle 10g R2 version & 11gR1

Having moved onto a project integrating Oracle Data Guard with Sun Cluster Geographic Edition I certainly find the plug-in extremely useful as I build and rebuild my clusters, quickly and repeatably.

The software is being made available as a free, unsupported download here. Documentation for the software can be found here.

Please note the documentation was written before the addition of support for 11gR1, so there are minor discrepancies in the component parameters.

I hope you find the software useful. I certainly did. I used it to help build our Oracle Open World Data Guard demonstration.

Tim Read

Solaris Cluster Engineering

Friday Jun 13, 2008

Oracle Data Guard Replication Support in Sun Cluster Express Geographic Edition

If you read the Solaris Cluster Express 6/08 available for download blog posting, you will have noticed that it mentioned that SCX Geographic Edition had been enhanced to support Oracle Data Guard replication. Now if you run Oracle 10g or 11g RAC in your data centers and need to integrate them into a disaster recovery framework that supports: Sun StorageTek Availability Suite (AVS), Hitachi TrueCopy and EMC SRDF then read on.

SCX Geographic Edition (SCXGE) is binary distribution of the open source version of Sun Cluster Geographic Edition (SCGE) that runs on Solaris Cluster Express (SCX) on top of Solaris Express Community Edition (SXCE) build 86. [We love our acronyms and abbreviations]. Trivia: did you know that SUN itself is an acronym that originally stood for Stanford University Networks?

Back to the plot... Services made highly available by SCX resource groups can grouped together in "protection groups". SCXGE then uses the robust start and stop methods of SCX to ensure that the protection group is migrated in a controlled fashion from a primary location to a standby site. This migration includes performing all the tasks needed to change the direction of data flow provided by the underlying replication technology. The advantage being that SCXGE hides the complexity of the individual replication methods behind a common command line interface. Thus switching a service from site A to site B is reduced to:

# geopg switchover -m newhost_cluster my-protection-pg

regardless of whether the replication was AVS, TC or SRDF based.

Support for Oracle Data Guard (ODG) extends SCXGE's capability beyond just software and storage based replication technologies. Internally, the ODG module uses the Oracle Data Guard broker interface which restricts its use to Oracle 10g and 11g RAC configurations only, i.e. no support for HA-Oracle.

So once you set up your Oracle RAC databases, configure ODG and create your Data Guard broker configuration, you can add that to your SCXGE configuration and have it monitored and controlled, just as you do for your other non-ODG protection groups. The module allows you to have multiple ODG protection groups, each with one or more ODG broker configuration in. Remember that each protection group can span any SCXGE partnership pair, so you can only have one primary and one standby database in each of these broker configurations.

If this has got you interested, feel free to download and install the SCX software bundle with the new ODG module and try it out. Just remember that this is the first build. I'm working on fixing all the bugs that my colleagues in QA discover, so keep an eye on this blog and the Open HA Cluster for news of any updates.

Although we don't offer support for this product, you can post questions in any of the usual forums (or is it fora :-) ).

Tim Read


Oracle Solaris Cluster Engineering Blog


« July 2016