News, tips, partners, and perspectives for the Oracle Solaris operating system

Disaster Recovery Orchestration

Synchronizing the switchover of multiple Oracle Solaris Cluster Geographic Edition Protection Groups

Geographic Edition Overview

Geographic Edition is a management framework which adds Disaster Recovery (DR) functionality to the High Availability that Oracle Solaris Cluster already provides.  It enables two clusters to be linked in a Partnership, within which many application protection groups (PGs) can be configured.

Each PG combines control of one or more Oracle Solaris Cluster resource groups (RGs) with control of a data replication mechanism.  Switching a PG from one partner cluster to the other results in a controlled shutdown of the application(s) managed by the RGs, a reversal of the replication direction, and a restart of the application(s) at the other partner cluster (the Disaster Recovery site).  If the primary cluster has become unavailable due to a disaster, the PG can simply be activated at the Disaster Recovery site to bring the service online there.

The need to go further

With the ever-increasing use of virtualization and cloud-based services, a service might no longer be confined to a single cluster.  The larger Engineered Systems are a good example of this, where a single service may have multiple tiers, each running in a separate logical domain or zone cluster.  An administrator will want to be able to switch multiple tiers and/or services as a unit.  In this situation it is desirable to respect the relationship between the tiers. For example if an application relies on some middleware which attaches to a database, you'll want to shut down the stack in a "top down" way prior to switchover, and to restart it "bottom up" at the Disaster Recovery site.  This is exactly what we can now do with the new Disaster Recovery Orchestration feature.

How does Disaster Recovery Orchestration work?

We've introduced two new concepts:

  • The site.  A site is a grouping of clusters which are treated as one for Disaster Recovery purposes.  It could be, for example, a complete Oracle SuperCluster, or a group of physical clusters which are all in the same data center.  You can mix physical and virtual (zone) clusters in one site.  Sites are managed through the new geosite command, for example geosite create

  • The multigroup (MG).  A multigroup is a collection of related PGs, from multiple clusters at a site, which can be switched as a unit.  Multigroups are manipulated through the new geomg command. Operations like start, stop, switchover, and takeover can be performed on an MG and they will be applied to all PGs within the MG

The PGs can be given as a simple list, so that they are all shut down together and then restarted together at the Disaster Recovery site without the need to issue separate commands for each PG. This operation is sychronous by default, all PGs are stopped before any are switched across. If they are completely unrelated, the switchover of all PGs can be initiated together and the switchovers allowed to complete asynchronously. Alternatively, if there are dependencies between them, the PG list can be specified in the form of a sequence, called a dependency chain.

For example, if an MG contains a PG list of the form berlin:app_pg/paris:db_pg, then the application PG app_pg on cluster berlin will be shut down before the DB PG db_pg on cluster paris, and on the Disaster Recovery site they will be restarted in the reverse order. This will ensure that the application tier isn't online unless the database tier is ready.

For secure administrative control, we also distinguish between clusters which are controllers for a site, and those which are simple members of the site.  An MG control operation such as geomg switchover can only be issued from a controller, and each member cluster of the site must be configured to accept the authority of a controller.  You use the geosite command to configure this.

What happens if you upgrade one cluster in a partnership to 4.2?

All of this is new in Oracle Solaris Cluster 4.2, but updating to 4.2 will have no functional impact on existing configurations.  Clusters running 4.2 can interoperate with 4.1 clusters as expected, but of course the 4.1 clusters cannot participate in Sites or MGs.

Finding more information

It's all explained in the documentation, which you can find at: http://docs.oracle.com/cd/E39579_01/html/E39666/config-sites.html and http://docs.oracle.com/cd/E39579_01/html/E39667/index.html and of course in the manual pages for geosite(1M)
http://docs.oracle.com/cd/E39579_01/html/E39677/geosite-1m.html#scrolltoc and geomg(1M): http://docs.oracle.com/cd/E39579_01/html/E39677/geomg-1m.html#scrolltoc


Geographic Edition team
Oracle Solaris Cluster Engineering

Be the first to comment

Comments ( 0 )
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.