Wednesday Dec 03, 2008

Open HA Cluster with HA-xVM demo

Following our recent Open CLARC inception review of the Open HA Cluster (OHAC) agent for xVM Hypervisor guest domains, I thought a small 5 minute demo would be of interest.

Please hit the download key from the demo link to stream the demo, which can be found at the HA-xVM project page.

The demo is based on Solaris Express Community Edition build 86 and Solaris Cluster Express 2/08 simply to reflect the cheat sheet from the link above. The purpose of the demo was to show the following:

  • Show that an OHAC resource group managing an xVM guest domain via an OHAC resource can switch an xVM guest domain from one node to another using live migration. What's not shown is that the OHAC interconnects are used for the live migration.
  • Show that an OHAC managed xVM guest domain can survive a node crash.

In particular, within the demo RG1 manages xVM domain domu1. RG1 is then switched from node podio2 to node podio1 and domu1 is subsequently live migrated between the two nodes. While RG1/domu1 is online on podio1 that node podio1 is crashed via "uadmin 2 1". OHAC automatically detects that failure and restarts domu1 on podio2.

Neil Garthwaite
Solaris Cluster Engineering

Wednesday Sep 24, 2008

Announcing Availability of SCX 09/08

We are pleased to announce the availability of the latest Solaris Cluster Express (SCX)!  You can download the software here.

There are some major milestones reached in this release: It incorporates the first contribution from a community member!   And coming from a student, it is doubly delightful!!  It also provides early access to some of the new and exciting features that are being developed by the team.

What is new?

\* This release is runs on SXCE build 97. The version of Sun Management Center and other shared components are upgraded to be compatible with the Solaris Express Community Edition (SXCE) version.

\* The fencing mechanisms have been enhanced with the introduction of optional  fencing.  This provides a mechanism for the administrator to change the fencing mechanism either at global or at an individual disk level.

\* This release also has a new feature called zone clusters.  This feature makes it possible to form a virtual cluster based on the zones of a cluster.  This is made possible by the introduction of a new brand of zone called "cluster".  Needless to say, most of the code is available under a CDDL license like the rest of the software.  This feature is sure to make you reconsider your views about Open HA Cluster and Solaris Cluster!  Please refer to the clzonecluster(1CL)  man page for more details. You can find a cheat sheet for configuring a zone cluster here.

\* Use of Loopback File Driver (lofi) device for global-devices name space is introduced with this release.  A dedicated partition for the exclusive use of global-devices name space (i.e /globaldevices) is no longer the requirement.

\* As usual, there are the mandatory bug fixes and you can find them from the change log.

This release is a major milestone in the Open Source journey. For the list of all the exciting projects that the community is working on, please visit Open HA Cluster community.  This release of Solaris Cluster Express (SCX) will not work on OpenSolaris binary distribution (OpenSolaris 05.08).  For  the planned move to OpenSolaris binary distribution, visit Project Colorado.

Munish Ahuja

Madhan Kumar B

Jonathan Mellors

Venugopal N.S

Tuesday Sep 09, 2008

Project Colorado: Running Open HA Cluster on OpenSolaris

If you have ever asked yourself why Solaris Cluster Express is running on Solaris Express Community Edition and not yet on the OpenSolaris binary distribution, then you might be interested in project Colorado. This project is endorsed by the HA Clusters community group and has as its goal to provide a minimal and extensible binary distribution of Open HA Cluster that runs on the OpenSolaris binary distribution.

As always, the devil is in the details. The following table summarizes some of the reasons why this isn't just a "recompile and run" experience:

Work Area Solaris Express Community Edition OpenSolaris Binary Distribution
Packaging System System V packages Image Packaging System (IPS)
Zones support native and lx brand type ipkg brand type
KSH version KSH88, KSH93 KSH93
Web-based system management for applications Sun Java Web Console (webconsole) not provided
Encumbered Solaris code CDE, Motif, ToolTalk not contained by design
Supported Platforms SPARC, i386/x64 i386/x64 only (to date)
Installer Supports network (JumpStart), text, and graphical installation Supports graphical installation only (to date)
Preferred Compiler Studio 11, soon Studio 12 StudioExpress

You can read here more details about each point listed within the table.

Besides solving the above challenges, we also want to offer some new possibilities within Colorado. You can read the details within the umbrella requirement specification. There are separate requirement specifications to outline specific details for the planned private-interconnect changes, cluster infrastructure changes involving the weaker membership, enhancements to make the proxy file system (PxFS) optional, and changes to use iSCSI with ZFS for non-shared storage configurations.

You can provide feedback on those documents to the ha-clusters-discuss mailing list. There is a review scheduled with the Cluster Architecture Review Committee on 18 September 2008, where you are invited to participate by phone if you are interested.

Thorsten Früauf
Solaris Cluster Engineering

Wednesday Jun 04, 2008

Solaris Cluster Express 6/08 available for download

Solaris Cluster Express 6/08 is now available for download! You can download the DVD image here

What is new in this release?

\* This release runs on OpenSolaris Nevada build 86. The version of the Sun Management Centre is now 3.1.

\* The HA agent for Solaris Containers is now enhanced to include support for the Solaris 9 Branded Zones on SPARC platform. This is very useful for those customers who still need to run some applications on Solaris 9 while taking advantage of the new features of Solaris 10 and above.

\* The HA agent for PostgreSQL Database is now ehanced to support WAL shipping. This feature greatly enhances the deployment of PostgreSQL database in Enterprise deployments.

\* Support for Solaris Containers configured with exclusive IP is included in this release.

\* The SCX Geographic Edition is enhanced to support Oracle Data Guard based replication.

\* This release also contains the mandatory bug fixes and other minor enhancements not mentioned above.

Stay tuned for more milestones along the open source journey!

Munish Ahuja
Madhan Kumar B.
Jonathan Mellors
Arun Kurse
Venugopal N.S.

Thursday May 29, 2008

HA For Grid Engine at osgc2008

Last week i presented Open HA Cluster at Open Source Grid Cluster Conference in Oakland California. The conference had three different tracks, dedicated to Globus (GlobusWorld), Grid Engine (Grid Engine Workshop), and Rocks (Rocks Cluster Workshop).
My presentation about making Sun Grid Engine highly available using Open HA Cluster (OHAC) was part of the Grid Engine Workshop.

I noticed that the term Cluster was a bit overused at this conference with different products and technologies using it in slightly different ways. So i started with clarifying the term "HA Cluster" to refer to the technology which OHAC brings to the arena, which is about high availability, in spite of failures. A quick show of hands revealed that about 25% of the participants were aware of the concept of "HA Clusters" in general, with about 15% actually being aware of OHAC itself. Given that, i spent a larger portion of my talk on the concept of single points of failure, redundancy, failover and how OHAC recovers from system failures. Towards the end of my talk, i also talked about using OHAC to make Sun Grid Engine highly available and what are the key advantages of the HA solution based on OHAC. These points and the slides are curtsy of Thorsten Frueauf . The key points about how OHAC helps in improving the availability of Sun Grid Engine are noted in this blog entry .

The presentation did generate a couple questions from the audience, i remember one question was about how does OHAC takes care of MAC address change when it fails over a HA ipaddress from one node to another. I explained that OHAC uses gratuitous ARPs to update the ARP cache of any routers on the network and that works with all but a very few exceptions. Another question was about data recovery during disk/mirror failures and whether the end application needs to worry about it, i explained that typically that recovery is performed by a volume manager and the end application is blissfully unaware of it. The OHAC framework makes sure that the end application has the data available when and where (on the node where the app is) the application is started. Another question was about the speed of failover (how fast is the recovery upon various failures). I turned that question into an advantage where i explained how OHAC is tightly integrated with Solaris and thus can detect failures quickly and recover from the failures quickly. I then invited folks to view the failover demo on my laptop on the next day, in the "Grill the Gurus" portion of the conference.

I was somewhat curious about the audience mix as well about whether the larger percentage was from academic community or the commercial community. A quick show of hands revealed that the commercial users were well represented, roughly in the same numbers as the academic/research users. After the talk, i did speak to a couple of people during the coffee/lunch breaks and met a variety of people. Here are some folks i remember: A sysadmin at an European Oil company interested in using Grid Engine for optimizing/minimizing application license for a commercial software he uses for geological data analysis, a IT manager for a Medical Software startup based in San Francisco who was interested in Open Source software as a way to minimize costs, a deployment architect for a IT consultancy company who was interested in geographical data replication and content based routing of incoming jobs, a lab manager from an ivy league university who wanted to figure out an easy way for his students to be effective at managing his compute lab environment, a IT admin for a storage manufacturer who was interested in learning about techniques for efficient monitoring of workloads.

For the demo next day, i had Sun Grid Engine configured as a HA server across two zones on my laptop. I was able to demo the very quick restart of the Grid Engine qmaster and scheduler daemons. People seemed to be somewhat interested to learn as to how that is happening, which led me to explain how Solaris Contracts are used by the process monitoring implementation in OHAC, which leads to quick detection and recovery from application failures. Most people were simply interested in chatting about the general concept of clusters itself and discussing their own "Grid and Cluster" scenarios.

I you are interested in the actual slides i used for the talk, you can check them out here . If you missed this conference, you would have an opportunity to learn more about Open HA Cluster and OpenSolaris at the upcoming LinuxTag conference in Berlin, Germany from May 28th till 31th of May 2008.

The picture at the top is taken during a coffee break in the conference. Check out this link for other photos i took at the conference. Also, Deirdré Straughan made a video of my talk, complete with neat fading in and out of the presentation slides. Click in the embedded window below to watch the presentation in flash.

If you'd like, you can watch the video in iPod format and watch it on your video iPod . Beware that the file is rather big though.

This conference was a nice experience for me to talk to lot of people and make them aware of Open HA Cluster , and also learn about what is going on in other Open Source communities such as Grid. Hope you found some of the things in this blog useful and interesting.

Ashutosh Tripathi
Solaris Cluster Engineering


Oracle Solaris Cluster Engineering Blog


« June 2016