Project Colorado: Running Open HA Cluster on OpenSolaris

If you have ever asked yourself why Solaris Cluster Express (the Open HA Cluster binary distribution) is running on Solaris Express Community Edition and not yet on the OpenSolaris binary distribution, then you might be interested in project Colorado. This project is endorsed by the HA Clusters community group and has as its goal to provide a minimal and extensible binary distribution of Open HA Cluster that runs on the OpenSolaris binary distribution.

As always, the devil is in the details. Here are some of the reasons why this isn't just a "recompile and run" experience:

  • Package system did change:
    One of the big changes with the OpenSolaris binary distribution is the switch to use the image packaging system (IPS). As Stephen Hahn explains in some of his blogs, it is a key design criteria of IPS to not include something like the rich scripting hooks found within the System V packaging system.
    Have a look at our initial analysis on how and where those scripting hooks are currently used. Note that installation is one aspect, uninstall another. In the more modular world of network based package system this means dependencies to other packages need to be explicit and in fine granularity. And within the Cluster world, this needs to be in agreement across multiple nodes. Since we also deliver kernel modules and plan to deliver a set of optional functionality, this brings the challenge on where to put the tasks that have been within the scripting hooks for configuration and uninstallation.
    Further, if you know the details of the Open HA Cluster build steps (which is similar to the ON build steps), you know that building the packages via the pkgdefs hierarchy is an important element to later then also assemble the deliverables for the DVD image. Since we do not just want a 1:1 conversion of our existing packages into IPS packages, we need to come up with a mechanism to deliver the IPS packages into a network repository as part of the build step, since there is no delivery of a DVD image going forward.
  • Zones behaviour did change:
    The native zones brand type introduced in Solaris 10 did inherit the packages to install from the global zone. For Solaris Cluster this means that cluster related packages got automatically installed and configured when a zone got created. On OpenSolaris this behavour did change. Currently there is only the ipkg brand type available. This means that out of the box none of the ways Solaris Cluster integrates with zones works without various changes needed.
  • KSH93 vs KSH88:
    OpenSolaris did switch to KSH93 for /bin/sh and /usr/bin/ksh, the previous KSH88 shell is no longer available. Again the devil is within the details. KSH93 does deal e.g. with local vs. global variables differently. Some scripts required for building (like /opt/scbld/bin/nbuild) or to install Solaris Cluster (like /usr/cluster/bin/scinstall) do break with KSH93. The full set of impacted scripts needs to get determined.
  • The Sun Java Web Console (webconsole) is not part of OpenSolaris:
    The Java Web Console provides a common location for users to access web-based system management applications. The single entry point that the web console provides eliminates the need to learn URLs for multiple applications. In addition, the single entry point provides user authentication and authorization for all applications that are registered with the web console. All web console-based applications conform to the same user interface guidelines, which enhances ease of use.
    Those are all reasons why Solaris Cluster did choose to deliver its browser user interface (BUI) named Sun Cluster Manager using and leveraging the Sun Java Web Console framework. In addition it also uses the Web Application Framework (JATO) and Lockhart Common Components.
    Since those components are not available for the OpenSolaris binary distribution, this brings the challenge which management framework to use (and develop against) instead. Of course a substitution is not trivial and can be quite time consuming. And it is not sure if existing code can get reused.
  • Dependencies on encumbered Solaris code:
    Besides components that the OpenSolaris binary distribution did choose not to deliver anymore, there is the goal to create a freely redistributable binary distribution. This means OpenSolaris does also not deliver the Common Desktop Environment (CDE), which includes the Motif libraries. The adminconsole delivered with Solaris Cluster does use Motif and ToolTalk.
    The adminconsole tools need to get redesigned to use libraries available within OpenSolaris.
  • No SPARC support for OpenSolaris yet:
    The OpenSolaris binary distribution is currently only available for the i386 platform. Solaris Express Community Edition does provide also a distribution for SPARC. While this is not a strong inhibitor to run on OpenSolaris, it is nonetheless a reason why providing Solaris Cluster Express is still a requirement.
    The good news is that there are plans to provide SPARC support for OpenSolaris within future releases.
  • OpenSolaris Installer does not support network installations yet:
    While this not a direct problem, it becomes to one if you consider that developers for Open HA Cluster are distributed around the world and most engineers have only access to remote systems, without the possibility to perform an installation requiring keyboard and monitor.
    Again the good news is that there are plans to add support for automated network installations within future OpenSolaris releases.

Besides solving the above challenges, we also want to offer some new possibilities within Colorado. You can read the details within the umbrella requirement specification. There are separate requirement specifications to outline specific details for the planned private-interconnect changes, cluster infrastructure changes involving the weaker membership, enhancements to make the proxy file system (PxFS) optional, and changes to use iSCSI with ZFS for non-shared storage configurations.

You can provide feedback on those documents to the ha-clusters-discuss mailing list. There is a review scheduled with the Cluster Architecture Review Committee on 18 September 2008, where you are invited to participate by phone if you are interested.


Post a Comment:
Comments are closed for this entry.

This Blog is about my work at Availability Engineering: Wine, Cluster and Song :-) The views expressed on this blog are my own and do not necessarily reflect the views of Sun and/or Oracle.


« June 2016