Monday Aug 24, 2009

Hochverfügbarkeit mit minimalem Cluster

Meinen Vortrag über "Hochverfügbarkeit mit minimalem Cluster" habe ich auf folgenden Veranstaltungen halten dürfen:

Hier die aktuelle Version als PDF.

Der Vortrag hat einen theoretischen Teil und ein live Demo von meinem Laptop. Die benutzte Konfiguration habe ich in einem white paper beschrieben.

Nach meinem persönlichen Gefühl ist mir der Vortrag in Berlin am besten gelungen. Auf dem Solaris iX Day bin ich spontan eingesprungen, nachdem ein geplanter Redner kurzfristig ausgefallen war - dafür war es denke ich OK. Auf der FrOSCon hatte ich etwas Probleme in meinen Redefluss zu finden und die Gedanken zu ordnen - stand also etwas neben mir. Hoffe der Vortrag war trotzdem nützlich und interessant für den einen oder anderen :-)

Friday Jun 19, 2009

Running Open HA Cluster with VirtualBox

My presentation at the Open HA Cluster Summit 2009 in San Francisco explained how to setup a system with OpenSolaris, serving as a host for at least two VirtualBox OpenSolaris guests, and shows how to setup a two node Open HA Cluster with them by using technologies like Crossbow and COMSTAR. Such a system can be used to build, develop and test Open HA Cluster, which got demonstrated live during the session.

The video recording for this presentation is now available (thanks to Deirdré Straughan):

You can also download the slides in order to follow the video better. Additionally I created a white paper with the following abstract:

For system administrators, it is often critical to have a test system on which to try out things and learn about new features. Of course the system needs to be low cost and transportable to anywhere they need to be.

HA Clusters are often perceived as complex to setup and resource hungry in terms of hardware requirements.

This white paper explains step-by-step how to setup a single x86 based system (like a Toshiba M10 laptop) with OpenSolaris, configuring a build environment for Open HA Cluster and using VirtualBox to setup a two node cluster.

OpenSolaris technologies like Crossbow (to create virtual networking adapters), COMSTAR (to setup non-shared storage as iSCSI targets and using them as iSCSI initiators), ZFS (to mirror the iSCSI targets), Clearview (the new architecture for IPMP), and IPsec (to secure the cluster private interconnect traffic) are used for the host system and VirtualBox guests to configure Open HA Cluster. The image packaging system (IPS) is being used to deploy the build packages into the guests. Open HA Cluster technologies like weak membership (to not require an extra quorum device) and the integration into OpenSolaris technologies are leveraged to setup three typical FOSS applications: HA MySQL, HA Tomcat and scalable Apache webserver.

Enjoy watching, reading and trying it out!

Monday Jun 08, 2009

Best kept secret - or - taking availability for granted

As Dr. David Cheriton made a very good point in his keynote at the Open HA Cluster Summit last week in San Francisco: we take availability for granted and only recognize the lack of availability.

Seems the announcement for OpenSolaris 2009.06 took that to heart. Fortunately there is the "What's New in OpenSolaris 2009.06" document, giving a hint on Open HA Cluster 2009.06. The features section then has a nice description about the availability options with Open HA Cluster on OpenSolaris.

Finally I recommend to have a look at the interview about the next generation high availability cluster.

If you are keen to try out Open HA Cluster on OpenSolaris, there is a white paper available, describing step-by-step how to setup a two node cluster using VirtualBox on a single x86 system, like a laptop.

The official release notes and installation guide are available on the HA Clusters community web page.

I want to thank the whole team that made Open HA Cluster 2009.06 for their hard work and a great release!

Thursday May 14, 2009

Second Blueprint: Deploying Oracle Real Application Clusters (RAC) on Solaris Zone Clusters

Some time ago I blogged about the blueprint that explains how zone clusters work in general. Dr Ellard Roush and Gia-Khanh Nguyen did now create a second blueprint that specifically explains how to deploy Oracle RAC in zone clusters.

This paper addresses the following topics:

  • "Zone cluster overview" provides a general overview of zone clusters.
  • "Oracle RAC in zone clusters" describes how zone clusters work with Oracle RAC.
  • "Example: Zone clusters hosting Oracle RAC" steps through an example configuring Oracle RAC on a zone cluster.
  • "Oracle RAC configurations" provides details on the various Oracle RAC configurations supported on zone clusters.
Note that you need to login with your Sun Online Account in order to access it.

Monday Mar 16, 2009

Open HA Cluster on OpenSolaris - a milestone to look at

You might have noticed the brief message to the ha-clusters-discuss maillist that a new Colorado source drop is available.  If you have read my blog about why getting Open HA Cluster running on OpenSolaris is not just a "re-compile and run" experience, then it is worth to mention that the team working on project Colorado made some great progress:

  • The whole source can get compiled by using the latest SunStudioExpress compiler on OpenSolaris.
  • Logic has been implemented to create the IPS manifest content within the source gate (usr/src/ipsdefs/) and to send the IPS packages to a configurable repository as part of the build process. That way one can easily install the own set of created IPS packages on any OpenSolaris system which can reach that repository.
  • IPS package dependencies have been analysed and are now defined explicitly in more fine granularity.
  • scinstall has been enhanced to do all the required steps at initial configuration time, which have been previously done within individual postinstall/preremove SVR4 package scripts.
  • Shell scripts have been verified to either work with KSH93 or been changed to use /usr/xpg4/bin/sh.
  • While the framework gate has still a build dependency to JATO (which is part of webconsole), any run-time dependency has been removed.
  • pconsole has been made available for OpenSolaris, which can be used instead of the Solaris Cluster adminconsole (which uses Motif).
  • Changes have been made to work with new networking features introduced by projects Crossbow, Clearview and Volo. This especially means that vnics can be used to setup a minimal cluster on systems which just have one physical network interface.
  • Changes have been made to improve the DID layer, to work with non-shared storage exported as Solaris iSCSI targets, which in turn can get used through configuring the Solaris iSCSI initiator on both nodes. This can get combined with ZFS to achieve failover for the corresponding zpool. Details can be found within the iSCSI design document
  • Changes have been made to implement a new feature called weak membership. This allows to form a two node cluster without requiring a quorum device (neither quorum disk nor quorum server). To better understand this new functionality, read the weak membership design document.
  • Changes have been made to the HA Containers agent to work the the ipkg brand type for non-global zones on OpenSolaris.
The source has been verified to compile on OpenSolaris 2009.06 build 108 on x86. Instructions on how to compile the framework and agent source code should enable you to try the various new possibilities out on your OpenSolaris system. Give it a try and report back your experience! Of course be aware that this is still work in progress.

Monday Feb 16, 2009

Participate: Colorado Phase 1 Open Design review

Colorado is the OpenSolaris project to have Open HA Cluster running on the OpenSolaris binary distribution. In a past blog I explained why this is not just a "compile and run" experience. Last year in September and October 2008 there have been two CLARC (CLuster Architecture Review Committee) slots assigned to review the Colorado requirements specification.

This week, Thursday 19th February 2009, you can participate within an Open CLARC review from 9am - 11am PT (Pacific Time). Free dialin information can be found at the Open CLARC wiki page.

Prior to the call it makes sense to first read the design documents:

  • The main document explains how generic requirements get implemented, like necessary (ksh) script changes, modifications to some agent, specifically support for the ipkg brand type within the HA Containers agent, details on the IPS support implementation, build changes to support compilation with Sun Studio Express, etc.
  • The document about weak membership does explain the modifications to achieve a two node cluster without requiring an extra quorum device. Instead a new mechanism gets implemented which makes use of health checks. The changed behavior to the already existing strong membership model gets explained, as well as the modifications to the command line interface to switch between those two membership models.
  • The networking design document describes the changes necessary to work with the OpenSolaris projects Clearview and Crossbow. Especially the later also requires changes within the cluster CLI to support the configuration of virtual network interfaces (VNIC). 
  • The iSCSI design document describes the changes required within the DID (device id) driver and the interaction with the Solaris iSCSI subsystem.

You can also send your comments to the ha-clusters-dicsuss-at-opensolaris-dot-org mailing list.

Don't miss this interesting opportunity to participare and learn a lot about the Open HA Cluster internals. Hear you at the CLARC slot or read you on the maillist :-)

Tuesday Jan 20, 2009

Interview about Open HA Cluster and MySQL

Together with Detlef Ulherr, I had the pleasure to get interviewed by Lenz Grimmer, member of the MySQL Community Relations team at Sun Microsystems.

You can read the full Interview at the MySQL Developer Zone. I like the graphics at the top of the page :-)

Monday Jan 19, 2009

unexpected ksh93 behaviour with build-in commands send to background

While working on modifying the HA Containers agent to support the ipkg zone brand type from OpenSolaris, I stumbled over a difference in behavior of ksh88 vs ksh93 to be aware off, since it will at least impact the GDS based agents.

The difference affects commands that exist within user land as well as shell build-in, like sleep, and how they are then seen within the process table.

Example: Consider the following script called "/var/tmp/mytesting.ksh":


# send one sleep into background - it will run longer than the script itself
sleep 100 &

# another sleep, this time not in the background
sleep 20
echo "work done"
exit 0

If you then invoke the script on OpenSolaris, and while the second sleep runs invoke "ps -ef | grep mytest" in a different window, it will show something like:

    root  8185  8159   0 06:48:13 pts/4       0:00 /bin/ksh /var/tmp/mytesting.ksh
    root  8248  8178   0 06:48:32 pts/5       0:00 grep mytesting
    root  8186  8185   0 06:48:13 pts/4       0:00 /bin/ksh /var/tmp/mytesting.ksh

Note the two processes with the same name.

After the second sleep 20 did finish, you will see the script has terminated with printing "work done".

However, "ps -ef | grep mytest" will show:

    root  8262  8178   0 06:48:37 pts/5       0:00 grep mytesting
    root  8186     1   0 06:48:13 pts/4       0:00 /bin/ksh /var/tmp/mytesting.ksh

until the first sleep did also finish.

What is interesting is that the sleep put into background has the script name in the process table.

On a system where /bin/ksh is ksh88 based, you would see a process called "sleep 100", and "/bin/ksh /var/tmp/mytesting.ksh" just once.

If you on the other side create the following script called "/var/tmp/mytesting2.ksh":


# send one sleep into background - it will run longer than the script itself
/usr/bin/sleep 100 &

# another sleep, this time not in the background
/usr/bin/sleep 20
echo "work done"
exit 0

And do the above testing again, you will see:

# ps -ef | grep mytesting
    root  8276  8159   0 06:57:31 pts/4       0:00 /bin/ksh /var/tmp/mytesting2.ksh
    root  8292  8178   0 06:57:36 pts/5       0:00 grep mytesting

Ie. the script appears just once. And you can see:

# ps -ef | grep sleep
    root  8278  8276   0 06:57:31 pts/4       0:00 /usr/bin/sleep 20
    root  8296  8178   0 06:57:43 pts/5       0:00 grep sleep
    root  8277  8276   0 06:57:31 pts/4       0:00 /usr/bin/sleep 100

While the second sleep is still running and the script has not finished.

Once it has finished, you just see:

# ps -ef | grep sleep
    root  8306  8178   0 06:57:55 pts/5       0:00 grep sleep
    root  8277     1   0 06:57:31 pts/4       0:00 /usr/bin/sleep 100

and no longer the /var/tmp/mytesting2.ksh process.

This does make a difference in our logic within the GDS based agents, where we disable the pmf action script. Before doing that we invoke a sleep with the START_TIMEOUT length to assure that at least one process is within the tag until the action script is disabled.

And in our probe script, the wait_for_online mechanism does check if the start_command is still running. If it does, the probe returns 100 to indicate that the resource is not online yet.

So far many of our code invokes sleep instead of /usr/bin/sleep - and with the wait_for_online combination above, this will cause our start commands to always run into a timeout - although the script itself does terminate and everything worked fine. Manual testing will not show anything obvious as well.

The fix is to always invoke /usr/bin/sleep, not the shell build-in sleep.

Took me a while to understand it, and I write it here so others are not scratching their head as I did ;-)

Wednesday Dec 10, 2008

GDS template - browse source online or checkout subversion repository

It is often convenient to refer directly to some code if you discuss a certain portion, or try to explain it.

Therefore I put the GDS coding template (which we published so far as compressed tar file) into a subversion repository under the HA Cluster Utilities project page. You can browse it online using OpenGrok.

The repository should be available via

Find instructions on how to use subversion at here.

If you want to enhance or improve the GDS template, feel free to send code changes for review to the ha-cluster-discuss mailing list.

In order to receive commit rights to this repository, your userid needs to be registered as a "project observer" at the HA Utilities project page. Only then I can select and enable the username to grant write access.

Thursday Oct 16, 2008

Aufzeichnungen Vorträge SourceTalk 2008

Einige Vorträge sind während den SourceTalk Tagen 2008 mit dem TeleTeachingTool aufgezeichnet worden. Seit kurzem stehen diese auf den Webseiten der Veranstaltung zur Verfügung. Darunter auch mein  Erfahrungsbericht "Portierung von Open HA Cluster auf OpenSolaris": ohne Video (12MB) und mit Video (53MB).

In meinem Blogeintrag vom letzten Jahr wird erklärt, wie man die Aufzeichnung auch unter Solaris anschauen kann. Das ttt.ksh Skript habe ich aktualisiert, der MP3 Ton wird damit korrekt abgespielt.

Meinen Präsentation kann man auch gerne als PDF runterladen.

Monday Sep 29, 2008

Building Open HA Cluster on OpenSolaris

The first source code tarball for project Colorado, to compile the Open HA Cluster core framework on the OpenSolaris binary distribution, is available. It also contains a set of updated scbld tools needed to run with KSH93 on OpenSolaris.

Have a look at the detailed instructions on how to get it compiled on OpenSolaris 2008.05.

The following parts have been disabled to compile:
  • dsconfig
  • agent_snmp_event_mib/mib
  • spm
  • adminconsole
and are thus not part of the created SVR4 packages, since the source code for those components relies on headers/libraries not being available for the OpenSolaris binary distribution.

Next steps: design on how to create/send IPS packages instead of building the SVR4 packages. Any help is welcome. Have a look at the Wiki page for the current plans.

Friday Sep 19, 2008

Einladung zum SourceTalk 2008

Vom 23. bis 25. September finden am  Mathematisches Institut der Universität Göttingen zum vierten mal die SourceTalk Tage 2008 statt. Am Mittwoch, 24. September ist eine Vortragsreihe OpenSolaris gewidmet. Darin werde ich um 11:15 Uhr einen Vortrag zum Thema  "Erfahrungsbericht - Portierung von Open HA Cluster auf OpenSolaris" halten.

Der Erfahrungsbericht handelt von den bisherigen Aktivitäten beim OpenSolaris Projekt Colorado. Einen Vorgucker kann man in meinem zugehörigen Blogeintrag nachlesen (in Englisch).

Zusammen mit meinen anderen Kollegen werden wir gegen 17 Uhr auch zu einer "Meet the Experts" Session für Fragen aller Art um das Thema Solaris, OpenSolaris und Open HA Cluster zur Verfügung stehen.

Man sieht sich in Göttingen!

Monday Sep 08, 2008

Project Colorado: Running Open HA Cluster on OpenSolaris

If you have ever asked yourself why Solaris Cluster Express (the Open HA Cluster binary distribution) is running on Solaris Express Community Edition and not yet on the OpenSolaris binary distribution, then you might be interested in project Colorado. This project is endorsed by the HA Clusters community group and has as its goal to provide a minimal and extensible binary distribution of Open HA Cluster that runs on the OpenSolaris binary distribution.

As always, the devil is in the details. Here are some of the reasons why this isn't just a "recompile and run" experience:

  • Package system did change:
    One of the big changes with the OpenSolaris binary distribution is the switch to use the image packaging system (IPS). As Stephen Hahn explains in some of his blogs, it is a key design criteria of IPS to not include something like the rich scripting hooks found within the System V packaging system.
    Have a look at our initial analysis on how and where those scripting hooks are currently used. Note that installation is one aspect, uninstall another. In the more modular world of network based package system this means dependencies to other packages need to be explicit and in fine granularity. And within the Cluster world, this needs to be in agreement across multiple nodes. Since we also deliver kernel modules and plan to deliver a set of optional functionality, this brings the challenge on where to put the tasks that have been within the scripting hooks for configuration and uninstallation.
    Further, if you know the details of the Open HA Cluster build steps (which is similar to the ON build steps), you know that building the packages via the pkgdefs hierarchy is an important element to later then also assemble the deliverables for the DVD image. Since we do not just want a 1:1 conversion of our existing packages into IPS packages, we need to come up with a mechanism to deliver the IPS packages into a network repository as part of the build step, since there is no delivery of a DVD image going forward.
  • Zones behaviour did change:
    The native zones brand type introduced in Solaris 10 did inherit the packages to install from the global zone. For Solaris Cluster this means that cluster related packages got automatically installed and configured when a zone got created. On OpenSolaris this behavour did change. Currently there is only the ipkg brand type available. This means that out of the box none of the ways Solaris Cluster integrates with zones works without various changes needed.
  • KSH93 vs KSH88:
    OpenSolaris did switch to KSH93 for /bin/sh and /usr/bin/ksh, the previous KSH88 shell is no longer available. Again the devil is within the details. KSH93 does deal e.g. with local vs. global variables differently. Some scripts required for building (like /opt/scbld/bin/nbuild) or to install Solaris Cluster (like /usr/cluster/bin/scinstall) do break with KSH93. The full set of impacted scripts needs to get determined.
  • The Sun Java Web Console (webconsole) is not part of OpenSolaris:
    The Java Web Console provides a common location for users to access web-based system management applications. The single entry point that the web console provides eliminates the need to learn URLs for multiple applications. In addition, the single entry point provides user authentication and authorization for all applications that are registered with the web console. All web console-based applications conform to the same user interface guidelines, which enhances ease of use.
    Those are all reasons why Solaris Cluster did choose to deliver its browser user interface (BUI) named Sun Cluster Manager using and leveraging the Sun Java Web Console framework. In addition it also uses the Web Application Framework (JATO) and Lockhart Common Components.
    Since those components are not available for the OpenSolaris binary distribution, this brings the challenge which management framework to use (and develop against) instead. Of course a substitution is not trivial and can be quite time consuming. And it is not sure if existing code can get reused.
  • Dependencies on encumbered Solaris code:
    Besides components that the OpenSolaris binary distribution did choose not to deliver anymore, there is the goal to create a freely redistributable binary distribution. This means OpenSolaris does also not deliver the Common Desktop Environment (CDE), which includes the Motif libraries. The adminconsole delivered with Solaris Cluster does use Motif and ToolTalk.
    The adminconsole tools need to get redesigned to use libraries available within OpenSolaris.
  • No SPARC support for OpenSolaris yet:
    The OpenSolaris binary distribution is currently only available for the i386 platform. Solaris Express Community Edition does provide also a distribution for SPARC. While this is not a strong inhibitor to run on OpenSolaris, it is nonetheless a reason why providing Solaris Cluster Express is still a requirement.
    The good news is that there are plans to provide SPARC support for OpenSolaris within future releases.
  • OpenSolaris Installer does not support network installations yet:
    While this not a direct problem, it becomes to one if you consider that developers for Open HA Cluster are distributed around the world and most engineers have only access to remote systems, without the possibility to perform an installation requiring keyboard and monitor.
    Again the good news is that there are plans to add support for automated network installations within future OpenSolaris releases.

Besides solving the above challenges, we also want to offer some new possibilities within Colorado. You can read the details within the umbrella requirement specification. There are separate requirement specifications to outline specific details for the planned private-interconnect changes, cluster infrastructure changes involving the weaker membership, enhancements to make the proxy file system (PxFS) optional, and changes to use iSCSI with ZFS for non-shared storage configurations.

You can provide feedback on those documents to the ha-clusters-discuss mailing list. There is a review scheduled with the Cluster Architecture Review Committee on 18 September 2008, where you are invited to participate by phone if you are interested.

Tuesday Jun 03, 2008

OpenSolaris Usergroup in Berlin: Open HA Cluster Vortrag

Meine Anwesenheit in Berlin wegen dem LinuxTag 2008 konnte ich glücklicherweise mit einem Vortrag bei der hiesigen OpenSolaris User Group verbinden. Am Mittwoch, 28.5.2008 um 19:30 Uhr, sollte der Vortrag stattfinden. Allerdings sorgten erstmal ein paar Missverständnisse dafür, daß es trotz pünktlicher Ankunft in der Tucholskystr. 48 gegen 19 Uhr zu einer viertelstündigen Verspätung kam :-( Durchaus nicht meine Art!

War aber erleichtert festzustellen, daß die Teilnehmer geduldig waren - und mich noch dazu bis etwa 21:30 Uhr haben vortragen lassen! Eine Stunde mehr als geplant - das entgegengebrachte Interesse war wirklich toll und hat zu interessanten Fragen und Diskussionen geführt.

Habe mich dann noch dem traditionellen Abendessen beim Inder um die Ecke angeschlossen, welches zu einer Fortsetzung der Gespräche und Austausch von Anekdoten rund um Open HA Cluster, OpenSolaris und die Wunder der IT bis etwa 24 Uhr eingeladen hat.

Insgesammt kann ich nur empfehlen die OpenSolaris User Group in Berlin als Interessierter oder Vortragender zu besuchen, nette und interessierte Leute, vom Anfänger bis zum alten Hasen alles dabei :-)

Anbei mein Vortrag zum runterladen. Mein Dank geht an Franz Timmer und Detlef Drewanz für diese Gelegenheit und an die Teilnehmer für das rege Interesse!

LinuxTag 2008: Hochverfügbarkeit mit Open HA Cluster

Sun Microsystems hatte einen Stand als Aussteller und Sponsor auf dem LinuxTag 2008 in Berlin. Dort gab es unter anderen auch einen Arbeitsplatz zu Open High Availability Cluster, an dem man mit Hartmut Streppel (Mi/Do), Heiko Stein (Fr/Sa), Eve Kleinknecht (Do/Fr/Sa) und mir (Mi/Do/Fr/Sa) alle Informationen und Fragen rund um das Thema Hochverfügbarkeit diskutieren und Demonstrationen live anschauen konnte. Vielen Dank an die Kollegen für die tolle und kompetente Unterstützung!

Am Samstag gab es dann innerhalb des OpenSolaris Track den Vortrag zu Hochverfügbarkeit mit Open HA Cluster, inclusive live Demonstration eines Serviceschwenk von HA PostgreSQL (Datenbank, IP, zpool (auf USB-Stick)) zwischen zwei Solaris Zonen, konfiguriert auf meinem Laptop als Single-Node Cluster (Solaris Express Community Edition 01/08 und Solaris Cluster Express 02/08). Eine prima Umgebung um sich mit der Technologie vertraut zu machen oder Agentenentwicklung zu betreiben.

Die gleiche Konfiguration kann man übrigens auch innerhalb einer VirtualBox installieren. Diese Option hatten wir auf dem Ausstellungsstand ebenfalls vorgeführt.

Anbei die Präsentation zum download.

Samstags gab es dann auch die Keynote von Ian Murdock, in welcher unter anderem die dritte Open Source Phase zu Open HA Cluster angekündigt wurde: ca. 2 Millionen Zeilen Quellcode des Solaris Cluster Core Framework! Ab jetzt ist also der komplette Quellcode von Open HA Cluster verfügbar! 

Es gibt zwei kleine Video von Terri Molini mit Eindrücken vom LinuxTag und von der LinuxNacht und Keynote.

Insgesammt hat sich der LinuxTag 2008 für mich gelohnt. Es gab viele neue Kontakte und interessante Gespräche. Nicht zuletzt hoffe ich das wir das Thema Open HA Cluster als nützlich und relevant darstellen konnten. 


This Blog is about my work at Availability Engineering: Wine, Cluster and Song :-) The views expressed on this blog are my own and do not necessarily reflect the views of Sun and/or Oracle.


« April 2014