New White Paper: Practicing Solaris Cluster using VirtualBox

For developers it is often convenient to have all tools necessary for their work in one place, ideally on a laptop for maximum mobility.

For system administrators, it is often critical to have a test system on which to try out things and learn about new features. Of course the system needs to be low cost and transportable to anywhere they need to be.

HA Clusters are often perceived as complex to setup and resource hungry in terms of hardware requirements.

This white paper explains how to setup a single x86 based system (like a laptop) with OpenSolaris, configuring a training and development environment for Solaris 10 / Solaris Cluster 3.2 and using VirtualBox to setup a two node cluster. The configuration can then be used to practice various technologies:

OpenSolaris technologies like Crossbow (to create virtual networking adapters), COMSTAR (to export iSCSI targets from the host being used as iSCSI initiators by the Solaris Cluster nodes as shared storage and quorum device), ZFS (to  export a ZFS volume as iSCSI targets and as failover file system within the cluster) and IPsec (to secure the cluster private interconnect traffic) are used for the host system and VirtualBox guests to configure Solaris 10 / Solaris Cluster 3.2.

Solaris Cluster technologies like software quorum and zone clusters are getting used to setup HA MySQL and HA Tomcat as failover services running in one virtual cluster. A second virtual cluster is being used to show how to setup Apache as a scalable service.

The instructions can be used as a step-by-step guide to setup any x86 64bit based system that is capable to run OpenSolaris. A CPU which supports hardware virtualization is recommended as well as at least 3GB of main memory. In order to try out if your system works, simply boot the OpenSolaris live CD-ROM and confirm with the Device Driver Utility (DDU) that all required components are able to run. The hardware compatibility list can be found at http://www.sun.com/bigadmin/hcl/. The role model for such a system is the Toshiba Tecra M10 with 4GB main memory.

If you ever had missed a possibility to just try out things with Solaris 10 and Solaris Cluster 3.2 and exploring new features - this is your chance :-)

Comments:

I had other plans for the weekend but this looks so cool... ;)

Posted by Thommy M. Malmström on October 16, 2009 at 10:57 AM CEST #

Hi Thorsten,

This looks fantastic. Can I use Sun Cluster3.2 Update 3 with this configuration rather than Update 2?

Kind regards

Rumi

Posted by Rumi on December 28, 2009 at 10:22 AM CET #

With your post now is a fantastic day to "sperimentare"

Posted by Renato Morano on January 05, 2010 at 01:18 AM CET #

Nice writing, thanks.
I experiment with a similar configuration but using Linux (Gentoo) as a host OS with VirtualBox. (And tried VMWare Server and Workstation above Linux and Win XP, too.)
All of this works fine except one thing: Sometimes the pm_tick delay gets too high, and one of the cluster nodes panicks. Your syslog.conf modification does not help to prevent this.
Google gives some clues, how to solve this, but found no good solution. The best clue I've found was trimming the cluster transport timings:
scconf -w heartbeat_timeout=60000
scconf -w heartbeat_quantum=10000
This notably reduces the possibility of the panic, but does not eliminate them. (Further trimming does not make more advance.)

Could you suggest something to prevent this pm_tick delay panics?

Posted by Gabor Simon on January 08, 2010 at 11:03 AM CET #

Thanks for the positive comments and sorry for the delay, I was on vacation.

@Rumi: Yes, you can use SC 3.2 11/09 (Update 3) as well. But I would recommend to use Solaris 10 05/09 (Update 7) because of CR 6888193, as mentioned in the white paper. Once a fix is available, you can use S10 Update 8 as well, but until then the iSCSI initiator part would not work as described in the paper.

@Gabor: I don't have a better recommendation in order to prevent the pm_tick delays. Since the messages flood syslog without my recommended change to syslog.conf, that was the only reason that made my system panic/non-responsive. Of course you have to restart syslog after the change. But I guess it also depends on the host system and how busy the guests drive the CPU (and thus add delays for the guests). In the M10 case it is a dual core cpu with hardware virtualization feature turned on in the BIOS.

Posted by Thorsten Frueauf on January 18, 2010 at 05:44 AM CET #

Post a Comment:
Comments are closed for this entry.
About

This Blog is about my work at Availability Engineering: Wine, Cluster and Song :-) The views expressed on this blog are my own and do not necessarily reflect the views of Sun and/or Oracle.

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today