Using SDM Cloud Adapter to Manage Solaris Zones

How does SDM Cloud Adapter work

The cloud Service Adapter can be seen as an extended spare pool that exclusively works on cloud resources. It provides cloud resources on demand and collects unused ones. In addition it provides the possibility to startup / shutdown cloud resource depending on the SDM systems need (cloud resources are hosts provided by a cloud service like EC2).

To communicate with the cloud the Cloud Adapter uses three scripting hooks:

  • showCloudHosts – This scripting hook is used to discover the cloud. It returns a list of hosts (= resources) that are part of the cloud. The Cloud Adapter calls this hook without parameters.

  • startupCloudHosts – This scripting hook is used to startup one or more hosts in the cloud. The Cloud Adapter passes the number of needed hosts as the first parameter to this script.

  • shutdownCloudHost – This scripting hook is used to shutdown one host. The targeted host name is the first parameter for this script.

All script hooks report errors via the exit code. Exit code 0 means that the script was executed successfully. Any other exit code is treated as an error. If an error occurs, then the Cloud Adapter stores the output of the scripting hook in a protocol file in the local spool directory.

On success the scritps returns the result in XML format. The Cloud Adapter parses the output an extracts the provided information.

Cloud Adapter does not know too much about the cloud. All information are gathered via the three scripting hooks. The Service Domain Manager Module of Sun Grid Engine 6.2u3 comes with scripts for Amazon EC2. You can imagine that it is not too complicated to adapt them to different Cloud solutions.

Using the Could Adapter for Managing Solaris Zones

With this blog entry I describe a set of sample scripts that can manage Solaris Zones as cloud resources. The scripts can be found in the attachment of this article. They are prototypes and should not be used for productive purpose.

What does the scripts:

    This scripting hook creates one or more new zones. It uses the zoneadm command to find out what zones are already installed but not running. If there are some they will be started. If there are no zones installed in creates new one by cloning a sparse zone.
    In addition  this scripts setup a virtual nic for the zone with crossbow.

    This scripts simply stops the zone by calling zoneadm -z <zonename> halt.

    This scripts discovers all running zones with the zoneadm list command.

The following sections describes the necessary action for setting up SDM with a Cloud Adapter that manages zones.


  • Opensolaris 2009.06 (with Crossbow)

  • Sun Grid Engine 6.2u3

  • Service Domain Manager 1.0u3 (SDM)

All commands must be executed as root.

Preparing the file system

  • We need one shared directory. All zones must have will have access to this directory (/gridware):

# zfs create -o mountpoint=/gridware rpool/gridware
  • We need a zfs file system where the file systems for the zones are stored

# zfs create -o mountpoint=/zones rpool/zones
  • The Service Domain Manager stores in /etc/sdm the definition of the locally installed SDM system. I create a zfs filesystem for this directory

# zfs create -o mountpoint=/etc/sdm rpool/etc_sdm
  • Also for the local spool directory of SDM I create a zfs file system:

# zfs create -o mountpoint=/var/spool/sdm rpool/var_sdm
# cd /gridware
# tar xzf <path to scripts>/zones-scripts-0.1.tar.gz
  • Install SGE6.2u3 packages
    You can download the opensolaris package for SDM from Please do not install the packages with are available via the IPS extra repository. It brings only SGE6.2u2 on your machine. It does not include the SDM Cloud Adapter component. You will need the following packages:
    Sun Grid Engine 6.2 Update 3, Solaris x64 (required for 64-Bit boxes), tar.gz format
    Sun Grid Engine 6.2 Update 3, Solaris x86 (required for 32-Bit boxes), tar.gz format
    Sun Grid Engine 6.2 Update 3, Service Domain Manager (SDM) module , tar gz format
    Download the packages and store them in your download folder.

  • Unpackage all packages in the download directory (will create a subdirectory sge6_2u3):

# cd <download_dir>
# unzip
  inflating: sge6_2u3/sge-6_2u3-bin-solaris-x64.tar.gz  
  inflating: sge6_2u3/sge-6_2u3-common.tar.gz  
# unzip
  inflating: sge6_2u3/sge-6_2u3-inspect.tar.gz #
# unzip
  inflating: sge6_2u3/sdm-1_0u3-core.tar.gz  #
  • Unpack the SGE package in /gridware/sge. I did not install SGE in /opt/sge because the /opt directory has a special meaning for zones. It is mapped into the zones filesystem as read only. However Grid Engine needs write access. So please do not install Grid Engine in /opt.:

# mkdir /gridware/sge
# cd /gridware/sge
# tar xzf <download_dir/sge6_2u3/sge-6_2u3-common.tar.gz
# tar xzf <download_dir/sge6_2u3/sge-6_2u3-bin-solaris-x64.tar.gz
# tar xzf <download_dir/sge6_2u3/sge-6_2u3-inspect.tar.gz
  • Unpack the SDM package in /gridware/sdm

# mkdir /gridware/sdm
# cd /gridware/sdm
# tar xzf /tmp/sge6_2u3/sdm-1_0u3-core.tar.gz
  • Add the /gridware/sdm/bin directory to your path

# export PATH=$PATH:/gridware/sdm/bin

Check qmaster hostname

For the qmaster installation it is important that hostname of the localhost is resolvable. If your system gots the IP address via dhcp you must add the hostname to the hosts file. On solaris you must add the hostname to the file /etc/inet/hosts. The hostname of my solaris box is lappy:

# vim /etc/inet/hosts
…   lappy

Please ensure that your the hostname of your host is not resolved to loop back interface. The following entry in the hosts file will cause problems:   lappy localhost

Remove the hostname from this line: localhost

You can test the resolvability with the Grid Engine util:

# /gridware/sge/utilbin/sol-amd64/gethostname -name

Install Sun Grid Engine

You can now continue with the Grid Engine qmaster installation. The easiest way for installing Sun Grid Engine is using the graphical installer. It is described in detail at Installing Sun Grid Engine with graphical installer. You must enable the JMX server inside of qmaster (JMX will be enabled by default since 6.2u3). Installation of execution daemons is not necessary. SDM will install them automatically on any zone.

For the graphical install start the start_gui_installer script in the SGE_ROOT directory.

# cd /gridware/sge
# ./start_gui_installer

I choose the following settings for the installation:

  • Install only qmaster (Default is: qmaster and execd)

  • Express Installation (Default value)

  • Admin user: root (Default value)

  • Grid Engine root directory: /gridware/sge (Default value)

  • Cell name: default (Default value)

  • Qmaster port: 6444 (Default value)

  • Execd port: 6445 (Default value)

    JMX port: 6446  (Default value)

  • Administrator mail: none (Default value)

  • Automatically start service(s) at machine boot: no (None default: Please unselect the check box)

If the installation finished successfully, qmaster should run on the localhost. You can check it with the qhost command:

# source /gridware/sge/default/common/
# qhost

Install SDM Master Host

The master host installation is described in the section "How to Install SDM on the Master Host". Here is a example:

# sdmadm  -s sdm1 -p system  install_master_host   \\
          -ca_admin_mail ""         \\
          -ca_state "fooLand"                      \\
          -ca_country "FO"                         \\
          -ca_location "fooLocation"               \\
          -ca_org_unit "fooUnit"                   \\
          -ca_org "fooOrg"                         \\
          -au root                                 \\
          -cs_port 31006                           \\
          -l /var/spool/sdm/sdm1                   \\
          -sge_root /gridware/sge                  \\

The example installs an SDM system with name sdm1 in the simple mode. Only one Java virtual Machine (JVM) will run on your host. Yes, SDM is implemented in Java. The processes of SDM are JVMs.

The single JVM will bind port 31006. For spooling the directory /var/spool/sdm/sdm1 is used.

Most of the parameters of the install_master_host command are used for the Certificate Authority (all parameters which are starting with -ca...). You can use reasonable values for your environment. The CA parameters are only stored in the root certificate of the SDM system. They are not used for other purposes.

In the next step you can startup the JVM. If you do not want to explicityly provide the global parameters "-s sdm1 -p system" for every sdmadm command call, you can make the SDM system sdm1 your default system your host:

# sdmadm -s sdm1 -p system sedbc
Default system set to 'sdm1'

From now on the sdmadm command addresses the system sdm1 unless something else is specified.

OK we can startup the SDM master host:

# sdmadm suj
jvm   host  result  message
cs_vm lappy STARTED

The SDM master host has been started successfully.

Adjusting settings in /gridware/scripts/

The settings file defines all parameters for the zones scripts. All scripts source this file. Adjust the values in this file to your environment. If you have used the same commands and parameters that are described in this document, most default settings should already be correct. However, you should check at least the following parameters:

  • MAX_ZONE_NR: Defines the max number of zones used be the Could Adapter.
    The Zones Cloud Adapter use the following schema for the names of the zones: $ZONE_NAME_PREFIX<zone_nr>. The variable $ZONE_NAME_PREFIX is also defined in the settings file. The zone_nr is a number in the range from 1 to 253 (limited by the class C network, 254 is reserved for the IP address of the globalzone, 1 is the zone number for the spare_zone. It is used as image for new zones).

  • PHYSICAL_NIC: Defines the name of the physical network interface.

If you have strictly followed the instructions in this article, no other parameter needs to be adjusted. If you did not, please check also the other parameters. They are documented in the settings file.

Setting up DNS

To make the zones resolvable all zone names will be added to the /etc/inet/hosts file. This file will be copied to all zones once they are started. For your convenience the zones scripts contains the script that creates the hosts file. Please make a backup of your hosts file before .

# cp /etc/inet/hosts /etc/inet/hosts.sav
# /gridware/scripts/ >> /etc/inet/hosts

Setting up the First Zone

All zones will be setup automatically from the zones script except the first zone. This first zone will be used as blueprint for the additional zones (zones will be cloned). Create the first zone with the following command:

# /gridware/script/ -debug 1

Depending on the internet connection of your machine, the startup of the first zone may take a while. It must download from packages from the IPS repository (approx. 100MB). If the zone has been successfully create you will see the following output:

   zone not yet available, retry in 5 seconds
   zone not yet available, retry in 5 seconds
   zone not yet available, retry in 5 seconds
  zone started
Creating copy of /etc/passwd and /etc/shadow on zone 

Cloning a zone is only possible if the source zone is stopped. Shutdown the first zone.

# zoneadm -z z1 halt

Adding the Zones Cloud Service to SDM

In side of the SDM system the zones Cloud Service will startup/shutdown the zones. To add such a service please use the following command:

# sdmadm add_cloud_service -s zones -j cs_vm -h localhost -f /gridware/scripts/zones.xml
service_name hostname   jvm_name message
zones        lappy      cs_vm    ADDED  

The -s switch defines the name of the service. The -j switch defines that the service should run the cs_vm (it is the only JVM in a simple installed system). The -h switch defines that the service should run on the localhost. The -f switch defines the configuration file for the zones service. Please adjust the configuration file in /gridware/scripts/zones.xml to your environment (if you strictly followed the instruction it is not necessary).

In the next step the zones service can be started with the startup_component command (shortcut is suc):

# sdmadm startup_component -c zones -h localhost
host       service    cstate  sstate 
lappy      spare_pool STARTED RUNNING
           zones      STARTED RUNNING

The zones service is now active. With the default configuration from the /gridware/scripts/zones.xml script it will immediately create and startup a new zone. However the process is on going in the background. You can see some hints in the log file of the JVM where the zones service is running:

# tail -f /var/sdm/sdm1/log/cs_vm-0.log
07/01/2009 12:46:12|18|.cloud.CloudServiceAdapterImpl.doStartService|I|Service zones:Started cloud service adapter.
07/01/2009 12:46:14|19|.cloud.CloudScriptInterface.startupCloudHosts|I|Service zones: Starting up 1 additional cloud host(s)
07/01/2009 12:48:34|19|.cloud.CloudScriptInterface.startupCloudHosts|I|Service zones: Started up 1 cloud hosts: [[hostname: z2, instanceId: i-z2, launchTime: 2009-07-01T12:48:33.000Z] ].

The zone z2 has been started, the SDM resource is now available:

# sdmadm show_resource
service id state    type flags usage annotation
zones   z2 ASSIGNED host       2

Adding a Sun Grid Engine Adapter to the System

Having only a cloud service that brings zones as host resources into the SDM system is nice. However what can you do with these resources. You need a service that consumes the resources. The SDM system supports currently two consuming services.

  • The Spare_pool
    It does not really consume resources. It has only the task to hold idle resources that are not used by other services. A cloud service can also be seen as spare pool, however it has additional functionality (startup/shutdown of resources).

  • The Sun Grid Engine Service
    This service is really a resource consuming one. If the SDM system assigns a resource to a Grid Engine service the Grid Engine Service Adapter will install the Grid Engine Execution daemon on the host.

To demonstrate the flexibility of a cloud adapter (with zones) we bring now a Grid Engine service into the SDM system. The zones scripts contain already the ready to use configuration for a Grid Engine service (if followed strictly the instructions in this document no changes are necessary).

The Grid Engine Service Adapter communicates with the qmaster via JMX. For password less authentication it needs a keystore containing the private key and certificate of a Grid Engine Admin user. On our system we have only root as admin user.

  • As a first step we create the keystore for root:

# source /gridware/sge/default/common/
# $SGE_ROOT/util/sgeCA/sge_ca -userks

The sge_ca -userks command creates the keystore of all Grid Engine Admin users. They are created in

# ls -l /var/sgeCA/port6444/default/userkeys/root/keystore 
-rw------- 1 root root 2939 2009-07-01 12:59 /var/sgeCA/port6444/default/userkeys/root/keystore

This keystore is referenced in the configuration of the Sun Grid Engine Service.

  • As next step we can add the Grid Engine Service to the SDM system:

# sdmadm ags -s sge -j cs_vm -h localhost -f /gridware/scripts/sge.xml 
Grid Engine service sge added
  • Start up the sge service

# sdmadm startup_component -c sge -h localhost
comp host       message          
sge  lappy      startup triggered

The configuration in /gridware/scripts/sge.xml defines already an MaxPendingJobsSLO for the Grid Engine Service. If you submit now jobs into the Grid Engine cluster the SDM system will move resources automatically into Grid Engine:

# qhost
global                  -               -     -       -       -       -       -
# qsub -t 1-100 -b y -o /dev/null -j y sleep 60
Unable to run job: warning: root your job is not allowed to run in any queue
Your job-array 1.1-100:1 ("sleep") has been submitted.

Grid Engine prints a warning because the Grid Engine cluster has currently no host where the jobs can run. However, the job is accepted. The MaxPendingJobsSLO will detect that there are pending jobs. It creates a need for new resources soon:

# sdmadm show_slo
service    slo                 quantity urgency request                                           
sge        fixed_usage         0        0       SLO has no needs                                  
           maxPendingJobs      100      60      type = "host"                                     
spare_pool PermanentRequestSLO 10       1       type = "host"                                     
zones      PermanentRequestSLO 10       2       type = "host" & owner = "zones"

You can see the maxPendingJobs SLO has a need for 100 resources, urgency of the need is 60. It needs resource of type host.

# sdmadm sr
service id state     type flags usage annotation      
zones   z2 ASSIGNED  host       2     

The zones service has a matching resource (z2). This resource has lower usage (=2). The SDM system will move the resource into the Grid Engine service:

# sdmadm sr
service id state     type flags usage annotation      
sge     z2 ASSIGNING host       60    Installing execd

# sdmadm sr
service id state    type flags usage annotation            
sge     z2 ASSIGNED host       60    Got execd update event

The zones cloud adapter has now no more resources. It starts up a new zone to fill up the gap.

These steps are repeated until all jobs in Grid Engine are finished. The MaxPendingJobsSLO will not longer assign usage to the resources. The PermamentRequestSLO from the zones service will have a higher urgency than the usage of the resource of the SGE service. The SDM system will move the resources back to the zones service.

The zones service has a limit of max 1 resource. It will automatically shutdown the additional resources (it will shutdown the zones).


I wrote this article to demonstrate how to manage zones with the SDM Cloud Adapter. The Cloud Adapter itself does not know anything about the cloud. It uses only scripting hooks to establish a communication.

The usage of the CloudAdapter is not limited to the EC2 cloud. With small shell scripts it is possible to adapt different cloud solutions. I implemented in a first step a solution for managing cloud. Other may follow (VirtualBox, xVM Opcenter).


Post a Comment:
  • HTML Syntax: NOT allowed



« July 2016