Tips on deploying and managing Oracle Solaris, especially in clouds

Building an OpenStack Cloud for Solaris Engineering, Part 2

Dave Miner
Sr. Principal Software Engineer

Continuing from where I left off with part 1 of this series, in this posting I'll discuss the elements that we put in place to deploy the OpenStack cloud infrastructure, also known as the undercloud.

The general philosophy here is to automate everything, both because it's
a best practice and because this cloud doesn't have any dedicated staff
to manage it; we're doing it ourselves in order to get first-hand
operational experience that we can apply to improve both Solaris and
OpenStack.  As I said in part 1, we don't have an HA requirement at this
point, but we'd like to keep any outages, both scheduled and
unscheduled, to less than a half hour, so redeploying a failed node
should take no more than 20 minutes.  The pieces that we're using are:

  • Automated Installation services and manifests to deploy Solaris
  • SMF profiles to configure system services
  • IPS site packages installed as part of the AI deployment to automate some first-boot configuration
  • A Puppet master to provide initial and ongoing configuration automation

I'll elaborate on the first two below, and discuss Puppet in the next posting.  The IPS site packages we are using are specific to Oracle's environment so I won't be covering those in detail.

Sanitized versions of the manifests and profiles discussed below are available for download as a tar file.

Automated Installation

Building the undercloud nodes means we're doing bare-metal provisioning, so we'll be using the Automated Installation (AI) feature in Solaris 11.  Most of the OpenStack services could run in kernel zones, or even non-global zones, but we're planning for larger scale and want to have some extra horsepower.  Therefore we opted not to go in that direction for now, but it may well be an option we use later for some services.

I already had an existing AI server in this cloud's lab, and it provides services to systems that aren't part of this cloud.  As we release each development build of a Solaris 11 update or Solaris 12 there's a new service generated on it.  The pace of evolution of this cloud is likely to be different from those other systems as well, so that led me to create two new AI services specifically for the cloud; we can make these services aliases of existing services so we don't need to bother replicating the boot image, thus the commands look like (output ellided):

# installadm create-service -n cloud-i386 --aliasof solaris11_2-i386
# installadm create-service -n cloud-sparc --aliasof solaris11_2-sparc

The next step is setting up the manifests that specify the installation.  For this, I've taken the default derived manifest that we install for services and modified it to:

  1. Specify a custom package list
  2. Lay out all of the storage
  3. Select the package repository based on Solaris release
  4. Install a Unified Archive rather than a package set based on a boot argument

You can download the complete manifest, I'll discuss the various customizations here.

The package list we're explicitly installing is below, there are of course a number of other packages pulled in as dependencies, so this expands out to just over 500 packages installed (perhaps not surprisingly, about 35% are Python libraries):


We start with solaris-minimal-server in order to build an effectively minimized environment.  We've chosen to install the same package set on all nodes so that any of them can be easily repurposed to a different role in the cloud if needed, so the openstack group package is used rather than the packages for the various OpenStack services.  We'll be using MySQL as the database, so need its client package.  snoop is there for network diagnostics (yes, we should use tshark instead but I'm old-school :-), some Python packages that we need to support OpenStack, as well as RabbitMQ as that's our message broker.  We use LDAP for authentication so that's included.  I find rsync convenient for caching crash dumps off to other systems for examination.  ssh is needed for remote access.  nss-utilities are needed for some LDAP configuration.  OpenStack needs consistent time, so NTP is required.  We use Kerberos for some NFS access so that's included, along with the automounter and NFS client.  We want to use SMTP notifications for any fault management events, so include it.  The utilities to manage Oracle hardware may come in handy, so we include them.  Puppet is going to provide ongoing configuration management, so it's included.  We need rad-evs-controller to back-end our Neutron driver.  bpf is listed only because of a missing dependency in another package that causes runaway console messages from the DLMP daemon; that's being fixed.  The NIS package provides some things that LDAP needs.  We're using kernel zones as the primary OpenStack guest, so need that zone brand installed.  The doctools package provides the man command, don't want to be caught without a man page when you need it!  less is there because it's better than more.  Finally, we install a couple of site packages, one that does some general customizations, another that delivers the base certificate needed for TLS access to our LDAP servers.

The storage layout we standardized on for the undercloud is to have a two-way mirror for the root pool, formed out of two of the smallest disks (usually 300 GB on the systems we're using), with any remaining disks in a separate pool, called tank on all of the systems, that can be used for other purposes.  On the Cinder node, it's where we put all the ZFS iSCSI targets; in the case of Glance it's where we store the images.  We're also planning to use it for Swift services on various nodes, but we haven't deployed Swift yet.  The tank pool gets built with varying amounts of redundancy based on the number of disks.  This logic is all in the last 60 lines of the manifest script.  It's an interesting example of using the derived manifest features to do some reasonably complex customization for individual nodes.

We internally have separate repositories for Solaris 11 and Solaris 12, so the manifest defaults to Solaris 12 and if it determines we've booted Solaris 11 to install, then it uses a different repository:

if [[ $(uname -r) = 5.11 ]]; then
aimanifest set /auto_install/ai_instance/software[@type="IPS"]/source/publisher[@name="solaris"]/origin@name http://example.com/solaris11

The last trick I added was the ability to select a Unified Archive to install instead of the packages.  We'll be using archives as the backup/recovery mechanism for the infrastructure, so this provides a faster way to deploy nodes when we already have the desired archive available.  On a SPARC system you'd select this using a boot command like:

ok boot net:dhcp - install archive_uri=http://example.com/openstack_archive.uar

On an x86 system you'd add this as -B archive_uri=<uri> to the $multiboot line in grub.cfg.

The code for this in the script looks like:

if [[ ${SI_ARCH} = sparc ]]; then
ARCHIVE_URI=$(prtconf -vp | nawk \
'/bootargs.*archive_uri=/{n=split($0,a,"archive_uri=");split(a[2],b);split(b[1],c,"'\''");print c[1]}')
ARCHIVE_URI=$(devprop -s archive_uri)
if [[ -n "$ARCHIVE_URI" ]]; then
# Replace package software section with archive
aimanifest delete software
swpath=$(aimanifest add -r /auto_install/ai_instance/software@type ARCHIVE)
aimanifest add $swpath/source/file@uri $ARCHIVE_URI
inspath=$(aimanifest add -r $swpath/software_data@action install)
aimanifest add $inspath/name global


Once we have the manifest, it's a simple matter to make it the default manifest for both of the cloud services:

# installadm create-manifest -n cloud-i386 -d -f havana.ksh
# installadm create-manifest -n cloud-sparc -d -f havana.ksh

Each of the systems we're including in the cloud infrastructure are assigned to the appropriate AI service with a command such as:

# installadm create-client -n cloud-sparc -e <mac address>

SMF Configuration Profiles

But before we go on to installing the systems, we also want to provide SMF (Service Management Facility) configuration profiles to automate the initial system configuration; otherwise, we'll be faced with running the interactive sysconfig tool during the initial boot.  For this deployment, we have a somewhat unusual twist, in that there is configuration we'd like to share between the infrastructure nodes and guests since they are ultimately all nodes on the Oracle internal network.  Also, for maximum flexibility and reuse, the configuration is expressed by multiple profiles, with each designed to configure only some aspects of the system.  In our case, we have a directory structure on the AI server that looks like:


The first three are specific to the infrastructure nodes.  The infrastructure.xml profile provides the fixed network configuration, along with coreadm setup and fault management notifications; we use SMTP notifications to alert us to any faults from the system.  The puppet.xml profile configures the puppet agents with the name of the master node.  The users.xml profile configures the root account as a role and sets its password, and also sets up a local system administrator account that's meant to be used in case of networking issues that prevent our administrators from using their normal user accounts.

The three profiles under the common directory are also used to configure guest instances in our cloud.  I'll show how that's done later in this series, but it's important that they be under a separate directory.  basic.xml configures the system's timezone, default locale, keyboard layout, and console terminal type.  dns.xml configures the DNS resolver, and ldap.xml configures the LDAP client.

We load each of these into the AI services with the command:

# installadm create-profile -n cloud-sparc -f <file name>

The important aspect of the above command is that no criteria are specified for the profiles, which means that they are applied to all clients of the service.  This also means that they must be disjoint; no two profiles can attempt to configure the same property on the same service, otherwise SMF will not apply the profiles that conflict.

Once all that's done, we can see the results:

# installadm list -p -m -n cloud-sparc
Service Name Manifest Name Type Status Criteria
------------ ------------- ---- ------ --------
cloud-sparc havana.ksh derived default none
Service Name Profile Name Criteria
------------ ------------ --------
cloud-sparc basic.xml none
dns.xml none
infrastructure.xml none
ldap.xml none
puppet.xml none
users.xml none
At this point we've got enough infrastructure implemented to install the OpenStack undercloud systems.  In the next posting I'll cover the Puppet manifests we're using; after that we'll get into configuring OpenStack itself.

Be the first to comment

Comments ( 0 )
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.Captcha

Integrated Cloud Applications & Platform Services