Tuesday Oct 28, 2014

Heating Up Your OpenStack Cloud

As part of the support updates to Solaris 11.2, we recently added the Heat orchestration engine to our OpenStack distribution.  If you aren't familiar with Heat, I highly recommend getting to know it, as you'll find it invaluable in deploying complex application topologies within an OpenStack cloud.  I've updated the script tarball from my recent series on building the Solaris engineering cloud to include configuration of Heat, so if you download that and update your cloud controller to the latest SRU, you can run havana_setup.py heat to turn it on.

OK, once you've done that, what can you do with Heat?  Well, I've added a script and a Heat template that it uses to the tarball to give you at least one idea.  The script, create_image, is similar to a script that we run to create custom Unified Archive images internally for the Solaris cloud.  The basic idea is to deploy an OpenStack instance using the standard archive that release engineering constructs for the product build, add some things we need to it, then save an image of that for the users of the cloud to use as a base deployment image.  I'd originally written a script to do this using the nova CLI, but using a Heat template simplified it.  The simple.hot file in the tarball is the template that it uses; that template is a simpler version of a two-node template from the heat-templates repository.  It's fairly self-explanatory so I'm not going to walk through it here.

As for create_image itself, the standard Solaris archive contains the packages in the solaris-minimal-server group, a pretty small package set that really isn't too useful for anything itself, but makes a nice base to build images that include the specific things you need.  In our case, I've defined a group package that pulls in a bunch of things we typically use in Solaris development work: ssh client, LDAP, NTP, Kerberos, NFS client and automounter, the man command, and less.  Here's what the main part of the package manifest looks like:

depend fmri=/network/ssh type=group
depend fmri=group/system/solaris-minimal-server type=group
depend fmri=ldapcert type=group
depend fmri=naming/ldap type=group
depend fmri=security/nss-utilities type=group
depend fmri=service/network/ntp type=group
depend fmri=service/security/kerberos-5 type=group
depend fmri=system/file-system/autofs type=group
depend fmri=system/file-system/nfs type=group
depend fmri=system/network/nis type=group
depend fmri=text/doctools type=group
depend fmri=text/less type=group

In our case we bundle the package in a package archive file that we copy into the image using scp and then install the group package.  Doing this saves our developers a few minutes in getting what they need deployed, and that's one easy way we can show them value in using the cloud rather than our older lab infrastructure.  It's certainly possible to do much more interesting customizations than this, so experiment and share your ideas, we're looking to make Heat much more useful on Solaris OpenStack as we move ahead.  You can also talk to us at the OpenStack summit in Paris next week, a number of us will be manning the booth at various times when we're not in sessions at the design summit or the conference itself.

Oh, and for those who are interested, the Solaris development cloud is now up past 100 users and has 5 compute nodes deployed.  Still not large by any measure, but it's growing quickly and we're learning more about running OpenStack every day.

Friday Sep 19, 2014

Building an OpenStack Cloud for Solaris Engineering, Part 4

The prior parts of this series discussed the design and deployment of the undercloud nodes on which our cloud is implemented.  Now it's time to configure OpenStack and turn the cloud on.  Over on OTN, my colleague David Comay has published a general getting started guide that does a manual setup based on the OpenStack all-in-one Unified Archive, I recommend at least browsing through that for background that will come in handy as you deal with the inevitable issues that occur in running software with the complexity of OpenStack.  It's even better to run through that single-node setup to get some experience before moving on to trying to build a multi-node cloud.

For our purposes, I needed to script the configuration of a multi-node cloud, and that makes everything more complex, not the least of the problems being that you can't just use the loopback IP address ( as the endpoint for every service.  We had (compliments of my colleague Drew Fisher) a script for single-system configuration already, so I started with that and hacked away to build something that could configure each component correctly in a multi-node cloud.  That Python script, called havana_setup.py, and some associated scripts are available for download.  Here, I'll walk through the design and key pieces.


Before the proper OpenStack configuration process, you'll need to run the gen_keys.py script to create some SSH keys.  These are used to secure the Solaris RAD (Remote Administration Daemon) transport that the Solaris Elastic Virtual Switch (EVS) controller uses to manage the networking between the Nova compute nodes and the Neutron controller node.  The script creates evsuser, neutron, and root sub-directories in whatever location you run it, and this location will be referenced later in configuring the Neutron and Nova compute nodes, so you want to put it in a directory that's easily shared via NFS.  You can (and probably should) unshare it after the nodes are configured, though.

Global Configuration

The first part of havana_setup.py is a whole series of global declarations that parameterize the services deployed on various nodes.  You'll note that the PRODUCTION variable can be set to control the layout used; if its value is False, you'll end up with a single-node deployment.  I have a couple of extra systems that I use for staging and this makes it easy to replicate the configuration well enough to do some basic sanity testing before deploying changes.,

MY_NAME = platform.node()
MY_IP = socket.gethostbyname(MY_NAME)

# When set to False, you end up with a single-node deployment

    CONTROLLER_NODE = "controller.example.com"


    GLANCE_NODE = "glance.example.com"
    CINDER_NODE = "cinder.example.com"
    NEUTRON_NODE = "neutron.example.com"

Next, we configure the main security elements, the root password for MySQL plus passwords and access tokens for Keystone, along with the URL's that we'll need to configure into the other services to connect them to Keystone.

MYSQL_ROOTPW = "mysqlroot"
ADMIN_PASSWORD = "adminpw"
SERVICE_PASSWORD = "servicepw"

AUTH_URL = "http://%s:5000/v2.0/" % KEYSTONE_NODE
IDENTITY_URL = "http://%s:35357" % KEYSTONE_NODE

The remainder of this section configures specifics of Glance, Cinder,  Neutron, and Horizon.  For Glance and Cinder, we provide the name of the base ZFS dataset that each will be using.  For Neutron, the NIC, VLAN tag, and external network addresses, as well as the subnets for each of the two tenants we are providing in our cloud.  We chose to have one tenant for developers in the organization that is funding this cloud, and a second tenant for other Oracle employees who want to experiment with OpenStack on Solaris; this gives us a way to grossly allocate resources between the two, and of course most go to the tenant paying the bill.  The last element of each tuple in the tenant network list is the number of floating IP addresses to set as the quota for the tenant.  For Horizon, the paths to a server certificate and key must be configured, but only if you're using TLS, and that's only the case if the script is run with PRODUCTION = True.  The SSH_KEYDIR should be set to the location where you ran the gen_keys.py script, above.

GLANCE_DATASET = "tank/glance"
CINDER_DATASET = "tank/cinder"

UPLINK_PORT = "aggr0"
    VXLAN_RANGE = "500-600"
    TENANT_NET_LIST = [("tenant1", "", 10),
                       ("tenant2", "", 60)]
    VXLAN_RANGE = "400-499"
    TENANT_NET_LIST = [("tenant1", "", 5), 
                       ("tenant2", "", 5)]


SERVER_CERT = "/path/to/horizon.crt" SERVER_KEY = "/path/to/horizon.key"

SSH_KEYDIR = "/path/to/generated/keys"

Configuring the Nodes

The remainder of havana_setup.py is a series of functions that configure each element of the cloud.  You select which element(s) to configure by specifying command-line arguments.  Valid values are mysql, keystone, glance, cinder, nova-controller, neutron, nova-compute, and horizon.  I'll briefly explain what each does below.  One thing to note is that each function first creates a backup boot environment so that if something goes wrong, you can easily revert to the state of the system prior to running the script.  This is a practice you should always use in Solaris administration before making any system configuration changes.  It also saved me a ton of time in development of the cloud, since I could reset within a minute or so every time I had a serious bug.  Even our best re-deployment times with AI and archives are about 10 times that when you have to cycle through network booting.


MySQL must be the first piece configured, since all of the OpenStack services use databases to store at least some of their objects.  This function sets the root password and removes some insecure aspects of the default MySQL configuration.  One key piece is that it removes remote root access; that forces us to create all of the databases in this module, rather than creating each component's database in its associated module.  There may be a better way to do this, but since I'm not a MySQL expert in any way, that was the easiest path here.  On review it seems like the enable of the mysql SMF service should really be moved over into the Puppet manifest from part 3.


The keystone function does some basic configuration, then calls the /usr/demo/openstack/keystone/sample_data.sh script to configure users, tenants, and endpoints.  In our deployment I've customized this script a bit to create the two tenants rather than just one, so you may need to make some adjustments for your site; I have not included that customization in the downloaded files.


The glance function configures and starts the various glance services, and also creates the base dataset for ZFS storage; we turn compression on to save on storage for all the images we'll have here.  If you're rolling back and re-running for some reason, this module isn't quite idempotent as written because it doesn't deal with the case where the dataset already exists, so you'd need to use zfs destroy to delete the glance dataset.


Beyond just basic configuration of the cinder services, the cinder function also creates the base ZFS dataset under which all of the volumes will be created.  We create this as an encrypted dataset so that all of the volumes will be encrypted, which Darren Moffat covers at more length in OpenStack Cinder Volume encryption with ZFS. Here we use pktool to generate the wrapping key and store it in root's home directory.  One piece of work we haven't yet had time to take on is adding our ZFS Storage Appliance as an additional back-end for Cinder.  I'll post an update to cover that once we get it done.  Like the glance function, this function doesn't deal propertly with the dataset already existing, so any rollback also needs to destroy the base dataset by hand.

nova_controller & nova_compute

Since our deployment runs the nova controller services separate from the compute nodes, the nova_controller function is run on the controller node to set up the API, scheduler, and conductor services.  If you combine the compute and controller nodes you would run this and then later run the nova_compute function.  The nova_compute function also makes use of a couple of helper functions to set up the ssh configuration for EVS.  For these functions to work properly you must run the neutron function on its designated node before running nova_compute on the compute nodes.


The neutron setup function is by far the most complex, as we not only configure the neutron services, including the underlying EVS and RAD functions, but also configures the external network and the tenant networks.  The external network is configured as a tagged VLAN, while the tenant networks are configured as VxLANs; you can certainly use VLANs or VxLANs for all of them, but this configuration was the most convenient for our environment.


For the production case, the horizon function just copies into place an Apache config file that configures TLS support for the Horizon dashboard and the server's certificate and key files.  If you're using self-signed certificates, then the Apache SSL/TLS Strong Encryption: FAQ is a good reference on how to create them.  For the non-production case, this function just comments out the pieces of the dashboard's local settings that enable SSL/TLS support.

Getting Started

Once you've run through all of the above functions from havana_setup.py, you have a cloud, and pointing your web browser at http://<your server>/horizon should display the login page, where you can login to the admin user with the password you configured in the global settings of havana_setup.py.

Assuming that works, your next step should be to upload an image.  The easiest way to start is by downloading the Solaris 11.2 Unified Archives.  Once you have an archive the upload can be done from the Horizon dashboard, but you'll find it easier to use the upload_image script that I've included in the download.  You'll need to edit the environment variables it sets first, but it takes care of setting several properties on the image that are required by the Solaris Zones driver for Nova to properly handle deploying instances.  Failure to set them is the single most common mistake that I and others have made in the early Solaris OpenStack deployments; when you forget and attempt to launch an instance, you'll get an immediate error, and the details from nova show will include the error:

| fault                                | {"message": "No valid host was 
found. ", "code": 500, "details": "  File 
 line 107, in schedule_run_instance |

When you snapshot a deployed instance with Horizon or nova image-create the archive properties will be set properly, so it's only manual uploads in Horizon or with the glance command that need care.

There's one more preparation task to do: upload an ssh public key that'll be used to access your instances. Select Access & Security from the list in the left panel of the Horizon Dashboard, then select the Keypairs tab, and click Import Keypair.  You'll want to paste the contents of your ~/.ssh/id_rsa.pub into the Public Key field, and probably name your keypair the same as your username.

Finally, you are ready to launch instances.   Select Instances in the Horizon Dashboard's left panel list, then click the Launch Instance button.  Enter a name for the instance, select the Flavor, select Boot from image as the Instance Boot Source, and select the image to use in deploying the VM.  The image will determine whether you get a SPARC or x86 VM and what software it includes, while the flavor determines whether it is a kernel zone or non-global zone, as well as the number of virtual CPUs and amount of memory.  The Access & Security tab should default to selecting your uploaded keypair.  You must go to the Networking tab and select a network for the instance.  Then click Launch and the VM will be installed, you can follow progress by clicking on the instance name to see details and selecting the Log tab.  It'll take a few minutes at present, in the meantime you can Associate a Floating IP in the Actions field.  Pick any address from the list offered.  Your instance will not be reachable until you've done this.

Once the instance has finished installing and reached the Active status, you can login to it.  To do so, use ssh root@<floating-ip-address>, which will login to the zone as root using the key you uploaded above.  If that all works, congratulations, you have a functioning OpenStack cloud on Solaris!

In future posts I'll cover additional tips and tricks we've learned in operating our cloud.  At this writing we're over 60 users and growing steadily, and it's been totally reliable over 3 months, with only outages for updates to the infrastructure.



Tuesday Sep 16, 2014

Building an OpenStack Cloud for Solaris Engineering, Part 3

At the end of Part 2, we built the infrastructure needed to deploy the undercloud systems into our network environment.  However, there's more configuration needed on these systems than we can completely express via Automated Installation, and there's also the issue of how to effectively maintain the undercloud systems.  We're only running a half dozen initially, but expect to add many more as we grow, and even at this scale it's still too much work, with too high a probability of mistakes, to do things by hand on each system.  That's where a configuration management system such as Puppet shows its value, providing us the ability to define a desired state for many aspects of many systems and have Puppet ensure that state is maintained.  My team did a lot of work to include Puppet in Solaris 11.2 and extend it to manage most of the major subsystems in Solaris, so the OpenStack cloud deployment was a great opportunity to start working with another shiny new toy.

Configuring the Puppet Master

One feature of the Puppet integration with Solaris is that the Puppet configuration is expressed in SMF, and then translated by the new SMF Stencils feature to settings in the usual /etc/puppet/puppet.conf file.  This makes it possible to configure Puppet using SMF profiles at deployment time, and the examples in Part 2 showed this for the clients.  For the master, we apply the profile below:

<!DOCTYPE service_bundle SYSTEM "/usr/share/lib/xml/dtd/service_bundle.dtd.1">
  This profile configures the Puppet master
<service_bundle type="profile" name="puppet">
  <service version="1" type="service" name="application/puppet">
    <instance enabled="true" name="master">
      <property_group type="application" name="config">
        <propval name="server" value="puppetmaster.example.com"/>
        <propval name="autosign" value="/etc/puppet/autosign.conf"/>

The interesting setting is the autosign configuration, which allows new clients to have their certificates automatically signed and accepted by the Puppet master.  This isn't strictly necessary, but makes operation a little easier when you have a reasonably secure network and you're not handing out any sensitive configuration via Puppet.  We use an autosign.conf that looks something like:


This means that we're accepting any system that identifies as being in the example.com domain.  The main pain with autosigning is that if you reinstall any of the systems and you're using self-generated certificates on the clients, you need to clean out the old certificate before the new one will be accepted; this means issuing a command on the master like:

# puppet cert clean client.example.com

There are lots of options in Puppet related to certificates and more sophisticated ways to manage them, but this is what we're doing for now.  We have filed some enhancement requests to implement ways of integrating Puppet client certificate delivery and signing with Automated Installation, which would make using the two together much more convenient.

Writing the Puppet Manifests

Next, we implemented a small Mercurial source repository to store the Puppet manifests and modules.  Using a source control system with Puppet is a highly recommended practice, and Mercurial happens to be the one we use for Solaris development, so it's natural for us in this case.  We configure /etc/puppet on the Puppet master as a child repository of the main Mercurial repository, so when we have new configuration to apply it's first checked into the main repository and then pulled into Puppet via hg pull -u, then automatically applied as each client polls the master.  Our repository presently contains the following:


An example tar file with all of the above is available for download.

The site manifest  starts with:

include ntp
include nameservice

The ntp module is the canonical example of Puppet, and is really important for the OpenStack undercloud, as it's necessary for the various nodes to have a consistent view of time in order for the security certificates issued by Keystone to be validated properly.  I'll describe the nameservice module a little later in this post.

Since most of our nodes are configured identically, we can use a default node definition to configure them.  The main piece is configuring Datalink Multipathing (DLMP), which provides us additional bandwidth and higher availability than a single link.  We can't yet configure this using SMF, so the Puppet manifest:

  • Figures out the IP address the system is using with some embedded Ruby
  • Removes the net0 link and creates a link aggregation from net0 and net1
  • Enables active probing on the link aggregation, so that it can detect upstream failures on the switches that don't affect link state signaling (which is also used, and is the only means unless probing is enabled)
  • Configures an IP interface and the same address on the new aggregation link
  • Restricts Ethernet autonegotiation to 1 Gb to work around issues we have with these systems and the switches/cabling we're using the in the lab; without this, we get 100 Mb speeds negotiated about 50% of the time, and that kills performance.
You'll note several uses of the require and before statements to ensure the rules are applied in the proper order, as we need to tear down the net0 IP interface before it can be moved into the aggregation, and the aggregation needs to be configured before the IP objects on top of it.
node default {
$myip = inline_template("<% _erbout.concat(Resolv::DNS.open.getaddress('$fqdn').to_s) %>")
	# Force link speed negotiation to be at least 1 Gb
	link_properties { "net0":
	    ensure => present,
	    properties => { en_100fdx_cap => "0" },
	link_properties { "net1":
	    ensure => present,
	    properties => { en_100fdx_cap => "0" },

	link_aggregation { "aggr0" :
	    ensure => present,
	    lower_links => [ 'net0', 'net1' ],
	    mode => "dlmp",
	link_properties { "aggr0":
	    ensure => present,
	    require => Link_aggregation['aggr0'],
	    properties => { probe-ip => "+" },
	ip_interface { "aggr0" :
	    ensure => present,
	    require => Link_aggregation['aggr0'],
	ip_interface { "net0":
	    ensure => absent,
	    before => Link_aggregation['aggr0'],
	address_object { "net0":
	    ensure => absent,
	    before => Ip_interface['net0'],
	address_object { 'aggr0/v4':
	    require => Ip_interface['aggr0'],
	    ensure => present,
	    address => "${myip}/24",
	    address_type => "static",
	    enable => "true",

The controller node declaration includes all of the above functionality, but also adds these elements to keep rabbitmq running and install the mysql database.

    service { "application/rabbitmq" :
        ensure => running,
    package { "database/mysql-55":
        ensure => installed,

The database installation could have been part of the AI derived manifest as well, but it works just as well here and it's convenient to do it this way when I'm setting up staging systems to test builds before we upgrade.

The nameservice Puppet module is shown below.  It's handling both nameservice and RBAC (Role-based Access Control) configuration:

class nameservice {

    dns { "openstack_dns":
        search => [ 'example.com' ],
        nameserver => [ ', '' ],

    service { "dns/client":
        ensure => running,

    svccfg { "domainname":
        ensure => present,
        fmri => "svc:/network/nis/domain",
        property => "config/domainname",
        type => "hostname",
        value => "example.com",

    # nameservice switch
    nsswitch { "dns + ldap":
        default => "files",
        host =>  "files dns",
        password => "files ldap",
        group => "files ldap",
        automount => "files ldap",
        netgroup => "ldap",

    # Set user_attr for administrative accounts
    file { "user_attr" :
        path => "/etc/user_attr.d/site-openstack",
        owner => "root",
        group => "sys",
        mode => 644,
        source => "puppet:///modules/nameservice/user_attr",

    # Configure zlogin access
    file { "site-zlogin" :
        path => "/etc/security/prof_attr.d/site-zlogin",
        owner => "root",
        group => "sys",
        mode => 644,
        source => "puppet:///modules/nameservice/prof_attr-zlogin",

    file { "zlogin-exec" :
        path => "/etc/security/exec_attr.d/site-zlogin",
        owner => "root",
        group => "sys",
        mode => 644,
        source => "puppet:///modules/nameservice/exec_attr-zlogin",

    file { "policy.conf" :
        path => "/etc/security/policy.conf",
        owner => "root",
        group => "sys",
        mode => 644,
        source => "puppet:///modules/nameservice/policy.conf",

You may notice that the nameservice configuration here is exactly the same as what we provided in the SMF profile in part 2.  We include it here because it's configuration we anticipate changing someday and we won't want to re-deploy the nodes.  There are ways we could prevent the duplication, but we didn't have time to spend on it right now and it also demonstrates that you could use a completely different configuration in operation than at deployment/staging time.

What's with the RBAC configuration?

The RBAC configuration is doing two things, the first being configuring the user accounts of the cloud administrators for administrative access on the cloud nodes.  The user_attr file we're distributing confers the System Adminstrator and OpenStack Management profiles, as well as access to the root role (oadmin is just an example account in this case):

oadmin::::profiles=System Administrator,OpenStack Management;roles=root

As we add administrators, I just need to add entries for them to the above file and they get the required access to all of the nodes.  Note that this doesn't directly provide administrative access to OpenStack's CLI's or its Dashboard, that's configured within OpenStack.

A limitation of the OpenStack software we include in Solaris 11.2 is that we don't provide the ability to connect to the guest instance consoles, an important feature that's being worked on.  The zlogin User profile is something I created to work around this problem and allow our cloud users to get access to the consoles, as this is often needed in Solaris development and testing.  First, the profile is defined by a prof_attr file with the entry:

zlogin User:::Use zlogin:auths=solaris.zone.manage

We also need an exec_attr file to ensure that zlogin is run with the needed uid and privileges:

 zlogin User:solaris:cmd:RO::/usr/sbin/zlogin:euid=0;privs=ALL

Finally, we modify the RBAC policy file so that all users are assigned to the zlogin User profile:

PROFS_GRANTED=zlogin User,Basic Solaris User

The result of all this is that a user can obtain access to their specific OpenStack guest instance via login to the compute node on which the guest is running, and runing a command such as:

$ pfexec zlogin -C instance-0000abcd

At this point we have the undercloud nodes fully configured to support our OpenStack deployment.  In part 4, we'll look at the scripts used to configure OpenStack itself.

Tuesday Sep 02, 2014

Building an OpenStack Cloud for Solaris Engineering, Part 2


Continuing from where I left off with part 1 of this series, in this posting I'll discuss the elements that we put in place to deploy the OpenStack cloud infrastructure, also known as the undercloud.

The general philosophy here is to automate everything, both because it's a best practice and because this cloud doesn't have any dedicated staff to manage it; we're doing it ourselves in order to get first-hand operational experience that we can apply to improve both Solaris and OpenStack.  As I said in part 1, we don't have an HA requirement at this point, but we'd like to keep any outages, both scheduled and unscheduled, to less than a half hour, so redeploying a failed node should take no more than 20 minutes.  The pieces that we're using are:

  • Automated Installation services and manifests to deploy Solaris
  • SMF profiles to configure system services
  • IPS site packages installed as part of the AI deployment to automate some first-boot configuration
  • A Puppet master to provide initial and ongoing configuration automation

I'll elaborate on the first two below, and discuss Puppet in the next posting.  The IPS site packages we are using are specific to Oracle's environment so I won't be covering those in detail.

Sanitized versions of the manifests and profiles discussed below are available for download as a tar file.

Automated Installation

Building the undercloud nodes means we're doing bare-metal provisioning, so we'll be using the Automated Installation (AI) feature in Solaris 11.  Most of the OpenStack services could run in kernel zones, or even non-global zones, but we're planning for larger scale and want to have some extra horsepower.  Therefore we opted not to go in that direction for now, but it may well be an option we use later for some services.

I already had an existing AI server in this cloud's lab, and it provides services to systems that aren't part of this cloud.  As we release each development build of a Solaris 11 update or Solaris 12 there's a new service generated on it.  The pace of evolution of this cloud is likely to be different from those other systems as well, so that led me to create two new AI services specifically for the cloud; we can make these services aliases of existing services so we don't need to bother replicating the boot image, thus the commands look like (output ellided):

# installadm create-service -n cloud-i386 --aliasof solaris11_2-i386
# installadm create-service -n cloud-sparc --aliasof solaris11_2-sparc 

The next step is setting up the manifests that specify the installation.  For this, I've taken the default derived manifest that we install for services and modified it to:

  1. Specify a custom package list
  2. Lay out all of the storage
  3. Select the package repository based on Solaris release
  4. Install a Unified Archive rather than a package set based on a boot argument

You can download the complete manifest, I'll discuss the various customizations here.

The package list we're explicitly installing is below, there are of course a number of other packages pulled in as dependencies, so this expands out to just over 500 packages installed (perhaps not surprisingly, about 35% are Python libraries):


We start with solaris-minimal-server in order to build an effectively minimized environment.  We've chosen to install the same package set on all nodes so that any of them can be easily repurposed to a different role in the cloud if needed, so the openstack group package is used rather than the packages for the various OpenStack services.  We'll be using MySQL as the database, so need its client package.  snoop is there for network diagnostics (yes, we should use tshark instead but I'm old-school :-), some Python packages that we need to support OpenStack, as well as RabbitMQ as that's our message broker.  We use LDAP for authentication so that's included.  I find rsync convenient for caching crash dumps off to other systems for examination.  ssh is needed for remote access.  nss-utilities are needed for some LDAP configuration.  OpenStack needs consistent time, so NTP is required.  We use Kerberos for some NFS access so that's included, along with the automounter and NFS client.  We want to use SMTP notifications for any fault management events, so include it.  The utilities to manage Oracle hardware may come in handy, so we include them.  Puppet is going to provide ongoing configuration management, so it's included.  We need rad-evs-controller to back-end our Neutron driver.  bpf is listed only because of a missing dependency in another package that causes runaway console messages from the DLMP daemon; that's being fixed.  The NIS package provides some things that LDAP needs.  We're using kernel zones as the primary OpenStack guest, so need that zone brand installed.  The doctools package provides the man command, don't want to be caught without a man page when you need it!  less is there because it's better than more.  Finally, we install a couple of site packages, one that does some general customizations, another that delivers the base certificate needed for TLS access to our LDAP servers.

The storage layout we standardized on for the undercloud is to have a two-way mirror for the root pool, formed out of two of the smallest disks (usually 300 GB on the systems we're using), with any remaining disks in a separate pool, called tank on all of the systems, that can be used for other purposes.  On the Cinder node, it's where we put all the ZFS iSCSI targets; in the case of Glance it's where we store the images.  We're also planning to use it for Swift services on various nodes, but we haven't deployed Swift yet.  The tank pool gets built with varying amounts of redundancy based on the number of disks.  This logic is all in the last 60 lines of the manifest script.  It's an interesting example of using the derived manifest features to do some reasonably complex customization for individual nodes.

We internally have separate repositories for Solaris 11 and Solaris 12, so the manifest defaults to Solaris 12 and if it determines we've booted Solaris 11 to install, then it uses a different repository:

if [[ $(uname -r) = 5.11 ]]; then
        aimanifest set /auto_install/ai_instance/software[@type="IPS"]/source/publisher[@name="solaris"]/origin@name http://example.com/solaris11

The last trick I added was the ability to select a Unified Archive to install instead of the packages.  We'll be using archives as the backup/recovery mechanism for the infrastructure, so this provides a faster way to deploy nodes when we already have the desired archive available.  On a SPARC system you'd select this using a boot command like:

ok boot net:dhcp - install archive_uri=http://example.com/openstack_archive.uar

On an x86 system you'd add this as -B archive_uri=<uri> to the $multiboot line in grub.cfg.

The code for this in the script looks like:

if [[ ${SI_ARCH} = sparc ]]; then
    ARCHIVE_URI=$(prtconf -vp | nawk \
        '/bootargs.*archive_uri=/{n=split($0,a,"archive_uri=");split(a[2],b);split(b[1],c,"'\''");print c[1]}')
    ARCHIVE_URI=$(devprop -s archive_uri)

if [[ -n "$ARCHIVE_URI" ]]; then
    # Replace package software section with archive
    aimanifest delete software
    swpath=$(aimanifest add -r /auto_install/ai_instance/software@type ARCHIVE)
    aimanifest add $swpath/source/file@uri $ARCHIVE_URI
    inspath=$(aimanifest add -r $swpath/software_data@action install)
    aimanifest add $inspath/name global 


Once we have the manifest, it's a simple matter to make it the default manifest for both of the cloud services:

# installadm create-manifest -n cloud-i386 -d -f havana.ksh
# installadm create-manifest -n cloud-sparc -d -f havana.ksh

Each of the systems we're including in the cloud infrastructure are assigned to the appropriate AI service with a command such as:

# installadm create-client -n cloud-sparc -e <mac address>

SMF Configuration Profiles

But before we go on to installing the systems, we also want to provide SMF (Service Management Facility) configuration profiles to automate the initial system configuration; otherwise, we'll be faced with running the interactive sysconfig tool during the initial boot.  For this deployment, we have a somewhat unusual twist, in that there is configuration we'd like to share between the infrastructure nodes and guests since they are ultimately all nodes on the Oracle internal network.  Also, for maximum flexibility and reuse, the configuration is expressed by multiple profiles, with each designed to configure only some aspects of the system.  In our case, we have a directory structure on the AI server that looks like:


The first three are specific to the infrastructure nodes.  The infrastructure.xml profile provides the fixed network configuration, along with coreadm setup and fault management notifications; we use SMTP notifications to alert us to any faults from the system.  The puppet.xml profile configures the puppet agents with the name of the master node.  The users.xml profile configures the root account as a role and sets its password, and also sets up a local system administrator account that's meant to be used in case of networking issues that prevent our administrators from using their normal user accounts.

The three profiles under the common directory are also used to configure guest instances in our cloud.  I'll show how that's done later in this series, but it's important that they be under a separate directory.  basic.xml configures the system's timezone, default locale, keyboard layout, and console terminal type.  dns.xml configures the DNS resolver, and ldap.xml configures the LDAP client.

We load each of these into the AI services with the command:

# installadm create-profile -n cloud-sparc -f <file name>

The important aspect of the above command is that no criteria are specified for the profiles, which means that they are applied to all clients of the service.  This also means that they must be disjoint; no two profiles can attempt to configure the same property on the same service, otherwise SMF will not apply the profiles that conflict.

Once all that's done, we can see the results:

# installadm list -p -m -n cloud-sparc
Service Name Manifest Name Type    Status   Criteria
------------ ------------- ----    ------   --------
cloud-sparc  havana.ksh    derived default  none    

Service Name Profile Name       Criteria
------------ ------------       --------
cloud-sparc  basic.xml          none    
             dns.xml            none    
             infrastructure.xml none    
             ldap.xml           none    
             puppet.xml         none    
             users.xml          none    
At this point we've got enough infrastructure implemented to install the OpenStack undercloud systems.  In the next posting I'll cover the Puppet manifests we're using; after that we'll get into configuring OpenStack itself.

Friday Aug 22, 2014

Building an OpenStack Cloud for Solaris Engineering, Part 1

One of the signature features of the recently-released Solaris 11.2 is the OpenStack cloud computing platform.  Over on the Solaris OpenStack blog the development team is publishing lots of details about our version of OpenStack Havana as well as some tips on specific features, and I highly recommend reading those to get a feel for how we've leveraged Solaris's features to build a top-notch cloud platform.  In this and some subsequent posts I'm going to look at it from a different perspective, which is that of the enterprise administrator deploying an OpenStack cloud.  But this won't be just a theoretical perspective: I've spent the past several months putting together a deployment of OpenStack for use by the Solaris engineering organization, and now that it's in production we'll share how we built it and what we've learned so far.

In the Solaris engineering organization we've long had dedicated lab systems dispersed among our various sites and a home-grown reservation tool for developers to reserve those systems; various teams also have private systems for specific testing purposes.  But as a developer, it can still be difficult to find systems you need, especially since most Solaris changes require testing on both SPARC and x86 systems before they can be integrated.  We've added virtual resources over the years as well in the form of LDOMs and zones (both traditional non-global zones and the new kernel zones).  Fundamentally, though, these were all still deployed in the same model: our overworked lab administrators set up pre-configured resources and we then reserve them.  Sounds like pretty much every traditional IT shop, right?  Which means that there's a lot of opportunity for efficiencies from greater use of virtualization and the self-service style of cloud computing.  As we were well into development of OpenStack on Solaris, I was recruited to figure out how we could deploy it to both provide more (and more efficient) development and test resources for the organization as well as a test environment for Solaris OpenStack.

At this point, let's acknowledge one fact: deploying OpenStack is hard.  It's a very complex piece of software that makes use of sophisticated networking features and runs as a ton of service daemons with myriad configuration files.  The web UI, Horizon, doesn't often do a good job of providing detailed errors.  Even the command-line clients are not as transparent as you'd like, though at least you can turn on verbose and debug messaging and often get some clues as to what to look for, though it helps if you're good at reading JSON structure dumps.  I'd already learned all of this in doing a single-system Grizzly-on-Linux deployment for the development team to reference when they were getting started so I at least came to this job with some appreciation for what I was taking on.  The good news is that both we and the community have done a lot to make deployment much easier in the last year; probably the easiest approach is to download the OpenStack Unified Archive from OTN to get your hands on a single-system demonstration environment.  I highly recommend getting started with something like it to get some understanding of OpenStack before you embark on a more complex deployment.  For some situations, it may in fact be all you ever need.  If so, you don't need to read the rest of this series of posts!

In the Solaris engineering case, we need a lot more horsepower than a single-system cloud can provide.  We need to support both SPARC and x86 VM's, and we have hundreds of developers so we want to be able to scale to support thousands of VM's, though we're going to build to that scale over time, not immediately.  We also want to be able to test both Solaris 11 updates and a release such as Solaris 12 that's under development so that we can work out any upgrade issues before release.  One thing we don't have is a requirement for extremely high availability, at least at this point.  We surely don't want a lot of down time, but we can tolerate scheduled outages and brief (as in an hour or so) unscheduled ones.  Thus I didn't need to spend effort on trying to get high availability everywhere.

The diagram below shows our initial deployment design.  We're using six systems, most of which are x86 because we had more of those immediately available.  All of those systems reside on a management VLAN and are connected with a two-way link aggregation of 1 Gb links (we don't yet have 10 Gb switching infrastructure in place, but we'll get there).  A separate VLAN provides "public" (as in connected to the rest of Oracle's internal network) addresses, while we use VxLANs for the tenant networks.

Solaris cloud diagram

One system is more or less the control node, providing the MySQL database, RabbitMQ, Keystone, and the Nova API and scheduler as well as the Horizon console.  We're curious how this will perform and I anticipate eventually splitting at least the database off to another node to help simplify upgrades, but at our present scale this works.

I had a couple of systems with lots of disk space, one of which was already configured as the Automated Installation server for the lab, so it's just providing the Glance image repository for OpenStack.  The other node with lots of disks provides Cinder block storage service; we also have a ZFS Storage Appliance that will help back-end Cinder in the near future, I just haven't had time to get it configured in yet.

There's a separate system for Neutron, which is our Elastic Virtual Switch controller and handles the routing and NAT for the guests.  We don't have any need for firewalling in this deployment so we're not doing so.  We presently have only two tenants defined, one for the Solaris organization that's funding this cloud, and a separate tenant for other Oracle organizations that would like to try out OpenStack on Solaris.  Each tenant has one VxLAN defined initially, but we can of course add more.  Right now we have just a single /24 network for the floating IP's, once we get demand up to where we need more then we'll add them.

Finally, we have started with just two compute nodes; one is an x86 system, the other is an LDOM on a SPARC T5-2.  We'll be adding more when demand reaches the level where we need them, but as we're still ramping up the user base it's less work to manage fewer nodes until then.

My next post will delve into the details of building this OpenStack cloud's infrastructure, including how we're using various Solaris features such as Automated Installation, IPS packaging, SMF, and Puppet to deploy and manage the nodes.  After that we'll get into the specifics of configuring and running OpenStack itself.

Tuesday Jan 31, 2012

Detroit Solaris 11 Forum, February 8

I'm just posting this quick note to help publicize the Oracle Solaris 11 Technology Forum we're holding in the Detroit area next week.  There's still time to register and come get a half-day overview of the great new stuff in Solaris 11.  The "special treat" that's not mentioned in the link is that I'll be joining Jeff Victor as a speaker.  Looking forward to being back in my home state for a quick visit, and hope I'll see some old friends there!

Tuesday Nov 22, 2011

Solaris at LISA 2011

As is our custom, the Solaris team will be out in force at the USENIX LISA conference; this year it's in Boston so it's sort of a home game for me for a change.  The big event we'll have is Tuesday, December 6, the Oracle Solaris 11 Summit Day.  We'll be covering deployment, ZFS, Networking, Virtualization, Security, Clustering, and how Oracle apps run best on Solaris 11.  We've done this the past couple of years and it's always a very full day.

On Wednesday, December 7, we've got a couple of BOF sessions scheduled back-to-back.  At 7:30 we'll have the ever-popular engineering panel, with all of us who are speaking at Tuesday's summit day there for a free-flowing discussion of all things Solaris.  Following that, Bart & I are hosting a second BOF at 9:30 to talk more about deployment for clouds and traditional data centers.

Also, on Wednesday and Thursday we'll have a booth at the exhibition where there'll be demos and just a general chance to talk with various Solaris staff from engineering and product management.

The conference program looks great and I look forward to seeing you there!

Thursday Nov 17, 2011

Virtually the fastest way to try Solaris 11 (and Solaris 10 zones)

If you're looking to try out Solaris 11, there are the standard ISO and USB image downloads on the main page.  Those are great if you're looking to install Solaris 11 on hardware, and we hope you will.  But if you take the time to look down the page, you'll find a link off to the Oracle Solaris 11 Virtual Machine downloads.  There are two downloads there:

  1. A pre-built Solaris 10 zone
  2. A pre-built Solaris 11 VM for use with VirtualBox

If you're looking to try Solaris 11 on x86, the second one is what you want.  Of course, this assumes you have VirtualBox already (and if you don't, now's the time to try it, it's a terrific free desktop virtualization product).  Once you complete the 1.8 GB download, it's a simple matter of unzipping the archive and a few quick clicks in VirtualBox to get a Solaris 11 desktop booted.  While it's booting, you'll get to run through the new system configuration tool (that'll be the subject of a future posting here) to configure networking, a user account, and so on.

So what about that pre-built Solaris 10 zone download?  It's a really simple way to get yourself acquainted with the Solaris 10 zones feature, which you may well find indispensible in transitioning an existing Solaris 10 infrastructure to Solaris 11.  Once you've downloaded the file, it's a self-extracting executable that'll configure the zone for you, all you have to supply is an IP address for the zone.  It's really quite slick!

I expect we'll do a lot more pre-built VM's and zones going forward, as that's a big part of being a cloud OS; if there's one that would be really useful for you, let us know.

Tuesday Nov 15, 2011

Solaris 11 Technology Forums, NYC and Boston

By now you're certainly aware that we released Solaris 11; I was on vacation during the launch so haven't had time to write any material related to the Solaris 11 installers, but will get to that soon.  Following onto the release, we're scheduling events in various locations around the world to talk about some of the key new features in Solaris 11 in more depth.  In the northeast US, we've scheduled technology forums in New York City on November 29, and Burlington, MA on November 30.  Click on those links to go to the detailed info and registration.  I'll be one of the speakers at both of them, so hope to see you there!

Monday Mar 28, 2011

Solaris Online Forum on April 14

For all of you that are interested in what's happening with Solaris 11, we've scheduled a half-day of online forums on Thursday, April 14.  I'll be on for 45 minutes with my pal Bart to talk about deployment; other colleagues will be discussing the Solaris strategy, virtualization, and other features of Solaris 11.  We'll also have a live on-line chat where you can get one of us to answer your questions.  For the full details, see the registration page.  Hope to see you there!

Tuesday Nov 16, 2010

Solaris 11 Express Interactive Installation

One thing I didn't note in my previous entry on the Solaris 11 Express 2010.11 release is that there are some new developments in installation since the last available builds of OpenSolaris.  This post just discusses the interactive installation options, while a subsequent entry will discuss the Automated Installer.

Before digging into the details, it's probably useful to explain the philosophy of the interactive installers a bit for those encountering them for the first time, as it is somewhat of a departure from Solaris 10 and prior.  Our basic guiding principle is probably best summarized as, "Get the system installed and get out of the way."  To elaborate a bit, the idea is to collect a minimal amount of configuration required to make the installed system functional, execute the install quickly, and let the user get on with using the system.  That means that a lot of the configuration you might have been asked about in past Solaris releases, such as Kerberos or NFS domains, or installing additional, layered software, are just not present.  You're asked only to select a disk, partition it a bit if you want, provide timezone and locale, and create a user account.  You're also not prompted to interactively select the software to be installed.  Instead, the software that's present on the media is what's installed, providing a useful starting point at first boot.  From there, you can use tools like the pkg CLI or the Package Manager GUI to customize software to your heart's content, all installed from the convenience of a software repository on the network.

There are several reasons why we think this shift is appropriate.  First, many of the configuration settings that were prompted for in the past were of interest to only small minorities of users.  That means we were making it harder for the majority, which is almost always a bad choice.  Second, we've put in a concerted effort over the past 5+ years to make Solaris configured more correctly to start with, and more capable of self-configuring, so that more users get the best results, not just those who can figure out the right knobs to twist.  The end results should be better for all of us in the Solaris ecosystem, as behavior will be more consistent and predictable.  Finally, in terms of software selection, we've reached the point where the commonly-available media format (DVD) just isn't large enough to incorporate all the software we want to provide as part of the product - we've just plain outstripped the rate of improvement in software compression technology.  It's well past time that we oriented Solaris towards a network-centric software delivery paradigm.

Text Installer

The most obvious difference to OpenSolaris users is the addition of the Text Installer, a curses-based interactive UI designed to run comfortably on all those servers out there that have only serial consoles.  Those that were following the OpenSolaris development train did see a late preview of this from the project team back around build 134, but S11 Express is the first release that includes this installer.  This now means that there is an interactive install option for SPARC users, as the GUI install is offered only on the x86 live CD.

Philosophically, this UI shares a fair amount with the GUI: it's a fairly streamlined experience that doesn't allow customization of the software payload, but does allow a little more freedom in disk configuration (most notably, the ability to preserve existing VTOC slices).  Like the GUI, the installation is a direct copy of the media contents, so what is included on the media defines the installation.

Initially, we've opted to include this installer only on a new, separate ISO download, identified as Text Install on the downloads page.  This image might be more accurately called "Server Install", as that's what it really is meant to be: a generic server installation that includes most, if not all, of the Solaris server elements, but omits the GNOME desktop and related applications.  If this is the image you downloaded and installed but you really wanted the GNOME desktop (easy to do since it's the first image on the page), then the easy solution is to install the package set that appears on the live CD media; you can accomplish that with the command pkg install slim_install, slim_install being the IPS group package that we use to define the live CD contents.  Incidentally, the group package that defines the text install media contents is the server_install package.

One thing that server administrators will undoubtedly find missing is the ability to directly configure the network as part of the install; right now it defaults to the automatic configuration technology we call Network Auto-Magic (or NWAM).  We do plan to extend the text installer to also provide static network configuration, so you'll be able to supply an IP address and nameservice configuration directly, rather than having to do this post-installation.

GUI Installer

The GUI installer has undergone some small changes from the versions provided with OpenSolaris.  If the last time you used it was with OpenSolaris 2009.06, the biggest difference is that it provides support for extended partitions, which provides a little more flexibility in dealing with the limitations of the x86 partitioning scheme and eases co-existence with other OS's in multi-boot configurations.  The other change here, more subtle, is that the UI no longer separately prompts for the root password.  Instead, the password for the root role is set to the same password as the initial user account (which is now required, where it was optional during OpenSolaris releases).  The root password is created as expired, however, so first time you su to root, you'll be prompted to change the password.  Finally, the initial user account is no longer assigned the Primary Administrator profile to enable administrative access.  Instead, the user account retains access to the root role, and is also given all access to sudo.  The text installer does allow independent setting of the root password at this release, but we expect to align it with the GUI in a future build.

Monday Nov 15, 2010

Oracle Solaris 11 Express 2010.11 is released

Today marks the release of Oracle Solaris 11 Express 2010.11, beginning the rollout of our long-gestating successor to Solaris 10.  The summary and links to most everything are available on the OTN Oracle Solaris 11 Overview.  Probably the biggest thing to emphasize is that this is a supported release, not a "beta" or preview; see the link for the support options.  That said, feature development continues in anticipation of a Solaris 11 release in 2011, as was outlined at OpenWorld back in September.

For those who used the OpenSolaris distribution releases, you'll find this release quite familiar, as it's the continuing evolution of the technology we introduced in those releases: the installers from the Caiman project, the IPS packaging system, and all the other great things that my colleagues in Solaris engineering have been developing for the past several years in networking, storage, security and so on.  The biggest visible differences are a different package repository, license terms, and of course Oracle branding.

For those of you who weren't users of OpenSolaris, well, now is the time to really start getting your feet wet, evaluating Solaris 11 and planning its deployment in your environment.  We hope you'll like it!

Tuesday Oct 19, 2010

Oracle Solaris Summit and BoFs @ LISA 2010

As the release of Oracle Solaris 11 Express is getting closer, we're having a bunch of information sessions at the the upcoming USENIX LISA conference in San Jose: an all-day summit on Tuesday, November 9, and evening BoF sessions on Tuesday and Wednesday.    It's important to note that you do not have to register for LISA to come to the summit on Tuesday, but you do need to register for the summit itself.  See my colleague Terri Wischmann's blog post, Oracle Solaris Summit at LISA conference 2010, for the full lineup of events and links to registration.  We hope to see you there!

Wednesday Sep 15, 2010

See You at Oracle OpenWorld 2010?

Blowing the dust off the ol' blog to note that I'll be at Oracle's OpenWorld/JavaOne/Develop extravaganza in San Francisco next week.  Two things I am scheduled to be doing:
  1. A webcast with OTN's Rick Ramsey at 10:30 AM (Pacific) on Tuesday
  2. A session at the "Unconference" at Hotel Parc 55 at 3 PM on Tuesday
Both of these sessions are on Solaris 11 deployment.  Webcast will probably be pretty high-level, hopefully a bunch of Q&A.  At the unconference session I'm planning to dive into the Automated Installer a bit, especially the changes since the OpenSolaris releases.

I'm planning to be generally hanging around the conference on Monday.  @dave_miner on Twitter will be one way to find me if you like.

If your main interest is Solaris, I should note that my friend Deirdre has kindly posted a schedule of the Solaris Unconference Sessions at Oracle Open World 2010.

Friday Dec 18, 2009

Big 2009 Finish for OpenSolaris Installation

As the end of 2009 approaches, there are a bunch of recent developments in the OpenSolaris installation software that I want to highlight.  All of the below will appear in OpenSolaris development build 130, due in the next few days.

First up is the addition of iSCSI support to Automated Installation (or AI).  You can now specify an iSCSI target for installation in the AI manifest.  It'll work on both SPARC and x86, provided you have firmware that can support iSCSI boot; on SPARC you'll need a very recent OBP patch to enable this support.  Official docs are in the works, but the design document should have enough info to piece it together if you're interested.

Next is the bootable AI image, which allows use of AI in a number of additional scenarios.  Probably the most generally interesting one is that you can now install OpenSolaris on SPARC without setting up an AI server first, by using the default AI manifest that's included on the ISO image.  One caveat is that the default manifest installs from the release repository; due to ZFS version changes between 2009.06 and present, this results in an installation that won't boot.  You'll want to make a copy of the default manifest and change the main url for the ai_pkg_repo_default_authority element to point to http://pkg.opensolaris.org/dev and put it at a URL that you can supply to the AI client once it boots.  Alok's mail and blog entry have more details.

Building on bootable AI, we've extended the Distribution Constructor (or DC) with a project known as Virtual Machine Constructor (VMC).  Succinctly, it extends DC to construct a pre-built virtual machine image that can be imported into hypervisors that support OVF 1.0, such as VirtualBox or VMware.  Glenn's mail notes a few limitations that will be addressed in the next few builds.  Anyone interested in building virtualization-heavy infrastructures should find this quite useful.

Finally, one more barrier to adding OpenSolaris on x86 to a system that's multi-booted with other OS's has fallen with the addition of extended partition support to both the live CD GUI installer and Automated Installation.  You can now install OpenSolaris into a logical partition carved from the extended partition.  Jean's mail has some brief notes on how to use this new feature.  I should also note at this point that a couple of builds ago the parted command and GParted GUI were added to the live CD, so the more complex preparations sometimes needed to free up space for OpenSolaris can now be done directly from the CD.

I'd like to thank my team for all the hard work that went into all of the above; they accomplished all of it with precious little help from me, as I spent most of the past three months either traveling around talking to literally hundreds of customers or working on architecture and design tasks.  Speaking of those, the review of the installer architecture is open, and I've also just this week posted the first draft design for AI service management improvements.

What's next?  Well, that will be the topic of my next post in early January.  It's time for a vacation!


I'm the architect for Solaris deployment and system management, with a lot of background in networking on the side. I am co-author of the OpenSolaris Bible (Wiley, 2009). I also play a lot of golf.


« March 2015

No bookmarks in folder


No bookmarks in folder