X

Tips on deploying and managing Oracle Solaris, especially in clouds

Building an OpenStack Cloud for Solaris Engineering, Part 4

Dave Miner
Sr. Principal Software Engineer

The prior parts of this series discussed the design and deployment of the undercloud nodes on which our cloud is implemented.  Now it's time to configure OpenStack and turn the cloud on.  Over on OTN, my colleague David Comay has published a general getting started guide that does a manual setup based on the OpenStack all-in-one Unified Archive, I recommend at least browsing through that for background that will come in handy as you deal with the inevitable issues that occur in running software with the complexity of OpenStack.  It's even better to run through that single-node setup to get some experience before moving on to trying to build a multi-node cloud.

For our purposes, I needed to script the configuration of a multi-node cloud, and that makes everything more complex, not the least of the problems being that you can't just use the loopback IP address (127.0.0.1) as the endpoint for every service.  We had (compliments of my colleague Drew Fisher) a script for single-system configuration already, so I started with that and hacked away to build something that could configure each component correctly in a multi-node cloud.  That Python script, called havana_setup.py, and some associated scripts are available for download.  Here, I'll walk through the design and key pieces.

Pre-work

Before the proper OpenStack configuration process, you'll need to run the gen_keys.py script to create some SSH keys.  These are used to secure the Solaris RAD (Remote Administration Daemon) transport that the Solaris Elastic Virtual Switch (EVS) controller uses to manage the networking between the Nova compute nodes and the Neutron controller node.  The script creates evsuser, neutron, and root sub-directories in whatever location you run it, and this location will be referenced later in configuring the Neutron and Nova compute nodes, so you want to put it in a directory that's easily shared via NFS.  You can (and probably should) unshare it after the nodes are configured, though.

Global Configuration

The first part of havana_setup.py is a whole series of global declarations that parameterize the services deployed on various nodes.  You'll note that the PRODUCTION variable can be set to control the layout used; if its value is False, you'll end up with a single-node deployment.  I have a couple of extra systems that I use for staging and this makes it easy to replicate the configuration well enough to do some basic sanity testing before deploying changes.,

MY_NAME = platform.node()
MY_IP = socket.gethostbyname(MY_NAME)
# When set to False, you end up with a single-node deployment
PRODUCTION = True
CONTROLLER_NODE = MY_NAME
if PRODUCTION:
CONTROLLER_NODE = "controller.example.com"
DB_NODE = CONTROLLER_NODE
KEYSTONE_NODE = CONTROLLER_NODE
GLANCE_NODE = CONTROLLER_NODE
CINDER_NODE = CONTROLLER_NODE
NEUTRON_NODE = CONTROLLER_NODE
RABBIT_NODE = CONTROLLER_NODE
HEAT_NODE = CONTROLLER_NODE
if PRODUCTION:
GLANCE_NODE = "glance.example.com"
CINDER_NODE = "cinder.example.com"
NEUTRON_NODE = "neutron.example.com"

Next, we configure the main security elements, the root password for MySQL plus passwords and access tokens for Keystone, along with the URL's that we'll need to configure into the other services to connect them to Keystone.

SERVICE_TOKEN = "TOKEN"
MYSQL_ROOTPW = "mysqlroot"
ADMIN_PASSWORD = "adminpw"
SERVICE_PASSWORD = "servicepw"
AUTH_URL = "http://%s:5000/v2.0/" % KEYSTONE_NODE
IDENTITY_URL = "http://%s:35357" % KEYSTONE_NODE

The remainder of this section configures specifics of Glance, Cinder,  Neutron, and Horizon.  For Glance and Cinder, we provide the name of the base ZFS dataset that each will be using.  For Neutron, the NIC, VLAN tag, and external network addresses, as well as the subnets for each of the two tenants we are providing in our cloud.  We chose to have one tenant for developers in the organization that is funding this cloud, and a second tenant for other Oracle employees who want to experiment with OpenStack on Solaris; this gives us a way to grossly allocate resources between the two, and of course most go to the tenant paying the bill.  The last element of each tuple in the tenant network list is the number of floating IP addresses to set as the quota for the tenant.  For Horizon, the paths to a server certificate and key must be configured, but only if you're using TLS, and that's only the case if the script is run with PRODUCTION = True.  The SSH_KEYDIR should be set to the location where you ran the gen_keys.py script, above.

GLANCE_DATASET = "tank/glance"
CINDER_DATASET = "tank/cinder"
UPLINK_PORT = "aggr0"
if PRODUCTION:
VXLAN_RANGE = "500-600"
TENANT_NET_LIST = [("tenant1", "192.168.66.0/24", 10),
("tenant2", "192.168.67.0/24", 60)]
else:
VXLAN_RANGE = "400-499"
TENANT_NET_LIST = [("tenant1", "192.168.70.0/24", 5),
("tenant2", "192.168.71.0/24", 5)]
EXTERNAL_GATEWAY = "10.134.12.1"
EXTERNAL_NETWORK_ADDR = "10.134.12.0/24"
EXTERNAL_NETWORK_VLAN_TAG = "12"
EXTERNAL_NETWORK_NAME = "external"

SERVER_CERT = "/path/to/horizon.crt"
SERVER_KEY = "/path/to/horizon.key"

SSH_KEYDIR = "/path/to/generated/keys"

Configuring the Nodes

The remainder of havana_setup.py is a series of functions that configure each element of the cloud.  You select which element(s) to configure by specifying command-line arguments.  Valid values are mysql, keystone, glance, cinder, nova-controller, neutron, nova-compute, and horizon.  I'll briefly explain what each does below.  One thing to note is that each function first creates a backup boot environment so that if something goes wrong, you can easily revert to the state of the system prior to running the script.  This is a practice you should always use in Solaris administration before making any system configuration changes.  It also saved me a ton of time in development of the cloud, since I could reset within a minute or so every time I had a serious bug.  Even our best re-deployment times with AI and archives are about 10 times that when you have to cycle through network booting.

mysql

MySQL must be the first piece configured, since all of the OpenStack services use databases to store at least some of their objects.  This function sets the root password and removes some insecure aspects of the default MySQL configuration.  One key piece is that it removes remote root access; that forces us to create all of the databases in this module, rather than creating each component's database in its associated module.  There may be a better way to do this, but since I'm not a MySQL expert in any way, that was the easiest path here.  On review it seems like the enable of the mysql SMF service should really be moved over into the Puppet manifest from part 3.

keystone

The keystone function does some basic configuration, then calls the /usr/demo/openstack/keystone/sample_data.sh script to configure users, tenants, and endpoints.  In our deployment I've customized this script a bit to create the two tenants rather than just one, so you may need to make some adjustments for your site; I have not included that customization in the downloaded files.

glance

The glance function configures and starts the various glance services, and also creates the base dataset for ZFS storage; we turn compression on to save on storage for all the images we'll have here.  If you're rolling back and re-running for some reason, this module isn't quite idempotent as written because it doesn't deal with the case where the dataset already exists, so you'd need to use zfs destroy to delete the glance dataset.

cinder

Beyond just basic configuration of the cinder services, the cinder function also creates the base ZFS dataset under which all of the volumes will be created.  We create this as an encrypted dataset so that all of the volumes will be encrypted, which Darren Moffat covers at more length in OpenStack Cinder Volume encryption with ZFS. Here we use pktool to generate the wrapping key and store it in root's home directory.  One piece of work we haven't yet had time to take on is adding our ZFS Storage Appliance as an additional back-end for Cinder.  I'll post an update to cover that once we get it done.  Like the glance function, this function doesn't deal propertly with the dataset already existing, so any rollback also needs to destroy the base dataset by hand.

nova_controller & nova_compute

Since our deployment runs the nova controller services separate from the compute nodes, the nova_controller function is run on the controller node to set up the API, scheduler, and conductor services.  If you combine the compute and controller nodes you would run this and then later run the nova_compute function.  The nova_compute function also makes use of a couple of helper functions to set up the ssh configuration for EVS.  For these functions to work properly you must run the neutron function on its designated node before running nova_compute on the compute nodes.

neutron

The neutron setup function is by far the most complex, as we not only configure the neutron services, including the underlying EVS and RAD functions, but also configures the external network and the tenant networks.  The external network is configured as a tagged VLAN, while the tenant networks are configured as VxLANs; you can certainly use VLANs or VxLANs for all of them, but this configuration was the most convenient for our environment.

horizon

For the production case, the horizon function just copies into place an Apache config file that configures TLS support for the Horizon dashboard and the server's certificate and key files.  If you're using self-signed certificates, then the Apache SSL/TLS Strong Encryption: FAQ is a good reference on how to create them.  For the non-production case, this function just comments out the pieces of the dashboard's local settings that enable SSL/TLS support.

Getting Started

Once you've run through all of the above functions from havana_setup.py, you have a cloud, and pointing your web browser at http://<your server>/horizon should display the login page, where you can login to the admin user with the password you configured in the global settings of havana_setup.py.

Assuming that works, your next step should be to upload an image.  The easiest way to start is by downloading the Solaris 11.2 Unified Archives.  Once you have an archive the upload can be done from the Horizon dashboard, but you'll find it easier to use the upload_image script that I've included in the download.  You'll need to edit the environment variables it sets first, but it takes care of setting several properties on the image that are required by the Solaris Zones driver for Nova to properly handle deploying instances.  Failure to set them is the single most common mistake that I and others have made in the early Solaris OpenStack deployments; when you forget and attempt to launch an instance, you'll get an immediate error, and the details from nova show will include the error:

| fault                                | {"message": "No valid host was 
found. ", "code": 500, "details": "  File
\"/usr/lib/python2.6/vendor-packages/nova/scheduler/filter_scheduler.py\",
line 107, in schedule_run_instance |

When you snapshot a deployed instance with Horizon or nova image-create the archive properties will be set properly, so it's only manual uploads in Horizon or with the glance command that need care.

There's one more preparation task to do: upload an ssh public key that'll be used to access your instances. Select Access & Security from the list in the left panel of the Horizon Dashboard, then select the Keypairs tab, and click Import Keypair.  You'll want to paste the contents of your ~/.ssh/id_rsa.pub into the Public Key field, and probably name your keypair the same as your username.

Finally, you are ready to launch instances.   Select Instances in the Horizon Dashboard's left panel list, then click the Launch Instance button.  Enter a name for the instance, select the Flavor, select Boot from image
as the Instance Boot Source, and select the image to use in deploying
the VM.  The image will determine whether you get a SPARC or x86 VM and
what software it includes, while the flavor determines whether it is a
kernel zone or non-global zone, as well as the number of virtual CPUs
and amount of memory.  The Access & Security tab should default to selecting your uploaded keypair.  You must go to the Networking tab and select a network for the instance.  Then click Launch and the VM will be installed, you can follow progress by clicking on the instance name to see details and selecting the Log tab.  It'll take a few minutes at present, in the meantime you can Associate a Floating IP in the Actions field.  Pick any address from the list offered.  Your instance will not be reachable until you've done this.

Once the instance has finished installing and reached the Active status, you can login to it.  To do so, use ssh root@<floating-ip-address>, which will login to the zone as root using the key you uploaded above.  If that all works, congratulations, you have a functioning OpenStack cloud on Solaris!

In future posts I'll cover additional tips and tricks we've learned in operating our cloud.  At this writing we're over 60 users and growing steadily, and it's been totally reliable over 3 months, with only outages for updates to the infrastructure.

 

                                         
    
                    
          
        
              
       

                                
                                                                

Be the first to comment

Comments ( 0 )
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.Captcha