Wednesday Jul 09, 2014

Oracle Solaris and OpenStack Workshops

This is the summer of Oracle Solaris 11.2 and OpenStack workshops.  I'm on the road covering some, along with by team mates, Pavel Anni and Bob Netherton.

The Solaris workshops are full-day, hands-on workshops that will give you not only an introduction into Solaris 11, but a view into the new features and capabilities in Solaris 11.2.  Primarily, these will use Pavel Anni's fantastic hands-on lab and Virtualbox.

OpenStack workshops are a shorter, 2-3 hour event that will let you see what we are up to with OpenStack in Solaris and how OpenStack can help you move into a world of modern cloud computing.

These are only the events that I am participating in.  Take a look at to see the rest of the events.  Also take a look at Bob's blog for more info on where he will be.

Here's my schedule:

Solaris Workshops

OpenStack Workshops

 Hope to see you at one of these events!

Saturday Nov 05, 2011

Solaris 11 Launch - Wish I could be there

Wish I could be in NYC this week!  After being a part of the Solaris team for a long, long time, it's great to see this finally almost here.  

But, if you can be in NYC or catch the webcast, I hope you will.

Join Oracle executives Mark Hurd and John Fowler and key Oracle Solaris Engineers and Execs at the Oracle Solaris 11 launch event in New York, Gotham Hall on Broadway, November 9th and learn how you can build your infrastructure with Oracle Solaris 11 to:

    * Accelerate internal, public, and hybrid cloud applications
    * Optimize application deployment with built-in virtualization
    * Achieve top performance and cost advantages with Oracle Solaris 11–based engineered systems

The launch event will also feature exclusive content for our in-person audience including a session led by the VP of core Solaris development and his leads on Solaris 11 and a customer insights panel during lunch. We will also have a technology showcase featuring our latest systems and Solaris technologies. The Solaris executive team will also be there throughout the day to answer questions and give insights into future developments in Solaris.

Don't miss the Oracle Solaris 11 launch in New York on November 9.

Oracle OpenWorld Hands On Lab - Solaris 10 Patching with Live Upgrade

Finally getting around to posting this.  We had a great turnout for our Solaris 10 Patching with Live Upgrade hands-on-lab at OpenWorld a few weeks back.  

The lab was a self-guided tour through the basics of using Live Upgrade combined with ZFS root to seriously speed up the system maintenance process.  You can find the lab document here. Since this after the actual live lab, the document describes how to create a basic Solaris instance to use to practice patching.  Nothing special - just a basic Solaris 10 VM with ZFS for the root. 

The document also refers to applying patches that we supplied in the lab.  This was just a trimmed down set of the Recommended Patchset.  It was trimmed down just so the lab would fit within our alloted time, not for any functional reason.  So, grab the latest Recommended Patchset from MOS and just use that.

So, take a look at the lab guide.  Move toward using ZFS for the root and Live Upgrade for patching.  This is a huge benefit and a way that you can hours out of your planned maintenance time.

Friday Jul 29, 2011

How much does Live Upgrade patching save you? Lots!

For a long time, I have advocated that Solaris users adopt ZFS for root, storing the operating system in ZFS.  I've also strongly advocated for using Live Upgrade as a patching tool in this case.  The benefits are intuitive and striking, but are they actual and quantifiable?


You can find a number of bloggers on BOC talking about the hows and whys of ZFS root.  Suffice it to say that ZFS has a number of great qualities that make management of the operating system simpler, especially when combined with other tools like Live Upgrade.  ZFS allows for the immediate creation of as many snapshots as you might want, simply by preserving the view of the filesystem meta-data and taking advantage of the fact that all writes in ZFS use copy-on-write, completely writing the new data before releasing the old.  The gives us snapshots for free.

Like chocolate and peanut butter, ZFS and Live Upgrade are two great tastes that taste great together.  Live Upgrade, traditionally was used just to upgrade systems from one release of Solaris (or update release) to another.  However, in Solaris 10, it now becomes a tremendous tool for patching.  With Live Upgrade (LU), the operating system is replicated in an Alternate Boot Environment (ABE) and all of the changes (patches, upgrades, whatever) are done to the copy of the OS while the OS is running, rather than taking the system down to apply maintenance.  Then, when the time is right, during a maintenance window, the new changes are activated by rebooting using the ABE. 

With this approach, downtime is minimized since changes are applied while the system is running.  Moreover, there is a fall-back procedure since the original boot environment is still there.  Rebooting again into the original environment effectively cancels out the changes exactly.

The Problem with Patching

Patching, generally speaking, is something that everyone knows they need to do, but few are really happy with how they do it.  It's not much fun.  It takes time.  It keeps coming around like clockwork.  You have to do it to every system.  You have to work around the schedules of the applications on the system.  But, it needs to be done.  Sort of like mowing the grass every week in the summer.

Live Upgrade can take a lot of the pain out of patching, since the actual application of the patches no longer has to be done while the system is shut down.  A typical non-LU approach for patches is to shut the system down to single-user prior to applying the patches.  In this way, you are certain that nothing else is going on on the system and you can change anything that might need to be updated safely.  But, the application is also down during this entire period.  And that is the crux of the problem.  Patching takes too long; we expect systems to be always available these days.

How long does patching take?  That all depends.  It depends on the number of changes and patches being applied to the system.  If you have not patched in a very long time, then a large number of patches are required to bring the system current.  The more patches you apply, the longer it takes.

It depends on the complexity of the system.  If, for example, there are Solaris zones on the system for virtualization, patches applied to the system are automatically applied to each of the Zones as well.  This takes extra time.  If patches are begin applied to a shut-down system, that just extends the outage. 

It's hard to get outage windows in any case and long outage windows are especially hard to schedule especially when the perceived benefit is small.  Patches are like a flu shot.  They can vaccinate you against problems that have been found, but they won't help if this year has a new strain of flu that we've not seen before.  So, long outage across lots of systems are hard to justify.

So, How Long Does It Really Take

I have long heard people talk about how patching takes too long, but I've not measured it in some time.  So, I decided to do a bit of an experiment.  Using a couple of different systems, neither one very fast or very new, I applied the Solaris 10 Recommended patch set from mid-July 2011.  I applied the patches to systems running different update releases of Solaris 10.  This gives different numbers of patches that have to be applied to bring the system current.  As far as procedure goes, for each test, I shut the system down to single-user (init S), applied the patches, and rebooted.  The times listed are just the time for the patching, although the actual maintenance window in real-life would include time to shut down, time to reboot, and time to validate system operation.  The two systems I used for my tests were an X4100 server with 2 dual-core Opteron processors and 16GB of memory and a Sun Fire V480 server with 4 UltraSPARC III+ processors.  Clearly, these are not new systems, but they will show what we need to see.

 System  Operating System
 Patches Applied
 Elapsed Time (hh:mm:ss)
X4100 Solaris 10 9/10
X4100 Solaris 10 10/09
X4100 Solaris 10 10/08
Solaris 10 9/10 99 00:47:29

For each of these tests, the server is installed with root on ZFS and patches are applied from the Recommended Patchset via the command "./installpatch -d --<pw>" for whatever password this patchset has.  All of this is done while the system was in single-user rather than while running multi-user.

It appears that clock speed is important when applying patches.  The older V480 took three times as long as the X4100 for the same patchset.

And this is the crux of the problem.  Even to apply patches to a pretty current system requires an extended outage.  This does not even take into account the time required for whatever internal validation of the work done, reboot time, application restart time, etc.  How can we make this better?  Let's make it worse first.

More Complicated Systems Take Longer to Patch

Nearly a quarter of all production systems running Solaris 10 are deployed using Solaris Zones.  Many more non-production systems may also use zones.  Zones allow me to consolidate the administrative overhead of only having to patch the global zone rather than each virtualized environment.  But, when applying patches to the global zone, patches are automatically applied to each zone in turn.  So, the time to patch a system can be significantly increased by having  multiple zones.  Let's first see how much longer this might take, and then we will show two solutions.

Operating System
Number of Zones
Patches Applied
Elapsed Time (hh:mm:ss)
Solaris 10 9/10 2 105
X4100 Solaris 10 9/10 20
X4100 Solaris 10 10/09 2
X4100 Solaris 10 10/08 2
Solaris 10 9/10 2

Again, all of these patches were applied to systems in single-user in the same way as the previous set.  Just having two (sparse-root) zones defined took nearly three times as long as just the global zone alone.  Having 20 zones installed took the patch time from 17 minutes to over three hours for even the smallest tested patchset.

How Can We Improve This?  Live Upgrade is Your Friend

There are two main ways that this patch time can be improved.  One applies to systems with or without zones, while the second improves on the first for systems with zones installed.

I mentioned before that Live Upgrade is very much your friend.  Rather than go into all the details of LU, I would refer you to the many other blogs and documents on LU.  Check out especially Bob Netherton's Blog for lots of LU articles.

When we use LU, rather than taking the system down to single-user, we are able to create a new alternate boot environment, using ZFS snapshot and clone capability, while the system is up, running in production.  Then, we apply the patches to that new boot environment, still using the installpatchset command.  For example, "./installpatchset -d -B NewABE --<pw>" to apply the patches into NewABE rather than the current boot environment.  When we use this approach, the patch times that we saw before improve don't change very much, since the same work is being done.  However, all of this is time that the system is not out of service.  The outage is only the time required to reboot into the new boot environment.

So, Live Upgrade saves us all of that outage time.  Customers who have older servers and are fairly out of date on patches say that applying a patch bundle can take more than four or five hours, an outage window that is completely unworkable.  With Live Upgrade, the outage is reduced to the time for a reboot, scheduled when it can be most convenient.

Live Upgrade Plus Parallel Patching

Recently, another enhancement was made to patching so that multiple zones are patched in parallel.  Check out Jeff Victor's blog where he explains how this all works.  As it turns out, this parallel patching works whether you are patching zones in single-user or via Live Upgrade.  So, just to get an idea of how this might help I tried to do some simple measurement with 2 and 20 sparse-root zones created on a system running Solaris 10 9/10.

Operating System
Number of Zones
Patches Applied
Elapsed Time (hh:mm:ss)
Solaris 10 9/10
Solaris 10 9/10 2
X4100 Solaris 10 9/10 20
105 1
X4100 Solaris 10 9/10 20
105 2
X4100 Solaris 10 9/10 20
105 4

num_procs is used as a guide for the number of threads to be engaged in parallel patching.  Jeff Victor's blog (above) and the man page for pdo.conf talk about how this relates to the actual number of processes that are used for patching.

With only two zones, doubling the number of threads has an effect, but not a huge effect, since the amount of parallelism is limited.  However, with 20 zones on a system, boosting the number of zones patched in parallel can significantly reduce the time taken for patching.

Recall that all of this is done within the application of patches with Live Upgrade.  Used alone, outside of Live Upgrade, this can help reduce the time required to patch a system during a maintenance window.  Used with Live Upgrade, it reduces the time required to apply patches to the alternate boot environment.

So, what should you do to speed up patching and reduce the outage required for patching? 

Use ZFS root and Live Upgrade so that you can apply your patches to an alternate boot environment while the system is up and running.  Then, use parallel patching to reduce the time required to apply the patches to the alternate boot environment where you have zones deployed.

Monday Dec 20, 2010

Finish Scripts and First-Boot Services with Bootable AI

In my last couple of blogs, I have talked about using the Automated Installer, specifically using Bootable AI.  We talked about creating a manifest to guide the installation and we talked about creating a system configuration manifest to configure the system identity.  The piece that has not yet been addressed, from the perspective of a long-time Jumpstart user, is how to do the final customization via finish scripts.

In general, there is no notion of an install script that is bundled into a package when using the Image Package System in Solaris 11 Express as there is with SVR4 packages from Solaris 10.  However, a package can install a service that can carry out the same function.  Its operation is a bit more controlled since it has to have its dependencies satisfied and can run in a reduced privilege environment if needed.  Additionally, many of the actions typically scripted into installation scripts, such as creation of users, groups, links, directories, are all built-in actions of the packaging system.

So, the question arises of how to use the IPS packaging system to add our own packages to a system, whether at installation time or later, and how to perform the necessary first-boot customizations to a system we are installing.  The requirement to create our own packages comes from the fact that there is no other way to deliver content to the system being installed during the installation process except through the AI manifest - and that means IPS packages.  In Jumpstart, there was a variable set at installation that pointed to a particular NFS-mounted directory where install scripts could reside.  This was all well and good so long as you could mount that directory.  When it was not available, you were left again with the notion of creating and delivering SVR4 packages via the Jumpstart profile.  So, the situation is not far different than in Solaris 10 and earlier.  There's just a little different syntax and a different network protocol in use to deliver the payload.

Why Use a Local Repository

There are two main reasons to create a local repository for use by AI and IPS.  First, you might choose to replicate the Solaris repository rather than make a home-run through your corporate network firewall to Oracle for every package installation on every server.  Performance and control are clearly going to be better and more deterministic by locating the data closer to where you plan to use it.  The second reason to create a local repository would be to host your own locally provided packages - whether developed locally or provided by an ISV.

The question then arises whether to combine both of these into the same repository.  My personal opinion is that it is better to keep them separate.  Just as it is a good practice to keep the binaries and files of the OS separate from those locally created or provided by applications on the disk, it seems a good idea to keep the repositories separate.  That does not mean that multiple systems are needed to host the data, however.  Multiple repository services can be hosted on the same system on different network ports, pointing to different directories on the disk.  Think of it like hosting multiple web sites with different ports and different htdocs directories.

Rather than go through all the details of creating a local mirror of the Solaris repository, I will refer you to Brian Leonard's blog on that topic.  Here, he talks about creating a local mirror from the repository ISO.  He also shows how you can create a file-based repository for local use.

For much of the rest of this exercise, I am relying on an article in the OpenSolaris Migration Hub that talks about creating a First Boot Service, which in turn references a page on creating a local package repository.  It is well worth it to read through these two pages.

Setting Up A Local Repository

So far, we have avoided the use of any sort of install server.  However, to have a local repository available during installation, this becomes a necessity.  So, pick a server to be used as a package repository.

Rather than clone the Solaris repository, we will create a new, empty repository to fill with our own packages.  We will configure the necessary SMF properties to enable the repository.  And then we will fill it with the packages that we need to deploy.

On the host that will host the repository, select a location in its filesystem and set it aside for this function.  A good practice would be to create a ZFS filesystem for the repository.  In this case, you can enable compression on the repository and easily control its size through quotas and reservations.  In this example, we will just create a ZFS filesystem within the root pool.  Often you will have a separate pool for this sort of function.

# zfs create -p -o mountpoint=/var/repos/ -o compression=on rpool/repos/

Next, we will need to set the SMF properties to support the repository.  The service application/pkg/server is responsible for managing the actual package depot process.  As such, it refers to properties to locate the repository on disk, establish what port to use, etc.

The property pkg/inst_root specifies where on the repository server's disk the repo resides.  

# svccfg -s application/pkg/server setprop pkg/inst_root=/rpool/repo0906/repo

pkg/readonly specifies whether or not the repository can be updated.  Typically, for a cloned Solaris repository, this will be set to true.  It also is a good practice to set it to true when it should not be updated.

# svccfg -s application/pkg/server setprop pkg/readonly=true

pkg/prefix specifies the name that the repository will take when specified as a publisher by a client system.  pkg/port specifies the port where the repository will answer.

# svccfg -s application/pkg/server setprop pkg/prefix=local-pkgs
# svccfg -s application/pkg/server setprop pkg/port=9000

Once the properties are set, refresh and enable the service.

# svcadm refresh application/pkg/server
# svcadm enable application/pkg/server

Creating a First-Boot Package

Now that the repository has been created, we need to create a package to go into that repository.  Since there are no post-install scripts with the Image Packaging System, we will create an SMF service that will be automatically enabled so that it will run when the system boots.  One technique used with Jumpstart was to install a script into /etc/rc3.d that would run late in the boot sequence and would then remove itself so that it would only run on the first boot.  We will take a similar path with our first-boot service.  We will have it disable itself so that it doesn't continue to run on each boot.  There are two parts to creating this simple package.  First, we have to create the manifest for the service, and second, we have to create the script that will be used as the start method within the service.  This area is covered in more depth in the OpenSolaris Migration Hub paper on Creating a First Boot Service.  In fact, we will use the manifest from that paper.

Creating the Manifest

We will use the manifest from Creating a First Boot Service directly and call it finish-script-manifest.xml.  The main points to see here are
  • the service is enabled automatically when we import the manifest
  • the service is dependent on svc:/milestone/multi-user so that it won't run until the system is in the Solaris 11 Express equivalent of run level 3.
  • the script /usr/bin/, which we will provide, is going to be run as the start method when the service begins.


<?xml version="1.0"?>
<!DOCTYPE service_bundle SYSTEM "/usr/share/lib/xml/dtd/service_bundle.dtd.1">

<service_bundle type='manifest' name='Finish:finish-script'>


   <create_default_instance enabled='true' />

    <single_instance />

    <dependency name='autofs' grouping='require_all' restart_on='none' type='service'>
        <service_fmri value='svc:/system/filesystem/autofs:default' />

<dependency name='multi-user' grouping='require_all' restart_on='none' type='service'>
      <service_fmri value='svc:/milestone/multi-user:default' />



        <property_group name='startd' type='framework'>
                <propval name='duration' type='astring' value='transient' />


Creating the Script

In this example, we will create a trivial finish script.  All it will do is log that it has run and then disable itself.  You could go so far as to have the finish script uninstall itself.  However, rather than do that we will just disable the service.  Certainly, you could have a much more expansive finish script, with multiple files and multiple functions.  Our script is short and simple:

svcadm disable finish-script
#pkg uninstall pkg:/finish
echo "Completed Finish Script" > /var/tmp/finish_log.$$
exit 0

Adding Packages to Repository

Now that we have created the finish script and the manifest for the first-boot service, we have to insert these into the package repository that we created earlier.  Take a look at the pkgsend man page for a lot more details about how all of this works.  It's possible with pkgsend to add SVR4 package bundles into the repository, as well as tar-balls and directories full of files.  Since our package is simple, we will just insert each package.

When we open the package in the repository, we have to specify the version number.  Take a good look at the pkg(5) man page to understand the version numbering and the various actions that could be part of the package.  Since I have been working on this script, I have decided that it is version 0.3.  We start by opening the package for insertion.  Then, we add each file to the package in the repository, specifying file ownership, permissions, and path.  Once all the pieces have been added, we close the package and the FMRI for the package is returned.

# eval `pkgsend -s http://localhost:9000/ open finish@0.3`
# pkgsend -s http://localhost:9000/ add file finish-script-manifest.xml mode=0555 owner=root group=bin path=/var/svc/manifest/system/finish-script-manifest.xml restart_fmri=svc:/system/manifest-import:default
# pkgsend -s http://localhost:9000/ add file mode=0555 owner=root group=bin path=/usr/bin/
# pkgsend -s http://localhost:9000/ close

Updating the AI Manifest

Now that the repository is created, we can add the new repository as a publisher to verify that it has the contents we expect.  On the package server, itself, we can add this publisher.  Remember that we set the prefix for the publisher to local-pkgs and that we specified it should run on port 9000.  This name could be anything that makes sense for your enterprise - perhaps the domain for the company or something that will identify it as local rather than part of Solaris is a good choice.

# pkg set-publisher -p http://localhost:9000 local-pkgs
pkg set-publisher:
  Added publisher(s): local-pkgs
# pkg list -n "pkg://local-pkgs/\*"
NAME (PUBLISHER)                                    VERSION         STATE      UFOXI
finish (local-pkgs)                                 0.3             known      -----

Now that we know the repository is on-line and contains the packages that we want to deploy, we have to update the AI manifest to reflect both the new publisher and the new packages to install.  First, to update the publisher, beneath the section that specifies the default publisher, add a second source for this new publisher:

        <publisher name="solaris">
          <origin name=""/>
         <publisher name="local-pkgs">
           <origin name="http://my-local-repository-server:9000/"/>

Then, add the packages to the list of packages to install.  Depending on how you named your package, you may not need to specify the repository for installation.  IPS will search the path of repositories to find each package.  However, it's not a bad idea to be specific.

      <software_data action="install" type="IPS">

Now, when you install the system using this manifest, in addition to the regular Solaris bits being installed, and the system configuration services executing, the finish script included in your package will be run on the first boot.  Since the script called by the service turns itself off, it will not continue to run on subsequent boots.  You can do whatever sort of additional configuration that you might need to do.  But, before you spend a long time converting all of your old Jumpstart scripts into a first-boot service, take a look at the built-in capabilities of AI and of Solaris 11 Express in general.  It may be that much of the work you had to code yourself before is no longer required.  For example, IP interface configuration is simple and persistent with ipadm.  Other new functions in Solaris 11 Express remove the need to write custom code, too.  But for the cases where you need custom code, this sort of first-boot service gives you a hook so that you can do what you need.

Booting and Installing with Bootable AI

My last couple of blogs have been about creating a manifest to be used when installing a system with the Solaris 11 Express Automated Installer.  Now that we have this basic manifest constructed, let's install a system.

To review, the Automated Installer is the facility in Solaris 11 Express that supports network-based installation.  The manifest used with AI determines what is installed and how the system is customized during installation.  

Typically, one would set up a network install service and fetch the manifest from there.  However, sometimes this is not desired or practical.  An option in these cases is to use bootable AI.  In this case, you boot the system to be installed from the AI ISO image.  During the boot process, you are prompted for a URL that points to a valid AI manifest.  This manifest is just fetched using HTTP (wget is actually used).  So, so long as you can get to the manifest, you are good to go.  Once fetched, the manifest is validated and acted upon to complete the installation.

In this installment, we will go through the boot process.  In particular, I will show what this looks like on an x86 host.  For details on SPARC, see my previous blog.

So, to start, boot your system from the Automated Install (AI) ISO.  When presented with the Grub menu, the default selection is to use a custom manifest.  This selection is the one we want and will prompt us for the URL of the manifest.  The other options allow you to use the default manifest built into the ISO and to perform the installation across a serial connection.

After the system has booted the small Solaris image on the AI ISO, you will be prompted for the URL of the manifest.

AI fetches the manifest and begins the installation process.  The installation goes on in the background rather than on the console.  Like the LiveCD installation, Solaris is running at this point.  If you want to monitor the progress of the installation, log in as the default admin user included in the ISO image.  The default login name is jack, with the password jack.  Once you have logged in, you can monitor the progress of the installation by tailing the installation log, found in /tmp/install_log.

Since Solaris is up and running, it is possible to enable network logins to the system while it is still installing.  To do this, su to root and enable the ssh service  (svcadm enable ssh).  Once ssh is enabled, you can ssh into the system as jack.  I have sometimes found this to be a useful tool when installing a virtual machine over a slow and unreliable network, where VNC is unable to sustain its required bandwidth.

Once the installation completes, you can reboot the system.  Of course, the regular first-boot SMF import will happen.  And the services that we configured in the last section will be activated to configure the networks and system identity. Once all of this complete, the system is ready for use.  The piece that you might have noticed is missing is any sort of finish-script customization.  Stay tuned for future installments to cover this.

Using the bootable AI, it is simple to provide a manifest via a simple URL and perform a near-hands-free, customized installation of the system.  It is important to note that DHCP is still used to fetch an address for the system, along with routing and DNS information.  For a truly hands-free installation, a network-based install server and install service would have to be created.

Using System Configuration Manifests with Bootable AI

In my last blog, I talked about how to configure a manifest for a bootable AI installation.  The main thing there was how to select which packages to install.  This time we are going to talk about how to handle AI's version of sysidcfg and configuring system identity at install-time.

In a Jumpstart world, many of the things that make up a system's identity - hostname, network configuration, timezone, naming services, etc. - can be configured at installation time by providing a sysidcfg file.  Alternately, an interactive dialog starts and prompts the installer for this sort of information.

The System Configuration manifest provides this same sort of information in an Automated Installer world.  The documentation for AI shows how to create either a separate or an embedded SC manifest to be served by an AI server.  When using Bootable AI, the SC manifest needs to be embedded within the AI manifest.  The SC manifest, whether embedded or not, is basically an XML document that is providing a bunch of properties for SMF services that are going to run on the first system boot to help complete the system configuration.  Some of the main tasks that can be completed in the SC manifest are:

  • Identify and configure the administrative "first user" created at install time.
  • Specify a root password and whether root is a role or a standard user
  • Configure timezone, hostname, keyboard maps, terminal type
  • Specify whether the network should be configured automatically or manually.
  • Configure network settings, including DNS, for manually configured networks

But, in the end, all of this is just setting SMF properties, so it's pretty straightforward.  It appears as a large service_bundle with properties for multiple SMF services.

As far as including the SC manifest information in the bootable AI manifest, the SC manifest is essentially embedded into the AI manifest as a large comment.  Don't be put off by the comment notation.  This whole section is passed on to SMF to assign the necessary service properties.

In order to explain the various sections of properties, I will just annotate an updated SC manifest.  In this manifest, I will specify some of the more common configuration settings you might use. 

The whole SC embedded manifest is identified within the AI manifest with the tag sc_embedded_manifest.  See the Automated Installer Guide for more details on the rest of the options to this tag.  The two lines following the sc_embedded_manifest tag are just the top part of the stand-alone SC manifest XML document.  Look in the default AI manifest for exact placement of this section.

    <sc_embedded_manifest name="AI">
      <!-- <?xml version='1.0'?>
      <!DOCTYPE service_bundle SYSTEM "/usr/share/lib/xml/dtd/service_bundle.dtd.1">    

The rest of the SC manifest sets up service properties for a service bundled named "system configuration."

      <service_bundle type="profile" name="system configuration">

The service system/install/config is responsible for doing some of the basic configuration actions at install-time, such as setting up the first, or admin, user, setting a root password, and giving the system a name.  The property group "user_account" specifies how the first user, used for administration, should be configured.  You can specify here the username (name="login"), an encrypted password, the GECOS field information (name="description"), as well as the UID and GID for the account.  Note that the default password supplied for the first user (by default, named jack) in the default manifest is "jack".

Special note should be made of the property "roles".  Recall that in Solaris 11 Express, root is no longer a regular login user, but becomes a role.  Therefore, in order to be able to assume the root role for administrative functions, this first user needs to be given the root role.  Other roles can also be specified here as needed.  Also notice that the profile "Primary Administrator" is no longer assigned to this first user, as was done in OpenSolaris.  Additional properties around roles, profiles, authorizations, etc. may be assigned.  See the Automated Installer Guide for details.

        <service name="system/install/config" version="1" type="service">
          <instance name="default" enabled="true">
            <property_group name="user_account" type="application">
              <propval name="login" type="astring" value="sewr"/>
              <propval name="password" type="astring" value="9Nd/cwBcNWFZg"/>
              <propval name="description" type="astring" value="default_user"/>
              <propval name="shell" type="astring" value="/usr/bin/bash"/>
              <propval name="uid" type='count' value='27589'/>
              <propval name="gid" type='count' value='10'/>
              <propval name="type" type="astring" value="normal"/>
              <propval name="roles" type="astring" value="root"/>

As with Jumpstart, it is possible to specify a root password at install time.  The encrypted string for the root password is given here as the password property.  If no new password is supplied, the default root password at install-time is "solaris".  Also note here that root is created as a role rather than a regular login user.

            <property_group name="root_account" type="application">
                <propval name="password" type="astring" value="$5$dnRfcZs$Hx4aBQ161Uvn9ZxJFKMdRiy8tCf4gMT2s2rtkFba2y4"/>
                <propval name="type" type="astring" value="role"/>

A few other housekeeping properties can also be set here for the system/install/config service.  These include the local timezone and the hostname (/etc/nodename) for the system.

            <property_group name="other_sc_params" type="application">
              <propval name="timezone" type="astring" value="US/Eastern"/>
              <propval name="hostname" type="astring" value="myfavoritehostname"/>

The system/console-login service establishes the login service for the console. Here you can specify the terminal type to be used for the console.

        <service name="system/console-login" version="1" type="service">
          <property_group name="ttymon" type="application">
            <propval name="terminal_type" type="astring" value="xterms"/>

The service system/keymap establishes what sort of keyboard input is to be expected on the system.

        <service name='system/keymap' version='1' type='service'>
          <instance name='default' enabled='true'>
            <property_group name='keymap' type='system'>
              <propval name='layout' type='astring' value='US-English'/>

By default, Solaris 11 Express enabled NWAM (NetWork Automagic) to automatically configure the primary network interface.  NWAM, by default, activates a primary network for the system, whether wired or wireless, monitors its availability and tries to restore network connectivity if it should go away.  Most people would say that its behavior is best suited for mobile or desktop systems, and it functions well in that space.  It includes the ability to have profiles that guide its behavior in a variety of networked environments.  NWAM relies on DHCP to get an available IP address and other data needed to configure the network.

In the default AI profile, the network/physical:nwam service instance is enabled and the network/physical:default service instance is disabled.  In most server configurations, static addressing and configuration might be more desirable.  In that case, you can do as we have below and switch with service instance is enabled and which is disabled by default.

        <service name="network/physical" version="1" type="service">
          <instance name="nwam" enabled="false"/>
          <instance name="default" enabled="true"/>

In the case where we are doing static network configuration, we will rely on the network/install service to set up our networks.  The properties and values used here correspond to arguments to the ipadm command, new in Solaris 11 Express.  ipadm is used to configure and tune IP interfaces.  See its man page for details on syntax.

In this case, we are setting up a single IPv4 network interface (xnf0), giving it a static IP address and netmask, and specifying a default route.

        <service name="network/install" version="1" type="service">
          <instance name="default" enabled="true">
            <property_group name="install_ipv4_interface" type="application">
              <propval name="name" type="astring" value="xnf0/v4"/>
              <propval name="address_type" type="astring" value="static"/>
              <propval name="static_address" type="net_address_v4" value=""/>
              <propval name="default_route" type="net_address_v4" value=""/>

As with Jumpstart using sysidcfg, it is possible to set up DNS information at install-time.  Note that only DNS and not NIS or LDAP naming services can be set up this way.  The System Administration Guide: Naming and Directory Services manual discusses how to configure these naming services.  NIS+ is no longer supported in Solaris 11 Express.

The network/dns/install service is used to set up DNS at install-time.  For this, we specify the regular sorts of data that will populate the /etc/resolv.conf file: nameservers, domain, and a domain name search path.  Some of these data items take multiple values, so lists of values are used, as shown below.

        <service name="network/dns/install" version="1" type="service">
          <instance name="default" enabled="true">
            <property_group name="install_props" type="application">
              <property name="nameserver" type="net_address">
                  <value_node value=""/>
                  <value_node value=""/>
                  <value_node value=""/>
              <propval name="domain" type="astring" value=""/>
              <property name="search" type="astring">
                  <value_node value=""/>
                  <value_node value=""/>
                  <value_node value=""/>

And we close out the service bundle and the embedded SC manifest.


So, by building a custom AI manifest with its embedded SC manifest, you can accomplish the same sorts of install-time configuration of a system as you could with Jumpstart and sysidcfg, without having to build any sort of complex finish scripts or any kind of extra coding.  This approach makes is possible to have a repeatable methodology for creating the administrative user, with known, standard credentials, and for configuring the base system networks and naming services.

Sunday Dec 19, 2010

Configuring Bootable AI in Solaris 11 Express

This is the first of several blogs around bootable AI, the ability with the Solaris 11 Express automated installer to boot directly from the AI ISO, fetch an installation manifest, and act on it, without having to set up an AI install server. Most of this focuses on the manifest, so it applies to AI booted from the network and from the ISO. However, I do not plan to go into creating and configuring the AI network services, at least not right now.  I think that other folks have talked about this already.

Solaris 11 Express includes the Automated Installer as the tool for performing automated network system installations. Using AI, it is possible to install a customized load of Solaris 11 Express, across the network, without manual intervention. AI allows you to specify which repositories to use, which packages to install from the repositories, and how to handle the initial system configuration (like sysidcfg) for the system.

Generally speaking, AI relies on a network automated install service to answer requests from a client trying to install it self. This is sort of like the jumpstart approach. The client starts to boot, looks around to see who on the network can help out, fetches what it needs from that network server, and goes ahead with the installation.

But, getting the base install information from the network isn't always feasible or even the most expeditious path. So, AI has a feature called "Bootable AI". I've blogged about this before, when it first came out in OpenSolaris.

The idea of bootable AI is that rather than rely on a network service to fetch the needed information to boot, you boot from the AI media instead. During the boot process, you can be prompted for a URL where an installation manifest can be found. The client fetches this manifest and carries on, just as it would from the AI service. The upside to this is that no AI service has to be installed on the network. The downside is that it does require at least one interaction if you want to specify a non-default manifest.

The manifest is the XML specification of how and what to install on the system. There is quite a lot that can be done in terms of selecting which disks to use, how to partition them, etc. Check the Oracle Solaris 11 Express Automated Installer Guide for detailed information on this.

In this note, I am going to show you how to put together some of the key parts of the manifest. My main goal is to then use this with bootable AI, but the same manifest can be installed with an install service in an AI server.

Locating a default manifest

How to get started? The simplest way to make a manifest is to start with the default manifest, delivered in the AI iso, and modify it to suit your needs. A default manifest is located in auto_install/default.xml in the AI ISO. Copy this and modify it as needed.

Selecting "Server Install" packages

When Solaris 11 Express is installed via the LiveCD or via the default manifest in AI, a full, desktop version of Solaris is installed. Often, however, when you use AI, you would prefer to have the smaller server installation provided by the text installer. Since the manifest specifies which top-level packages to install, this is easily accomplished.

In the default manifest, look for the software_data section with the install action. This section specifies what packages are to be installed. The two packages listed here are group packages, sort of like package clusters in Solaris 10. entire and babel_install are the packages that, when installed, provide the environment installed from the LiveCD. In order to get a reduced installation like that from the text installer, replace babel_install with server_install. If there are other packages that you want to add to the installation (for example the iSCSI packages referenced in the comments), you can add them here.

Change this section:

<software_data action="install" type="IPS">

to this section:

<software_data action="install" type="IPS">

Uninstalling the appropriate packages

The server_install package bundle has dependencies of the packages that make up the reduced server installation. By installing it, we get all of the other packages that come with it. That's part of the coolness of IPS. However, we also want to preserve the ability to uninstall or modify individual components of that overall bundle. So, we finish out our installation by uninstalling the server_install wrapper. This does not affect the dependent packages; it just unwraps them so we can modify them directly. So, to do this, update the uninstall section as below.

Additionally, even with the reduced server installation, there may still be packages that we want to remove. For example, there are still over 700MB of system locales installed that you may not need and might choose to remove. You can add any other packages that you want to remove in this section as well. Note that this really does first install the package and then remove it. Seems sort of redundant, but I have not yet found a way to cause IPS to build a plan that would note the uninstalled packages and just mark them to be skipped during installation.

Change this:

        babel_install and slim_install are group packages used to
        define the default installation.  They are removed here so
        that they do not inhibit removal of other packages on the
        installed system.
      <software_data action="uninstall" type="IPS">

to this:

<software_data action="uninstall" type="IPS">
        <!-- ... -->

Any other packages that you want to uninstall can be listed here, too.

So, you see how easy it is to build a manifest for AI that specifies which packages you want to include or exclude and how to create a smaller, server installation for Solaris 11 Express.

Wednesday Mar 03, 2010

Installing OpenSolaris in Oracle VM for x86

Yeh, Virginia, you can install OpenSolaris as a guest in an Oracle VM for x86 environment!  Here is a little step-by-step guide on how to do it.  I am going to assume that you already have downloaded and installed Oracle VM Server.  Further, I am going to assume that you have already created an Oracle VM Manager system to manage your OVM farm.  All of that is more than I can tackle today.  But, to get you started, you can fetch the OVM am VM Manager bits from .  I found the simplest thing to do was to create an Oracle Enterprise Linux management system and install the VM Manager there.

Once you have the OVM environment established, you need to get some installation media to install the guests.  For OpenSolaris, here's the magic: Bootable AI.  Check out Alok's blog for more details on exactly what the Bootable AI project is about.  But in a nutshell, this make it so that you can install OpenSolaris as if you were using a network install server, but while you are booted from installation media.  This gets around the difficulty of trying to do an installation using a LiveCD in a tiny VNC window and the difficulty of trying to get a network, PXE-based installation working.  This is a quick and easy way to go.

Fetch the OpenSolaris AI iso rather than the regular LiveCD iso.  Install this into the VM Manager resource repository. (Remember, I assum you know how to do this, or can figure it out pretty easily.  I did.)

Now, create a VM just as you always would for Oracle Enterprise Linux, Solaris, Windows, or whatever.  Select Install from installation media, and use the iso that you just added to the repostiory.  When you specify what operating system this VM will run, select "Other" since it isn't one of the pre-defined choices.  Start the creation and away you go.

As you have already figured out from Alok's blog, this is only half of the story.  You still must create an AI manifest.  The manifest details which packages to install and from where, along with the details for the first user created, root password, etc.  Check out the Automated Installation Project page for details on this.  The docs are pretty good and the minimum manifest needed for bootable AI is pretty basic.  Alok talks about how to specify booting from the development repository.  That was the only change to the default manifest that I made.

Put this manifest somewhere accessible via http from the VM you want to create.  The VM you created is sitting, waiting for you to tell it where to fetch its manifest so it can boot.  You really don't want to keep it waiting much longer.

Connect to the VM using VNC. You can use the built-in VNC client in VM Manager or whatever VNC client you like best. I tend to use vncviewer because it seems to manage the screen resolutions better than the Java client. When the installer prompts you for the manifest, enter the URL for the manifest you just made. The installer will fetch it, validate it, and then go on with its usual installation using that manifest. This is so simple and so cool!

Installation proceeds like it would with an install server.  You can log in on the console of the system being installed and monitor is progress.  Then, when it's done, reboot and you are done.

One note:  I have run into difficulting with OpenSolaris b133 and this approach.  When I used the b133 iso, even though I never got an error, the resulting VM was not bootable.  (No, I haven't got around to filing a bug on this.  Was going to wait until b134.)  However, if I use the b131 iso and a manifest that referenced installing entire@0.5.11-0.133 things worked out just fine.  So, give that a shot.

Once you have created a VM that you like within Oracle VM, you can do all of the cool Oracle VM things - convert it to a template and stamp out lots of copies, move it from server to server, etc.  But that's for another day.  Or that's something to look for in Honglin Su's blog.

Thursday Feb 18, 2010

ATLOSUG February Slides Posted

Slides from the February meeting of the Atlanta OpenSolaris User Group are posted at . The topic for this last meeting was "Oracle RAC on Solaris Logical Domains - Part 1". 

Don't miss part 2 of this presentation on March 9.

Tuesday Feb 16, 2010

It is indeed possible to install Oracle 11.2 on OpenSolaris

Found this cool post while checking my Google updates today.  Had not read this blog before, but I think I will give it a look. 

It is indeed possible to install Oracle 11.2 on OpenSolaris

Wednesday Feb 03, 2010

Oracle Virtual Machine Manager and OpenSolaris

This is pretty cool!  With a whole new set of products to become familiar with, I am jumping into Oracle VM Server and Oracle VM Manager.  So far, I've got 

  • Oracle VM Server installed on a pair of X8420 blades and have them set up as an HA pool
  • Oracle VM Manager installed on a v20z as my management interface
  • Oracle Enterprise Linux VMs - both PVM and HVM, installed from media and from pre-built templates
  • Nevada b130 as an HVM guest
  • Cloned VMs, migrated VMs on the fly between the blades
  • And now, I am installing OpenSolaris b131 using Bootable AI in a VM
Looks to me like the future is bright with lots of cool new tools that we can combine to get even more out of the technologies that both Sun and Oracle have created.

Tuesday Jan 26, 2010

Unnatural Acts with AI

I'm pretty sure this is not what the AI team had in mind when they gave us bootable AI.  But in my quest to see what the oldest piece of gear I can run OpenSolaris, here's a fun one:

jack@opensolaris:~$ uname -a
SunOS opensolaris 5.11 snv_130 sun4u sparc SUNW,UltraAX-i2 Solaris
jack@opensolaris:~$ cat /etc/release
                      OpenSolaris Development snv_130 SPARC
           Copyright 2009 Sun Microsystems, Inc.  All Rights Reserved.
                        Use is subject to license terms.
                           Assembled 18 December 2009
telnet> send brk
Type  'go' to resume
ok banner
Sun Fire V100 (UltraSPARC-IIe 500MHz), No Keyboard
OpenBoot 4.0, 1024 MB memory installed, Serial #51701117.
Ethernet address 0:3:ba:14:e5:7d, Host ID: 8314e57d.
ok go
jack@opensolaris:~$ prtdiag
System Configuration:  Sun Microsystems  sun4u Sun Fire V100 (UltraSPARC-IIe 500MHz)
System clock frequency: 100 MHz
Memory size: 1024 Megabytes

Pretty much took forever to install, but it works like a champ.  More news as it occurs!

Friday Jan 15, 2010

Bootable AI ISO is way cool

Alok Aggarwal posted, just before Christmas, a blog mentioning that the ISO images for the Auto Installer in OpenSolaris are now bootable.  Not just for x86 but also for SPARC.

This is huge!  While it does not provide a LiveCD desktop environment for SPARC, it does give us a way to easily install OpenSolaris on  SPARC gear.  Previously, it was necessary to set up an AI install server (running on an x86 platform since that was the only thing you could install natively) and use WAN Boot to install OpenSolaris on the SPARC boxes.  Well, that was a tough hurdle for some of us to get over.

Now, you can burn the AI ISO to a CD and boot it directly.  The default manifest on the disk will install a default system from the  release repository.   Or, better yet, build a simple AI manifest that changes the release repository to the dev repo and put it somewhere you can fetch via http.  When you boot up, you will be prompted for the URL of the manifest.  AI will fetch it and use it to install the system.

{2} ok boot cdrom - install prompt
Resetting ...

Sun Fire 480R, No Keyboard
Copyright 1998-2003 Sun Microsystems, Inc.  All rights reserved.
OpenBoot 4.10.8, 16384 MB memory installed, Serial #57154911.
Ethernet address 0:3:ba:68:1d:5f, Host ID: 83681d5f.

Rebooting with command: boot cdrom - install prompt
Boot device: /pci@8,700000/ide@6/cdrom@0,0:f  File and args: - install prompt
SunOS Release 5.11 Version snv_130 64-bit
Copyright 1983-2009 Sun Microsystems, Inc.  All rights reserved.
Use is subject to license terms.
Hostname: opensolaris
Remounting root read/write
Probing for device nodes ...
Preparing automated install image for use
Done mounting automated install image
Configuring devices.
Enter the URL for the AI manifest [HTTP, default]: http://<my web server>/bootable.xml

See!  This is really easy and gives new life to really old gear.  In this case, the manifest is super simple, too.  I just grabbed the default manifest from an AI image and changed the repository and package to install.

$ pfexec lofiadm -a `pwd`/osol-dev-130-ai-x86.iso
$ pfexec mount -o ro -F hsfs /dev/lofi/1 /mnt
$ cp /mnt/auto_install/default.xml /etc/apache2/2.2/htdocs/bootable.xml

Edit this file and change

<main url="" publisher=""/>


<main url="" publisher=""/>

Or as a speedup, add the mirror to :

<main url="" publisher=""/>
<mirror url=""/>

And change

<pkg name="entire"/>


<pkg name="entire@0.5.11-0.130"/>

You can add a mirror site for the repo in this manifest.  Or you can list other packages that you want to be installed as the system is installed.  The docs for the AutoInstaller talk about how to create and modify a manifest.

Some caveats that I found:  First, NWAM and DHCP might take longer than you think.  If you quickly try to type in the URL for the manifest, you may find that you have no network yet and become concerned.  I spent the better part of a day on this.  Then, I let it sit for a couple of minutes before trying the manifest URL and life was good.  My DHCP server is particularly slow on my network.

Second, not using the mirror, on a slow system took a really long time to install.  Have not diagnosed it to network download time or processing time.  I think some of both since things like the installation phase of babel_install took nearly an hour on one system.

Third, there must be a lower bound on what sort of system will work.  T2000 works just fine.  SF480R has worked fine.  My SF280R is busted - as soon as it's fixed, I'll try it.  Not so great on E220 and E420 systems.  They appear to work, but at the very end it says it failed.  The only failure message I can see this time is due to the installer finding a former non-global zone environment on the disk. But so far, my experience on UltraSPARC-II systems is that once the installation completes, it hangs on the first reboot or fails to boot at all.  I am not surprised that systems that are no longer supported are not supported by AI.  I think I saw in Alok's notes that OBP 4.17 was the minimum supported.  That means my USII boxes are right out, and  I think even the SF280.  I hate doing firmware updates, so I have not updated the SF480.

Fourth, when I tried to install on a system that previously had the root disk mirrored with SVM, zpool create for the root pool failed.  I had to delete the metadbs and the metadevices before I could proceed.

But, I am very impressed!  Bootable AI media is way cool.  Keep your eyes and ears open, though, for more developments in the AutoInstaller in the coming months.

Wednesday Jan 13, 2010

ATLOSUG January Slides Posted

Slides from the January meeting of ATLOSUG - the Atlanta OpenSolaris User Group - are posted at

Next meeting will be February 9, 2010.  Check our web site for details.

Time to move to OpenSolaris completely

The last build of SXCE, the Solaris Express Community Edition, Build 130 has been released.  So, what?

Well, this means that it's time for all of us laggards who have been basking in the glow of new features and capabilities given to us by the Solaris developers, but who have not been willing to take the plunge into OpenSolaris completely, need to get off the fence and move straight away to OpenSolaris.

I made that move over the holidays.  Got a new laptop.  Perfect time to make the move.  Used to be, I would run SXCE natively on my laptop and run OpenSolaris in a VirtualBox.  My rationale is that I do a lot of demonstrations for customers and I wanted my laptop to look as much like the production Solaris 10 as possible, while still getting the cool new stuff. 

Now, I run OpenSolaris native on the laptop and run Solaris 10 in a VirtualBox when I need it.

Turns out the migration has been remarkably painless.  My only hassle was actually moving my own data from one laptop to the other.

I guess that in the eggs and bacon breakfast of OSes, I have moved from being the chicken (involved in the process) to being the pig (fully committed).  And this is some tasty, thick sliced, smoked bacon!  Mmmm.

Monday Dec 14, 2009

Sillyt ZFS Dedup Experiment

Just for grins, I thought it would be fun to do some "extreme" deduping.  I started out created a pool from a pair of mirrored drives on a system running OpenSolaris build 129.  We'll call the pool p1.  Notice that everyone agrees on the size when we first create it.  zpool list, zfs list, and df -h all show 134G available, more or less.  Notice that when we created the pool, we turned deduplication on from the very start.

# zpool create -O dedup=on p1 mirror c0t2d0 c0t3d0
# zfs list p1
p1      72K   134G    21K  /p1
# zpool list p1
p1     136G   126K   136G     0%  1.00x  ONLINE  -
# df -h /p1
Filesystem             size   used  avail capacity  Mounted on
p1                     134G    21K   134G     1%    /p1

So, what if we start copying a file over and over?  Well, we would expect that to dedup pretty well.  Let's get some data to play with.  We will create a set of 8 files, each one being made up of 128K of random data.  Then we will cat these together over and over and over and over and see what we get.

Why choose 128K for my file size?  Remember that we are trying to deduplicate as much as possible within this dataset.  As it turns out, the default recordsize for ZFS is 128K.  ZFS deduplication works at the ZFS block level.  By selecting a file size of 128K, each of the files I create fits exactly into a single ZFS block.  What if we picked a file size that was different from the ZFS block size? The blocks across the boundaries, where each file was cat-ed to another, would create some blocks that were not exactly the same as the other boundary blocks and would not deduplicate as well.

Here's an example.  Assume we have a file A whose contents are "aaaaaaaa", a file B containing "bbbbbbbb", and a file C containing "cccccccc".  If our blocksize is 6, while our files all have length 8, then each file spans more than 1 block.

# cat A B C > f1
# cat f1
# cat B A C > f2
# cat f2

The combined contents of the three files span across 4 blocks.  Notice that the only block in this example that is replicated is block 4 of f1 and block 4 of f2.  The other blocks all end up being different, even though the files were the same.  Think about how this would work as files numbers of files grew.

So, if we want to make an example where things are guaranteed to dedup as well as possible, our files need to always line up on block boundaries (remember we're not trying to be a real world - we're trying to get silly dedupratios).  So, let's create a set of files that all match the ZFS blocksize.  We'll just create files b1-b8 full of blocks of /dev/

# zfs get recordsize p1
p1    recordsize  128K     default
# dd if=/dev/random bs=1024 count=128 of=/p1/b1

# ls -ls b1 b2 b3 b4 b5 b6 b7 b8
 257 -rw-r--r--   1 root     root      131072 Dec 14 15:28 b1
 257 -rw-r--r--   1 root     root      131072 Dec 14 15:28 b2
 257 -rw-r--r--   1 root     root      131072 Dec 14 15:28 b3
 257 -rw-r--r--   1 root     root      131072 Dec 14 15:28 b4
 257 -rw-r--r--   1 root     root      131072 Dec 14 15:28 b5
 257 -rw-r--r--   1 root     root      131072 Dec 14 15:28 b6
 257 -rw-r--r--   1 root     root      131072 Dec 14 15:28 b7
 205 -rw-r--r--   1 root     root      131072 Dec 14 15:28 b8

Now, let's make some big files out of these.

# cat b1 b2 b3 b4 b5 b6 b7 b8 > f1
# cat f1 f1 f1 f1 f1 f1 f1 f1 > f2
# cat f2 f2 f2 f2 f2 f2 f2 f2 > f3
# cat f3 f3 f3 f3 f3 f3 f3 f3 > f4
# cat f4 f4 f4 f4 f4 f4 f4 f4 > f5
# cat f5 f5 f5 f5 f5 f5 f5 f5 > f6
# cat f6 f6 f6 f6 f6 f6 f6 f6 > f7

# ls -lh
total 614027307
-rw-r--r--   1 root     root        128K Dec 14 15:28 b1
-rw-r--r--   1 root     root        128K Dec 14 15:28 b2
-rw-r--r--   1 root     root        128K Dec 14 15:28 b3
-rw-r--r--   1 root     root        128K Dec 14 15:28 b4
-rw-r--r--   1 root     root        128K Dec 14 15:28 b5
-rw-r--r--   1 root     root        128K Dec 14 15:28 b6
-rw-r--r--   1 root     root        128K Dec 14 15:28 b7
-rw-r--r--   1 root     root        128K Dec 14 15:28 b8
-rw-r--r--   1 root     root        1.0M Dec 14 15:28 f1
-rw-r--r--   1 root     root        8.0M Dec 14 15:28 f2
-rw-r--r--   1 root     root         64M Dec 14 15:28 f3
-rw-r--r--   1 root     root        512M Dec 14 15:28 f4
-rw-r--r--   1 root     root        4.0G Dec 14 15:28 f5
-rw-r--r--   1 root     root         32G Dec 14 15:30 f6
-rw-r--r--   1 root     root        256G Dec 14 15:49 f7

This looks pretty weird.  Remember our pool is only 134GB big.  Already the file f7 is 256G and we are not using any sort of compression.  What does df tell us?

# df -h /p1
Filesystem             size   used  avail capacity  Mounted on
p1                     422G   293G   129G    70%    /p1

Somehow, df now believes that the pool is 422GB instead of 134GB.  Why is that?  Well, rather than reporting the amount of available space by subtracting used from size, df now calculates its size dynamically as the sum of the space used plus the space available.  We have lots of space available since we have many many many duplicate references to the same blocks.

# zfs list p1
p1     293G   129G   293G  /p1
# zpool list p1
p1     136G   225M   136G     0%  299594.00x  ONLINE  -

zpool list tells us the actual size of the pool, along with the amount of space that it views as being allocated and the amount free.  So, the pool really has not changed size.  But the pool says that 225M are in use.  Metadata and pointer blocks, I presume.

Notice that the dedupratio is 299594!  That means that on average, there are almost 300,000 references to each actual block on the disk.

One last bit of interesting output comes from zdb.  Try zdb -DD on the pool.  This will give you a histogram of how many blocks are referenced how many times.  Not for the faint of heart, zdb will give you lots of ugly internal info on the pool and datasets. 

# zdb -DD p1
DDT-sha256-zap-duplicate: 8 entries, size 768 on disk, 1024 in core

DDT histogram (aggregated over all DDTs):

bucket              allocated                       referenced         
______   ______________________________   ______________________________
refcnt   blocks   LSIZE   PSIZE   DSIZE   blocks   LSIZE   PSIZE   DSIZE
------   ------   -----   -----   -----   ------   -----   -----   -----
  256K        8      1M      1M      1M    2.29M    293G    293G    293G
 Total        8      1M      1M      1M    2.29M    293G    293G    293G

dedup = 299594.00, compress = 1.00, copies = 1.00, dedup \* compress / copies = 299594.00

So, what's my point?  I guess the point is that dedup really does work.  For data that has a commonality, it can save space.  For data that has a lot of commonality, it can save a lot of space.  With that come some surprises in terms of how some commands have had to adjust to changing sizes (or perceived sizes) of the storage they are reporting.

My suggestion?  Take a look at zfs dedup.  Think about where it might be helpful.  And then give it a try!

Friday Dec 11, 2009

ATLOSUG December Meeting slides posted

We had a great meeting of ATLOSUG, the Atlanta OpenSolaris User Group, this past Tuesday.  20+ people attended our first meeting with our new host, GCA Technology Services, at their training facility in Atlanta.  A big "Thank You" to Dawn and GCA for hosting our group.

Our topic this time was "What's New In ZFS" and we talked about some of the new features that have gone into ZFS recently, especially DeDupe.  George Wilson of the ZFS team was kind enough to share some slides that he had been working on and they are posted here.

Our next meeting will be Tuesday, January 12 at GCA.  Details and info can be found on the ATLOSUG website at

Monday Aug 17, 2009

ATLOSUG COMSTAR slides posted

Slides from last week's meeting of the Atlanta OpenSolaris User Group (ATLOSUG) are posted now on the group website -

We had a good group of about 16 people in attendance and a great discussion around how and why to use COMSTAR. 

The next meeting will be held on Sept. 8.  The topic will be how COMSTAR and other OpenSolaris technologies fit together in the Sun Unified Storage family of products.  Hope to see you there!

Monday Jun 29, 2009

Quick Review of Pro OpenSolaris

Pro OpenSolaris - Harry Foxwell and Christine Tran

Several (too many) weeks ago, I said that I was going to read and review Harry & Christine's new book, Pro OpenSolaris. Finally, I am getting around to doing this.

Overall, I was pleased with Pro OpenSolaris.  It does a good job at what it tries to do.  The key is to recognize when it is the right text and when others might be the right text.  Right in the Introduction, the authors are clear that this is an orientation tour.  They say "We assume that you are a professional system administrator ... and that your learning style needs only an orientation and in indication of what should be learned first in order to take advantage of OpenSolaris."  That's a good summary of the main direction of the book.  And at this, it does a very nice job!

This means that Pro OpenSolaris is not an exhaustive reference manual on all of the features and nuances of OpenSolaris.  Instead, it's a broad overview of what OpenSolaris is, how it got to be what it is, what is key features and differentiators are, and why I might choose to use OpenSolaris instead of some other system.  That's important to realize from the outset.  If you are looking for the thousand-page reference guide, this is not the one.  If you have heard about OpenSolaris and want to explore a bit more deeply, to decide whether or not OpenSolaris is something that might help your business or might be a tool you can use, this is a great place to start.
Pro OpenSolaris spends a good bit of time on the preliminaries.  There is an extensive section on the philosophical differences between the approaches and requirements of different open source licenses and styles of licenses.  Pro OpenSolaris explains clearly why OpenSolaris uses the CDDL license as opposed to other licenses and how this fits in with the overall goal of the OpenSolaris project.

Pro OpenSolaris helps you get started, with a lengthy discussion of how to go about installing OpenSolaris either on  bare metal or in a virtual machine.

Compare this to the OpenSolaris Bible (Solter, Jelinek, & Miner), which really does aspire to be the thousand-page reference guide.  In the OpenSolaris Bible, licensing and installation are given only a short discussion, since they are not central to the book's focus.  Instead, the reader is directed to other places for that discussion.

But that's why it's important to have both books.  Pro OpenSolaris gives the tour of the important parts of the OpenSolaris operating system, how and why I might use them, and why they are important, but it does not go deeply into the details.  That's probably wise for an operating system that is still growing and changing substantially with each new release.

One thing that particularly interested me in Pro OpenSolaris was the fact that it includes large sections on both the OpenSolaris Webstack which includes IPS-packaged versions of the commonly used pieces of an AMP stack - notably, Apache, MySQL, PHP, lighttpd, nginx, Ruby, Rails, etc - all compiled and optimized for OpenSolaris and including key add-ons such as DTrace providers where applicable.  Pro OpenSolaris also has a nice, long chapter on NetBeans and its role as a part of an overall OpenSolaris development environment.

What's my take overall?  Pro OpenSolaris is a quick read that will give you a good understanding of what OpenSolaris is and why you would want to use it; what it's key features are and why they are important; and how you can use these to your best advantage.  There are lots of examples and technical details so that you can see that what Harry & Christine talk about is for real.  I would recommend this as part of your library.  But I would also recommend the OpenSolaris Bible.  The two complement each other nicely to complete the picture.

Saturday Jun 13, 2009

June Atlanta OpenSolaris User Group meeting

Had a great Atlanta OpenSolaris User Group meeting this month.  We did an installfest, an update from CommunityOne, and a recap of what's new in OpenSolaris 2009.06.  About twenty folks showed up and about half loaded their laptops with the new build while we were there.

We got some great feedback for upcoming topics and are pushing forward with that.  We also decided to move back to monthly meetings starting in August.  Our next meeting is August 11 when we will talk about COMSTAR.  We are also considering a change in venue back to the Sun office in Alpharetta.  Matrix Resources has been very gracious in allowing us to use their facility, but I always feel bad that they have to have someone stick around until late at night to babysit us.

We're going to try an experiment to see if we can't get the word out a little better about our merry band via social networks.  We've started by creating a Meetup group at  Hopefully this might generate more traffic to our meetings and help us find folks in the area.

Tuesday Jun 09, 2009

OpenSolaris User Group Leaders Bootcamp

The keepers of the OpenSolaris Community took advantage of having a number of the User Group leaders at the CommunityOne conference this last week to set aside a day for a User Group Leaders' Bootcamp.

What a great opportunity to get together in the same room with folks working to create and sustain OpenSolaris user groups around the world! We had folks from every continent - from Atlanta and Argentina, from Dallas and Serbia, from China and London, and on and on. Something like twenty-five to thirty of the OpenSolaris User Groups were represented.

The whole day was a great experience. It was great to see that as different as each group was, there were a lot of common themes for both successes and for challenges. And a lot of great ideas were shared as to how to boost participation, to improve meetings, and to improve the success of the groups overall. It will be exciting to hear a report back next year on how these ideas have played out.

Be sure to check out Jim Grisanzio's photos to see some of these characters and what all went on at CommunityOne and in the OSUG Bootcamp.

Jeff Jackson, Sr. VP for Solaris Engineering, started the day off with a greeting and charge to get the most out of this opportunity to meet with each other and with the OpenSolaris and Solaris headquarters teams.

Since the thing that brought this group together was a common focus on OpenSolaris User Groups and not the fact that we knew each other, we began the day with a bit of team-building exercise, courtesy of The Go Game. This is a cross between a scavenger hunt and an improvisational acting class. Teams criss-crossed downtown San Francisco trying to find and photograph places hinted at by clues on web pages. At some venues, the teams had to act out and film various tasks. For example, on the Yerba Buena lawn, the team had to engage in an impromptu Tai Chi exercise in order to find their long-lost phys ed teacher, Ms. Karpanski, who then led the team in creating a new exercise video. Once we all returned, all of our submissions were voted on by the team and a winning team chosen. Supposedly, we can see all these photos and videos. Haven't yet found out how. Perhaps, that's for the best!

In order for us to get to know each other's groups, each User Group prepared a poster describing the group, where we were located, what we do, what sort of members make up the group, and what makes us special. Many of these posters were really well done! We had a bit of a scavenger hunt for answers to questions found by careful reading of all of the posters. It was really cool to see what sorts of projects some of the groups had undertaken and how they were working with various university or other organizations.

But the main part of the day was spent in a big brainstorming session. We all identified our successes, our failures, our challenges, and ideas for the future. We put all of these on several hundred post-it notes and placed them on large posters. We grouped them by topic and then went through all of these. Even though this only had an hour on the agenda, it ended up taking the bulk of the day. Since this was the most important thing for us, we decided to rearrange the day to accommodate it.

From these sticky-notes, we found out that some of our groups were mostly focused on administrators but others had a large developer population. We all have some sort of issues around meeting locations - whether it's a matter of access in the evening, finding a convenient location, or providing network access and power. For most groups, having some sort of refreshments was important, though some groups felt like good refreshments attracted too many folks who just show up for the food.

There were a lot of good ideas around using a registration site to get access to the facility and order food, creating and using Facebook, LinkedIn, and Twitter, using IRC, interacting with the Sun Campus Ambassadors, using MeetUp to find new members. Many folks found it useful to video and make available presentations given at their meetings. Some groups (for example in Japan) have special sub-groups for beginners. Other groups are doing large-scale development projects, such as the Belenix project in Bangalore.

For me and the Atlanta OpenSolaris User Group, I have a lot of new ideas that I want to put out to our membership and our leaders - move back to monthly meetings, use a registration site, set up a presence on various social networks.

Many people said that folks come to the user groups in order to network and expand their circle of business acquaintances. In light of the current economic situation, with so many smart people out of work, I am thinking of promoting our group with some of the job networking groups around Atlanta. For example, my church, Roswell United Methodist Church, has one of the largest job networking groups in the Atlanta area. Every two weeks, nearly 500 people meet to network and help each other in their job search. Perhaps the many IT folks in this group might find this a way to get current and stay current in a whole new area.

At any rate, I am inspired to get things cranking at ATLOSUG!

After spending the afternoon working through our hundreds of sticky notes, the OpenSolaris Governing Board had a bit of a roundtable with us to talk about what they do and how we can work better together. It was really helpful for me to hear from them and to get to put faces to some of the names for the folks I did not already know.

We finished out the evening with a great dinner at the Crab House at Pier 39. From what I have seen, many of the photos from dinner and the meeting are already on Facebook, Flickr, and likely Jim Grisanzio, OpenSolaris Chief Photographer, was out in force with his camera!

Thanks so much to Teresa Giacomini, Lynn Rohrer, Dierdre Straughan, Jim Grisanzio, Tina Hartshorn, Wendy Ames, Kris Hake and everyone else who had a hand in organizing this event. Thanks to Jeff Jackson, Bill Franklin, Chris Armes, Dan Roberts and all the other HQ folks who took the time to come and listen and interact with the leaders of these groups. I know that I got a lot out of the meeting and am more eager than ever to promote and push forward with our user group.

CommunityOne Recap

Last week, I had the opportunity to attend CommunityOne West in San Francisco, along with a number of the other leaders of OpenSolaris User Groups. (I head up the Atlanta OpenSolaris User Group.) What a great meeting! Three days of OpenSolaris.

First off, I am sure that Teresa and the OpenSolaris team selected the Hotel Mosser because they knew it was a Solaris focused venue. As Dave Barry would say, I am not making this up! Even the toilet paper was Solaris-based. Bob Netherton and I were speculating that perhaps this was an example of Solaris Roll-Based Dump Management, new in OpenSolaris 2009.06.

CommunityOne Day One

Day One was a full day of OpenSolaris and related talks. The OpenSolaris teams maintained tracks around deploying OpenSolaris 2009.06 in the datacenter and around developing applications on OpenSolaris 2009.06. For the most part, I stuck with the operations-focused sessions, though I did step out into a few others. Some of the highlights included:

  • Peter Dennis and Brian Leonard's fun survey of what's new and exciting in OpenSolaris 2009.06. ATLOSUG folks should look for a reprise of this at our meeting on Tuesday.
  • Jerry Jelinek's discussion of the various virtualization techniques built into and onto OpenSolaris. This is a sort of talk that I give a lot. It was really helpful to hear how the folks in engineering approach this topic.
  • Scott Tracy & Dan Maslowski's COMSTAR discussion and demo. COMSTAR has been significantly expanded in recent builds, with more coolness still to come. I had not paid a lot of attention to this lately and this was a really helpful talk, especially since Teresa Giacomini had asked me to present this demo for the user group leaders on Wednesday. In any case, I have reproduced the iSCSI demo that Scott did using just VirtualBox, rather than requiring a server. Of course, the VB version is not something I would run my main storage server on. But it certainly is a great tool to understand the technology. I hope to have Ryan Matteson (Ryan, you volunteered!) give a talk at the ATLOSUG sometime soon.
  • I branched out of main OpenSolaris path to see a few other things on Day One, as well. Ken Pepple, Scott Mattoon, and John Stanford gave a good talk on Practical Cloud Patterns. They talked about some of the typical ways that people do provisioning, application deployment, and monitoring within the cloud.
  • Karsten Wade, "Community Gardener" at Red Hat, gave a talk called Participate or Die. This was about the importance of participating in the Open Source projects that are important to your business. He talked about understanding the difference in participating (perhaps, using open source code) and influencing (helping to guide the project). By paying more attention to those who actively participate, active members of the community enhance their status and become influencers of the direction for a project. And it is important that this happen - in successful projects, the roadmap is driven by the participants rather than handed down on high with the hope that people will line up behind it. Really, I think, his key message was that it is important not to just passively stand by when you care about or depend upon something, leaving its future in the hands of others.
  • Kevin Nilson and Michael Van Riper gave a great talk about building and maintaining a successful user group. This was built on their experiences with the Silicon Valley Java User Group and with the Google Technology User Group. They took a great approach by collecting videos from the leaders, hosts, and participants in these and other groups around the country. It was really helpful to hear people's perspectives on why they attend a group, why companies host group meetings, and why and how people continue to lead user groups. While a lot of what they had to say, and the successes that they have had, are a product of being in a very "target-rich environment" in Silicon Valley, it was interesting to see that some things are universal: a good location makes a lot of difference; having food matters. I got a lot of ideas from this and from the OpenSolaris User Group Bootcamp that I hope to get going in ATLOSUG.
  • OpenSolaris 2009.06 Launch Party finished out the evening. Dodgeball and the Extra Action Marching Band. I thought these folks were the hit of the evening. You get the best of marching bands, big drums, loud brass, but add to that folks flaying around, throwing themselves at the dodgeball court nets. Much more exciting than your regular marching band, even some of the cool ones around Atlanta in the Battle of the Bands!

CommunityOne Day Two

Day Two was filled with OpenSolaris Deep Dives. These were very helpful, not just in content, but in helping me to hone my own OpenSolaris presentations. For this day, I stuck close to the Deploying OpenSolaris track, having learned in graduate school that I am not a developer. This track included:

  • Chris Armes kicked off the day with a talk on deploying OpenSolaris in your Data Centre (as he spells it).
  • Becoming a ZFS Ninja, presented by Ben Rockwood. Ben is an early adopter and a production user of ZFS. This was a two-hour, fairly in-depth talk about ZFS and its capabilities.
  • Nick Solter, co-author of the OpenSolaris Bible, talked about OpenHA Cluster, newly released and available for OpenSolaris. With OpenHA, enterprise-level availability is not just available, but also supported. He talked about how the cluster works and about extensions to the OpenHA cluster beyond the capabilities of Solaris Cluster, based on OpenSolaris technologies. Some of these include the use of Crossbow VNICs for private interconnects. I am still thinking about the availability implications of this and am not sure it's an answer for all configurations. But it's cool that it's there!
  • Jerry Jelinek rounded out the day talking about Resource Management with Containers, a topic near and dear to my heart and one I end up presenting a lot.
We finished out Day Two with a reunion dinner of some of the old team at Bucca di Beppo. Around the table, we had Vasu Karunanithi, Dawit Bereket, Matt Ingenthron, Scott Dickson (me), Bob Netherton, Isaac Rosenfeld, and Kimberly Chang. It was great to get at least part of the old gang together and catch up.

Day Three was the OpenSolaris User Group Leaders Bootcamp. But that's for another post....

Monday May 25, 2009

EBC Goes On the Road

Sun's Executive Briefing Center is on the road this week.  We are visiting with customers in Cleveland, Columbus, and Detroit.  Looks like a busy schedule and I am looking forward to the trip.  I was asked to fill in at the Solaris Virtualization speaker for this trip.

We fly to Cleveland and fly home from Detroit.  Kate has arranged a bus to get us from Cleveland to Columbus to Detroit.  My wife calls it Geeks on a Bus and thought it sounded too scary to contemplate!

We'll be talking about Sun's Vision, Systems, Software, OpenStorage, Solaris, Virtualization of Systems, Desktop Virtualization, and Services to support all of these.  Hope to see many of you there.

Saturday May 09, 2009

Reminder of Jumpstart Survey

Last week, I blogged about a Jumpstart Survey.  I've gotten good comments and some responses to the survey.  It's been a week, but I want to collect some more responses before posting an analysis.  Take a look at my previous blog and fill out the survey or comment on the blog.  I will summarize and report in another week or so.

DTrace and Performance Tools Seminar This Week - Atlanta, Ft. Lauderdale, Tampa

I'm doing briefings on DTrace and Solaris Performance Tools this week in Atlanta, Ft. Lauderdale, and Tampa.  Click the links below to register if this is of interest and you can attend.  These are pretty much a 2 1/2 to 3 hour briefing that stays pretty technical with lots of examples.  

From the flyer:

Join us for our next Solaris 10 Technology Brief featuring DTrace.  DTrace, Solaris 10's powerful new framework for system observability, helps system administrators, capacity planners, and application developers improve performance and problem resolution. 

DATE: May 12, 2009
LOCATION: Classroom Resource Group, Atlanta
TIME: 8:30 AM Registration, 9:00 am - 12:00 pm Session

HOLLYWOOD, FL - May 13, 2009
LOCATION: Seminole Hardrock Hotel
TIME: 8:30 AM Registration, 9:00 am - 12:00 pm Session

TAMPA, FL - May 14, 2009
LOCATION:  University of South Florida
TIME: 8:30 AM Registration, 9:00 am - 12:00 pm Session

What You'll Learn?
You can't improve what you can't see and DTrace provides safe, production-quality, top to bottom observability - from the PHP application scripts down to the device drivers - without modifying applications or the system.  This seminar will introduce DTrace and the DTrace Toolkit as key parts of an overall Solaris performance and observability toolkit. 

8:30 AM To 9:00 AM      Check In, Continental Breakfast
9:00 AM To 9:10 AM      Welcome
9:10 AM To 10:15 AM     Dtrace
10:15 AM To 10:30 AM    BREAK
10:30 AM To 11:30 AM    Dtrace Continued
11:30 AM To 12:00 PM    Wrap Up, Q&A, Evaluations

We look forward to seeing you at one of these upcoming Solaris 10 Dtrace sessions! 

Wednesday Apr 29, 2009

How do you use Jumpstart?

Jumpstart is the technology within Solaris that allows a system to be remotely installed across a network. This feature has been in the OS for a long, long time, dating to the start of Solaris 2.0, I believe. With Jumpstart, the system to be installed, the Jumpstart client, contacts a Jumpstart server to be installed across the network. This is a huge simplification, since there are nuances to how to set all of this up. Your best bet is to check the Solaris 10 Installation Guide: Network Based Installations and the Solaris 10 Installation Guide: Custom Jumpstart and Advanced Installations.

Jumpstart makes use of rules to decide how to install a particular system, based on its architecture, network connectivity, hostname, disk and memory capacity, or any of a number of other parameters. The rules select a profile that determines what will be installed on that system and where it will come from. Scripts can be inserted before and after the installation for further customization. To help manage the profiles and post-installation customization, Mike Ramchand has produced a fabulous tool, the Jumpstart Enterprise Toolkit (JET).

My Questions for You

As a long time Solaris admin, I have been a fan of Jumpstart for years and years. As an SE visiting many cool companies, I have seen people do really interesting things with Jumptstart. I want to capture how people use Jumpstart in the real world - not just the world of those who create the product. I know that people come up with new and unique ways of using the tools that we create in ways we would never imagine.

For example, I once installed 600 systems with SunOS 4.1.4 in less than a week using Jumpstart - remember that Jumpstart never supported SunOS 4.1.4.

But, I am not just looking for the weird stories. I want to know what Jumpstart features you use. I'll follow this up with extra, detailed questions around Jumpstart Flash, WAN Boot, DHCP vs. RARP. But I want to start with just some basics about Jumpstart.

Lacking a polling mechanism here at, you can just enter your responses as a comment. Or you can answer these questions at SurveyMonkey here. Or drop me a note at scott.dickson at

  1. How do you install Solaris systems in your environment?
    1. I use Jumpstart
    2. I use DVD or CD media
    3. I do something else - please tell me about it
  2. Do you have a system for automating your jumpstart configurations?
    1. Yes, we have written our own
    2. Yes, we use JET
    3. Yes, we use xVM OpCenter
    4. No, we do interactive installations via Jumpstart. We just use Jumpstart to get the bits to the client.
  3. What system architectures do you support with Jumpstart?
    1. SPARC
    2. x86
  4. Do you use a sysidcfg file to answer the system identification questions - hostname, network, IP address, naming service, etc?
    1. No, I answer these interactively
    2. Yes, I hand-craft a sysidcfg file
    3. Yes, but it is created via the Jumpstart automation tools
  5. Do you use WANboot? I'll follow up with more questions on this at a later time.
    1. What's Wanboot?
    2. I have heard of it, but have never used it
    3. We rely on Wanboot
  6. Do you use Jumpstart Flash? More questions on this later, too
    1. Never heard of it
    2. We sometimes use Flash
    3. We live and breathe Flash
  7. What sort of rules do you include in your rules file?
    1. We do interactive installations and don't use a rules file
    2. We use the rules files generated by our automation tools, like JET
    3. We have a common rules file for all Jumpstarts based on hostname
    4. We use not only hostnames but also other parameters to determine which rule to use for installation
  8. Do you use begin scripts?
    1. No
    2. We use them to create derived profiles for installation
    3. We use them some other way
  9. Do you use finish scripts
    1. No
    2. We use the finish scripts created by our automation
    3. We use finish scripts to do some minor cleanup
    4. We do extensive post-installation customization via finish scripts. If so, please tell me about it.
  10. Do you customize the list of packages to be installed via Jumpstart?
    1. No
    2. Somewhat
    3. Not only do we customize the list of packages, but we create custom packages for our installation

Monday Apr 27, 2009

Just got my copy of Pro OpenSolaris

Just got my copy of Pro OpenSolaris by Harry Foxwell and Christine Tran in the mail today!  Can't wait to get a good look and post a review.  I wonder if I can get the authors to inscribe it to me!  

Also got a copy of OpenSolaris Bible by Nick Solter, Gerry Jelinek, and Dave Miner.  Looking forward into cracking into it as well.

Will post reviews shortly.

Tuesday Dec 23, 2008

A Much Better Way to use Flash and ZFS Boot

A Different Approach

A week or so ago, I wrote about a way to get around the current limitation of mixing flash and ZFS root in Solaris 10 10/08. Well, here's a much better approach.

I was visiting with a customer last week and they were very excited to move forward quickly with ZFS boot in their Solaris 10 environment, even to the point of using this as a reason to encourage people to upgrade. However, when they realized that it was impossible to use Flash with Jumpstart and ZFS boot, they were disappointed. Their entire deployment infrastructure is built around using not just Flash, but Secure WANboot. This means that they have no alternative to Flash; the images deployed via Secure WANBoot are always flash archives. So, what to do?

It occurred to me that in general, the upgrade procedure from a pre-10/08 update of Solaris 10 to Solaris 10 10/08 with a ZFS root disk is a two-step process. First, you have to upgrade to Solaris 10 10/08 on UFS and then use lucreate to copy that environment to a new ZFS ABE. Why not use this approach in Jumpstart?

Turns out that it works quite nicely. This is a framework for how to do that. You likely will want to expand on it, since one thing this does not do is give you any indication of progress once it starts the conversion. Here's the general approach:

  • Create your flash archive for Solaris 10 10/08 as you usually would. Make sure you include all the appropriate LiveUpgrade patches in the flash archive.
  • Use Jumpstart to deploy this flash archive to one disk in the target system.
  • Use a finish script to add a conversion program to run when the system reboots for the first time. It is necessary to make this script run once the system has rebooted so that the LU commands run within the context of the fully built new system.

Details of this approach

Our goal when complete is to have the flash archive installed as it always has been, but to have it running from a ZFS root pool, preferably a mirrored ZFS pool. The conversion script requires two phases to complete this conversion. The first phase creates the ZFS boot environment and the second phase mirrors the root pool. The following in this example, our flash archive is called s10u6s.flar. We will install the initial flash archive onto the disk c0t1d0 and built our initial root pool on c0t0d0.

Here is the Jumpstart profile used in this example:

install_type    flash_install
archive_location nfs nfsserver:/export/solaris/Solaris10/flash/s10u6s.flar
partitioning    explicit
filesys         c0t1d0s1        1024    swap
filesys         c0t1d0s0        free    /

We specify a simple finish script for this system to copy our conversion script into place:

cp ${SI_CONFIG_DIR}/S99xlu-phase1 /a/etc/rc2.d/S99xlu-phase1

You see what we have done: We put a new script into place to run at the end of rc2 during the first boot. We name the script so that it is the last thing to run. The x in the name makes sure that this will run after other S99 scripts that might be in place. As it turns out, the luactivate that we will do puts its own S99 script in place, and we want to come after that. Naming ours S99x makes it happen later in the boot sequence.

So, what does this magic conversion script do? Let me outline it for you:

  • Create a new ZFS pool that will become our root pool
  • Create a new boot environment in that pool using lucreate
  • Activate the new boot environment
  • Add the script to be run during the second phase of the conversion
  • Clean up a bit and reboot

That's Phase 1. Phase 2 has its own script to be run at the same time that finishes the mirroring of the root pool. If you are satisfied with a non-mirrored pool, you can stop here and leave phase 2 out. Or you might prefer to make this step a manual process once the system is built. But, here's what happens in Phase 2:

  • Delete the old boot environment
  • Add a boot block to the disk we just freed. This example is SPARC, so use installboot. For x86, you would do something similar with installgrub.
  • Attach the disk we freed from the old boot environment as a mirror of the device used to build the new root zpool.
  • Clean up and reboot.

I have been thinking it might be worthwhile to add a third phase to start a zpool scrub, which will force the newly attached drive to be resilvered when it reboots. The first time something goes to use this drive, it will notice that it has not been synced to the master drive and will resilver it, so this is sort of optional.

The reason we add bootability explicitly to this drive is because currently, when a mirror is attached to a root zpool, a boot block is not automatically installed. If the master drive were to fail and you were left with only the mirror, this would leave the system unbootable. By adding a boot block to it, you can boot from either drive.

So, here's my simple little script that got installed as /etc/rc2.d/S99xlu-phase1. Just to make the code a little easier for me to follow, I first create the script for phase 2, then do the work of phase 1.

cat > /etc/rc2.d/S99xlu-phase2 << EOF
ludelete -n s10u6-ufs
installboot -F zfs /usr/platform/`uname -i`/lib/fs/zfs/bootblk /dev/rdsk/c0t1d0s0
zpool attach -f rpool c0t0d0s0 c0t1d0s0
rm /etc/rc2.d/S99xlu-phase2
init 6
dumpadm -d swap
zpool create -f rpool c0t0d0s0
lucreate -c s10u6-ufs -n s10u6 -p rpool
luactivate -n s10u6
rm /etc/rc2.d/S99xlu-phase1
init 6

I think that this is a much better approach than the one I offered before, using ZFS send. This approach uses standard tools to create the new environment and it allows you to continue to use Flash as a way to deploy archives. The dependency is that you must have two drives on the target system. I think that's not going to be a hardship, since most folks will use two drives anyway. You will have to keep then as separate drives rather than using hardware mirroring. The underlying assumption is that you previously used SVM or VxVM to mirror those drives.

So, what do you think? Better? Is this helpful? Hopefully, this is a little Christmas present for someone! Merry Christmas and Happy New Year!

Friday Dec 05, 2008

Flashless System Cloning with ZFS

Ancient History

Gather round kiddies and let Grandpa tell you a tale of how we used to to clone systems before we had Jumpstart and Flash, when we had to carry water in leaky buckets 3 miles through snow up to our knees, uphill both ways.

Long ago, a customer of mine needed to deploy 600(!) SPARCstation 5 desktops all running SunOS 4.1.4. Even then, this was an old operating system, since Solaris 2.6 had recently been released. But it was what their application required. And we only had a few days to build and deploy these systems.

Remember that Jumpstart did not exist for SunOS 4.1.4, Flash did not exist for Solaris 2.6. So, our approach was to build a system, a golden image, the way we wanted to be deployed and then use ufsdump to save the contents of the filesystems. Then, we were able to use Jumpstart from a Solaris 2.6 server to boot each of these workstations. Instead of having a Jumpstart profile, we only used a finish script that partitioned the disks and restored the ufsdump images. So Jumpstart just provided us clean way to boot these systems and apply the scripts we wanted to them.

Solaris 10 10/08, ZFS, Jumpstart and Flash

Now, we have a bit of a similar situation. Solaris 10 10/08 introduces ZFS boot to Solaris, something that many of my customers have been anxiously awaiting for some time. A system can be deployed using Jumpstart and the ZFS boot environment created as a part of the Jumpstart process.

But. There's always a but, isn't there.

But, at present, Flash archives are not supported (and in fact do not work) as a way to install into a ZFS boot environment, either via Jumpstart or via Live Upgrade. Turns out, they use the same mechanism under the covers for this. This is CR 6690473.

So, how can I continue to use Jumpstart to deploy systems, and continue to use something akin to Flash archives to speed and simplify the process?

Turns out the lessons we learned years ago can be used, more or less. Combine the idea of the ufsdump with some of the ideas that Bob Netherton recently blogged about (Solaris and OpenSolaris coexistence in the same root zpool), and you can get to a workaround that might be useful enough to get you through until Flash really is supported with ZFS root.

Build a "Golden Image" System

The first step, as with Flash, is to construct a system that you want to replicate. The caveat here is that you use ZFS for the root of this system. For this example, I have left /var as part of the root filesystem rather than a separate dataset, though this process could certainly be tweaked to accommodate a separate /var.

Once the system to be cloned has been built, you save an image of the system. Rather than using flarcreate, you will create a ZFS send stream and capture this in a file. Then move that file to the jumpstart server, just as you would with a flash archive.

In this example, the ZFS bootfs has the default name - rpool/ROOT/s10s_u6wos_07.

golden# zfs snapshot rpool/ROOT/s10s_u6wos_07@flar
golden# zfs send -v rpool/ROOT/s10s_u6wos_07@flar > s10s_u6wos_07_flar.zfs
golden# scp s10s_u6wos_07_flar.zfs js-server:/flashdirectory

How do I get this on my new server?

Now, we have to figure out how to have this ZFS send stream restored on the new clone systems. We would like to take advantage of the fact that Jumpstart will create the root pool for us, along with the dump and swap volumes, and will set up all of the needed bits for the booting from ZFS. So, let's install the minimum Solaris set of packages just to get these side effects.

Then, we will use Jumpstart finish scripts to create a fresh ZFS dataset and restore our saved image into it. Since this new dataset will contain the old identity of the original system, we have to reset our system identity. But once we do that, we are good to go.

So, set up the cloned system as you would for a hands-free jumpstart. Be sure to specify the sysid_config and install_config bits in the /etc/bootparams. The manual Solaris 10 10/08 Installation Guide: Custom JumpStart and Advanced Installations covers how to do this. We add to the rules file a finish script (I called mine loadzfs in this case) that will do the heavy lifting. Once Jumpstart installs Solaris according to the profile provided, it then runs the finish script to finish up the installation.

Here is the Jumpstart profile I used. This is a basic profile that installs the base, required Solaris packages into a ZFS pool mirrored across two drives.

install_type    initial_install
cluster         SUNWCreq
system_type     standalone
pool            rpool auto auto auto mirror c0t0d0s0 c0t1d0s0
bootenv         installbe bename s10u6_req

The finish script is a little more interesting since it has to create the new ZFS dataset, set the right properties, fill it up, reset the identity, etc. Below is the finish script that I used.

#!/bin/sh -x

# TBOOTFS is a temporary dataset used to receive the stream

# NBOOTFS is the final name for the new ZFS dataset


# Mount directory where archive (send stream) exists
mkdir ${MNT}
mount -o ro -F nfs ${NFS} ${MNT}

# Create file system to receive ZFS send stream &
# receive it.  This creates a new ZFS snapshot that
# needs to be promoted into a new filesystem
zfs create ${TBOOTFS}
zfs set canmount=noauto ${TBOOTFS}
zfs set compression=on ${TBOOTFS}
zfs receive -vF ${TBOOTFS} < ${MNT}/${FLAR}

# Create a writeable filesystem from the received snapshot
zfs clone ${TBOOTFS}@flar ${NBOOTFS}

# Make the new filesystem the top of the stack so it is not dependent
# on other filesystems or snapshots
zfs promote ${NBOOTFS}

# Don't automatically mount this new dataset, but allow it to be mounted
# so we can finalize our changes.
zfs set canmount=noauto ${NBOOTFS}
zfs set mountpoint=${MNT} ${NBOOTFS}

# Mount newly created replica filesystem and set up for
# sysidtool.  Remove old identity and provide new identity
umount ${MNT}
zfs mount ${NBOOTFS}

# This section essentially forces sysidtool to reset system identity at
# the next boot.
touch /a/${MNT}/reconfigure
touch /a/${MNT}/etc/.UNCONFIGURED
rm /a/${MNT}/etc/nodename
rm /a/${MNT}/etc/.sysIDtool.state
cp ${SI_CONFIG_DIR}/sysidcfg /a/${MNT}/etc/sysidcfg

# Now that we have finished tweaking things, unmount the new filesystem
# and make it ready to become the new root.
zfs umount ${NBOOTFS}
zfs set mountpoint=/ ${NBOOTFS}
zpool set bootfs=${NBOOTFS} rpool

# Get rid of the leftovers
zfs destroy ${TBOOTFS}
zfs destroy ${NBOOTFS}@flar

When we jumpstart the system, Solaris is installed, but it really isn't used. Then, we load from the send stream a whole new OS dataset, make it bootable, set our identity in it, and use it. When the system is booted, Jumpstart still takes care of updating the boot archives in the new bootfs.

On the whole, this is a lot more work than Flash, and is really not as flexible or as complete. But hopefully, until Flash is supported with a ZFS root and Jumpstart, this might at least give you an idea of how you can replicate systems and do installations that do not have to revert back to package-based installation.

Many people use Flash as a form of disaster recover. I think that this same approach might be used there as well. Still not as clean or complete as Flash, but it might work in a pinch.

So, what do you think? I would love to hear comments on this as a stop-gap approach.


Interesting bits about Solaris, Virtualization, and Ops Center


« October 2015