Sunday May 11, 2008

A field guide to Zones in OpenSolaris 2008.05

I have had a busy couple of months. After wrapping up work on Solaris 8 Containers (my teammate Steve ran the Solaris 9 Containers effort), I turned my attention to helping the Image Packaging team (rogue's gallery) with their efforts to get OpenSolaris 2008.05 out the door.

Among other things, I have been working hard to provide a basic level of zones functionality for OpenSolaris 2008.05. I wish I could have gotten more done, but today I want to cover what does and does not work. I want to be clear that Zones support in OpenSolaris 2008.05 and beyond will evolve substantially. To start, here's an example of configuring a zone on 2008.05:

# zonecfg -z donutshop
donutshop: No such zone configured
Use 'create' to begin configuring a new zone.
zonecfg:donutshop> create
zonecfg:donutshop> set zonepath=/zones/donutshop
zonecfg:donutshop> add net
zonecfg:donutshop:net> set physical=e1000g0
zonecfg:donutshop:net> set address=
zonecfg:donutshop:net> end
zonecfg:donutshop> add capped-cpu
zonecfg:donutshop:capped-cpu> set ncpus=1.5
zonecfg:donutshop:capped-cpu> end
zonecfg:donutshop> commit
zonecfg:donutshop> exit

# zoneadm list -vc
  ID NAME             STATUS     PATH                           BRAND    IP    
   0 global           running    /                              native   shared
   - donutshop        configured /zones/donutshop               ipkg     shared

If you're familiar with deploying zones, you can see that there is a lot which is familiar here.  But you can also see that donutshop isn't, as you would normally expect, using the native brand. Here we're using the ipkg brand. The reason is that commands like zoneadm and zonecfg have some special behaviors for native zones which presume that you're using a SystemV Packaging based OS. In the future, we'll make native less magical, and the zones you install will be branded native as you would expect. Jerry is actually working on that right now. Note also that I used the relatively new CPU Caps resource management feature to put some resource limits on the zone-- it's easy to do!. Now let's install the zone:

# zoneadm -z donutshop install
A ZFS file system has been created for this zone.

      Image: Preparing at /zones/donutshop/root ... done.
    Catalog: Retrieving from ... done.
 Installing: (output follows)
DOWNLOAD                                    PKGS       FILES     XFER (MB)
Completed                                  49/49   7634/7634 206.85/206.85 

PHASE                                        ACTIONS
Install Phase                            12602/12602 

       Note: Man pages can be obtained by installing SUNWman
Postinstall: Copying SMF seed repository ... done.
Postinstall: Working around
Postinstall: Working around
       Done: Installation completed in 208.535 seconds.

 Next Steps: Boot the zone, then log into the zone console
             (zlogin -C) to complete the configuration process

There are a couple of things to notice, both in the configuration and in the install:
Non-global zones are not sparse, for now
Zones are said to be sparse if /usr, /lib, /platform, /sbin and optionally /opt are looped back, read-only, from the global zone. This allows a substantial disk space savings in the traditional zones model (which is that the zones have the same software installed as the global zone).

Whether we will ultimately choose to implement sparse zones, or not, is an open question. I plan to bring this question to the Zones community, and to some key customers, in the near future.

Zones are installed from a network repository
Unlike with traditional zones, which are sourced by copying bits from the global zone, here we simply spool the contents from the network repository. The upside is that this was easy to implement; the downside is that you must be connected to the network to deploy a zone. Getting the bits from the global zone is still desirable, but we don't have that implemented yet.

By default, zones are installed using the system's preferred authority (use pkg authority to see what that is set to). The preferred authority is the propagated into the zone. If you want to override that, you can specify a different repository using the new -a argument to zoneadm install:

# zoneadm -z donutshop install -a ipkg=http://ipkg.eng:80
Non-global zones are small
Traditionally, zones are installed with all of the same software that the global zone contains. In the case of "whole root" zones (the opposite of sparse), this means that non-global zones are about the same size as global zones-- easily at least a gigabyte in size.

Since we're not supporting sparse zones, I decided to pare down the install as much as I could, within reason: the default zone installation is just 206MB, and has a decent set of basic tools. But you have to add other stuff you might need. And we can even do more: some package refactoring should yield another 30-40MB of savings, as packagings like Tcl and Tk should not be needed by default. For example, Tk (5MB) gets dragged in as a dependency of python (the packaging system is written in python); Tcl (another 5MB) is dragged in by Tk. Tk then pulls in parts of X11. Smallness yields speed: when connected to a fast package repository server, I can install a zone in just 24 seconds!.

I'm really curious to know what reaction people will have to such minimalist environments. What do you think?

Once you start thinking about such small environments, some new concerns surface: vim (which in 2008.05 we're using as our vi implementation) is 17MB, or almost 9% of the disk space used by the zone!

Non-global zones are independent of the global zone
Because ipkg zones are branded, they exist independently of the global zone. This means that if you do an image-update of the global zone, you'll also need to update each of your zones, and ensure that they are kept in sync. For now this is a manual process-- in the future we'll make it less so.
ZFS support notes
OpenSolaris 2008.05 makes extensive use of ZFS, and enforces ZFS as the root filesystem. Additional filesystems are created for /export, /export/home and /opt. Non-global zones don't yet follow this convention. Additionally, I have sometimes seen our auto-zfs file system creation fail to work (you can see it working properly in the example above). We haven't yet tracked down that problem-- my suspicion is that there is a bad interaction with the 2008.05 filesystem layout's use of ZFS legacy mounts.

As a result of this (and for other reasons too, probably), zones don't participate in the boot-environment subsystem. This means that you won't get an automatic snapshot when you image-update your zone or install packages. That means no automatic rollback for zones. Again, this is something we will endeavor to fix.

Beware of bug 6684810
You may see a message like the following when you boot your zone:
zoneadm: zone 'donutshop': Unable to set route for interface lo0 to éÞùÞ$
zoneadm: zone 'donutshop': 
This is a known bug (6684810); fortunately the message is harmless.

In the next month, I hope to: take a vacation, launch a discussion with our community about sparse root zones, and to make a solid plan for the overall support of zones on OpenSolaris. I've got a lot to do, but that's easily balanced by the fact that I've been having a blast working on this project...

Wednesday Apr 09, 2008

Solaris 8 Containers, Solaris 9 Containers

In the flurry of today's launch event, we've launched Solaris 8 Containers (which was previously called Solaris 8 Migration Assistant, or Project Etude).  Here is the datasheet about the product.  Even better: We've also announced that Solaris 9 Containers will be available soon!  Jerry and Steve on the containers team have been toiling away like mad to make this possible.

Why the rename?  Well, for one thing, it's easier to say :)  It also signals a shift in the way Sun will offer this technology to customers:

  • Professional Services Engagement: No longer required, now recommend.  It's also simpler to order a SunPS engagement for this product.
  • Partners: (Some of) Sun's partners are now ready to deliver this solution to customers.  Talk to your partner for more information.
  • Right to Use: Previously, we provided a 90 day evaluation RTU.  Now, the RTU is unlimited.  However, you must still pay for support.
I invite you to download Solaris 8 Containers, and give it a try! And as always, talk to your local SE or Sales Rep if you're interested in obtaining support licenses (or any kind of help with) your Solaris 8 (or 9) containers.

Here's Joost, our fearless marketing leader, with an informative talk about the why and how of Solaris 8 Containers. 

Tuesday Jan 29, 2008

The joy of 'zpool scrub'

Some days, when it's cold and you're not feeling very motivated (like me, today), it's nice to do a zpool scrub on the machines you manage, and then once it's done:

$ zpool status
  pool: aux
 state: ONLINE
 scrub: scrub completed with 0 errors on Tue Jan 29 15:52:38 2008

        NAME          STATE     READ WRITE CKSUM
        aux           ONLINE       0     0     0
          mirror      ONLINE       0     0     0
            c1t0d0s7  ONLINE       0     0     0
            c1t1d0s7  ONLINE       0     0     0

errors: No known data errors
And then relax, knowing that your data is safe.

Tuesday Dec 04, 2007

Podcast on Etude!

Hal Stern invited Joost and I to sit down for a chat about the Solaris 8 Migration Assistant for an episode of his podcast Innovating@Sun.  This was my first podcast experience and it was a lot more fun than I had expected.   Check it out here, or you can download the MP3 audio directly.

Monday Oct 22, 2007

Solaris 8 Migration Assistant 1.0 (Project Etude) Ships!

I'm very happy to announce that the Solaris 8 Migration Assistant 1.0 (also known as Project Etude) has shipped!  The product is now officially available from Sun.  Some key links:

In a nutshell, the product provides a migration solution from Solaris 8 to Solaris 10 by creating a bridge between the two operating systems.  You can perform P2V (physical-to-virtual) conversions of existing Solaris 8 systems, and drop those into Solaris 8 containers running on your Solaris 10 host.

Above all, I want to take another chance to thank the many people who worked extremely hard for the past eight months to make this project a reality.  It was sprint from start to finish, which is certainly tough on everyone involved.  But I was amazed and pleased that almost universally, people helped us out with dedication and a good sense of humor.  Thank you very much.

Wednesday Sep 26, 2007

Etude Progress Update

The whole Etude engineering team (except, unfortunately, for Penny) gathered in the Bay Area last week.  We made a ton of progress and we're on our final round of bug fixes.  It has been fun (if a bit nerve wracking) to watch the bug counts drop day by day.

One problem which has been challenging with this project is that we have a number of differently delivered software parts, all of which must undergo some change:

  • The solaris8 brand itself is a piece of software which bridges the gap between Solaris 8 and Solaris 10.  This is something you must add to an Solaris 10 8/07 (also called S10U4) system.  When it is finished, it will be delivered as two packages, SUNWs8brandu and SUNWs8brandr, plus an optional "demo" package.
  • We've got a very few Solaris 8 patches which we require (currently I think there are 6 of them).  Some of these (for example, a fix to the linker, and a fix to ptree(1)) we had to engineer ourselves. And some have been available for a long time and are probably already patched onto most S8 systems out there.  All of these are (or will be) available via the normal patch mechanisms.
  • We needed to add some enhancements to Solaris 10 in order for BrandZ to work solidly on SPARC systems.  While these changes will be automatically included in the next update of Solaris 10, for Solaris 10 8/07 you need to add a patch-- the kernel jumbo patch, in fact (as I mentioned before, this patch is not out just yet).

For our Beta release, we were able to supply workarounds for some of these issues, but for our official release, we need every 't' crossed, and every 'i' dotted. So last week we finally had all of those pieces available (internally) in at least a preliminary form.  Everyone has been busy testing the whole works.  For example, I've done a dry run on a T1000, which looks like this:

  • Install S10 8/07 onto the system (or into an LDOM (logical domain) on the system)
  • Bring system to single user mode
  • Add kernel patch using patchadd
  • Reboot system (or LDOM)
  • Add SUNWs8brandr and SUNWs8brandu packages to the system
  • Configure a Solaris 8 container and install it from an existing system archive
    • This will auto-apply any required Solaris 8 patches to the system
  • Boot S8 zone, and enjoy!

It's nice to see the pieces coming together...

Wednesday Sep 05, 2007

Project Etude, Revealed

Well, it has been a while since I last wrote anything here.  I have been working on a team developing a new project, which we named Etude.  Marc Hamilton, VP of Solaris Marketing, has written about it here.

[As an aside, it's weird to have a code name you selected cross the lips of senior executives several months later] 

In a nutshell, we've built a Solaris Container (or Zone) which is capable of running the Solaris 8 user environment.  We have also created a capability to perform P2V (or Physical-to-Virtual) transformation of existing Solaris 8 systems into containers running on a Solaris 10 host.  This is an enabler for rapid migration of legacy Solaris 8 environments onto modern, environmentally friendly, cost effective hardware.  And onto Solaris 10.  The idea is to break up the upgrade tasks into chunks, allowing the hardware and OS to be upgraded, while continuing to run legacy environments.  Next, the legacy environments can be used until they are retired, or redeployed into Solaris 10 containers, or into logical domains.

I will write more about the why and the how of this project in a future blog entry.  (And Marc does a good job of explaining some of the why in his blog, so go there for more background).  But for now, I want to expand upon the what.  The notion of a Solaris 8 container was not a new one when we began to look at this problem early this year.  But with the completion of the BrandZ project, we had the tools in hand to make a serious attempt to realize this idea.  BrandZ was originally developed so that we could have a "Linux Zone" but the core "brand" framework is really very flexible (and is now more so, thanks to our work).  It allows the development of a variety of OS personalities atop Solaris 10.

So who built Etude? I'm very lucky to be leading a group of seasoned and incredibly talented engineers: Bill, Ed, Jerry, and Steve (no blog), and Penny on documentation.  We worked at something of a breakneck pace to assemble a prototype, which we had in hand by late April. Since then, we have worked to convert the prototype into a production quality offering.  I'll try to say more about that later.  Along the way, we've had a lot of help from many corners of the company-- people too numerous to list.  I especially want to thank Bill Franklin, Jerri-Ann, Joost, Susan, Allan, Richard, Tim and Liane.

Ok, so I'm anxious to show off our creation. First, let's archive a Solaris 8 system using the Flash Archiving tools. If your S8 system is patched up to date, you already have these tools installed. Alternatively, you could use CPIO or ufsdump or some other tool to create a suitable archive:

s8-system # uname -a
SunOS s8-system 5.8 Generic_108528-29 sun4u sparc SUNW,UltraAX-i2
s8-system # flarcreate -S -n s8-system /net/s10system/export/s8-system.flar
Determining which filesystems will be included in the archive...
Creating the archive...
Archive creation complete.
s8-system #

[Note to our beta customers!  In the beta version of Etude, support for Flash Archives is not present, so if you use this blog entry as a substitute for reading the documentation, you will be disappointed.  Please ensure that you carefully read the supplied instructions] 

In the above example, I've simply used NFS to place the flash archive onto my Solaris 10 system, but you could use any method for moving files around. Now, we'll head over to the Solaris 10 system and create a container suitable for use as a Solaris 8 environment...

s10-system # zonecfg -z s8-system
s8-system: No such zone configured
Use 'create' to begin configuring a new zone.
zonecfg:s8-system> create -t SUNWsolaris8 zonecfg:s8-system> set zonepath=/aux/zones/s8-system zonecfg:s8-system> add net zonecfg:s8-system:net> set address= zonecfg:s8-system:net> set physical=e1000g0 zonecfg:s8-system:net> end zonecfg:s8-system> add dedicated-cpu zonecfg:s8-system:dedicated-cpu> set ncpus=2 zonecfg:s8-system:dedicated-cpu> end zonecfg:s8-system> commit zonecfg:s8-system> exit s10-system # s10-system # zoneadm list -vc ID NAME STATUS PATH BRAND IP 0 global running / native shared - s8-system configured /aux/zones/s8-system solaris8 shared

The last part (setting ncpus=2) is not strictly necessary, but is a good example of how easy it is to allocate CPU resources to containers starting in Solaris 10 8/2007. Jerry (who led that work) and Jeff Victor have written more about this capability here and here.  (By the way, Solaris 10, 8/07 was launched yesterday, and includes BrandZ and many other enhancements to the containers infrastructure.)

Now that we've configured the zone, we need to install it.  We'll use the flash archive we made to do so:

s10-system # zoneadm -z s8-system install -a /aux/flar/s8-system.flar
      Log File: /var/tmp/s8-system.install.104490.log
        Source: /aux/flar/s8-system.flar
    Installing: This may take several minutes...
Postprocessing: This may take several minutes...

        Result: Installation completed successfully.
      Log File: /aux/zones/s8-system/root/var/log/s8-system.install.104490.log
s10-system # 

That's it! We've now installed a Solaris 8 container! Let's boot it up, and log onto the console, and see what happens:

s10-system # zoneadm -z s8-system boot; zlogin -C s8-system
[Connected to zone 's8-system' console]

SunOS Release 5.8 Version Generic_Virtual 64-bit
Copyright 1983-2000 Sun Microsystems, Inc. All rights reserved

Hostname: s8-system
The system is coming up. Please wait.
NIS domainname is
starting rpc services: rpcbind keyserv ypbind done.
syslog service starting.
Print services started.
The system is ready.

s8-system console login: root
Sep 5 17:55:52 s8-system login: ROOT LOGIN /dev/console
Last login: Wed Sep 5 13:11:37 on console
Sun Microsystems Inc. SunOS 5.8 Generic Patch February 2004
s8-system # uname -a SunOS s8-system 5.8 Generic_Virtual sun4v sparc SUNW,Sun-Fire-T200 s8-system # psrinfo 0 on-line since 08/29/07 13:32:21 1 on-line since 08/29/07 13:32:23

We could have also performed a sys-unconfig(1m) on the Solaris 8 image during the container installation (by passing -u to the installer). In that case, we would have been asked to answer the usual system identity questions. This zone can be cloned, moved around, renamed, attached/detached and manipulated like any other. You can even install it atop a ZFS filesystem, and from the global zone, use DTrace against the applications running inside of it.

Most importantly, you can run real workloads. Building on the Solaris Application Binary Compatibility Guarantee, we have done the difficult work to make sure that your applications will work successfully inside of these containers. This includes software such as databases, application servers, Java programs, web servers, and more. We've also utilized the amazing array of test suites we have available in-house.  You can even do software development inside of Solaris 8 Containers, building binaries which will run on any Solaris 8 (and 9, and 10) systems.

You can also patch these containers using the same tools you use to patch Solaris 8 (and using the same patches). We've even pulled down hundreds of Blastwave packages to test gnome, KDE, and lots of other applications available there. You can run your favorite ancient desktop environment: Solaris 8 was the last version of Solaris to include OpenWindows. So here's the obligatory screenshot from the wayback machine: Netscape 4.x, Java 1.3 and some other miscellaneous stuff all running atop the OpenWindows environment.  Running on emulated Solaris 8, on Solaris 10, on a T2000. A weird blast from the past mixed up with the present day.  (It's interesting to think that today's whizzy desktop applications will in a few years look just as antiquated as OpenWindows and Netscape 4.x do today).  Click the image to zoom in.


That's all I have time for tonight.

If you're interested in hearing more about this technology, talk to your local Sun Representative. Or if that fails for any reason, send me an email (my-firstname dot my-lastname will do the trick) and I'll try to hook you up with someone helpful (please put "Etude" in the subject line).


Kernel Gardening.


« July 2016