Wednesday Mar 30, 2011

Solaris Online Forum

I'll be participating in an online forum on April 14th, where we'll be talking about Solaris 11. If you're interested in listening in, there's more info at the registration site.

Tuesday Oct 13, 2009

testing ON changes with OpenSolaris installers

Installers are funny things. They boot a reduced version of the OS in order to install it, usually in some bizarre context like netboot, or with a live CD. They have all sorts of restrictions -- not being able to write to certain parts of the filesystem, or needing to be bootstrapped from tftp or other ancient crufty protocols. Furthermore, the OS bootstraps itself with less information it has about the system's hardware and configuration than it will have in normal operation -- it needs to put together enough information on the fly and prepare the system for normal operation. But, fundamentally, the initial installer for most OSes boots essentially the same version of the OS it's about to install.

For a developer in any core part of the system that the installer uses, that means certain types of changes must be built and tested in install context. smf(5) is certainly one of those subsystems where a subtle change may have unintended consequences on the gymnastics the OS is doing in install context. Testing is key. But assembling an install image to test for Solaris 10 and earlier used to be a horribly complicated and arcane process. And most changes don't tend to require testing in install context, so people wouldn't do 'just in case' testing. They'd only test install if they really really needed to.

Recently, I needed to figure out how to do testing of nightly ON bits in install context. I started Saturday with my ON nightly repository in hand, memories of the Solaris 10 procedure fresh in my mind, and a sense of dread. But this isn't Solaris 10 anymore, and the Install team has done a great job of changing a painful procedure into something downright delightful. I decided I wanted to test a live CD, so I...

$ pfexec pkg install distribution-constructor
$ cp /usr/share/distro_const/slim_cd/slim_cd_x86.xml ~
$ pfexec zfs create rpool/dc

I modified ~/slim_cd_x86.xml to point at my ON repository first, and then find any packages it couldn't get there from a repository containing the previous build of OpenSolaris. I also didn't install the entire package, as I'm using ON development bits (don't do this on a supported system). And then ran

# /usr/bin/distro_const build ~lianep/slim_cd_x86.xml

About an hour later, I had a LiveCD image and a USB image, ready to test. So much for my sense of dread. One quick note: Distribution Constructor currently requires that you first image-update the system where you're running DC to the same install as you're attempting to build. The DC team knows it's an issue, and the workaround is quite straightforward as it's just "pkg image-update" to your repositories.

And that's pretty much it. The diffs for the DC manifest look like:

--- /usr/share/distro_const/slim_cd/slim_cd_x86.xml  Fri Oct  9 16:29:26 2009
+++ slim_cd_x86.xml  Sat Oct 10 16:53:39 2009
@@ -57,8 +57,8 @@
-				url=""
-				authname=""/>
+				url="http://ipkg.sfbay/on-nightly"
+				authname="on-nightly"/>
 			     If you want to use one or more  mirrors that are
 			     setup for the authority, specify the urls here.
@@ -75,15 +75,12 @@
 		     If you want to use one or more  mirrors that are
 		     setup for the authority, specify the urls here.
-		<!--
-		     Uncomment before using.
-				url=""
-				authname=""/>
+				url="http://ipkg.sfbay/dev"
+				authname=""/>
 			<mirror url="" />
-		-->
-                       <pkg name="entire"/>

For cleanliness, I also updated the post_install_repo_default_authority and post_install_repo_addl_authority to point at the same URLs.

The install team does recommend running DC on a system that's already been updated to that package level. e.g. to build a 124 LiveCD, they suggest first image-updating the system to 124, then running distro_const(1M). That shouldn't always be necessary, but it's also easy to do given that DC requires a repository as input that you can just image-update from.

It's pretty nice that the install team has fundamentally converted a task that was challenging for ON developers before into something that's really quite easy. If you're interested in more, there's plenty, including documentation at the Caiman Project over on

Sunday Oct 11, 2009

building an ON IPS repository

I've been working with the gracious help of Mark on making the ON consolidation create an pkg(5) repository as part of the build process. If you build the ON consolidation from source ever, this is probably interesting to you.

Our changes are destined to be integrated into the main ON gate, which should happen in November 2009 sometime (though that's subject to change and doesn't constitute a promise). We've tried to make it easy for folks to build their own ON IPS repositories for testing in advance of integration of our changes. You can access the latest instructions for building your own ON repository in the README which lives in our development mercurial repository.

If you do want to try this out, I strongly recommend subscribing to, as that's where we're answering questions, giving heads-ups about important changes, and having development conversations. We've got some sizable changes coming over the next few weeks, including a protocmp which works on IPS manifests.

I'm really enjoying using the same tools as we expect our customers to use. It's now pkg image-update to update my ON development bits rather than the development-only tool bfu. pkg image-update is at least as fast, if not faster than bfu, especially over a slower link. That's because only the bits which have actually changed between versions are downloaded and updated by pkg(5). Nice. And nice that our normal upgrade experience is now as blindingly fast as ON developers have come to expect.

Friday Apr 24, 2009

simplifying building ON on an OpenSolaris system

Most of my code lives in the OS/Net consolidation of OpenSolaris (also known as ON). That's true for a lot of other OpenSolaris developers as well, since the majority of the kernel, drivers, and core system software all lives in that consolidation. (Desktop software like Gnome, X, and other pieces of the operating system live in different consolidations.)

It only makes sense to be able to compile the software I primarily work on using my OpenSolaris systems rather than relying on an SXCE build machine. A few folks, primarily Ed and Sherry had assembled early instructions last year. The early instructions required copying some bits from SXCE systems, and installing extra packages. But, since then all of the required packages have been made available in IPS form, and some new requirements have cropped up. I updated the webpage that was put together from Ed's instructions, but the list of packages to install had grown to a fairly large number -- 28 at current count. Rather than forcing people to cut and paste the entire list, I created an IPS package which contains all the dependencies required. That package integrated into OpenSolaris build 111a, now available at the standard development build repository. Once you're running 111a (and have added the extra repository to be able to add a few unfortunately non-redistributable packages), you can pkg install osnet. The package is fully specified as pkg:/developer/opensolaris/osnet.

I'll keep updating developer/opensolaris/osnet with new additions and changes, and any special instructions will be maintained on Indiana's building ON page, which was the subject of an ON heads-up, and is linked to from every place that I could think of, including the Indiana project, the ON community, and a few other spots. But I'll link again here for emphasis, since it will be the best place to get up-to-date instructions for a while:

Next up, I'm working on getting ON to spit out an IPS repo from the binaries it generates... more news on that here and in the ON community as it develops.

Wednesday Apr 08, 2009

nothing like a small crisis to motivate an upgrade to OpenSolaris

Like many other folks, we've got a small server at home which runs nameservice for our personal domain along with mail and even a webserver from time to time. To my shame (or just as a testament to my laziness/ability to leave things alone which Just Work), it had been happily running FreeBSD 2.2.mumble for over a decade. Yes. The 2.2 branch. Before modern conveniences like ELF as the default binary format, and an MT kernel. Taking a moment to pause and laugh at the antiquity of this part of my home infrastructure is entirely appropriate.

In my defense, the NFS server has been running Solaris for a very long time. Laptops run OpenSolaris, of course. We had also bought a small shuttle box quite a while ago with the intent to retire the FreeBSD box in favour of Solaris. Just never quite got around to the migration.

Yesterday morning, I received a phonecall from a friend saying that my email had been bouncing for days. Sure enough, the nameserver wasn't serving, and the machine wasn't pinging. Not much could be done about the problem from work, so after we're home around 8:30pm, I trundle down the basement stairs to the console and find the system wedged. Hard. There was no response to the keyboard, many bizarre messages from various drivers on console. All attempts to soft-reset the system failed, and I power cycled. Unsurprisingly, a fsck of the filesystems was required. / and user data responded favourably to fsck -y, but /var was totally scrogged. Rather than spend time trying to nurse it back to health, figured we'd see how quickly we could switch to the new hardware and OpenSolaris.

Fortunately, by 1am the system was installed with OpenSolaris build 111 from, configured, nameservice configuration was migrated, and I learned enough about postfix to get it up and running rather than trying to migrate ancient sendmail configuration.

Details were fairly straightforward. pkg search -l in.named revealed that SUNWbind was already installed. No big deal to migrate that config, and svcadm enable dns/server.

There's no IPS package for postfix yet (and I really hope that Ceri will have the time to integrate mailwrapper, as described in PSARC 2008/759). System V packages for third-party software work great on OpenSolaris, so I would have used the lovely postfix package distributed by Ihsan Dogan, but it's compiled against a different version of libssl than we have in OpenSolaris. Downloaded the postfix source, pkg uninstall SUNWsndm; svcadm disable sendmail; svcadm disable sendmail-client, pkg install SUNWgcc, and the compile was quick and clean. To keep the software manageable, though, I did use his makePostfixPkg script to create my own package which I could install and remove at will. After pkgadd of the new package, configured, svcadm enable postfix, and mail was back online too.

It's really nice to be on OpenSolaris and a sensible upgrade path, with planned biweekly pkg image-update. But right now, I'm mostly reveling in the safety of a mirrored ZFS root. Should have upgraded a long time ago.

Monday Mar 16, 2009

faster imports

Steve Peng has done something really cool. He's taken the slowest part of the first-boot experience and made it many \*times\* faster than it was. This happened quite a while ago in Nevada/OpenSolaris, so lots of folks have probably already noticed, but I wanted to take a few moments and call it out explicitly.

In onnv_84 (in February, 2008), Steve integrated the fix for

   6351623 Initial manifest-import is slow

During the first boot of Solaris, SMF doesn't know yet what manifests have been populated on the system, so it imports everything below /var/svc/manifest. On systems with slow disks, this can take a long while, but generally folks are used to (and frustrated about!) seeing the slow progress of:

   Loading smf(5) service descriptions: 36/189

During that time, SMF is carefully taking all the manifests from /var/svc/manifest and committing the data to persistent storage. Which takes a while because we weren't doing a bulk import, and instead doing every single property update as a separate transaction to sqlite, each of which needed to commit to disk, which can be very slow.

Steve changed things to instead do the import into tmpfs, where commits are fast because they go to memory rather than all the way to disk, and then switch the repository back to using persistent storage after the imports have completed.

This is a huge deal for performance of the manifest-import service. It significantly improves performance for the first boot after install, for the first boot of virtualized (zones, XVM, VirtualBox, etc.) deployments, and on system upgrades.

In a quick and dirty test on a bare-metal Solaris install, I found manifest-import was 3.15 times faster at importing the ~189 manifests currently included with SXDE. Steve's seen even better performance on many machines -- more like 6.6 times faster!

For those deploying diskless clients, the change in first-boot performance should be nothing less of remarkable.

Monday Mar 10, 2008

VirtualBox on Indiana

I've taken the leap and converted my desktop to run Indiana Developer Preview 2. It's looking pretty nice so far, but as the Preview title suggests, running this as a primary development machine is still a bit on the bleeding edge. So, I'm filing or updating bugs at as they pop up.

Getting VirtualBox running was one of my top priorities so that it remains easy to test in-development bits on Nevada too. So, I downloaded and installed VirtualBox for my amd64 system, and got this:

$ /usr/bin/VirtualBox VirtualBox: fatal: open failed: No such file or directory

So, I found (well, David pointed me at) Indiana bug 512, which describes the missing symlink. Since it's just a missing link, I was easily able to work around it.

  # cd /usr/lib/amd64
  # ln -s ../../X11/lib/64/

Now I can launch /usr/bin/VirtualBox, and am installing Nevada right now.

Tuesday Jul 31, 2007

playa programming -- OpenSolaris at Burning Man

For my eighth year at Burning Man, the two-by-four of inspiration whacked me upside the head and stole away my summer free time. On evenings and weekends I've been working feverishly on an art installation called The Belligerent Blooms. With lots of help and indulgence from Jan, Bart, and encouragement from the rest of my campmates, the project is starting to take shape. There's been plenty of programming, hacking, drilling, sawing, soldering, and even a little bit of welding. The whole project is centered around an OpenSolaris-based system driving audio for this embedded application. There's no electricity out at Burning Man, and generators are noisy -- so with a bit of begging, we've managed to borrow 2 solar panels to charge the battery which runs the whole installation. For those playing along at home, that means we'll be running Sun on the Sun.

This post will (hopefully) be the first in a series about how the software and hardware of this art project was put together. The series will likely come in fits and starts as time before the event is precious -- we depart on August 26, so blogs may have to wait until we return. But, before I begin the series, there's some more begging to do...

The Belligerent Blooms need your audio contributions!

The Belligerent Blooms are a garden of cranky electro-mechanical flowers, accosting passers-by with their deeply rooted beliefs. We need your unique voices to contribute to the cacophony.

Consider what you'd say if you were a belligerent bloom -- share your experiences as part of nature, opinions about humanity in general or Burning Man participants in particular, or any other flowery invective you have to offer. Try to keep the comments short and pithy. Get a giggle, challenge world views, add your voice! Above all, be belligerent!

Contribute your flowery invective by sending mail to and including .wav (preferred, and easily generated by audiorecord(1)) or other audio files. Putting your quips in separate files will greatly please the gardeners. Contributions made by August 19, 2007 will be included.

If you, or someone you know attend Burning Man, the Belligerent Blooms make their home at the 'Blacklight Aquarium' theme camp. Habitat and 7:30; just look for the big spinning fish sign in the sky.

Technorati Tags: , , , and .

Thursday Jul 26, 2007

zone out and speed up your development cycles

The other day, as an aside in a conversation about how often developers use certain OS features, I was asked how often I use Solaris Zones. At least weekly, and daily if I'm lucky enough to be spending my time on code.

Surprised? Userland developers on Solaris shouldn't be.

I spend a great deal of time modifying libraries and daemons started really early in the Solaris boot process. While SMF tries to dump you at an sulogin prompt if you've introduced a bug, it's still a bit of a pain to recover from some nasty failure or hang you've coded into init(1M), svc.startd(1M), or libscf(3LIB). Zones make the deploy and reboot cycle go really really fast, and recovery from late-night programming errors is a breeze!

Here's what I do:

  • Create a whole root zone. Takes up more disk space, but allows me to replace any system binary:

            # zonecfg -z test
            test: No such zone configured
            Use 'create' to begin configuring a new zone.
            zonecfg:test> create -b
            zonecfg:test> set zonepath=/test-zone
            zonecfg:test> commit
            zonecfg:test> exit
            # zoneadm -z test install
            # zoneadm -z test boot
            # zlogin -C test
               ... answer sysid questions, and log in 
  • Make sure the bits I'm compiling are relatively close to the bits installed on my desktop. Live Upgrade and I are close friends so that I can keep my desktop as up-to-date as my workspaces.

  • Create a script to dump my modified bits into the zone root. Something like this fragment:

            if [ ! -d $zone ]; then
                echo "no zone root here: $zone"
                exit 1
            cp  ${gate}/lib/libscf/i386/ ${zone}/lib
            cp  ${gate}/lib/libscf/amd64/ ${zone}/lib/amd64
  • Compile, run script, test, debug, fix, repeat.

After my code is all basically working, then I move on to testing on the bare metal. But, the fast reboot times of zones and the easy ability to replace a broken library with a library broken in a new and different way is invaluable to making very rapid progress. I've rebooted my zone at least 15 times today. Compiling the library takes longer than the zone deployment and reboot! Every few months I mismatch libraries and commands from different workspaces and foul up my zone badly enough that it needs to be re-installed. But, zoneadm -z test uninstall; zoneadm -z test install provides a convenient excuse to go get coffee and then I'm back in business.

Technorati Tags: , , , and .

Tuesday Jun 27, 2006

starting ssh early in boot

It's nice that as of Solaris 9, ssh is included in Solaris. We've always started it late, though. That's a conscious decision made by the ssh team to include dependencies which mean that it won't start until all local and network services any user might need, including remote filesystems, are available.

This is great for interactive environments with many home directories stored on the network. Your users don't end up with spurious and surprising home directory not available messages if you try to log in too early during the boot process. But, if you've got a server stranded in a co-lo many miles away, this might not be what you want. You may want ssh to start up as soon as the root filesystem and basic networking are available, and be available if you boot to single-user mode. Here's how for Solaris 10 and similar releases (no guarantee of fitness for more recent bits):

  • Remove the ssh dependencies on services you don't need. The full dependency list for ssh is:

       $ svcs -d ssh
       STATE          STIME    FMRI
       online         Dec_19   svc:/network/loopback:default
       online         Dec_19   svc:/network/loopback:default
       online         Dec_19   svc:/network/physical:default
       online         Dec_19   svc:/system/filesystem/usr:default
       online         Dec_19   svc:/system/cryptosvc:default
       online         Dec_19   svc:/system/filesystem/local:default
       online         Dec_19   svc:/system/utmp:default
       online         Dec_19   svc:/system/filesystem/autofs:default

    Since I said I wanted just the network and the root filesystem to be available (oh, yeah, and the crypto services we need to do proper encryption), I'll delete everything else. That means I first look at the dependency property group names:

       # svccfg -s ssh listpg
       config_data     dependency
       cryptosvc       dependency
       net             dependency
       fs-local        dependency
       fs-autofs       dependency
       net-loopback    dependency
       net-physical    dependency
       utmp            dependency

    Dependencies are helpfully named here and have type dependency, so I'm not looking much futher to see the ones to delete. Deleting the dependency on utmpd is a little risky, but I'm willing to live with the risk to commands like last(1). So, I go ahead and delete the property groups which configure those dependencies.

       # svccfg -s ssh delpg fs-local
       # svccfg -s ssh delpg fs-autofs
       # svccfg -s ssh delpg utmp
       # svcadm refresh ssh
       # svcs -d ssh
       STATE          STIME    FMRI
       online         Dec_19   svc:/network/loopback:default
       online         Dec_19   svc:/network/loopback:default
       online         Dec_19   svc:/network/physical:default
       online         Dec_19   svc:/system/cryptosvc:default
       online         Dec_19   svc:/system/filesystem/local:default

None of this is endorsement to go around deleting dependencies willy-nilly on your system. All of them are there for a reason, and deleting them without understanding their purpose is guaranteed to lead to pain. If you have any problems with a service where you've deleted dependencies, the first step is to put them back! There are no places in which we've added dependencies to services for fun. Some of the dependencies you might see that seem superfluous are the ones that we added only after finding some very subtle breakage without them.

Tuesday May 16, 2006

Towards a SANE SMF

SANE 2006

Thanks to a very kind invitation from the conference organizers, I'm at SANE 2006 and finished up a half-day SMF tutorial earlier today. The room was pretty full, and there were plenty of good questions to keep me on my toes. I promised to post the slides on my blog, so here they are. The conference provided Certificates of Completion for anyone who stayed in my session until the end, so I'd also like to present an honorary and virtual Certificate of Completion for David Bustos and Bob Netherton, from whom I borrowed some of the slide material. :)

As this is my first time in the Netherlands, it seems necessary to include a tourist snapshot. There's a lovely old church (indeed, Oude Kerk) visible out my window; no reference to Vermeer was originally intended, but given that he's interred at Delft's Oude Kerk perhaps it was inevitable (though inappropriate for such a poorly lit photograph). The charming half-hourly bells would be more so if they weren't conspiring with my current state of jet lag to limit sleep to nap-length increments.

Tuesday Oct 04, 2005

Manifest collecting

In case you're not reading Stephen's blog, I wanted to mention that we're collecting smf(5) manifests for all sorts of software over on the OpenSolaris smf(5) community. We're getting through our backlog of manifests to include slowly but surely, so if you've written a manifest you want to share, or even just know of an interesting manifest or two out on the web, come on over and give us a pointer.

Thursday Aug 25, 2005

How does :kill work?

The :kill method token is kind of cool. Rather than maintaining a pid file or using some grungy invocation of pkill (which almost always is incorrect for services that may run on a system with zones) to find all the processes required to stop your service, you can use the :kill token in your method to simply say "kill off all the processes in this service". Works great with contract type services in startd (but not so much with transient services). You can see smf_method(5) and the developer intro for more details about :kill from the service developer's point of view.

I found out recently that I didn't talk about how the :kill method token works when I talked about general fault isolation in smf(5). It's just so simple that I forgot to mention it. But still, you need to know the key...

Since contracts already take care of grouping the processes for us into services, and extended the appropriate kernel interfaces to allow operations on contracts, we can actually just send a signal to all processes in the contract easily with sigsend(2). So, the contract kill function in svc.startd(1M) is mind-bogglingly simple. All you need is a contract id.

contract_kill(ctid_t ctid, int sig, const char \*fmri)
        if (sigsend(P_CTID, ctid, sig) == -1 && errno != ESRCH) {
                    "%s: Could not signal all contract members: %s\\n", fmri,
                return (-1);

        return (0);

Nice, huh? Having contracts as a well-supported kernel feature makes some previously impossible things now possible, and generally makes the life of the restarter author easier. This is, to me, one of the truly significant benefits of having a kernel that evolves in concert with its userland tools.

Technorati Tags: , , and .

Friday Aug 19, 2005

A few OSCON pics

Well, I've passed the expiration date for a useful trip report of OSCON. I'll keep this short and sweet and get a few pictures posted. It was my first time at OSCON, and I'm glad I went. It was great to go and talk about OpenSolaris and Solaris in general; among the fun was our rockstar-style suite (complete with getting admonished by the hotel for talking about OpenSolaris too loud on Wednesday night), the booth, a number of good talks by Bryan and Keith, and a small but fun BoF.

From an smf(5) point of view, I got to chat with one of Apple's launchd guys, which was definitely interesting. launchd had a very different design center and problem definition than the smf(5) team was working with, but ended up with strong similarities to smf(5) in their solution. We also have some diverging functionality too, but the similarities often generate comment. And to dispel any myths... launchd wasn't yet out when we were releasing smf(5) in Solaris 10, and the launchd guys say they didn't see our stuff until they were nearly complete. Chalk it up to "great minds think alike", if you're feeling charitable. :)

Jon Masters, Devon O'Dell, and I showed our stripes and shared a drink at the OpenSolaris suite.

Sara Dornsife, Patrick Finch, and Claire Giordano from OpenSolaris marketing.

Some dorky-looking geek with Teresa Giacomini of the OpenSolaris team and Casper Dik: CAB member, security guru, and Solaris expert.

Technorati Tags: , , and .

Monday Aug 01, 2005

Off to OSCON

As folks like Keith and Bryan have already noted, there are plenty of OpenSolaris happenings at OSCON this year. I'll be at the OpenSolaris BOF at 8:30pm Wednesday evening, so stop by if you're around.

I'll also be knocking about the conference/Portland in general Tuesday night through Friday. Leave a message for me at my hotel (503-222-0001) if you'd like to meet up for a beer and talk about OpenSolaris, Solaris, or smf(5). I'll even be happy to help you write an smf(5) manifest for any open source or commercial service; conditional only on your willingness to publish the manifest for others to use. :)

Technorati Tags: , , and .


Liane Praza-Oracle


« February 2017