Tuesday Oct 13, 2009

testing ON changes with OpenSolaris installers

Installers are funny things. They boot a reduced version of the OS in order to install it, usually in some bizarre context like netboot, or with a live CD. They have all sorts of restrictions -- not being able to write to certain parts of the filesystem, or needing to be bootstrapped from tftp or other ancient crufty protocols. Furthermore, the OS bootstraps itself with less information it has about the system's hardware and configuration than it will have in normal operation -- it needs to put together enough information on the fly and prepare the system for normal operation. But, fundamentally, the initial installer for most OSes boots essentially the same version of the OS it's about to install.

For a developer in any core part of the system that the installer uses, that means certain types of changes must be built and tested in install context. smf(5) is certainly one of those subsystems where a subtle change may have unintended consequences on the gymnastics the OS is doing in install context. Testing is key. But assembling an install image to test for Solaris 10 and earlier used to be a horribly complicated and arcane process. And most changes don't tend to require testing in install context, so people wouldn't do 'just in case' testing. They'd only test install if they really really needed to.

Recently, I needed to figure out how to do testing of nightly ON bits in install context. I started Saturday with my ON nightly repository in hand, memories of the Solaris 10 procedure fresh in my mind, and a sense of dread. But this isn't Solaris 10 anymore, and the Install team has done a great job of changing a painful procedure into something downright delightful. I decided I wanted to test a live CD, so I...

$ pfexec pkg install distribution-constructor
$ cp /usr/share/distro_const/slim_cd/slim_cd_x86.xml ~
$ pfexec zfs create rpool/dc

I modified ~/slim_cd_x86.xml to point at my ON repository first, and then find any packages it couldn't get there from a repository containing the previous build of OpenSolaris. I also didn't install the entire package, as I'm using ON development bits (don't do this on a supported system). And then ran

# /usr/bin/distro_const build ~lianep/slim_cd_x86.xml

About an hour later, I had a LiveCD image and a USB image, ready to test. So much for my sense of dread. One quick note: Distribution Constructor currently requires that you first image-update the system where you're running DC to the same install as you're attempting to build. The DC team knows it's an issue, and the workaround is quite straightforward as it's just "pkg image-update" to your repositories.

And that's pretty much it. The diffs for the DC manifest look like:

--- /usr/share/distro_const/slim_cd/slim_cd_x86.xml  Fri Oct  9 16:29:26 2009
+++ slim_cd_x86.xml  Sat Oct 10 16:53:39 2009
@@ -57,8 +57,8 @@
-				url="http://pkg.opensolaris.org/release"
-				authname="opensolaris.org"/>
+				url="http://ipkg.sfbay/on-nightly"
+				authname="on-nightly"/>
 			     If you want to use one or more  mirrors that are
 			     setup for the authority, specify the urls here.
@@ -75,15 +75,12 @@
 		     If you want to use one or more  mirrors that are
 		     setup for the authority, specify the urls here.
-		<!--
-		     Uncomment before using.
-				url=""
-				authname=""/>
+				url="http://ipkg.sfbay/dev"
+				authname="opensolaris.org"/>
 			<mirror url="" />
-		-->
-                       <pkg name="entire"/>

For cleanliness, I also updated the post_install_repo_default_authority and post_install_repo_addl_authority to point at the same URLs.

The install team does recommend running DC on a system that's already been updated to that package level. e.g. to build a 124 LiveCD, they suggest first image-updating the system to 124, then running distro_const(1M). That shouldn't always be necessary, but it's also easy to do given that DC requires a repository as input that you can just image-update from.

It's pretty nice that the install team has fundamentally converted a task that was challenging for ON developers before into something that's really quite easy. If you're interested in more, there's plenty, including documentation at the Caiman Project over on opensolaris.org.

Sunday Oct 11, 2009

building an ON IPS repository

I've been working with the gracious help of Mark on making the ON consolidation create an pkg(5) repository as part of the build process. If you build the ON consolidation from source ever, this is probably interesting to you.

Our changes are destined to be integrated into the main ON gate, which should happen in November 2009 sometime (though that's subject to change and doesn't constitute a promise). We've tried to make it easy for folks to build their own ON IPS repositories for testing in advance of integration of our changes. You can access the latest instructions for building your own ON repository in the README which lives in our development mercurial repository.

If you do want to try this out, I strongly recommend subscribing to on-ips-dev@opensolaris.org, as that's where we're answering questions, giving heads-ups about important changes, and having development conversations. We've got some sizable changes coming over the next few weeks, including a protocmp which works on IPS manifests.

I'm really enjoying using the same tools as we expect our customers to use. It's now pkg image-update to update my ON development bits rather than the development-only tool bfu. pkg image-update is at least as fast, if not faster than bfu, especially over a slower link. That's because only the bits which have actually changed between versions are downloaded and updated by pkg(5). Nice. And nice that our normal upgrade experience is now as blindingly fast as ON developers have come to expect.

Friday Apr 24, 2009

simplifying building ON on an OpenSolaris system

Most of my code lives in the OS/Net consolidation of OpenSolaris (also known as ON). That's true for a lot of other OpenSolaris developers as well, since the majority of the kernel, drivers, and core system software all lives in that consolidation. (Desktop software like Gnome, X, and other pieces of the operating system live in different consolidations.)

It only makes sense to be able to compile the software I primarily work on using my OpenSolaris systems rather than relying on an SXCE build machine. A few folks, primarily Ed and Sherry had assembled early instructions last year. The early instructions required copying some bits from SXCE systems, and installing extra packages. But, since then all of the required packages have been made available in IPS form, and some new requirements have cropped up. I updated the webpage that was put together from Ed's instructions, but the list of packages to install had grown to a fairly large number -- 28 at current count. Rather than forcing people to cut and paste the entire list, I created an IPS package which contains all the dependencies required. That package integrated into OpenSolaris build 111a, now available at the standard http://pkg.opensolaris.org/dev development build repository. Once you're running 111a (and have added the extra repository to be able to add a few unfortunately non-redistributable packages), you can pkg install osnet. The package is fully specified as pkg:/developer/opensolaris/osnet.

I'll keep updating developer/opensolaris/osnet with new additions and changes, and any special instructions will be maintained on Indiana's building ON page, which was the subject of an ON heads-up, and is linked to from every place that I could think of, including the Indiana project, the ON community, and a few other spots. But I'll link again here for emphasis, since it will be the best place to get up-to-date instructions for a while: http://opensolaris.org/os/project/indiana/building_on/.

Next up, I'm working on getting ON to spit out an IPS repo from the binaries it generates... more news on that here and in the ON community as it develops.

Wednesday Apr 08, 2009

nothing like a small crisis to motivate an upgrade to OpenSolaris

Like many other folks, we've got a small server at home which runs nameservice for our personal domain along with mail and even a webserver from time to time. To my shame (or just as a testament to my laziness/ability to leave things alone which Just Work), it had been happily running FreeBSD 2.2.mumble for over a decade. Yes. The 2.2 branch. Before modern conveniences like ELF as the default binary format, and an MT kernel. Taking a moment to pause and laugh at the antiquity of this part of my home infrastructure is entirely appropriate.

In my defense, the NFS server has been running Solaris for a very long time. Laptops run OpenSolaris, of course. We had also bought a small shuttle box quite a while ago with the intent to retire the FreeBSD box in favour of Solaris. Just never quite got around to the migration.

Yesterday morning, I received a phonecall from a friend saying that my email had been bouncing for days. Sure enough, the nameserver wasn't serving, and the machine wasn't pinging. Not much could be done about the problem from work, so after we're home around 8:30pm, I trundle down the basement stairs to the console and find the system wedged. Hard. There was no response to the keyboard, many bizarre messages from various drivers on console. All attempts to soft-reset the system failed, and I power cycled. Unsurprisingly, a fsck of the filesystems was required. / and user data responded favourably to fsck -y, but /var was totally scrogged. Rather than spend time trying to nurse it back to health, figured we'd see how quickly we could switch to the new hardware and OpenSolaris.

Fortunately, by 1am the system was installed with OpenSolaris build 111 from http://pkg.opensolaris.org/dev, configured, nameservice configuration was migrated, and I learned enough about postfix to get it up and running rather than trying to migrate ancient sendmail configuration.

Details were fairly straightforward. pkg search -l in.named revealed that SUNWbind was already installed. No big deal to migrate that config, and svcadm enable dns/server.

There's no IPS package for postfix yet (and I really hope that Ceri will have the time to integrate mailwrapper, as described in PSARC 2008/759). System V packages for third-party software work great on OpenSolaris, so I would have used the lovely postfix package distributed by Ihsan Dogan, but it's compiled against a different version of libssl than we have in OpenSolaris. Downloaded the postfix source, pkg uninstall SUNWsndm; svcadm disable sendmail; svcadm disable sendmail-client, pkg install SUNWgcc, and the compile was quick and clean. To keep the software manageable, though, I did use his makePostfixPkg script to create my own package which I could install and remove at will. After pkgadd of the new package, configured main.cf, svcadm enable postfix, and mail was back online too.

It's really nice to be on OpenSolaris and a sensible upgrade path, with planned biweekly pkg image-update. But right now, I'm mostly reveling in the safety of a mirrored ZFS root. Should have upgraded a long time ago.

Monday Mar 16, 2009

faster imports

Steve Peng has done something really cool. He's taken the slowest part of the first-boot experience and made it many \*times\* faster than it was. This happened quite a while ago in Nevada/OpenSolaris, so lots of folks have probably already noticed, but I wanted to take a few moments and call it out explicitly.

In onnv_84 (in February, 2008), Steve integrated the fix for

   6351623 Initial manifest-import is slow

During the first boot of Solaris, SMF doesn't know yet what manifests have been populated on the system, so it imports everything below /var/svc/manifest. On systems with slow disks, this can take a long while, but generally folks are used to (and frustrated about!) seeing the slow progress of:

   Loading smf(5) service descriptions: 36/189

During that time, SMF is carefully taking all the manifests from /var/svc/manifest and committing the data to persistent storage. Which takes a while because we weren't doing a bulk import, and instead doing every single property update as a separate transaction to sqlite, each of which needed to commit to disk, which can be very slow.

Steve changed things to instead do the import into tmpfs, where commits are fast because they go to memory rather than all the way to disk, and then switch the repository back to using persistent storage after the imports have completed.

This is a huge deal for performance of the manifest-import service. It significantly improves performance for the first boot after install, for the first boot of virtualized (zones, XVM, VirtualBox, etc.) deployments, and on system upgrades.

In a quick and dirty test on a bare-metal Solaris install, I found manifest-import was 3.15 times faster at importing the ~189 manifests currently included with SXDE. Steve's seen even better performance on many machines -- more like 6.6 times faster!

For those deploying diskless clients, the change in first-boot performance should be nothing less of remarkable.

Monday Mar 10, 2008

VirtualBox on Indiana

I've taken the leap and converted my desktop to run Indiana Developer Preview 2. It's looking pretty nice so far, but as the Preview title suggests, running this as a primary development machine is still a bit on the bleeding edge. So, I'm filing or updating bugs at http://defect.opensolaris.org as they pop up.

Getting VirtualBox running was one of my top priorities so that it remains easy to test in-development bits on Nevada too. So, I downloaded and installed VirtualBox for my amd64 system, and got this:

$ /usr/bin/VirtualBox
ld.so.1: VirtualBox: fatal: libX11.so.4: open failed: No such file or directory

So, I found (well, David pointed me at) Indiana bug 512, which describes the missing symlink. Since it's just a missing link, I was easily able to work around it.

  # cd /usr/lib/amd64
  # ln -s ../../X11/lib/64/libX11.so.6 libX11.so.4

Now I can launch /usr/bin/VirtualBox, and am installing Nevada right now.

Tuesday Jul 31, 2007

playa programming -- OpenSolaris at Burning Man

For my eighth year at Burning Man, the two-by-four of inspiration whacked me upside the head and stole away my summer free time. On evenings and weekends I've been working feverishly on an art installation called The Belligerent Blooms. With lots of help and indulgence from Jan, Bart, and encouragement from the rest of my campmates, the project is starting to take shape. There's been plenty of programming, hacking, drilling, sawing, soldering, and even a little bit of welding. The whole project is centered around an OpenSolaris-based system driving audio for this embedded application. There's no electricity out at Burning Man, and generators are noisy -- so with a bit of begging, we've managed to borrow 2 solar panels to charge the battery which runs the whole installation. For those playing along at home, that means we'll be running Sun on the Sun.

This post will (hopefully) be the first in a series about how the software and hardware of this art project was put together. The series will likely come in fits and starts as time before the event is precious -- we depart on August 26, so blogs may have to wait until we return. But, before I begin the series, there's some more begging to do...

The Belligerent Blooms need your audio contributions!

The Belligerent Blooms are a garden of cranky electro-mechanical flowers, accosting passers-by with their deeply rooted beliefs. We need your unique voices to contribute to the cacophony.

Consider what you'd say if you were a belligerent bloom -- share your experiences as part of nature, opinions about humanity in general or Burning Man participants in particular, or any other flowery invective you have to offer. Try to keep the comments short and pithy. Get a giggle, challenge world views, add your voice! Above all, be belligerent!

Contribute your flowery invective by sending mail to belligerent.blooms@gmail.com and including .wav (preferred, and easily generated by audiorecord(1)) or other audio files. Putting your quips in separate files will greatly please the gardeners. Contributions made by August 19, 2007 will be included.

If you, or someone you know attend Burning Man, the Belligerent Blooms make their home at the 'Blacklight Aquarium' theme camp. Habitat and 7:30; just look for the big spinning fish sign in the sky.

Technorati Tags: , , , and .

Thursday Jul 26, 2007

zone out and speed up your development cycles

The other day, as an aside in a conversation about how often developers use certain OS features, I was asked how often I use Solaris Zones. At least weekly, and daily if I'm lucky enough to be spending my time on code.

Surprised? Userland developers on Solaris shouldn't be.

I spend a great deal of time modifying libraries and daemons started really early in the Solaris boot process. While SMF tries to dump you at an sulogin prompt if you've introduced a bug, it's still a bit of a pain to recover from some nasty failure or hang you've coded into init(1M), svc.startd(1M), or libscf(3LIB). Zones make the deploy and reboot cycle go really really fast, and recovery from late-night programming errors is a breeze!

Here's what I do:

  • Create a whole root zone. Takes up more disk space, but allows me to replace any system binary:

            # zonecfg -z test
            test: No such zone configured
            Use 'create' to begin configuring a new zone.
            zonecfg:test> create -b
            zonecfg:test> set zonepath=/test-zone
            zonecfg:test> commit
            zonecfg:test> exit
            # zoneadm -z test install
            # zoneadm -z test boot
            # zlogin -C test
               ... answer sysid questions, and log in 
  • Make sure the bits I'm compiling are relatively close to the bits installed on my desktop. Live Upgrade and I are close friends so that I can keep my desktop as up-to-date as my workspaces.

  • Create a script to dump my modified bits into the zone root. Something like this fragment:

            if [ ! -d $zone ]; then
                echo "no zone root here: $zone"
                exit 1
            cp  ${gate}/lib/libscf/i386/libscf.so.1 ${zone}/lib
            cp  ${gate}/lib/libscf/amd64/libscf.so.1 ${zone}/lib/amd64
  • Compile, run script, test, debug, fix, repeat.

After my code is all basically working, then I move on to testing on the bare metal. But, the fast reboot times of zones and the easy ability to replace a broken library with a library broken in a new and different way is invaluable to making very rapid progress. I've rebooted my zone at least 15 times today. Compiling the library takes longer than the zone deployment and reboot! Every few months I mismatch libraries and commands from different workspaces and foul up my zone badly enough that it needs to be re-installed. But, zoneadm -z test uninstall; zoneadm -z test install provides a convenient excuse to go get coffee and then I'm back in business.

Technorati Tags: , , , and .


Liane Praza


« April 2014