Thursday Jun 04, 2009

Getting with the program in roughly an hour.

Note: This is mostly useful to developers with existing nevada based build machines.

This morning I came in to an e-mail describing a couple ways in which live-upgrade was blowing up on our (nevada based) primary x86 build machine. This wasn't really news to me, but it was now stuck on bits that pre-dated some build machine flag days and really did need to be moved forward. We could either work around all the lu issues and stay on the pre ips nevada train, or get the same bits via ips by moving to opensolaris (the distro, we're already running the OS and none of this changes the source base).

So, since the system was idle, I formed a quick plan to move to opensolaris and less than an hour and 15 minutes later it was successfully producing full nightly builds. This included a couple of trips up and down three flights of stairs and is dramatically less time than a live-upgrade would have consumed.

Here's what I did:

I knew that the system had graphical console redirection, so I could just use the live-cd. I didn't want to deal with the virtual CD/DVD drive over the 100Mbit link to my office if I could help it, so I kicked of a DVD burner to produce a 2009.06 111b CD from an image I had downloaded a while back.

While it was burning I halted the system and ran downstairs to check if it had a physical CD/DVD drive. It did. I also swapped one of the (zfs mirrored) root drives for a blank one so I could fall back or reference it if worse came to worse. I also disconnected the JBOD with the build workspaces, mostly to make things probe faster as I'd be rebooting the system a couple of times, but also to keep my data safely away from my brain that might get one of the controller numbers mixed up.

Back in my office the DVD was burned (it's CD sized, but I'm using DVD+RW media). I ran downstairs with it, put it in the system and then back up-stairs to my office, where I grabbed the console redirection. Once the live-cd was on-line I pointed the installer at one of the disks and got it started. I then paused it to mirror and enable compression on the root pool, more or less:

pstop `pgrep cpio`
zfs set compression=on rpool/ROOT/opensolaris
# used format to verify that the disks were the same size and
# that they were using whole disk Solaris2 fdisk partitions with
# s0 spanning all but the first
zpool attach -f rpool s0 s0
# run zpool status and wait roughly a minute for resilver to finish
prun `pgrep cpio`

Then rather than rebooting the system at the end, I halted it. I ran back downstairs and removed the CD and re-connected the JBOD. Back in my office I booted -s and ran sys-unconfig. Note: I didn't think of it at the time, but before running sys-unconfig might have been the right time to edit /rpool/boot/grub/menu.lst to remove all the splashimage loads and console=graphics and set console to ttya as it would have saved a later reboot.

The sys-unconfig rebooted the system, it came up with sysid and I ran through it in essence setting things to english/C, PST, static IP w/ipv6, nisdomain and NFSv4 domain, the system then again rebooted.

Now I could remotely log in and to get the bits needed to run a build I connected it to the internal dev (not required, but desirable) and extras repositories by doing the following:

pkg set-publisher -P -O http://ipkg.sfbay/dev/
pkg set-publisher -O http://ipkg.sfbay/extra/ extra
pkg install osnet

I also had to:

cd /opt
ln -s /ws/onnv-tools/teamware
ln -s /ws/onnv-tools/SUNWspro/SS12 SUNWspro

To get the build workspaces back on-line from the zpool on the JBOD, I simply ran:

zpool import -f builds

Since things like whether or not a filesystem is shared is set on the fs properties with zfs, that configuration was all preserved. Neat.

Finally I edited the GRUB menu as described above and ran a pkg image-update to get any bits that weren't up to date on the CD (not much at this point) and did a final reboot.




« June 2009 »