A not-quite-isochronous upgrade
By sch on Aug 09, 2006
In the midst of many pieces of work, a new machine showed up on my doorstep. It's pretty sweet, and I wanted to get it running latest Solaris Nevada bits. So I repartitioned the disk to make room for an additional boot environment (BE), updated the Live Upgrade support, upgraded (to Build 44), and got back to work.
But this week I saw something pretty strange. I was working on making some software more portable, updating some Makefiles as well as changing some of the source. On a rebuild, my gcc compilation failed with
collect2: ld terminated with signal 9 [Killed] ld.so.1: ld: fatal: libld.so.4: version `SUNWprivate_4.2' not found (required by file /usr/ccs/bin/ld) ld.so.1: ld: fatal: libld.so.4: open failed: No such file or directory ld.so.1: ld: fatal: relocation error: file /usr/ccs/bin/ld: symbol ld32_main: referenced symbol not found
After checking with a few neighbours about any recent linker fixes, and reviewing package installation times, we finally started looking inside the binaries. The version from the install server showed Build 44:
$ mcs -p reloc/usr/ccs/bin/ld reloc/usr/ccs/bin/ld: @(#)SunOS 5.11 snv_44 October 2007
But the version installed on the machine was from the future:
$ mcs -p /usr/ccs/bin/ld /usr/ccs/bin/ld: @(#)SunOS 5.11 snv_45 October 2007
What were we running on this box?
$ cat /etc/motd Sun Microsystems Inc. SunOS 5.11 snv_44 October 2007
I suppose if this were an Encyclopedia Brown mystery, you would have to flip to the end of the book to discover the solution to the mystery. Unfortunately, I haven't given you quite enough information--you need to know how our install servers share new images.
As you might expect, there is a large set of install servers available
across Sun. Some are run by IT operations, some are in development
labs—but all pull their images across at different times.
It turns out that the most recent build is linked to a directory called
latest", so that one can reinstall a system every two weeks and
have it running the most recently assembled version of Solaris.
Generally, the action to get a copy of the newest image and update the
links happens automatically, and outside of business hours.
But sometimes in a development lab, someone wants an image
early, perhaps to upgrade a collection of test machines for verification
purposes. And, it turns out,
luupgrade(1M) works just fine across
such a rename, if it's not specifically accessing the filesystem during
the image changeover. So my Live Upgrade installed a swathe of packages
from 44 followed by a smattering from 45.
One more Live Upgrade (using the numbered path, rather than
to Build 45, and it's back to porting.
This problem isn't going to be seen very often—particularly if you don't have a multi-version install server setup like we do—but it seems appropriate to document the underlying issue: Live Upgrade doesn't detect an image change during the operation. I'm not sure it should, but it's pretty clear that the underlying package operations are incomplete with respect to versioned dependencies.