Live Upgrade, /var/tmp and the Ever Growing Boot Environments

Even if you are a veteran Live Upgrade user, you might be caught by surprise when your new ZFS root pool starts filling up, and you have no idea where the space is going. I tripped over this one while installing different versions of StarOffice and OpenOffice and forgot that they left a rather large parcel behind in /var/tmp. When recently helping a customer through some Live Upgrade issues, I noticed that they were downloading patch clusters into /var/tmp and then I remembered that I used to do that too.

And then stopped. This is why. What follows has been added to the list of Common Live Upgrade Problems, as Number 3.

Let's start with a clean installation of Solaris 10 10/09 (u8).

# df -k /
Filesystem                       kbytes    used   avail capacity  Mounted on
rpool/ROOT/s10x_u8wos_08a      20514816 4277560 13089687    25%    /

So far, so good. Solaris is just a bit over 4GB. Another 3GB is used by the swap and dump devices. That should leave plenty of room for half a dozen or so patch cycles (assuming 1GB each) and an upgrade to the next release.

Now, let's put on the latest recommended patch cluster. Note that I am following the suggestions in my Live Upgrade Survival Guide, installing the prerequisite patches and the LU patch before actually installing the patch cluster.

# cd /var/tmp
# wget patchserver:/export/patches/ .
# unzip -qq

# wget patchserver:/export/patches/
# unzip 121431-69

# cd 10x_Recommended
# ./installcluster --apply-prereq --passcode (you can find this in README)

# patchadd -M /var/tmp 121431-69

# lucreate -n s10u8-2012-01-05
# ./installcluster -d -B s10u8-2012-01-05 --passcode

# luactivate s10u8-2012-01-05
# init 0

After the new boot environment is activated, let's upgrade to the latest release of Solaris 10. In this case, it will be Solaris 10 8/11 (u10).

Yes, this does seem like an awful lot is happening in a short period of time. I'm trying to demonstrate a situation that really does happen when you forget something as simple as a patch cluster clogging up /var/tmp. Think of this as one of those time lapse video sequences you might see in a nature documentary.

# pkgrm SUNWluu SUNWlur SUNWlucfg
# pkgadd -d /cdrom/sol_10_811_x86  SUNWluu SUNWlur SUNWlucfg
# patchadd -M /var/tmp 121431-69

# lucreate -n s10u10-baseline'
# echo "autoreg=disable" > /var/tmp/no-autoreg
# luupgrade -u -s /cdrom/sol_10_811_x86 -k /var/tmp/no-autoreg -n s10u10-baseline
# luactivate s10u10-baseline
# init 0
As before, everything went exactly as expected. Or I thought so, until I logged in the first time and checked the free space in the root pool.
# df -k /
Filesystem                       kbytes    used   avail capacity  Mounted on
rpool/ROOT/s10u10-baseline     20514816 10795038 2432308    82%    /
Where did all of the space go ? Back of the napkin calculations of 4.5GB (s10u8) + 4.5GB (s10u10) + 1GB (patch set) + 3GB (swap and dump) = 13GB. 20GB pool - 13GB used = 7GB free. But there's only 2.4GB free ?

This is about the time that I smack myself on the forehead and realize that I put the patch cluster in the /var/tmp. Old habits die hard. This is not a problem, I can just delete it, right ?

Not so fast.

# du -sh /var/tmp
 5.4G   /var/tmp

# du -sh /var/tmp/10*
 3.8G   /var/tmp/10_x86_Recommended
 1.5G   /var/tmp/

# rm -rf /var/tmp/10*

# du -sh /var/tmp
 3.4M   /var/tmp

Imagine the look on my face when I check the pool free space, expecting to see 7GB.
# df -k /
Filesystem                      kbytes    used   avail capacity  Mounted on
rpool/ROOT/s10u10-baseline    20514816 5074262 2424603    68%    /

We are getting closer. At least my root filesystem size is reasonable (5GB vs 11GB). But the free space hasn't changed at all.

Once again, I smack myself on the forehead. The patch cluster is also in the other two boot environments. All I have to do is get rid them too, and I'll get my free space back.

# lumount s10u8-2012-01-05 /mnt
# rm -rf /mnt/var/tmp/10_x86_Recommended*
# luumount s10u8-2012-01-05

# lumount s10x_u8wos_08a /mnt
# rm -rf /mnt/var/tmp/10_x86_Recommended*
# luumount s10x_u8wos_08a
Surely, that will get my free space reclaimed, right ?
# df -k /
Filesystem                    kbytes    used   avail capacity  Mounted on
rpool/ROOT/s10u10-baseline  20514816 5074265 2429261    68%    /

This is when I smack myself on the forehead for the third time in one afternoon. Just getting rid of them in the boot environments is not sufficient. It would be if I were using UFS as a root filesystem, but lucreate will use the ZFS snapshot and cloning features when used on a ZFS root. So the patch cluster is in the snapshot, and the oldest one at that.

Let's try this all over again, but this time I will put the patches somewhere else that is not part of a boot environment. If you are thinking of using root's home directory, think again - it is part of the boot environment. If you are running out of ideas, let me suggest that /export/patches might be a good place to put them.

Doing the exercise again, with the patches in /export/patches, I get similar results (to be expected), but with one significant different.This time the patches are in a shared ZFS dataset (/export) and can be deleted.

# lustatus
Boot Environment           Is       Active Active    Can    Copy      
Name                       Complete Now    On Reboot Delete Status    
-------------------------- -------- ------ --------- ------ ----------
s10x_u8wos_08a             yes      no     no        yes    -         
s10u8-2012-01-05           yes      no     no        yes    -         
s10u10-baseline            yes      yes    yes       no     -         

# df -k /
Filesystem                      kbytes    used   avail capacity  Mounted on
rpool/ROOT/s10u10-baseline    20514816 5184578 2445140    68%    /

# df -k /export
Filesystem                      kbytes    used   avail capacity  Mounted on
rpool/export                  20514816 5606384 2445142    70%    /export

This time, when I delete them, the disk space will be reclaimed.
# rm -rf /export/patches/10_x86_Recommended*

# df -k /
Filesystem                      kbytes    used   avail capacity  Mounted on
rpool/ROOT/s10u10-baseline    20514816 5184578 8048050    40%    /

Now, that's more like it. With this free space, I can continue to patch and maintain my system as I had originally planned - estimating a few hundred MB to 1.5GB per patch set.

The moral to the story is that even if you follow all of the best practices and recommendations, you can still be tripped up by old habits when you don't consider their consequences. And when you do, don't feel bad. Many best practices come from exercises just like this one.

Technocrati Tags:


I have been there, Bob. =-) Nothing goes away, so long as you're using snapshots (for LU or just ad-hoc). My coping mechanisms include:

o ludelete older BE's more aggressively, where possible.
I really don't need that BE from two updates ago, even if it looks cool to have a dozen ABE's on the list.

o Use lzjb compression under rpool/ROOT
Some folks don't use this in root, but I've found it to be helpful.
CPU usage is very reasonable, and I typically see 1.7x or 1.8x.
In some cases, you may actually improve performance via fewer IOPS.

o Keep patches in a centralized location (hello pca-proxy.cgi =-)

Things I used to do, but don't need any longer:

o Aggressively delete obsolete.Z backout files - not much help

o Keep all backout files somewhere other than /var/sadm/pkg
Now this one is a bit more controversial, but those JRE undo.Z files can be hundreds of megabytes.

o Don't keep backout files at all
If you're using LU, then did you really need them anyways? =-)

Thx... -c

Posted by Craig S. Bell on January 13, 2012 at 12:49 PM CST #

/var/tmp really should be named /var/landfill, because stuff goes there and never leaves. Thanks for writing this, it's something that you really have to manage differently in a multiple-BE with ZFS world.

Posted by Dave Miner on January 14, 2012 at 01:49 AM CST #

Very useful post. I've been there too. With ZFS we need to retrain ourselves to think about snapshots very early on when a filesystem fills - especially when it isn't obvious what's eating up all of the space. Usually I list the snapshots and they remind me of those boot environments I've got laying around - often gathering cyber-dust.

Posted by guest on January 17, 2012 at 07:00 AM CST #

Post a Comment:
  • HTML Syntax: NOT allowed

Bob Netherton is a Principal Sales Consultant for the North American Commercial Hardware group, specializing in Solaris, Virtualization and Engineered Systems. Bob is also a contributing author of Solaris 10 Virtualization Essentials.

This blog will contain information about all three, but primarily focused on topics for Solaris system administrators.

Please follow me on Twitter Facebook or send me email


« April 2014