Getting Rid of Pesky Live Upgrade Boot Environments
By user12611829 on May 21, 2009
Oh where oh where did that file system go ?One thing you can do to stop Live Upgrade in its tracks is to remove a file system that it thinks another boot environment needs. This does fall into the category of user error, but you are more likely to run into this in a ZFS world where file systems can be created and destroyed with great ease. You will also run into a varient of this if you change your zone configurations without recreating your boot environment, but I'll save that for a later day.
Here is our simple test case:
- Create a ZFS file system.
- Create a new boot environment.
- Delete the ZFS file system.
- Watch Live Upgrade fail.
# zfs create arrakis/temp # lucreate -n test Checking GRUB menu... System has findroot enabled GRUB Analyzing system configuration. Comparing source boot environment <s10u7-baseline> file systems with the file system(s) you specified for the new boot environment. Determining which file systems should be in the new boot environment. Updating boot environment description database on all BEs. Updating system configuration files. Creating configuration for boot environment <test>. Source boot environment is <s10u7-baseline>. Creating boot environment <test>. Cloning file systems from boot environment <s10u7-baseline> to create boot environment <test>. Creating snapshot for <rpool/ROOT/s10u7-baseline> on <rpool/ROOT/s10u7-baseline@test>. Creating clone for <rpool/ROOT/s10u7-baseline@test> on <rpool/ROOT/test>. Setting canmount=noauto for </> in zone <global> on <rpool/ROOT/test>. Saving existing file </boot/grub/menu.lst> in top level dataset for BE <s10u6_baseline> as <mount-point>>//boot/grub/menu.lst.prev. Saving existing file </boot/grub/menu.lst> in top level dataset for BE <test> as <mount-point>//boot/grub/menu.lst.prev. Saving existing file </boot/grub/menu.lst> in top level dataset for BE <nv114> as <mount-point>//boot/grub/menu.lst.prev. Saving existing file </boot/grub/menu.lst> in top level dataset for BE <route66> as <mount-point>//boot/grub/menu.lst.prev. Saving existing file </boot/grub/menu.lst> in top level dataset for BE <nv95> as <mount-point>//boot/grub/menu.lst.prev. File </boot/grub/menu.lst> propagation successful Copied GRUB menu from PBE to ABE No entry for BE <test> in GRUB menu Population of boot environment <test> successful. Creation of boot environment <test> successful. # zfs destroy arrakis/temp # luupgrade -t -s /export/patches/10_x86_Recommended-2009-05-14 -O "-d" -n test System has findroot enabled GRUB No entry for BE <test> in GRUB menu Validating the contents of the media </export/patches/10_x86_Recommended-2009-05-14>. The media contains 143 software patches that can be added. All 143 patches will be added because you did not specify any specific patches to add. Mounting the BE <test>. ERROR: Read-only file system: cannot create mount point </.alt.tmp.b-59c.mnt/arrakis/temp> ERROR: failed to create mount point </.alt.tmp.b-59c.mnt/arrakis/temp> for file system </arrakis/temp> ERROR: unmounting partially mounted boot environment file systems ERROR: cannot mount boot environment by icf file </etc/lu/ICF.5> ERROR: Unable to mount ABE <test>: cannot complete lumk_iconf Adding patches to the BE <test>. Validating patches... Loading patches installed on the system... Cannot check name /a/var/sadm/pkg. Unmounting the BE <test>. The patch add to the BE <test> failed (with result code <1>).The proper Live Upgrade solution to this problem would be to destroy and recreate the boot environment, or just recreate the missing file system (I'm sure that most of you have figured the latter part out on your own). The rationale is that the alternate boot environment no longer matches the storage configuration of its source. This was fine in a UFS world, but perhaps a bit constraining when ZFS rules the landscape. What if you really wanted the file system to be gone forever.
With a little more understanding of the internals of Live Upgrade, we can fix this rather easily.
Important note: We are about to modify undocumented Live Upgrade configuration files. The formats, names, and contents are subject to change without notice and any errors made while doing this can render your Live Upgrade configuration unusable.
The file system configurations for each boot environment are kept in a set of Internal Configuration Files (ICF) in /etc/lu named ICF.n, where n is the boot environment number. From the error message above we see that /etc/lu/ICF.5 is the one that is causing the problem. Let's take a look.
# cat /etc/lu/ICF.5 test:-:/dev/dsk/c5d0s1:swap:4225095 test:-:/dev/zvol/dsk/rpool/swap:swap:8435712 test:/:rpool/ROOT/test:zfs:0 test:/archives:/dev/dsk/c1t0d0s2:ufs:327645675 test:/arrakis:arrakis:zfs:0 test:/arrakis/misc:arrakis/misc:zfs:0 test:/arrakis/misc2:arrakis/misc2:zfs:0 test:/arrakis/stuff:arrakis/stuff:zfs:0 test:/arrakis/temp:arrakis/temp:zfs:0 test:/audio:arrakis/audio:zfs:0 test:/backups:arrakis/backups:zfs:0 test:/export:arrakis/export:zfs:0 test:/export/home:arrakis/home:zfs:0 test:/export/iso:arrakis/iso:zfs:0 test:/export/linux:arrakis/linux:zfs:0 test:/rpool:rpool:zfs:0 test:/rpool/ROOT:rpool/ROOT:zfs:0 test:/usr/local:arrakis/local:zfs:0 test:/vbox:arrakis/vbox:zfs:0 test:/vbox/fedora8:arrakis/vbox/fedora8:zfs:0 test:/video:arrakis/video:zfs:0 test:/workshop:arrakis/workshop:zfs:0 test:/xp:/dev/dsk/c2d0s7:ufs:70396830 test:/xvm:arrakis/xvm:zfs:0 test:/xvm/fedora8:arrakis/xvm/fedora8:zfs:0 test:/xvm/newfs:arrakis/xvm/newfs:zfs:0 test:/xvm/nv113:arrakis/xvm/nv113:zfs:0 test:/xvm/opensolaris:arrakis/xvm/opensolaris:zfs:0 test:/xvm/s10u5:arrakis/xvm/s10u5:zfs:0 test:/xvm/ub710:arrakis/xvm/ub710:zfs:0The first step is to clean up the mess left by the failing luupgrade attempt. At the very least we will need to unmount the alternate boot environment root. It is also very likely that we will have to unmount a few temporary directories, such as /tmp and /var/run. Since this is ZFS we will also have to remove the directories created when these file systems were mounted.
# df -k | tail -3 rpool/ROOT/test 49545216 6879597 7546183 48% /.alt.tmp.b-Fx.mnt swap 4695136 0 4695136 0% /a/var/run swap 4695136 0 4695136 0% /a/tmp # luumount test # umount /a/var/run # umount /a/tmp # rmdir /a/var/run /a/var /a/tmpNext we need to remove the missing file system entry from the current copy of the ICF file. Use whatever method you prefer (vi, perl, grep). Once we have corrected our local copy of the ICF file we must propagate it to the alternate boot environment we are about to patch. You can skip the propagation if you are going to delete the boot environment without doing any other maintenance activities. The normal Live Upgrade operations will take care of propagating the ICF files to the other boot environments, so we should not have to worry about them at this time.
# mv /etc/lu/ICF.5 /tmp/ICF.5 # grep -v arrakis/temp /tmp/ICF.5 > /etc/lu/ICF.5 # cp /etc/lu/ICF.5 `lumount test`/etc/lu/ICF.5 # luumount testAt this point we should be good to go. Let's try the luupgrade again.
# luupgrade -t -n test -O "-d" -s /export/patches/10_x86_Recommended-2009-05-14 System has findroot enabled GRUB No entry for BENow that the alternate boot environment has been patched, we can activate it at our convenience.
in GRUB menu Validating the contents of the media . The media contains 143 software patches that can be added. All 143 patches will be added because you did not specify any specific patches to add. Mounting the BE <test>. Adding patches to the BE <test>. Validating patches... Loading patches installed on the system... Done! Loading patches requested to install. Approved patches will be installed in this order: 118668-19 118669-19 119214-19 123591-10 123896-10 125556-03 139100-02 Checking installed patches... Verifying sufficient filesystem capacity (dry run method)... Installing patch packages... Patch 118668-19 has been successfully installed. Patch 118669-19 has been successfully installed. Patch 119214-19 has been successfully installed. Patch 123591-10 has been successfully installed. Patch 123896-10 has been successfully installed. Patch 125556-03 has been successfully installed. Patch 139100-02 has been successfully installed. Unmounting the BE <test>. The patch add to the BE <test> completed.
I keep deleting and deleting and still can't get rid of those pesky boot environmentsThis is an interesting corner case where the Live Upgrade configuration files get so scrambled that even simple tasks like deleting a boot environment are not possible. Every time I have gotten myself into this situation I can trace it back to some ill advised shortcut that seemed harmless at the time, but I won't rule out bugs and environment as possible causes.
Here is our simple test case: turn our boot environment from the previous example into a zombie - something that is neither alive nor dead but just takes up space and causes a mild annoyance.
Important note: Don't try this on a production system. This is for demonstration purposes only.
# dd if=/dev/random of=/etc/lu/ICF.5 bs=2048 count=2 0+2 records in 0+2 records out # ludelete -f test System has findroot enabled GRUB No entry for BE <test> in GRUB menu ERROR: The mount point </.alt.tmp.b-fxc.mnt> is not a valid ABE mount point (no /etc directory found). ERROR: The mount point </.alt.tmp.b-fxc.mnt> provided by the <-m> option is not a valid ABE mount point. Usage: lurootspec [-l error_log] [-o outfile] [-m mntpt] ERROR: Cannot determine root specification for BE <test>. ERROR: boot environment <test> is not mounted Unable to delete boot environment.Our first task is to make sure that any partially mounted boot environment is cleaned up. A df should help us here.
# df -k | tail -5 arrakis/xvm/opensolaris 350945280 19 17448377 1% /xvm/opensolaris arrakis/xvm/s10u5 350945280 19 17448377 1% /xvm/s10u5 arrakis/xvm/ub710 350945280 19 17448377 1% /xvm/ub710 swap 4549680 0 4549680 0% /.alt.tmp.b-fxc.mnt/var/run swap 4549680 0 4549680 0% /.alt.tmp.b-fxc.mnt/tmp # umount /.alt.tmp.b-fxc.mnt/tmp # umount /.alt.tmp.b-fxc.mnt/var/runOrdinarily you would use lufslist(1M) to try to determine which file systems are in use by the boot environment you are trying to delete. In this worst case scenario that is not possible. A bit of forensic investigation and a bit more courage will help us figure this out.
The first place we will look is /etc/lutab. This is the configuration file that lists all boot environments known to Live Upgrade. There is a man page for this in section 4, so it is somewhat of a public interface but please take note of the warning
The lutab file must not be edited by hand. Any user modifi- cation to this file will result in the incorrect operation of the Live Upgrade feature.This is very good advice and failing to follow it has led some some of my most spectacular Live Upgrade meltdowns. But in this case Live Upgrade is already broken and it may be possible to undo the damage and restore proper operation. So let's see what we can find out.
# cat /etc/lutab # DO NOT EDIT THIS FILE BY HAND. This file is not a public interface. # The format and contents of this file are subject to change. # Any user modification to this file may result in the incorrect # operation of Live Upgrade. 3:s10u5_baseline:C:0 3:/:/dev/dsk/c2d0s0:1 3:boot-device:/dev/dsk/c2d0s0:2 1:s10u5_lu:C:0 1:/:/dev/dsk/c5d0s0:1 1:boot-device:/dev/dsk/c5d0s0:2 2:s10u6_ufs:C:0 2:/:/dev/dsk/c4d0s0:1 2:boot-device:/dev/dsk/c4d0s0:2 4:s10u6_baseline:C:0 4:/:rpool/ROOT/s10u6_baseline:1 4:boot-device:/dev/dsk/c4d0s3:2 10:route66:C:0 10:/:rpool/ROOT/route66:1 10:boot-device:/dev/dsk/c4d0s3:2 11:nv95:C:0 11:/:rpool/ROOT/nv95:1 11:boot-device:/dev/dsk/c4d0s3:2 6:s10u7-baseline:C:0 6:/:rpool/ROOT/s10u7-baseline:1 6:boot-device:/dev/dsk/c4d0s3:2 7:nv114:C:0 7:/:rpool/ROOT/nv114:1 7:boot-device:/dev/dsk/c4d0s3:2 5:test:C:0 5:/:rpool/ROOT/test:1 5:boot-device:/dev/dsk/c4d0s3:2We can see that the boot environment named test is (still) BE #5 and has it's root file system at rpool/ROOT/test. This is the default dataset name and indicates that the boot environment has not been renamed. Consider the following example for a more complicated configuration.
# lucreate -n scooby # lufslist scooby | grep ROOT rpool/ROOT/scooby zfs 241152 / - rpool/ROOT zfs 39284664832 /rpool/ROOT - # lurename -e scooby -n doo # lufslist doo | grep ROOT rpool/ROOT/scooby zfs 241152 / - rpool/ROOT zfs 39284664832 /rpool/ROOT -The point is that we have to trust the contents of /etc/lutab but it does not hurt to do a bit of sanity checking before we start deleting ZFS datasets. To remove boot environment test from the view of Live Upgrade, delete the three lines in /etc/lutab starting with 5 (in this example). We should also remove it's Internal Configuration File (ICF) /etc/lu/ICF.5
# mv -f /etc/lutab /etc/lutab.old # grep -v \^5: /etc/lutab.old > /etc/lutab # rm -f /etc/lu/ICF.5 # lustatus Boot Environment Is Active Active Can Copy Name Complete Now On Reboot Delete Status -------------------------- -------- ------ --------- ------ ---------- s10u5_baseline yes no no yes - s10u5_lu yes no no yes - s10u6_ufs yes no no yes - s10u6_baseline yes no no yes - route66 yes no no yes - nv95 yes yes yes no - s10u7-baseline yes no no yes - nv114 yes no no yes -If the boot environment being deleted is in UFS then we are done. Well, not exactly - but pretty close. We still need to propagate the updated configuration files to the remaining boot environments. This will be done during the next live upgrade operation (lucreate, lumake, ludelete, luactivate) and I would recommend that you let Live Upgrade handle this part. The exception to this will be if you boot directly into another boot environment without activating it first. This isn't a recommended practice and has been the source of some of my most frustrating mistakes.
If the exorcised boot environment is in ZFS then we still have a little bit of work to do. We need to delete the old root datasets and any snapshots that they may have been cloned from. In our example the root dataset was rpool/ROOT/test. We need to look for any children as well as the originating snapshot, if present.
# zfs list -r rpool/ROOT/test NAME USED AVAIL REFER MOUNTPOINT rpool/ROOT/test 234K 6.47G 8.79G /.alt.test rpool/ROOT/test/var 18K 6.47G 18K /.alt.test/var # zfs get -r origin rpool/ROOT/test NAME PROPERTY VALUE SOURCE rpool/ROOT/test origin rpool/ROOT/nv95@test - rpool/ROOT/test/var origin rpool/ROOT/nv95/var@test # zfs destroy rpool/ROOT/test/var # zfs destroy rpool/ROOT/nv95/var@test # zfs destroy rpool/ROOT/test # zfs destroy rpool/ROOT/nv95@testImportant note:luactivate will promote the newly activated root dataset so that snapshots used to create alternate boot environments should be easy to delete. If you are switching between boot environments without activating them first (which I have already warned you about doing), you may have to manually promote a different dataset so that the snapshots can be deleted.
To BE or not to BE - how about no BE ?You may find yourself in a situation where you have things so scrambled up that you want to start all over again. We can use what we have just learned to unwind Live Upgrade and start from a clean configuration. Specifically we want to delete /etc/lutab, the ICF and related files, all of the temporary files in /etc/lu/tmp and a few files that hold environment variables for some of the lu scripts. And if using ZFS we will also have to delete any datasets and snapshots that are no longer needed.
# rm -f /etc/lutab # rm -f /etc/lu/ICF.* /etc/lu/INODE.* /etc/lu/vtoc.* # rm -f /etc/lu/.??* # rm -f /etc/lu/tmp/* # lustatus ERROR: No boot environments are configured on this system ERROR: cannot determine list of all boot environment names # lucreate -c scooby -n doo Checking GRUB menu... Analyzing system configuration. No name for current boot environment. Current boot environment is named <scooby>. Creating initial configuration for primary boot environment <scooby>. The device </dev/dsk/c4d0s3> is not a root device for any boot environment; cannot get BE ID. PBE configuration successful: PBE name <scooby> PBE Boot Device </dev/dsk/c4d0s3>. Comparing source boot environment <scooby> file systems with the file system(s) you specified for the new boot environment. Determining which file systems should be in the new boot environment. Updating boot environment description database on all BEs. Updating system configuration files. Creating configuration for boot environment <doo>. Source boot environment is <scooby>. Creating boot environment <doo>. Cloning file systems from boot environment <scooby> to create boot environment <doo>. Creating snapshot for <rpool/ROOT/scooby> on <rpool/ROOT/scooby@doo>. Creating clone for <rpool/ROOT/scooby@doo> on <rpool/ROOT/doo>. Setting canmount=noauto for </> in zone <global> on <rpool/ROOT/doo>. Saving existing file </boot/grub/menu.lst> in top level dataset for BE <doo> as <mount-point>//boot/grub/menu.lst.prev. File </boot/grub/menu.lst> propagation successful Copied GRUB menu from PBE to ABE No entry for BE <doo> in GRUB menu Population of boot environment <doo> successful. Creation of boot environment <doo> successful. # luactivate doo System has findroot enabled GRUB Generating boot-sign, partition and slice information for PBEPretty cool, eh ?
File deletion successful File deletion successful File deletion successful Activation of boot environment successful. # lustatus Boot Environment Is Active Active Can Copy Name Complete Now On Reboot Delete Status -------------------------- -------- ------ --------- ------ ---------- scooby yes yes no no - doo yes no yes no -
There are still a few more interesting corner cases, but we will deal with those in the one of the next articles. In the mean time, please remember to
- Check Infodoc 206844 for Live Upgrade patch requirements
- Keep your patching and package utilities updated
- Use luactivate to switch between boot environments