Getting Rid of Pesky Live Upgrade Boot Environments

As we discussed earlier, Live Upgrade can solve most of the problems associated with patching and upgrading your Solaris system. I'm not quite ready to post the next installment in the LU series quite yet, but from some of the comments and email I have received, there are two problems that I would like to help you work around.

Oh where oh where did that file system go ?

One thing you can do to stop Live Upgrade in its tracks is to remove a file system that it thinks another boot environment needs. This does fall into the category of user error, but you are more likely to run into this in a ZFS world where file systems can be created and destroyed with great ease. You will also run into a varient of this if you change your zone configurations without recreating your boot environment, but I'll save that for a later day.

Here is our simple test case:
  1. Create a ZFS file system.
  2. Create a new boot environment.
  3. Delete the ZFS file system.
  4. Watch Live Upgrade fail.

# zfs create arrakis/temp

# lucreate -n test
Checking GRUB menu...
System has findroot enabled GRUB
Analyzing system configuration.
Comparing source boot environment <s10u7-baseline> file systems with the
file system(s) you specified for the new boot environment. Determining
which file systems should be in the new boot environment.
Updating boot environment description database on all BEs.
Updating system configuration files.
Creating configuration for boot environment <test>.
Source boot environment is <s10u7-baseline>.
Creating boot environment <test>.
Cloning file systems from boot environment <s10u7-baseline> to create boot environment <test>.
Creating snapshot for <rpool/ROOT/s10u7-baseline> on <rpool/ROOT/s10u7-baseline@test>.
Creating clone for <rpool/ROOT/s10u7-baseline@test> on <rpool/ROOT/test>.
Setting canmount=noauto for </> in zone <global> on <rpool/ROOT/test>.
Saving existing file </boot/grub/menu.lst> in top level dataset for BE <s10u6_baseline> as <mount-point>>//boot/grub/menu.lst.prev.
Saving existing file </boot/grub/menu.lst> in top level dataset for BE <test> as <mount-point>//boot/grub/menu.lst.prev.
Saving existing file </boot/grub/menu.lst> in top level dataset for BE <nv114> as <mount-point>//boot/grub/menu.lst.prev.
Saving existing file </boot/grub/menu.lst> in top level dataset for BE <route66> as <mount-point>//boot/grub/menu.lst.prev.
Saving existing file </boot/grub/menu.lst> in top level dataset for BE <nv95> as <mount-point>//boot/grub/menu.lst.prev.
File </boot/grub/menu.lst> propagation successful
Copied GRUB menu from PBE to ABE
No entry for BE <test> in GRUB menu
Population of boot environment <test> successful.
Creation of boot environment <test> successful.

# zfs destroy arrakis/temp

# luupgrade -t -s /export/patches/10_x86_Recommended-2009-05-14  -O "-d" -n test
System has findroot enabled GRUB
No entry for BE <test> in GRUB menu
Validating the contents of the media </export/patches/10_x86_Recommended-2009-05-14>.
The media contains 143 software patches that can be added.
All 143 patches will be added because you did not specify any specific patches to add.
Mounting the BE <test>.
ERROR: Read-only file system: cannot create mount point </.alt.tmp.b-59c.mnt/arrakis/temp>
ERROR: failed to create mount point </.alt.tmp.b-59c.mnt/arrakis/temp> for file system </arrakis/temp>
ERROR: unmounting partially mounted boot environment file systems
ERROR: cannot mount boot environment by icf file </etc/lu/ICF.5>
ERROR: Unable to mount ABE <test>: cannot complete lumk_iconf
Adding patches to the BE <test>.
Validating patches...

Loading patches installed on the system...

Cannot check name /a/var/sadm/pkg.
Unmounting the BE <test>.
The patch add to the BE <test> failed (with result code <1>).
The proper Live Upgrade solution to this problem would be to destroy and recreate the boot environment, or just recreate the missing file system (I'm sure that most of you have figured the latter part out on your own). The rationale is that the alternate boot environment no longer matches the storage configuration of its source. This was fine in a UFS world, but perhaps a bit constraining when ZFS rules the landscape. What if you really wanted the file system to be gone forever.

With a little more understanding of the internals of Live Upgrade, we can fix this rather easily.

Important note: We are about to modify undocumented Live Upgrade configuration files. The formats, names, and contents are subject to change without notice and any errors made while doing this can render your Live Upgrade configuration unusable.

The file system configurations for each boot environment are kept in a set of Internal Configuration Files (ICF) in /etc/lu named ICF.n, where n is the boot environment number. From the error message above we see that /etc/lu/ICF.5 is the one that is causing the problem. Let's take a look.
# cat /etc/lu/ICF.5
test:-:/dev/dsk/c5d0s1:swap:4225095
test:-:/dev/zvol/dsk/rpool/swap:swap:8435712
test:/:rpool/ROOT/test:zfs:0
test:/archives:/dev/dsk/c1t0d0s2:ufs:327645675
test:/arrakis:arrakis:zfs:0
test:/arrakis/misc:arrakis/misc:zfs:0
test:/arrakis/misc2:arrakis/misc2:zfs:0
test:/arrakis/stuff:arrakis/stuff:zfs:0

test:/arrakis/temp:arrakis/temp:zfs:0

test:/audio:arrakis/audio:zfs:0
test:/backups:arrakis/backups:zfs:0
test:/export:arrakis/export:zfs:0
test:/export/home:arrakis/home:zfs:0
test:/export/iso:arrakis/iso:zfs:0
test:/export/linux:arrakis/linux:zfs:0
test:/rpool:rpool:zfs:0
test:/rpool/ROOT:rpool/ROOT:zfs:0
test:/usr/local:arrakis/local:zfs:0
test:/vbox:arrakis/vbox:zfs:0
test:/vbox/fedora8:arrakis/vbox/fedora8:zfs:0
test:/video:arrakis/video:zfs:0
test:/workshop:arrakis/workshop:zfs:0
test:/xp:/dev/dsk/c2d0s7:ufs:70396830
test:/xvm:arrakis/xvm:zfs:0
test:/xvm/fedora8:arrakis/xvm/fedora8:zfs:0
test:/xvm/newfs:arrakis/xvm/newfs:zfs:0
test:/xvm/nv113:arrakis/xvm/nv113:zfs:0
test:/xvm/opensolaris:arrakis/xvm/opensolaris:zfs:0
test:/xvm/s10u5:arrakis/xvm/s10u5:zfs:0
test:/xvm/ub710:arrakis/xvm/ub710:zfs:0
The first step is to clean up the mess left by the failing luupgrade attempt. At the very least we will need to unmount the alternate boot environment root. It is also very likely that we will have to unmount a few temporary directories, such as /tmp and /var/run. Since this is ZFS we will also have to remove the directories created when these file systems were mounted.
# df -k | tail -3
rpool/ROOT/test      49545216 6879597 7546183    48%    /.alt.tmp.b-Fx.mnt
swap                 4695136       0 4695136     0%    /a/var/run
swap                 4695136       0 4695136     0%    /a/tmp

# luumount test
# umount /a/var/run
# umount /a/tmp
# rmdir /a/var/run /a/var /a/tmp

Next we need to remove the missing file system entry from the current copy of the ICF file. Use whatever method you prefer (vi, perl, grep). Once we have corrected our local copy of the ICF file we must propagate it to the alternate boot environment we are about to patch. You can skip the propagation if you are going to delete the boot environment without doing any other maintenance activities. The normal Live Upgrade operations will take care of propagating the ICF files to the other boot environments, so we should not have to worry about them at this time.
# mv /etc/lu/ICF.5 /tmp/ICF.5
# grep -v arrakis/temp /tmp/ICF.5 > /etc/lu/ICF.5 
# cp /etc/lu/ICF.5 `lumount test`/etc/lu/ICF.5
# luumount test
At this point we should be good to go. Let's try the luupgrade again.
# luupgrade -t -n test -O "-d" -s /export/patches/10_x86_Recommended-2009-05-14
System has findroot enabled GRUB
No entry for BE  in GRUB menu
Validating the contents of the media .
The media contains 143 software patches that can be added.
All 143 patches will be added because you did not specify any specific patches to add.
Mounting the BE <test>.
Adding patches to the BE <test>.
Validating patches...

Loading patches installed on the system...

Done!

Loading patches requested to install.

Approved patches will be installed in this order:

118668-19 118669-19 119214-19 123591-10 123896-10 125556-03 139100-02


Checking installed patches...
Verifying sufficient filesystem capacity (dry run method)...
Installing patch packages...

Patch 118668-19 has been successfully installed.
Patch 118669-19 has been successfully installed.
Patch 119214-19 has been successfully installed.
Patch 123591-10 has been successfully installed.
Patch 123896-10 has been successfully installed.
Patch 125556-03 has been successfully installed.
Patch 139100-02 has been successfully installed.

Unmounting the BE <test>.
The patch add to the BE <test> completed.
Now that the alternate boot environment has been patched, we can activate it at our convenience.

I keep deleting and deleting and still can't get rid of those pesky boot environments

This is an interesting corner case where the Live Upgrade configuration files get so scrambled that even simple tasks like deleting a boot environment are not possible. Every time I have gotten myself into this situation I can trace it back to some ill advised shortcut that seemed harmless at the time, but I won't rule out bugs and environment as possible causes.

Here is our simple test case: turn our boot environment from the previous example into a zombie - something that is neither alive nor dead but just takes up space and causes a mild annoyance.

Important note: Don't try this on a production system. This is for demonstration purposes only.
# dd if=/dev/random of=/etc/lu/ICF.5 bs=2048 count=2
0+2 records in
0+2 records out

# ludelete -f test
System has findroot enabled GRUB
No entry for BE <test> in GRUB menu
ERROR: The mount point </.alt.tmp.b-fxc.mnt> is not a valid ABE mount point (no /etc directory found).
ERROR: The mount point </.alt.tmp.b-fxc.mnt> provided by the <-m> option is not a valid ABE mount point.
Usage: lurootspec [-l error_log] [-o outfile] [-m mntpt]
ERROR: Cannot determine root specification for BE <test>.
ERROR: boot environment <test> is not mounted
Unable to delete boot environment.
Our first task is to make sure that any partially mounted boot environment is cleaned up. A df should help us here.
# df -k | tail -5
arrakis/xvm/opensolaris 350945280      19 17448377     1%    /xvm/opensolaris
arrakis/xvm/s10u5    350945280      19 17448377     1%    /xvm/s10u5
arrakis/xvm/ub710    350945280      19 17448377     1%    /xvm/ub710
swap                 4549680       0 4549680     0%    /.alt.tmp.b-fxc.mnt/var/run
swap                 4549680       0 4549680     0%    /.alt.tmp.b-fxc.mnt/tmp


# umount /.alt.tmp.b-fxc.mnt/tmp
# umount /.alt.tmp.b-fxc.mnt/var/run
Ordinarily you would use lufslist(1M) to try to determine which file systems are in use by the boot environment you are trying to delete. In this worst case scenario that is not possible. A bit of forensic investigation and a bit more courage will help us figure this out.

The first place we will look is /etc/lutab. This is the configuration file that lists all boot environments known to Live Upgrade. There is a man page for this in section 4, so it is somewhat of a public interface but please take note of the warning
 
        The lutab file must not be edited by hand. Any user  modifi-
        cation  to  this file will result in the incorrect operation
        of the Live Upgrade feature.
This is very good advice and failing to follow it has led some some of my most spectacular Live Upgrade meltdowns. But in this case Live Upgrade is already broken and it may be possible to undo the damage and restore proper operation. So let's see what we can find out.
# cat /etc/lutab
# DO NOT EDIT THIS FILE BY HAND. This file is not a public interface.
# The format and contents of this file are subject to change.
# Any user modification to this file may result in the incorrect
# operation of Live Upgrade.
3:s10u5_baseline:C:0
3:/:/dev/dsk/c2d0s0:1
3:boot-device:/dev/dsk/c2d0s0:2
1:s10u5_lu:C:0
1:/:/dev/dsk/c5d0s0:1
1:boot-device:/dev/dsk/c5d0s0:2
2:s10u6_ufs:C:0
2:/:/dev/dsk/c4d0s0:1
2:boot-device:/dev/dsk/c4d0s0:2
4:s10u6_baseline:C:0
4:/:rpool/ROOT/s10u6_baseline:1
4:boot-device:/dev/dsk/c4d0s3:2
10:route66:C:0
10:/:rpool/ROOT/route66:1
10:boot-device:/dev/dsk/c4d0s3:2
11:nv95:C:0
11:/:rpool/ROOT/nv95:1
11:boot-device:/dev/dsk/c4d0s3:2
6:s10u7-baseline:C:0
6:/:rpool/ROOT/s10u7-baseline:1
6:boot-device:/dev/dsk/c4d0s3:2
7:nv114:C:0
7:/:rpool/ROOT/nv114:1
7:boot-device:/dev/dsk/c4d0s3:2
5:test:C:0
5:/:rpool/ROOT/test:1
5:boot-device:/dev/dsk/c4d0s3:2
We can see that the boot environment named test is (still) BE #5 and has it's root file system at rpool/ROOT/test. This is the default dataset name and indicates that the boot environment has not been renamed. Consider the following example for a more complicated configuration.
# lucreate -n scooby
# lufslist scooby | grep ROOT
rpool/ROOT/scooby       zfs            241152 /                   -
rpool/ROOT              zfs       39284664832 /rpool/ROOT         -

# lurename -e scooby -n doo
# lufslist doo | grep ROOT
rpool/ROOT/scooby       zfs            241152 /                   -
rpool/ROOT              zfs       39284664832 /rpool/ROOT         -
The point is that we have to trust the contents of /etc/lutab but it does not hurt to do a bit of sanity checking before we start deleting ZFS datasets. To remove boot environment test from the view of Live Upgrade, delete the three lines in /etc/lutab starting with 5 (in this example). We should also remove it's Internal Configuration File (ICF) /etc/lu/ICF.5
# mv -f /etc/lutab /etc/lutab.old
# grep -v \^5: /etc/lutab.old > /etc/lutab
# rm -f /etc/lu/ICF.5

# lustatus
Boot Environment           Is       Active Active    Can    Copy      
Name                       Complete Now    On Reboot Delete Status    
-------------------------- -------- ------ --------- ------ ----------
s10u5_baseline             yes      no     no        yes    -         
s10u5_lu                   yes      no     no        yes    -         
s10u6_ufs                  yes      no     no        yes    -         
s10u6_baseline             yes      no     no        yes    -         
route66                    yes      no     no        yes    -         
nv95                       yes      yes    yes       no     -         
s10u7-baseline             yes      no     no        yes    -         
nv114                      yes      no     no        yes    -         
If the boot environment being deleted is in UFS then we are done. Well, not exactly - but pretty close. We still need to propagate the updated configuration files to the remaining boot environments. This will be done during the next live upgrade operation (lucreate, lumake, ludelete, luactivate) and I would recommend that you let Live Upgrade handle this part. The exception to this will be if you boot directly into another boot environment without activating it first. This isn't a recommended practice and has been the source of some of my most frustrating mistakes.

If the exorcised boot environment is in ZFS then we still have a little bit of work to do. We need to delete the old root datasets and any snapshots that they may have been cloned from. In our example the root dataset was rpool/ROOT/test. We need to look for any children as well as the originating snapshot, if present.
# zfs list -r rpool/ROOT/test
NAME                  USED  AVAIL  REFER  MOUNTPOINT
rpool/ROOT/test       234K  6.47G  8.79G  /.alt.test
rpool/ROOT/test/var    18K  6.47G    18K  /.alt.test/var

# zfs get -r origin rpool/ROOT/test
NAME             PROPERTY  VALUE                 SOURCE
rpool/ROOT/test  origin    rpool/ROOT/nv95@test  -
rpool/ROOT/test/var  origin    rpool/ROOT/nv95/var@test    
       
# zfs destroy rpool/ROOT/test/var
# zfs destroy rpool/ROOT/nv95/var@test
# zfs destroy rpool/ROOT/test
# zfs destroy rpool/ROOT/nv95@test
Important note:luactivate will promote the newly activated root dataset so that snapshots used to create alternate boot environments should be easy to delete. If you are switching between boot environments without activating them first (which I have already warned you about doing), you may have to manually promote a different dataset so that the snapshots can be deleted.

To BE or not to BE - how about no BE ?

You may find yourself in a situation where you have things so scrambled up that you want to start all over again. We can use what we have just learned to unwind Live Upgrade and start from a clean configuration. Specifically we want to delete /etc/lutab, the ICF and related files, all of the temporary files in /etc/lu/tmp and a few files that hold environment variables for some of the lu scripts. And if using ZFS we will also have to delete any datasets and snapshots that are no longer needed.
 
# rm -f /etc/lutab 
# rm -f /etc/lu/ICF.* /etc/lu/INODE.* /etc/lu/vtoc.*
# rm -f /etc/lu/.??*
# rm -f /etc/lu/tmp/* 

# lustatus
ERROR: No boot environments are configured on this system
ERROR: cannot determine list of all boot environment names

# lucreate -c scooby -n doo
Checking GRUB menu...
Analyzing system configuration.
No name for current boot environment.
Current boot environment is named <scooby>.
Creating initial configuration for primary boot environment <scooby>.
The device </dev/dsk/c4d0s3> is not a root device for any boot environment; cannot get BE ID.
PBE configuration successful: PBE name <scooby> PBE Boot Device </dev/dsk/c4d0s3>.
Comparing source boot environment <scooby> file systems with the file 
system(s) you specified for the new boot environment. Determining which 
file systems should be in the new boot environment.
Updating boot environment description database on all BEs.
Updating system configuration files.
Creating configuration for boot environment <doo>.
Source boot environment is <scooby>.
Creating boot environment <doo>.
Cloning file systems from boot environment <scooby> to create boot environment <doo>.
Creating snapshot for <rpool/ROOT/scooby> on <rpool/ROOT/scooby@doo>.
Creating clone for <rpool/ROOT/scooby@doo> on <rpool/ROOT/doo>.
Setting canmount=noauto for </> in zone <global> on <rpool/ROOT/doo>.
Saving existing file </boot/grub/menu.lst> in top level dataset for BE <doo> as <mount-point>//boot/grub/menu.lst.prev.
File </boot/grub/menu.lst> propagation successful
Copied GRUB menu from PBE to ABE
No entry for BE <doo> in GRUB menu
Population of boot environment <doo> successful.
Creation of boot environment <doo> successful.

# luactivate doo
System has findroot enabled GRUB
Generating boot-sign, partition and slice information for PBE 

File  deletion successful
File  deletion successful
File  deletion successful
Activation of boot environment  successful.

# lustatus
Boot Environment           Is       Active Active    Can    Copy      
Name                       Complete Now    On Reboot Delete Status    
-------------------------- -------- ------ --------- ------ ----------
scooby                     yes      yes    no        no     -         
doo                        yes      no     yes       no     -        
Pretty cool, eh ?

There are still a few more interesting corner cases, but we will deal with those in the one of the next articles. In the mean time, please remember to
  • Check Infodoc 206844 for Live Upgrade patch requirements
  • Keep your patching and package utilities updated
  • Use luactivate to switch between boot environments


Technocrati Tags: <script type="text/javascript"> var sc_project=1193495; var sc_invisible=1; var sc_security="a46f6831"; </script> <script type="text/javascript" src="http://www.statcounter.com/counter/counter.js"></script>
Comments:

Bob - great info. Got me out of a jam where I had zfs-destroy'ed some file systems I was cleaning up which made ludelete unhappy. I guess the rule of thumb is:
Do ludelete's before deleting any zfs file systems.

Else, do the lu clean up you outline above. Thoughts?

Thanks!
Mark

Posted by Mark on May 23, 2009 at 02:12 PM CDT #

Thanks Mark.

The rule of thumb is to ludelete any boot environments that depend on something you are about to delete (or change). I think that's a lot easier in the old UFS model, but with ZFS things can happen much more quickly and you can get a few more oops scenarios - which have happened to me more often than I want to think about. Usually in room full of customers :-)

If you don't know what is in play, lufslist(1M) is your friend, or as you now know you can troll through /etc/lu/ICF.\*. lufslist is more accurate since it uses the ICF copy local to the alternate boot environment.

You are right, ludelete before zfs deletes, and if you forget you now know how to fix things after the fact.

Posted by Robert Netherton on May 24, 2009 at 04:06 AM CDT #

Bob, thanks for posting this! I changed my filesystem layout with ZFS a little bit (replaced a directory with a filesystem) and subsequently discovered that I had made a total hash of my Live Upgrade configuration by doing so; I couldn't mount the old BEs or fix them. Blowing away the LU files and starting over turned out to be the answer, and it worked like a charm.

Posted by Clayton Wheeler on June 01, 2009 at 09:08 AM CDT #

Bob, you are awesome. Thanks a million for posting this! Really saved my ass when I had to kill -9 an lucreate task due to a miscalculation of the disk space that would be taken up. Had to do some cleanup afterwards and your tips set things straight again. Thanks so much!

Posted by Alfred Fazio on July 03, 2009 at 03:10 PM CDT #

Do you know how to delete ABE:
bash-3.00# lustatus
Boot Environment Is Active Active Can Copy
Name Complete Now On Reboot Delete Status
-------------------------- -------- ------ --------- ------ ----------
sol106 yes yes yes no -
sol709 no no no no ACTIVE
bash-3.00# ludelete sol709
ERROR: The mount point </.alt.tmp.b-pK.mnt> is not a valid ABE mount point (no /etc directory found).
ERROR: The mount point </.alt.tmp.b-pK.mnt> provided by the <-m> option is not a valid ABE mount point.
Usage: lurootspec [-l error_log] [-o outfile] [-m mntpt]
ERROR: Cannot determine root specification for BE <sol709>.
Unable to delete boot environment.

Posted by Juan Carrascosa on July 08, 2009 at 07:59 PM CDT #

Great post - many thanks. I always wondered why blowing away /etc/lutab was never enough.

Posted by Richard on November 13, 2009 at 09:06 PM CST #

Thanks Richard. Much appreciated.

Posted by Bob Netherton on November 16, 2009 at 12:38 AM CST #

Thank you. The article helped me a lot.

Posted by Subhashith on February 01, 2011 at 07:27 PM CST #

Bob,

Thanks for this. We had a development machine which we had completely confused as far as Live Upgrade goes.

This allowed me to clean it up and start again.

many thanks for taking the time to post this infomation

Joy

Posted by Joy Young on March 23, 2011 at 12:45 AM CDT #

Hi Bob, just some polish for the minor typos or copy/paste errors... First section: zfs destroy arrakis/temp (not "test") Last section: Remove the backslashes in the local "rm" commands :-)

Posted by guest on June 05, 2011 at 10:21 PM CDT #

Great stuff. Just used it to clean up my BE mess.

Posted by guest on June 09, 2011 at 10:19 AM CDT #

Thanks for the comments, Henry. I've looked at this blog a gazillion times, through the writing at editing, and I missed that zfs destroy errror.

I'm going to blame the extraneous backspaces on the blogs.sun.com to blogs.oracle.com conversion :-)

Thanks again for pointing these errors out. They have all been fixed!

Posted by Bob Netherton on June 28, 2011 at 04:47 AM CDT #

Good document which is really helpful

Posted by krishna kumar on October 05, 2011 at 08:53 PM CDT #

Have you ever run into issues when you try to destory a dataset, the OS throws an error saying cannot destory 'XXXXXX': dataset already exits?

Posted by guest on July 31, 2012 at 04:40 PM CDT #

Post a Comment:
  • HTML Syntax: NOT allowed
About

Bob Netherton is a Principal Sales Consultant for the North American Commercial Hardware group, specializing in Solaris, Virtualization and Engineered Systems. Bob is also a contributing author of Solaris 10 Virtualization Essentials.

This blog will contain information about all three, but primarily focused on topics for Solaris system administrators.

Please follow me on Twitter Facebook or send me email

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today