Thursday May 21, 2009

Getting Rid of Pesky Live Upgrade Boot Environments

As we discussed earlier, Live Upgrade can solve most of the problems associated with patching and upgrading your Solaris system. I'm not quite ready to post the next installment in the LU series quite yet, but from some of the comments and email I have received, there are two problems that I would like to help you work around.

Oh where oh where did that file system go ?

One thing you can do to stop Live Upgrade in its tracks is to remove a file system that it thinks another boot environment needs. This does fall into the category of user error, but you are more likely to run into this in a ZFS world where file systems can be created and destroyed with great ease. You will also run into a varient of this if you change your zone configurations without recreating your boot environment, but I'll save that for a later day.

Here is our simple test case:
  1. Create a ZFS file system.
  2. Create a new boot environment.
  3. Delete the ZFS file system.
  4. Watch Live Upgrade fail.

# zfs create arrakis/temp

# lucreate -n test
Checking GRUB menu...
System has findroot enabled GRUB
Analyzing system configuration.
Comparing source boot environment <s10u7-baseline> file systems with the
file system(s) you specified for the new boot environment. Determining
which file systems should be in the new boot environment.
Updating boot environment description database on all BEs.
Updating system configuration files.
Creating configuration for boot environment <test>.
Source boot environment is <s10u7-baseline>.
Creating boot environment <test>.
Cloning file systems from boot environment <s10u7-baseline> to create boot environment <test>.
Creating snapshot for <rpool/ROOT/s10u7-baseline> on <rpool/ROOT/s10u7-baseline@test>.
Creating clone for <rpool/ROOT/s10u7-baseline@test> on <rpool/ROOT/test>.
Setting canmount=noauto for </> in zone <global> on <rpool/ROOT/test>.
Saving existing file </boot/grub/menu.lst> in top level dataset for BE <s10u6_baseline> as <mount-point>>//boot/grub/menu.lst.prev.
Saving existing file </boot/grub/menu.lst> in top level dataset for BE <test> as <mount-point>//boot/grub/menu.lst.prev.
Saving existing file </boot/grub/menu.lst> in top level dataset for BE <nv114> as <mount-point>//boot/grub/menu.lst.prev.
Saving existing file </boot/grub/menu.lst> in top level dataset for BE <route66> as <mount-point>//boot/grub/menu.lst.prev.
Saving existing file </boot/grub/menu.lst> in top level dataset for BE <nv95> as <mount-point>//boot/grub/menu.lst.prev.
File </boot/grub/menu.lst> propagation successful
Copied GRUB menu from PBE to ABE
No entry for BE <test> in GRUB menu
Population of boot environment <test> successful.
Creation of boot environment <test> successful.

# zfs destroy arrakis/temp

# luupgrade -t -s /export/patches/10_x86_Recommended-2009-05-14  -O "-d" -n test
System has findroot enabled GRUB
No entry for BE <test> in GRUB menu
Validating the contents of the media </export/patches/10_x86_Recommended-2009-05-14>.
The media contains 143 software patches that can be added.
All 143 patches will be added because you did not specify any specific patches to add.
Mounting the BE <test>.
ERROR: Read-only file system: cannot create mount point </.alt.tmp.b-59c.mnt/arrakis/temp>
ERROR: failed to create mount point </.alt.tmp.b-59c.mnt/arrakis/temp> for file system </arrakis/temp>
ERROR: unmounting partially mounted boot environment file systems
ERROR: cannot mount boot environment by icf file </etc/lu/ICF.5>
ERROR: Unable to mount ABE <test>: cannot complete lumk_iconf
Adding patches to the BE <test>.
Validating patches...

Loading patches installed on the system...

Cannot check name /a/var/sadm/pkg.
Unmounting the BE <test>.
The patch add to the BE <test> failed (with result code <1>).
The proper Live Upgrade solution to this problem would be to destroy and recreate the boot environment, or just recreate the missing file system (I'm sure that most of you have figured the latter part out on your own). The rationale is that the alternate boot environment no longer matches the storage configuration of its source. This was fine in a UFS world, but perhaps a bit constraining when ZFS rules the landscape. What if you really wanted the file system to be gone forever.

With a little more understanding of the internals of Live Upgrade, we can fix this rather easily.

Important note: We are about to modify undocumented Live Upgrade configuration files. The formats, names, and contents are subject to change without notice and any errors made while doing this can render your Live Upgrade configuration unusable.

The file system configurations for each boot environment are kept in a set of Internal Configuration Files (ICF) in /etc/lu named ICF.n, where n is the boot environment number. From the error message above we see that /etc/lu/ICF.5 is the one that is causing the problem. Let's take a look.
# cat /etc/lu/ICF.5
test:-:/dev/dsk/c5d0s1:swap:4225095
test:-:/dev/zvol/dsk/rpool/swap:swap:8435712
test:/:rpool/ROOT/test:zfs:0
test:/archives:/dev/dsk/c1t0d0s2:ufs:327645675
test:/arrakis:arrakis:zfs:0
test:/arrakis/misc:arrakis/misc:zfs:0
test:/arrakis/misc2:arrakis/misc2:zfs:0
test:/arrakis/stuff:arrakis/stuff:zfs:0

test:/arrakis/temp:arrakis/temp:zfs:0

test:/audio:arrakis/audio:zfs:0
test:/backups:arrakis/backups:zfs:0
test:/export:arrakis/export:zfs:0
test:/export/home:arrakis/home:zfs:0
test:/export/iso:arrakis/iso:zfs:0
test:/export/linux:arrakis/linux:zfs:0
test:/rpool:rpool:zfs:0
test:/rpool/ROOT:rpool/ROOT:zfs:0
test:/usr/local:arrakis/local:zfs:0
test:/vbox:arrakis/vbox:zfs:0
test:/vbox/fedora8:arrakis/vbox/fedora8:zfs:0
test:/video:arrakis/video:zfs:0
test:/workshop:arrakis/workshop:zfs:0
test:/xp:/dev/dsk/c2d0s7:ufs:70396830
test:/xvm:arrakis/xvm:zfs:0
test:/xvm/fedora8:arrakis/xvm/fedora8:zfs:0
test:/xvm/newfs:arrakis/xvm/newfs:zfs:0
test:/xvm/nv113:arrakis/xvm/nv113:zfs:0
test:/xvm/opensolaris:arrakis/xvm/opensolaris:zfs:0
test:/xvm/s10u5:arrakis/xvm/s10u5:zfs:0
test:/xvm/ub710:arrakis/xvm/ub710:zfs:0
The first step is to clean up the mess left by the failing luupgrade attempt. At the very least we will need to unmount the alternate boot environment root. It is also very likely that we will have to unmount a few temporary directories, such as /tmp and /var/run. Since this is ZFS we will also have to remove the directories created when these file systems were mounted.
# df -k | tail -3
rpool/ROOT/test      49545216 6879597 7546183    48%    /.alt.tmp.b-Fx.mnt
swap                 4695136       0 4695136     0%    /a/var/run
swap                 4695136       0 4695136     0%    /a/tmp

# luumount test
# umount /a/var/run
# umount /a/tmp
# rmdir /a/var/run /a/var /a/tmp

Next we need to remove the missing file system entry from the current copy of the ICF file. Use whatever method you prefer (vi, perl, grep). Once we have corrected our local copy of the ICF file we must propagate it to the alternate boot environment we are about to patch. You can skip the propagation if you are going to delete the boot environment without doing any other maintenance activities. The normal Live Upgrade operations will take care of propagating the ICF files to the other boot environments, so we should not have to worry about them at this time.
# mv /etc/lu/ICF.5 /tmp/ICF.5
# grep -v arrakis/temp /tmp/ICF.5 > /etc/lu/ICF.5 
# cp /etc/lu/ICF.5 `lumount test`/etc/lu/ICF.5
# luumount test
At this point we should be good to go. Let's try the luupgrade again.
# luupgrade -t -n test -O "-d" -s /export/patches/10_x86_Recommended-2009-05-14
System has findroot enabled GRUB
No entry for BE  in GRUB menu
Validating the contents of the media .
The media contains 143 software patches that can be added.
All 143 patches will be added because you did not specify any specific patches to add.
Mounting the BE <test>.
Adding patches to the BE <test>.
Validating patches...

Loading patches installed on the system...

Done!

Loading patches requested to install.

Approved patches will be installed in this order:

118668-19 118669-19 119214-19 123591-10 123896-10 125556-03 139100-02


Checking installed patches...
Verifying sufficient filesystem capacity (dry run method)...
Installing patch packages...

Patch 118668-19 has been successfully installed.
Patch 118669-19 has been successfully installed.
Patch 119214-19 has been successfully installed.
Patch 123591-10 has been successfully installed.
Patch 123896-10 has been successfully installed.
Patch 125556-03 has been successfully installed.
Patch 139100-02 has been successfully installed.

Unmounting the BE <test>.
The patch add to the BE <test> completed.
Now that the alternate boot environment has been patched, we can activate it at our convenience.

I keep deleting and deleting and still can't get rid of those pesky boot environments

This is an interesting corner case where the Live Upgrade configuration files get so scrambled that even simple tasks like deleting a boot environment are not possible. Every time I have gotten myself into this situation I can trace it back to some ill advised shortcut that seemed harmless at the time, but I won't rule out bugs and environment as possible causes.

Here is our simple test case: turn our boot environment from the previous example into a zombie - something that is neither alive nor dead but just takes up space and causes a mild annoyance.

Important note: Don't try this on a production system. This is for demonstration purposes only.
# dd if=/dev/random of=/etc/lu/ICF.5 bs=2048 count=2
0+2 records in
0+2 records out

# ludelete -f test
System has findroot enabled GRUB
No entry for BE <test> in GRUB menu
ERROR: The mount point </.alt.tmp.b-fxc.mnt> is not a valid ABE mount point (no /etc directory found).
ERROR: The mount point </.alt.tmp.b-fxc.mnt> provided by the <-m> option is not a valid ABE mount point.
Usage: lurootspec [-l error_log] [-o outfile] [-m mntpt]
ERROR: Cannot determine root specification for BE <test>.
ERROR: boot environment <test> is not mounted
Unable to delete boot environment.
Our first task is to make sure that any partially mounted boot environment is cleaned up. A df should help us here.
# df -k | tail -5
arrakis/xvm/opensolaris 350945280      19 17448377     1%    /xvm/opensolaris
arrakis/xvm/s10u5    350945280      19 17448377     1%    /xvm/s10u5
arrakis/xvm/ub710    350945280      19 17448377     1%    /xvm/ub710
swap                 4549680       0 4549680     0%    /.alt.tmp.b-fxc.mnt/var/run
swap                 4549680       0 4549680     0%    /.alt.tmp.b-fxc.mnt/tmp


# umount /.alt.tmp.b-fxc.mnt/tmp
# umount /.alt.tmp.b-fxc.mnt/var/run
Ordinarily you would use lufslist(1M) to try to determine which file systems are in use by the boot environment you are trying to delete. In this worst case scenario that is not possible. A bit of forensic investigation and a bit more courage will help us figure this out.

The first place we will look is /etc/lutab. This is the configuration file that lists all boot environments known to Live Upgrade. There is a man page for this in section 4, so it is somewhat of a public interface but please take note of the warning
 
        The lutab file must not be edited by hand. Any user  modifi-
        cation  to  this file will result in the incorrect operation
        of the Live Upgrade feature.
This is very good advice and failing to follow it has led some some of my most spectacular Live Upgrade meltdowns. But in this case Live Upgrade is already broken and it may be possible to undo the damage and restore proper operation. So let's see what we can find out.
# cat /etc/lutab
# DO NOT EDIT THIS FILE BY HAND. This file is not a public interface.
# The format and contents of this file are subject to change.
# Any user modification to this file may result in the incorrect
# operation of Live Upgrade.
3:s10u5_baseline:C:0
3:/:/dev/dsk/c2d0s0:1
3:boot-device:/dev/dsk/c2d0s0:2
1:s10u5_lu:C:0
1:/:/dev/dsk/c5d0s0:1
1:boot-device:/dev/dsk/c5d0s0:2
2:s10u6_ufs:C:0
2:/:/dev/dsk/c4d0s0:1
2:boot-device:/dev/dsk/c4d0s0:2
4:s10u6_baseline:C:0
4:/:rpool/ROOT/s10u6_baseline:1
4:boot-device:/dev/dsk/c4d0s3:2
10:route66:C:0
10:/:rpool/ROOT/route66:1
10:boot-device:/dev/dsk/c4d0s3:2
11:nv95:C:0
11:/:rpool/ROOT/nv95:1
11:boot-device:/dev/dsk/c4d0s3:2
6:s10u7-baseline:C:0
6:/:rpool/ROOT/s10u7-baseline:1
6:boot-device:/dev/dsk/c4d0s3:2
7:nv114:C:0
7:/:rpool/ROOT/nv114:1
7:boot-device:/dev/dsk/c4d0s3:2
5:test:C:0
5:/:rpool/ROOT/test:1
5:boot-device:/dev/dsk/c4d0s3:2
We can see that the boot environment named test is (still) BE #5 and has it's root file system at rpool/ROOT/test. This is the default dataset name and indicates that the boot environment has not been renamed. Consider the following example for a more complicated configuration.
# lucreate -n scooby
# lufslist scooby | grep ROOT
rpool/ROOT/scooby       zfs            241152 /                   -
rpool/ROOT              zfs       39284664832 /rpool/ROOT         -

# lurename -e scooby -n doo
# lufslist doo | grep ROOT
rpool/ROOT/scooby       zfs            241152 /                   -
rpool/ROOT              zfs       39284664832 /rpool/ROOT         -
The point is that we have to trust the contents of /etc/lutab but it does not hurt to do a bit of sanity checking before we start deleting ZFS datasets. To remove boot environment test from the view of Live Upgrade, delete the three lines in /etc/lutab starting with 5 (in this example). We should also remove it's Internal Configuration File (ICF) /etc/lu/ICF.5
# mv -f /etc/lutab /etc/lutab.old
# grep -v \^5: /etc/lutab.old > /etc/lutab
# rm -f /etc/lu/ICF.5

# lustatus
Boot Environment           Is       Active Active    Can    Copy      
Name                       Complete Now    On Reboot Delete Status    
-------------------------- -------- ------ --------- ------ ----------
s10u5_baseline             yes      no     no        yes    -         
s10u5_lu                   yes      no     no        yes    -         
s10u6_ufs                  yes      no     no        yes    -         
s10u6_baseline             yes      no     no        yes    -         
route66                    yes      no     no        yes    -         
nv95                       yes      yes    yes       no     -         
s10u7-baseline             yes      no     no        yes    -         
nv114                      yes      no     no        yes    -         
If the boot environment being deleted is in UFS then we are done. Well, not exactly - but pretty close. We still need to propagate the updated configuration files to the remaining boot environments. This will be done during the next live upgrade operation (lucreate, lumake, ludelete, luactivate) and I would recommend that you let Live Upgrade handle this part. The exception to this will be if you boot directly into another boot environment without activating it first. This isn't a recommended practice and has been the source of some of my most frustrating mistakes.

If the exorcised boot environment is in ZFS then we still have a little bit of work to do. We need to delete the old root datasets and any snapshots that they may have been cloned from. In our example the root dataset was rpool/ROOT/test. We need to look for any children as well as the originating snapshot, if present.
# zfs list -r rpool/ROOT/test
NAME                  USED  AVAIL  REFER  MOUNTPOINT
rpool/ROOT/test       234K  6.47G  8.79G  /.alt.test
rpool/ROOT/test/var    18K  6.47G    18K  /.alt.test/var

# zfs get -r origin rpool/ROOT/test
NAME             PROPERTY  VALUE                 SOURCE
rpool/ROOT/test  origin    rpool/ROOT/nv95@test  -
rpool/ROOT/test/var  origin    rpool/ROOT/nv95/var@test    
       
# zfs destroy rpool/ROOT/test/var
# zfs destroy rpool/ROOT/nv95/var@test
# zfs destroy rpool/ROOT/test
# zfs destroy rpool/ROOT/nv95@test
Important note:luactivate will promote the newly activated root dataset so that snapshots used to create alternate boot environments should be easy to delete. If you are switching between boot environments without activating them first (which I have already warned you about doing), you may have to manually promote a different dataset so that the snapshots can be deleted.

To BE or not to BE - how about no BE ?

You may find yourself in a situation where you have things so scrambled up that you want to start all over again. We can use what we have just learned to unwind Live Upgrade and start from a clean configuration. Specifically we want to delete /etc/lutab, the ICF and related files, all of the temporary files in /etc/lu/tmp and a few files that hold environment variables for some of the lu scripts. And if using ZFS we will also have to delete any datasets and snapshots that are no longer needed.
 
# rm -f /etc/lutab 
# rm -f /etc/lu/ICF.* /etc/lu/INODE.* /etc/lu/vtoc.*
# rm -f /etc/lu/.??*
# rm -f /etc/lu/tmp/* 

# lustatus
ERROR: No boot environments are configured on this system
ERROR: cannot determine list of all boot environment names

# lucreate -c scooby -n doo
Checking GRUB menu...
Analyzing system configuration.
No name for current boot environment.
Current boot environment is named <scooby>.
Creating initial configuration for primary boot environment <scooby>.
The device </dev/dsk/c4d0s3> is not a root device for any boot environment; cannot get BE ID.
PBE configuration successful: PBE name <scooby> PBE Boot Device </dev/dsk/c4d0s3>.
Comparing source boot environment <scooby> file systems with the file 
system(s) you specified for the new boot environment. Determining which 
file systems should be in the new boot environment.
Updating boot environment description database on all BEs.
Updating system configuration files.
Creating configuration for boot environment <doo>.
Source boot environment is <scooby>.
Creating boot environment <doo>.
Cloning file systems from boot environment <scooby> to create boot environment <doo>.
Creating snapshot for <rpool/ROOT/scooby> on <rpool/ROOT/scooby@doo>.
Creating clone for <rpool/ROOT/scooby@doo> on <rpool/ROOT/doo>.
Setting canmount=noauto for </> in zone <global> on <rpool/ROOT/doo>.
Saving existing file </boot/grub/menu.lst> in top level dataset for BE <doo> as <mount-point>//boot/grub/menu.lst.prev.
File </boot/grub/menu.lst> propagation successful
Copied GRUB menu from PBE to ABE
No entry for BE <doo> in GRUB menu
Population of boot environment <doo> successful.
Creation of boot environment <doo> successful.

# luactivate doo
System has findroot enabled GRUB
Generating boot-sign, partition and slice information for PBE 

File  deletion successful
File  deletion successful
File  deletion successful
Activation of boot environment  successful.

# lustatus
Boot Environment           Is       Active Active    Can    Copy      
Name                       Complete Now    On Reboot Delete Status    
-------------------------- -------- ------ --------- ------ ----------
scooby                     yes      yes    no        no     -         
doo                        yes      no     yes       no     -        
Pretty cool, eh ?

There are still a few more interesting corner cases, but we will deal with those in the one of the next articles. In the mean time, please remember to
  • Check Infodoc 206844 for Live Upgrade patch requirements
  • Keep your patching and package utilities updated
  • Use luactivate to switch between boot environments


Technocrati Tags: <script type="text/javascript"> var sc_project=1193495; var sc_invisible=1; var sc_security="a46f6831"; </script> <script type="text/javascript" src="http://www.statcounter.com/counter/counter.js"></script>

Tuesday Mar 24, 2009

Adobe releases an x86 version of Acroread 9.1 for Solaris

Great Googly Moogly!!! Our friends at Adobe have finally released a new x86 version of Acroread for Solaris. Download Acroread 9.1 from Adobe.com and say goodbye to evince, xpdf, and the especially interesting Acroread out of the Linux branded zone trick.
Click image to enlarge 

Sunday Mar 22, 2009

Dr. Live Upgrade - Or How I Learned to Stop Worrying and Love Solaris Patching

Who loves to patch or upgrade a system ?

That's right, nobody. Or if you do perhaps we should start a local support group to help you come to terms with this unusual fascination. Patching, and to a lesser extent upgrades (which can be thought of as patches delivered more efficiently through package replacement), is the the most common complaint that I hear when meeting with system administrators and their management.

Most of the difficulties seem to fit into one of the following categories.
  • Analysis: What patches need to be applied to my system ?
  • Effort: What do I have to do to perform the required maintenance ?
  • Outage: How long will the system be down to perform the maintenance ?
  • Recovery: What happens when something goes wrong ?
And if a single system gives you a headache, adding a few containers into the mix will bring on a full migraine. And without some relief you may be left with the impression that containers aren't worth the effort. That's unfortunate because containers don't have to be troublesome and patching doesn't have to be hard. But it does take getting to know one of the most important and sadly least used features in Solaris: Live Upgrade

Before we looking at Live Upgrade, let's start with a definition. A boot environment is the set of all file systems and devices that are unique to an instance of Solaris on a system. If you have several boot environments then some data will be shared (non svr4 package installed applications, data, local home directories) and some will be exclusive to one boot environment. Not making this more complicated than it needs to be, a boot environment is generally your root (including /usr and /etc), /var (frequently split out on a separate file system), and /opt. Swap may or may not be a part of a boot environment - it is your choice. I prefer to share swap, but there are some operational situations where this may not be feasible. There may be additional items, but generally everything else is shared. Network mounted file systems and removable media are assumed to be shared.

With this definition behind us, let's proceed.

Analysis: What patches need to be applied to my system ?

For all of the assistance that Live Upgrade offers, it doesn't do anything to help with the analysis phase. Fortunately there are plenty of tools that can help with this phase. Some of them work nicely with Live Upgrade, others take a bit more effort.

smpatch(1M) has an analyze capability that can determine which patches need to be applied to your system. It will get a list of patches from an update server, most likely one at Sun, and match up the dependencies and requirements with your system. smpatch can be used to download these patches for future application or it can apply them for you. smpatch works nicely with Live Upgrade, so from a single command you can upgrade an alternate boot environment. With containers!

The Sun Update Manager is a simple to use graphical front end for smpatch. It gives you a little more flexibility during the inspection phase by allowing you to look at individual patch README files. It is also much easier to see what collection a patch belongs to (recommended, security, none) and if the application of that patch will require a reboot. For all of that additional flexibility you lose the integration with Live Upgrade. Not for lack of trying, but I have not found a good way to make Update Manager and Live Upgrade play together.

Sun xVM Ops Center has a much more sophisticated patch analysis system that uses additional knowledge engines beyond those used by smpatch and Update Manager. The result is a higher quality patch bundle tailored for each individual system, automated deployment of the patch bundle, detailed auditing of what was done and simple backout should problems occur. And it basically does the same for Windows and Linux. It is this last feature that makes things interesting. Neither Windows nor Linux have anything like Live Upgrade and the least common denominator approach of Ops Center in its current state means that it doesn't work with Live Upgrade. Fortunately this will change in the not too distant future, and when it does I will be shouting about this feature from rooftops (OK, what I really mean is I'll post a blog and a tweet about it). If I can coax Ops Center into doing the analysis and download pieces then I can manually bolt it onto Live Upgrade for a best of both worlds solution.

These are our offerings and there are others. Some of them are quite good and in use in many places. Patch Check Advanced (PCA) is one of the more common tools in use. It operates on a patch dependency cross reference file and does a good job with the dependency analysis (this is obsoleted by that, etc). It can be used to maintain an alternate boot environment and in simple cases that would be fine. If the alternate boot environment contains any containers then I would use Live Upgrade's luupgrade instead of PCA's patchadd -R approach. If I was familiar with PCA then I would still use it for the analysis and download feature. Just let luupgrade apply the patches. You might have to uncompress the patches downloaded by PCA before handing them over to luupgrade, but that is a minor implementation detail.

In summary, use an analysis tool appropriate to the task (based on familiarity, budget and complexity) to figure out what patches are needed. Then use Live Upgrade (luupgrade) to deploy the desired patches.

Effort: What does it take to perform the required maintenance ?

This is a big topic and I could write pages on the subject. Even if I use an analysis tool like smpatch or pca to save me hours of trolling through READMEs drawing dependency graphs, there is still a lot of work to do in order to survive the ordeal of applying patches. Some of the more common techniques include ....
Backing up your boot environment.
I should not have to mention this, but there are some operational considerations unique to system maintenance. Even though tiny, there is a greater chance that you will render your system non-bootable during system maintenance than any other operational task. Even with mature processes, human factors can come into play and bad things can happen (oops - that was my fallback boot environment that I just ran newfs(1M) on).

This is why automation and time tested scripting becomes so important. Should you do the unthinkable and render a system nonfunctional, rapid restoration of the boot environment is important. And getting it back to the last known good state is just as important. A fresh backup that can be restored by utilities from install media or jumpstart miniroot is a very good idea. Flash archives (see flarcreate(1M)) is even better, although complications with containers make this less interesting now than in previous releases of Solaris. How many of you take a backup before applying patches ? Probably about the same number as replace batteries in your RAID controllers or change out your UPS systems after their expiration date.

Split Mirrors
One interesting technique is to split mirrors instead of backups. Of course this only works if you mirror your boot environment (a recommended practice for those systems with adequate disk space). Break your mirror, apply patches to the non-running half, cut over the updated boot environment during the next maintenance window and see how this goes. At first glance this seems like a good idea, but there are two catches.
  1. Do you synchronize dynamic boot environment elements ? Things like /etc/passwd, /etc/shadow, /var/adm/messages, print and mail queues are constantly changing. It is possible that these have changed between the mirror split and subsequent activation.
  2. How long are you willing to run without your boot environment being mirrored ? This may cause to you certify the new boot environment too quickly. You want to reestablish your mirror, but if that is your fallback in case of trouble you have a conundrum. And if you are the sort that seems to have a black cloud following you through life, you will discover a problem shortly after you started the mirror resync.
Pez disks ?
OK, the mirror split thing can be solved by swinging in another disk. Operationally a bit more complex and you have at least one disk that you can't use for other purposes (like hosting a few containers), but it can be done. I wouldn't do it (mainly because I know where this story is heading) but many of you do.
Better living through Live Upgrade
Everything we do to try to make it better adds complexity, or another hundred lines of scripting. It doesn't need to be this way, and if you become one with the LU commands it won't for you either. Live Upgrade will take care building and updating multiple boot environments. It will check to make sure the disks being used are bootable and not part of another boot environment. It works with the Solaris Volume Manager, Veritas encapulated root devices, and starting with Solaris 10 10/08 (update 6) ZFS. It also takes care of the synchronization problem. Starting with Solaris 10 8/07 (update 4), Live Upgrade also works with containers, both native and branded (and with Solaris 10 10/08 your zoneroots can be in a ZFS pool).

Outage: How long will my system be down for the maintenance?

Or perhaps more to the point, how long will my applications be unavailable ? The proper reply is it depends on how big the patch bundle is and how many containers you have. And if a kernel patch is involved, double or triple your estimate. This can be a big problem and cause you to take short cuts like only install some patches now and others later when it is more convenient. Our good friend Bart Smaalders has a nice discussion on the implications of this approach and what we are doing in OpenSolaris to solve this. That solution will eventually work its way into the Next Solaris, but in the mean time we have a problem to solve.

There is a large set (not really large, but more than one) of patches that require a quiescent system to be properly applied. An example would be a kernel patch that causes a change to libc. It is sort of hard to rip out libc on a running system (new processes get the new libc my may have issues with the running kernel, old processes get the old libc and tend to be fine, until they do a fork(2) and exec(2)). So we developed a brilliant solution to this problem - deferred activation patching. If you apply one of these troublesome patches then we will throw it in a queue to be applied the next time the system is quiesced (a fancy term for the next time we're in single user mode). This solves the current system stability concerns but may make the next reboot take a bit longer. And if you forgot you have deferred patches in your queue, don't get anxious and interrupt the shutdown or next boot. Grab a noncaffeinated beverage and put some Bobby McFerrin on your iPod. Don't Worry, Be Happy.

So deferred activation patching seems like a good way to deal with situation where everything goes well. And some brilliant engineers are working on applying patches in parallel (where applicable) which will make this even better. But what happens when things go wrong ? This is when you realize that patchrm(1M) is not your friend. It has never been your friend, nor will it ever be. I have an almost paralyzing fear of dentists, but would rather visit one then start down a path where patchrm is involved. Well tested tools and some automation can reduce this to simple anxiety, but if I could eliminate patchrm altogether I would be much happier.

For all that Live Upgrade can do to ease system maintenance, it is in the area of outage and recovery that make it special. And when speaking about Solaris, either in training or evangelism events, this is why I urge attendees to drop whatever they are doing and adopt Live Upgrade immediately.

Since Live Upgrade (lucreate, lumake, luupgrade) operates on an alternate boot environment, the currently running set of applications are not affected. The system stays up, applications stay running and nothing is changing underneath them so there is no cause for concern. The only impact is some additional load by the live upgrade operations. If that is a concern then run live upgrade in a project and cap resource consumption to that project.

An interesting implication of Live Upgrade is that the operational sanity of each step is no longer required. All that matters is the end state. This gives us more freedom to apply patches in a more efficient fashion than would be possible on a running boot environment. This is especially noticeable on a system with containers. The time that the upgrade runs is significantly reduced, and all the while applications are running. No more deferred activation patches, no more single user mode patching. And if all goes poorly after activating the new boot environment you still have your old one to fall back on. Queue Bobby McFerrin for another round of "Don't Worry, Be Happy".

This brings up another feature of Live Upgrade - the synchronization of system files in flight between boot environments. After a boot environment is activated, a synchronization process is queued as a K0 script to be run during shutdown. Live Upgrade will catch a lot of private files that we know about and the obvious public ones (/etc/passwd, /etc/shadow, /var/adm/messages, mail queues). It also provides a place (/etc/lu/synclist) for you to include things we might not have thought about or are unique to your applications.

When using Live Upgrade applications are only unavailable for the amount of time it takes to shut down the system (the synchronization process) and boot the new boot environment. This may include some minor SMF manifest importing but that should not add much to the new boot time. You only have to complete the restart during a maintenance window, not the entire upgrade. While vampires are all the rage for teenagers these days, system administrators can now come out into the light and work regular hours.

Recovery: What happens when something goes wrong?

This is when you will fully appreciate Live Upgrade. After activation of a new boot environment, now called the Primary Boot Environment (PBE), your old boot environment, now called an Alternate Boot Environment (ABE) can still be called upon in case of trouble. Just activate it and shut down the system. Applications will be down for a short period (the K0 sync and subsequence start up), but there will be no more wringing of the hands, reaching for beverages with too much caffeine and vitamin B12, trying to remember where you kept your bottle of Tums. Queue Bobby McFerrin one more timne and "Don't Worry, Be Happy". You will be back to your previous operational state in a matter of a few minutes (longer if you have a large server with many disks). Then you can mount up your ABE and troll through the logs trying to determine what went wrong. If you have a service contract then we will troll through the logs with you.

I neglected to mention earlier, disks that comprise boot environments can be mirrored, so there is no rush to certification. Everything can be mirrored, at all times. Which is a very good thing. You still need to back up your boot environments, but you will find yourself reaching for the backup media much less often when using Live Upgrade.

All that is left are a few simple examples of how to use Live Upgrade. I'll save that for next time.

Technocrati Tags:

Tuesday Mar 17, 2009

Time-slider saves the day (or at least a lot of frustration)

As I was tidying up my Live Upgrade boot environments yesterday, I did something that I thought was terribly clever but had some pretty wicked side effects. While linking up all of my application configuration directories (firefox, mozilla, thunderbird, [g]xine, staroffice) I got blindsided by the GNOME message client: pidgin, or more specifically one of our migration assistants from GAIM to pidgin.

As a quick background, Solaris, Solaris Express Community Edition (SXCE), and OpenSolaris all have different versions of the GNOME desktop. Since some of the configuration settings are incompatible across releases the easy solution is to keep separate home directories for each version of GNOME you might use. Which is fine until you grow weary of setting your message filters for Thunderbird again or forget which Firefox has that cached password for the local recreation center that you only use once a year. Pretty quickly you come up with the idea of a common directory for all shared configuration files (dot directories, collections of pictures, video, audio, presentations, scripts).

For one boot environment you do something like
$ mkdir /export/home/me
$ for dotdir in .thunderbird .purple .mozilla .firefox .gxine .xine .staroffice .wine .staroffice\* .openoffice\* .VirtualBox .evolution bin lib misc presentations 
> do
> mv $dotdir /export/home/me
> ln -s /export/home/me/$dotdir   $dotdir
> done
And for the other GNOME home directories you do something like
$ for dotdir in .thunderbird .purple .mozilla .firefox .gxine .xine .staroffice .wine .staroffice\* .openoffice\* .VirtualBox .evolution bin lib misc presentations 
> do
> mv $dotdir ${dotdir}.old
> ln -s /export/home/me/$dotdir   $dotdir
> done
And all is well. Until......

Booted into Solaris 10 and fired up pidgin thinking I would get all of my accounts activated and the default chatrooms started. Instead I was met by this rather nasty note that I had incompatible GAIM entries and it would try to convert them for me. What it did was wipe out all of my pidgin settings. And sure enough when I look into the shared directory, .purple contained all new and quite empty configuration settings.

This is where I am hoping to get some sympathy, since we have all done things like this. But then I remembered I had started time-slider earlier in the day (from the OpenSolaris side of things).
$ time-slider-setup
And there were my .purple files from 15 minutes ago, right before the GAIM conversion tools made a mess of them.
$ cd /export/home/.zfs/snapshot
$ ls
zfs-auto-snap:daily-2009-03-16-22:47
zfs-auto-snap:daily-2009-03-17-00:00
zfs-auto-snap:frequent-2009-03-17-11:45
zfs-auto-snap:frequent-2009-03-17-12:00
zfs-auto-snap:frequent-2009-03-17-12:15
zfs-auto-snap:frequent-2009-03-17-12:30
zfs-auto-snap:hourly-2009-03-16-22:47
zfs-auto-snap:hourly-2009-03-16-23:00
zfs-auto-snap:hourly-2009-03-17-00:00
zfs-auto-snap:hourly-2009-03-17-01:00
zfs-auto-snap:hourly-2009-03-17-02:00
zfs-auto-snap:hourly-2009-03-17-03:00
zfs-auto-snap:hourly-2009-03-17-04:00
zfs-auto-snap:hourly-2009-03-17-05:00
zfs-auto-snap:hourly-2009-03-17-06:00
zfs-auto-snap:hourly-2009-03-17-07:00
zfs-auto-snap:hourly-2009-03-17-08:00
zfs-auto-snap:hourly-2009-03-17-09:00
zfs-auto-snap:hourly-2009-03-17-10:00
zfs-auto-snap:hourly-2009-03-17-11:00
zfs-auto-snap:hourly-2009-03-17-12:00
zfs-auto-snap:monthly-2009-03-16-11:38
zfs-auto-snap:weekly-2009-03-16-22:47

$ cd zfs-auto-snap:frequent-2009-03-17-12:15/me/.purple
$ rm -rf /export/home/me/.purple/\*
$ cp -r \* /export/home/me/.purple

(and this is is really really important)
$ mv $HOME/.gaim $HOME/.gaim-never-to-be-heard-from-again

Log out and back in to refresh the GNOME configuration settings and everything is as it should be. OpenSolaris time-slider is just one more reason that I'm glad that it is my daily driver.

Technocrati Tags:

Monday Mar 02, 2009

Alaska and Oregon Solaris Boot Camps

A big thanks to all who attended the Solaris Boot Camps in Juneau, Fairbanks, Portland and Salem. I hope that you found the information useful. And thanks for all of the good questions and discussion.

Here are the materials that were used during the bootcamp.

Please send me email if you have any questions or want to follow up on any of the discussions.

Thanks again for your attendance and continued support for Solaris.

Technocrati Tags:

Tuesday Nov 11, 2008

OpenSolaris 2008.11 Release Candidate 1B (nv101a) is now available for testing

The initial release candidate (rc1b) for OpenSolaris 2008.11 (based on nv101a) is now available for download and testing. Additional (larger) images are available for non-English locales as well as USB images for faster installs. If you have not played with a USB image you will be dazzled at the speed of the installation. Amazing what happens when you eliminate all those slow seeks.

The new release candidate has quite a few interesting features and updates. The items that caught my attention were
  • IPS Package Manager
  • Automatically cloning root file system (beadm clone) during image update
  • GNOME 2.24
  • Evolution 2.24 for those of us that are stubborn enough to continue using it
  • OpenOffice 3.0
  • Songbird - an iTunes-like media player. Still needs lots of codecs (like the free Fluendo MP3 decoder) to be really useful
  • Brasero - a Nero-like media burner
Our own Dan Roberts has more to say on the subject in this video podcast.

Using the graphical package manager it only took a few minutes to set up the installation plan for a nice web based development system including Netbeans, a web stack (including Glassfish), and a Xen based virtualization system.

OpenSolaris 2008.11 is shaping up to be quite a nice release. Now that I have figured out how to make it play nicely in a root zpool with other Solaris releases, I will be spending a lot more time with it as the daily driver.

Download it, play with it, and please remember to file bugs when you run into things that don't work.

Technocrati Tags:

Tuesday Nov 04, 2008

Solaris and OpenSolaris coexistence in the same root zpool

Some time ago, my buddy Jeff Victor gave us FrankenZone. An idea that is disturbingly brilliant. It has taken me a while, but I offer for your consideration VirtualBox as a V2P platform for OpenSolaris. Nowhere near as brilliant, but at least as unusual. And you know that you have to try this out at home.

Note: This is totally a science experiment. I fully expect to see the two guys from Myth Busters showing up at any moment. It also requires at least build 100 of OpenSolaris on both the host and guest operating system to work around the hostid difficulties.

With the caveats out of the way, let me set the back story to explain how I got here.

Until virtualization technologies become ubiquitous and nothing more than BIOS extensions, multi-boot configurations will continue to be an important capability. And for those working with [Open]Solaris there are several limitations that complicate this unnecessarily. Rather than lamenting these, the possibility of leveraging ZFS root pools, now in Solaris 10 10/08, should offer up some interesting solutions.

What I want to do is simple - have a single Solaris fdisk partition that can have multiple versions of Solaris all bootable with access to all of my data. This doesn't seem like much of a request, but as of yet this has been nearly impossible to accomplish in anything close to a supportable configuration. As it turns out the essential limitation is in the installer - all other issues can be handled if we can figure out how to install OpenSolaris into an existing pool.

What we will do is use our friend VirtualBox to work around the installer issues. After installing OpenSolaris in a virtual machine we take a ZFS snapshot, send it to the bare metal Solaris host and restore it in the root pool. Finally we fix up a few configuration files to make everything work and we will be left with a single root pool that can boot Solaris 10, Solaris Express Community Edition (nevada), and OpenSolaris.

How cool is that :-) Yeah, it is that cool. Let's proceed.

Prepare the host system

The host system is running a fresh install of Solaris 10 10/08 with a single large root zpool. In this example the root zpool is named panroot. There is also a separate zpool that contains data that needs to be preserved in case a re-installation of Solaris is required. That zpool is named pandora, but it doesn't matter - it will be automatically imported in our new OpenSolaris installation if all goes well.
# lustatus 
Boot Environment           Is       Active Active    Can    Copy      
Name                       Complete Now    On Reboot Delete Status    
-------------------------- -------- ------ --------- ------ ----------
s10u6_baseline             yes      no     no        yes    -         
s10u6                      yes      no     no        yes    -         
nv95                       yes      yes    yes       no     -         
nv101a                     yes      no     no        yes    -    

     
# zpool list
NAME      SIZE   USED  AVAIL    CAP  HEALTH  ALTROOT
pandora  64.5G  56.9G  7.61G    88%  ONLINE  -
panroot    40G  26.7G  13.3G    66%  ONLINE  -
One challenge that came up was the less than stellar performance of ssh over the VirtualBox NAT interface. So rather than fight this I set up a shared NFS file system in the root pool to stage the ZFS backup file. This made the process go much faster.

In the host Solaris system
# zfs create -o sharenfs=rw,anon=0 -o mountpoint=/share panroot/share

Prepare the OpenSolaris virtual machine

If you have not already done so, get a copy of VirtualBox, install it and set up a virtual machine for OpenSolaris.

Important note: Do not install the VirtualBox guest additions. This will install some SMF services that will fail when booted on bare metal.

Send a ZFS snapshot to the host OS root zpool

Let's take a look around the freshly installed OpenSolaris system to see what we want to send.

Inside the OpenSolaris virtual machine
bash-3.2$ zfs list
NAME                     USED  AVAIL  REFER  MOUNTPOINT
rpool                   6.13G  9.50G    46K  /rpool
rpool/ROOT              2.56G  9.50G    18K  legacy
rpool/ROOT/opensolaris  2.56G  9.50G  2.49G  /
rpool/dump               511M  9.50G   511M  -
rpool/export            2.57G  9.50G  2.57G  /export
rpool/export/home        604K  9.50G    19K  /export/home
rpool/export/home/bob    585K  9.50G   585K  /export/home/bob
rpool/swap               512M  9.82G   176M  -
My host system root zpool (panroot) already has swap and dump, so these won't be needed. And it also has an /export hierarchy for home directories. I will recreate my OpenSolaris Primary System Administrator user once on bare metal, so it appears the only thing I need to bring over is the root dataset itself.

Inside the OpenSolaris virtual machine
bash-3.2$ pfexec zfs snapshot rpool/ROOT/opensolaris@scooby
bash-3.2$ pfexec zfs send rpool/ROOT/opensolaris@scooby > /net/10.0.2.2/share/scooby.zfs
We are now done with the virtual machine. It can be shut down and the storage reclaimed for other purposes.

Restore the ZFS dataset in the host system root pool

In addition to restoring the OpenSolaris root pool, the canmount property should be set to noauto. I also destroy the NFS shared directory since it will no longer be needed.
# zfs receive panroot/ROOT/scooby < /share/scooby.zfs
# zfs set canmount=noauto panroot/ROOT/scooby
# zfs destroy panroot/shared
Now mount the new OpenSolaris root filesystem and fix up a few configuration files. Specifically
  • /etc/zfs/zpool.cache so that all boot environments have the same view of available ZFS pools
  • /etc/hostid to keep all of the boot environments using the same hostid. This is extremely important and failure to do this will leave some of your boot environments unbootable - which isn't very useful. /etc/hostid is new to build 100 and later.
Rebuild the OpenSolaris boot archive and we will be done with that filesystem.
# zfs set canmount=noauto panroot/ROOT/scooby
# zfs set mountpoint=/mnt panroot/ROOT/scooby
# zfs mount panroot/ROOT/scooby

# cp /etc/zfs/zpool.cache /mnt/etc/zfs
# cp /etc/hostid /mnt/etc/hostid

# bootadm update-archive -f -R /mnt
Creating boot_archive for /mnt
updating /mnt/platform/i86pc/amd64/boot_archive
updating /mnt/platform/i86pc/boot_archive

# umount /mnt
Make a home directory for your OpenSolaris administrator user (in this example the user is named admin). Also add a GRUB stanza so that OpenSolaris can be booted.
# mkdir -p /export/home/admin
# chown admin:admin /export/home/admin
# cat > /panroot/boot/grub/menu.lst   <<DOO
title Scooby
root (hd0,3,a)
bootfs panroot/ROOT/scooby
kernel$ /platform/i86pc/kernel/$ISADIR/unix -B $ZFS-BOOTFS
module$ /platform/i86pc/$ISADIR/boot_archive
DOO
At this point we are done. Reboot the system and you should see a new GRUB stanza for our new OpenSolaris installation (scooby). Cue large audience applause track.

Live Upgrade and OpenSolaris Boot Environment Administration

On interesting side effect, on the positive side, is the healthy interaction of Live Upgrade and beadm(1M). For your Solaris and nevada based installations you can continue to use lucreate(1M), luupgrade(1M), and luactivate(1M). On the OpenSolaris side you can see all of your Live Upgrade boot environments as well as your OpenSolaris boot environments. Note that we can create and activate new boot environments as needed.

When in OpenSolaris
# beadm list
BE                           Active Mountpoint Space   Policy Created          
--                           ------ ---------- -----   ------ -------          
nv101a                       -      -          18.17G  static 2008-11-04 00:03 
nv95                         -      -          122.07M static 2008-11-03 12:47 
opensolaris                  -      -          2.83G   static 2008-11-03 16:23 
opensolaris-2008.11-baseline R      -          2.49G   static 2008-11-04 11:16 
s10u6                        -      -          97.22M  static 2008-11-03 12:03 
s10x_u6wos_07b               -      -          205.48M static 2008-11-01 20:51 
scooby                       N      /          2.61G   static 2008-11-04 10:29 

# beadm create doo
# beadm activate doo
# beadm list
BE                           Active Mountpoint Space   Policy Created          
--                           ------ ---------- -----   ------ -------          
doo                          R      -          5.37G   static 2008-11-04 16:23 
nv101a                       -      -          18.17G  static 2008-11-04 00:03 
nv95                         -      -          122.07M static 2008-11-03 12:47 
opensolaris                  -      -          25.5K   static 2008-11-03 16:23 
opensolaris-2008.11-baseline -      -          105.0K  static 2008-11-04 11:16 
s10u6                        -      -          97.22M  static 2008-11-03 12:03 
s10x_u6wos_07b               -      -          205.48M static 2008-11-01 20:51 
scooby                       N      /          2.61G   static 2008-11-04 10:29 

For the first time I have a single Solaris disk environment that can boot Solaris 10, Solaris Express Community Edition (nevada) or OpenSolaris and have access to all of my data. I did have to add a mount for my shared FAT32 file system (I have an iPhone and several iPods - so Windows do occasionally get opened), but that is about it. Now off to the repository to start playing with all of the new OpenSolaris goodies like Songbird, Brasero, Bluefish and the Xen bits.

Technocrati Tags:

Tuesday Sep 23, 2008

LDOMs or Containers, that is the question....

An often asked question, do I put my application in a container (zone) or an LDOM ? My question in reply is why the or ? The two technologies are not mutually exclusive, and in practice their combination can yield some very interesting results. So if it is not an or, under what circumstances would I apply each of the technologies ? And does it matter if I substitute LDOMs with VMware, Xen, VirtualBox or Dynamic System Domains ? In this context all virtual machine technologies are similar enough to treat them as a class, so we will generalize to zones vs virtual machines for the rest of this discussion.

First to the question of zones. All applications in Solaris 10 and later should be deployed in zones with the following exceptions
  • The restricted set of privileges in a zone will not allow the application to operate correctly
  • The application interacts with the kernel in an intimate fashion (reads or writes kernel data)
  • The application loads or unloads kernel modules
  • There is a higher level virtualization or abstraction technology in use that would obviate any benefits from deploying the application in a zone
Presented a different way, if the security model allows the application to run and you aren't diminishing the benefits of a zone, deploy in a zone.

Some examples of applications that have difficulty with the restrictive privileges would be security monitoring and auditing, hardware monitoring, storage (volume) management software, specialized file systems, some forms of application monitoring, intrusive debugging and inspection tools that use the kernel facilities such as the DTrace FBT provider. With the introduction of configurable zone privileges in Solaris 10 11/06, the number of applications that fit into this category should be few in number, highly specialized and not the type of application that you would want to deploy in a zone.

For the higher level abstraction exclusion, think of something at the application layer that tries to hide the underlying platform. The best example would be Oracle RAC. RAC abstracts the details of the platform so that it can provide continuously operating database services. It also has the characteristic that it is itself a consolidation platform with some notion of resource controls. Given the complexity associated with RAC, it would not be a good idea to consolidate non-RAC workloads on a RAC cluster. And since zones are all about consolidation, RAC would trump zones in this case.

There are other examples such as load balancers and transaction monitors. These are typically deployed on smaller horizontally scalable servers to provide greater bandwidth or increases service availability. Although they do not provide consolidation services, their sophisticated availability features might not interact well with the nonglobal zone restrictive security model. High availability frameworks such as SunCluster do work well with zones. Zones abstract applications in such a way that service failover configurations can be significantly simplified.



Unless your application falls under one of these exemptions, the application should be deployed in a zone.

What about virtual machines ? This type of abstraction is happening at a much lower level, in this case hardware resources (processors, memory, I/O). In contrast, zones abstract user space objects (processes, network stacks, resource controls). Virtual machines allow greater flexibility in running many types and versions of operating systems and applications but also eliminates many opportunities to share resources efficiently.

Where would I use virtual machines ? Where you need the diversity of multiple operating systems. This can be different types of operating system (Windows, Linux, Solaris) or different versions or patch levels of the same operating system. The challenge here is that large sites can have servers at many different patch and update versions, not by design but as a result of inadequate patching and maintenance tools. Enterprise patch management tools (xVM OpsCenter), patch managers (PCA), or automated provisioning tools (OpsWare) can help reduce the number of software combinations and online maintenance using Live Upgrade can reduce the time and effort required to maintain systems.

It is important to understand that zones are not virtual machines. Their differences and the implications of this are
  • Zones provide application isolation on a shared kernel
  • Zones share resources very efficiently (shared libraries, system caches, storage)
  • Zones have a configurable and restricted set of privileges
  • Zones allow for easy application of resource controls even in a complex dynamic application environment
  • Virtual machines provide relatively complete isolation between operating systems
  • Virtual machines allow consolidation of many types and versions of operating systems
  • Although virtual machines may allow oversubscription of resources, they provide very few opportunities to share critical resources
  • An operating system running in a virtual machine can still isolate applications using zones.
And it is that last point that carries this conversation a bit farther. If the decision between zones and virtual machines isn't an or, under what conditions would it be an and, and what sort of benefit can be expected ?

Consider the case of application consolidation. Suppose you have three applications: A, B and C. If they are consolidated without isolation then system maintenance becomes cumbersome as you can only patch or upgrade when all three application owners agree. Even more challenging is the time pressure to certify the newly patched or upgraded environment due to the fact that you have to test three things instead of one. Clearly isolation is a benefit in this case, and it is a persistent property (once isolated, forever isolated).

Isolation using zones alone will be very efficient but there will be times when the common shared kernel will be inconvenient - approaching the problems of the non-isolated case. Isolation using virtual machines is simple and very flexible but comes with a cost that might be unnecessary.

So why not do both ? Use zones to isolate the applications and use virtual machines for those times when you cannot support all of the applications with a common version of the operating system. In other words the isolation is a persistent property and the need for heterogeneous operating systems is temporary and specific. With some small improvements in the patching and upgrade tools, the time frame when you need heterogeneous operating systems can be reduced.

Using our three applications as an example, A B and C are deployed in separate zones on a single system image, bare metal or in a virtual machine. Everything is operating spectacularly until a new OS upgrade is available which provides some important new functionality for application A. So application owner A wants to upgrade immediately, application B doesn't care one way or the other, and (naturally) application C has just gone into seasonal lock-down and cannot be altered for the rest of the year.

Using zones and virtual machines provides a unique solution. Provision a new virtual machine with the new operating system software, either on the same platform by reassigning resources (CPU, memory) or on a separate platform. Next clone the zone running application A. Detach the newly cloned zone and migrate it to the new virtual machine. A new feature in Solaris 10 10/08 will automatically upgrade the new zone upon attachment to a server running newer software. Leave the original zone alone for some period of time in the event that an adverse regression appears that would force you to revert to the original version. Eventually the original zone can be reclaimed, but at a time when convenient.

Migrate the other two applications at a convenient time using the same procedure. When all of the applications have been migrated and you are comfortable that they have been adequately tested, the old system image can be shut down and any remaining resources can be reclaimed for other purposes. Zones as the sole isolation agent cannot do this and virtual machines by themselves will require more administrative effort and higher resource consumption during the long periods when you don't need different versions of the operating system. Combined you get the best of both features.

A less obvious example is ISV licensing. Consider the case of Oracle. Our friends at Oracle consider the combination of zones and capped resource controls as a hard partition method which allows you to license their software to the size of the resource cap, not the server. If you put Oracle in a zone on a 16 core system with a resource cap of 2 cores, you only pay for 2 cores. They have also made similar considerations for their Xen based Oracle VM product yet have been slow to respond to other virtual machine technologies. Zones to the rescue. If you deploy Oracle in a VM on a 16 core server you pay for all 16 cores. If you put that same application in a zone, in the same VM but cap the zone at 4 cores then you only pay for 4 cores.

Zones are all about isolation and application of resouce controls. Virtual machines are all about heterogeneous operating systems. Use zones to persistently isolate applications. Use virtual machines during the times when a single operating system version is not feasible.

This is only the beginning of the conversation. A new Blueprint based on measured results from some more interesting use cases is clearly needed. Jeff Savit, Jeff Victor and I will be working on this over the next few weeks and I'm sure that we will be blogging with partial results as they become available. As always, questions and suggestions are welcome. <script type="text/javascript"> var sc_project=1193495; var sc_invisible=1; var sc_security="a46f6831"; </script> <script type="text/javascript" src="http://www.statcounter.com/counter/counter.js"></script>

Monday Feb 18, 2008

ZFS and FMA - Two great tastes .....

Our good friend Isaac Rozenfeld talks about the Multiplicity of Solaris. When talking about Solaris I will use the phrase "The Vastness of Solaris". If you have attended a Solaris Boot Camp or Tech Day in the last few years you get an idea of what we are talking about - when we go on about Solaris hour after hour after hour.

But the key point in Isaac's multiplicity discussion is how the cornucopia of Solaris features work together to do some pretty spectacular (and competitively differentiating) things. In the past we've looked at combinations such as ZFS and Zones or Service Management, Role Based Access Control (RBAC) and Least Privilege. Based on a conversation last week in St. Louis, let's consider how ZFS and Solaris Fault Management (FMA) play together.

Preparation

Let's begin by creating some fake devices that we can play with. I don't have enough disks on this particular system, but I'm not going to let that slow me down. If you have sufficient real hot swappable disks, feel free to use them instead.
# mkfile 1g /dev/disk1
# mkfile 1g /dev/disk2
# mkfile 512m /dev/disk3
# mkfile 512m /dev/disk4
# mkfile 1g /dev/disk5

Now let's create a couple of zpools using the fake devices. pool1 will be a 1GB mirrored pool using disk1 and disk2. pool2 will be a 512MB mirrored pool using disk3 and disk4. Device spare1 will spare both pools in case of a problem - which we are about to inflict upon the pools.
# zpool create pool1 mirror disk1 disk2 spare spare1
# zpool create pool2 mirror disk3 disk4 spare spare1
# zpool status
  pool: pool1
 state: ONLINE
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        pool1       ONLINE       0     0     0
          mirror    ONLINE       0     0     0
            disk1   ONLINE       0     0     0
            disk2   ONLINE       0     0     0
        spares
          spare1    AVAIL   

errors: No known data errors

  pool: pool2
 state: ONLINE
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        pool2       ONLINE       0     0     0
          mirror    ONLINE       0     0     0
            disk3   ONLINE       0     0     0
            disk4   ONLINE       0     0     0
        spares
          spare1    AVAIL   

errors: No known data errors

So far so good. If we were to run a scrub on either pool, it will complete immediately. Remember that unlike hardware RAID disk replacement, ZFS scrubbing and resilvering only touches blocks that contain actual data. Since there is no data in these pools (yet), there is little for the scrubbing process to do.
# zpool scrub pool1
# zpool scrub pool2
# zpool status
  pool: pool1
 state: ONLINE
 scrub: scrub completed with 0 errors on Mon Feb 18 09:24:16 2008
config:

        NAME        STATE     READ WRITE CKSUM
        pool1       ONLINE       0     0     0
          mirror    ONLINE       0     0     0
            disk1   ONLINE       0     0     0
            disk2   ONLINE       0     0     0
        spares
          spare1    AVAIL   

errors: No known data errors

  pool: pool2
 state: ONLINE
 scrub: scrub completed with 0 errors on Mon Feb 18 09:24:17 2008
config:

        NAME        STATE     READ WRITE CKSUM
        pool2       ONLINE       0     0     0
          mirror    ONLINE       0     0     0
            disk3   ONLINE       0     0     0
            disk4   ONLINE       0     0     0
        spares
          spare1    AVAIL   

errors: No known data errors

Let's populate both pools with some data. I happen to have a directory of scenic images that I use as screen backgrounds - that will work nicely.

# cd /export/pub/pix>
# find scenic -print | cpio -pdum /pool1
# find scenic -print | cpio -pdum /pool2

# df -k | grep pool
pool1                1007616  248925  758539    25%    /pool1
pool2                 483328  248921  234204    52%    /pool2

And yes, cp -r would have been just as good.

Problem 1: Simple data corruption

Time to inflict some harm upon the pool. First, some simple corruption. Writing some zeros over half of the mirror should do quite nicely.
# dd if=/dev/zero of=/dev/dsk/disk1 bs=8192 count=10000 conv=notrunc
10000+0 records in
10000+0 records out 

At this point we are unaware that anything has happened to our data. So let's try accessing some of the data to see if we can observe ZFS self healing in action. If your system has plenty of memory and is relatively idle, accessing the data may not be sufficient. If you still end up with no errors after the cpio, try a zpool scrub - that will catch all errors in the data.
# cd /pool1
# find . -print | cpio -ov > /dev/null
416027 blocks

Let's ask our friend fmstat(1m) if anything is wrong ?
# fmstat
module             ev_recv ev_acpt wait  svc_t  %w  %b  open solve  memsz  bufsz
cpumem-retire            0       0  0.0    0.1   0   0     0     0      0      0
disk-transport           0       0  0.0  366.5   0   0     0     0    32b      0
eft                      0       0  0.0    2.6   0   0     0     0   1.4M      0
fmd-self-diagnosis       1       0  0.0    0.2   0   0     0     0      0      0
io-retire                0       0  0.0    1.1   0   0     0     0      0      0
snmp-trapgen             1       0  0.0   16.0   0   0     0     0    32b      0
sysevent-transport       0       0  0.0  620.3   0   0     0     0      0      0
syslog-msgs              1       0  0.0    9.7   0   0     0     0      0      0
zfs-diagnosis          162     162  0.0    1.5   0   0     1     0   168b   140b
zfs-retire               1       1  0.0  112.3   0   0     0     0      0      0

As the guys in the Guinness commercial say, "Brilliant!" The important thing to note here is that the zfs-diagnosis engine has run several times indicating that there is a problem somewhere in one of my pools. I'm also running this on Nevada so the zfs-retire engine has also run, kicking in a hot spare due to excessive errors.

So which pool is having the problems ? We continue our FMA investigation to find out.
# fmadm faulty
--------------- ------------------------------------  -------------- ---------
TIME            EVENT-ID                              MSG-ID         SEVERITY
--------------- ------------------------------------  -------------- ---------
Feb 18 09:56:24 d82d1716-c920-6243-e899-b7ddd386902e  ZFS-8000-GH    Major    

Fault class : fault.fs.zfs.vdev.checksum

Description : The number of checksum errors associated with a ZFS device
              exceeded acceptable levels.  Refer to
              http://sun.com/msg/ZFS-8000-GH for more information.

Response    : The device has been marked as degraded.  An attempt
              will be made to activate a hot spare if available.

Impact      : Fault tolerance of the pool may be compromised.

Action      : Run 'zpool status -x' and replace the bad device.


# zpool status -x
  pool: pool1
 state: DEGRADED
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://www.sun.com/msg/ZFS-8000-9P
 scrub: resilver in progress, 44.83% done, 0h0m to go
config:

        NAME          STATE     READ WRITE CKSUM
        pool1         DEGRADED     0     0     0
          mirror      DEGRADED     0     0     0
            spare     DEGRADED     0     0     0
              disk1   DEGRADED     0     0   162  too many errors
              spare1  ONLINE       0     0     0
            disk2     ONLINE       0     0     0
        spares
          spare1      INUSE     currently in use

errors: No known data errors

This tells us all that we need to know. The device disk1 was found to have quite a few checksum errors - so many in fact that it was replaced automatically by a hot spare. The spare was resilvering and a full complement of data replicas would be available soon. The entire process was automatic and completely observable.

Since we inflicted harm upon the (fake) disk device ourself, we know that it is in fact quite healthy. So we can restore our pool to its original configuration rather simply - by detaching the spare and clearing the error. We should also clear the FMA counters and repair the ZFS vdev so that we can tell if anything else is misbehaving in either this or another pool.
# zpool detach pool1 spare1
# zpool clear pool
# zpool status pool1
  pool: pool1
 state: ONLINE
 scrub: resilver completed with 0 errors on Mon Feb 18 10:25:26 2008
config:

        NAME        STATE     READ WRITE CKSUM
        pool1       ONLINE       0     0     0
          mirror    ONLINE       0     0     0
            disk1   ONLINE       0     0     0
            disk2   ONLINE       0     0     0
        spares
          spare1    AVAIL   

errors: No known data errors


# fmadm reset zfs-diagnosis
# fmadm reset zfs-retire
# fmstat
module             ev_recv ev_acpt wait  svc_t  %w  %b  open solve  memsz  bufsz
cpumem-retire            0       0  0.0    0.5   0   0     0     0      0      0
disk-transport           0       0  0.0  223.5   0   0     0     0    32b      0
eft                      1       0  0.0    4.6   0   0     0     0   1.4M      0
fmd-self-diagnosis       4       0  0.0    0.6   0   0     0     0      0      0
io-retire                1       0  0.0    1.1   0   0     0     0      0      0
snmp-trapgen             4       0  0.0    8.8   0   0     0     0    32b      0
sysevent-transport       0       0  0.0  372.7   0   0     0     0      0      0
syslog-msgs              4       0  0.0    5.4   0   0     0     0      0      0
zfs-diagnosis            0       0  0.0    1.4   0   0     0     0      0      0
zfs-retire               0       0  0.0    0.0   0   0     0     0      0      0


# fmdump -v -u d82d1716-c920-6243-e899-b7ddd386902e
TIME                 UUID                                 SUNW-MSG-ID
Feb 18 09:51:49.3025 d82d1716-c920-6243-e899-b7ddd386902e ZFS-8000-GH
  100%  fault.fs.zfs.vdev.checksum

        Problem in: 
           Affects: zfs://pool=pool1/vdev=449a3328bc444732
               FRU: -
          Location: -

# fmadm repair zfs://pool=pool1/vdev=449a3328bc444732
fmadm: recorded repair to zfs://pool=pool1/vdev=449a3328bc444732

# fmadm faulty

Problem 2: Device failure

Time to do a little more harm. In this case I will simulate the failure of a device by removing the fake device. Again we will access the pool and then consult fmstat to see what is happening (are you noticing a pattern here????).
# rm -f /dev/dsk/disk2
# cd /pool1
# find . -print | cpio -oc > /dev/null
416027 blocks

# fmstat
module             ev_recv ev_acpt wait  svc_t  %w  %b  open solve  memsz  bufsz
cpumem-retire            0       0  0.0    0.5   0   0     0     0      0      0
disk-transport           0       0  0.0  214.2   0   0     0     0    32b      0
eft                      1       0  0.0    4.6   0   0     0     0   1.4M      0
fmd-self-diagnosis       4       0  0.0    0.6   0   0     0     0      0      0
io-retire                1       0  0.0    1.1   0   0     0     0      0      0
snmp-trapgen             4       0  0.0    8.8   0   0     0     0    32b      0
sysevent-transport       0       0  0.0  372.7   0   0     0     0      0      0
syslog-msgs              4       0  0.0    5.4   0   0     0     0      0      0
zfs-diagnosis            0       0  0.0    1.4   0   0     0     0      0      0
zfs-retire               0       0  0.0    0.0   0   0     0     0      0      0

Rats, the find ran totally out of cache from the last example. As before, should this happen,proceed directly to zpool scrub.
# zpool scrub pool1
# fmstat
module             ev_recv ev_acpt wait  svc_t  %w  %b  open solve  memsz  bufsz
cpumem-retire            0       0  0.0    0.5   0   0     0     0      0      0
disk-transport           0       0  0.0  190.5   0   0     0     0    32b      0
eft                      1       0  0.0    4.1   0   0     0     0   1.4M      0
fmd-self-diagnosis       5       0  0.0    0.5   0   0     0     0      0      0
io-retire                1       0  0.0    1.0   0   0     0     0      0      0
snmp-trapgen             6       0  0.0    7.4   0   0     0     0    32b      0
sysevent-transport       0       0  0.0  329.0   0   0     0     0      0      0
syslog-msgs              6       0  0.0    4.6   0   0     0     0      0      0
zfs-diagnosis           16       1  0.0   70.3   0   0     1     1   168b   140b
zfs-retire               1       0  0.0  509.8   0   0     0     0      0      0

Again, hot sparing has kicked in automatically. The evidence of this is the zfs-retire engine running.
# fmadm faulty
--------------- ------------------------------------  -------------- ---------
TIME            EVENT-ID                              MSG-ID         SEVERITY
--------------- ------------------------------------  -------------- ---------
Feb 18 11:07:29 50ea07a0-2cd9-6bfb-ff9e-e219740052d5  ZFS-8000-D3    Major    
Feb 18 11:16:43 06bfe323-2570-46e8-f1a2-e00d8970ed0d

Fault class : fault.fs.zfs.device

Description : A ZFS device failed.  Refer to http://sun.com/msg/ZFS-8000-D3 for
              more information.

Response    : No automated response will occur.

Impact      : Fault tolerance of the pool may be compromised.

Action      : Run 'zpool status -x' and replace the bad device.

# zpool status -x
  pool: pool1
 state: DEGRADED
status: One or more devices could not be opened.  Sufficient replicas exist for
        the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
   see: http://www.sun.com/msg/ZFS-8000-2Q
 scrub: resilver in progress, 4.94% done, 0h0m to go
config:

        NAME          STATE     READ WRITE CKSUM
        pool1         DEGRADED     0     0     0
          mirror      DEGRADED     0     0     0
            disk1     ONLINE       0     0     0
            spare     DEGRADED     0     0     0
              disk2   UNAVAIL      0     0     0  cannot open
              spare1  ONLINE       0     0     0
        spares
          spare1      INUSE     currently in use

errors: No known data errors

As before, this tells us all that we need to know. A device (disk2) has failed and is no longer in operation. Sufficient spares existed and one was automatically attached to the damaged pool. Resilvering completed successfully and the data is once again fully mirrored.

But here's the magic. Let's repair the device - again simulated with our fake device.
# mkfile 1g /dev/dsk/disk2
# zpool repair pool1 disk2
# zpool status pool1 
  pool: pool1
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
 scrub: resilver in progress, 4.86% done, 0h1m to go
config:

        NAME               STATE     READ WRITE CKSUM
        pool1              DEGRADED     0     0     0
          mirror           DEGRADED     0     0     0
            disk1          ONLINE       0     0     0
            spare          DEGRADED     0     0     0
              replacing    DEGRADED     0     0     0
                disk2/old  UNAVAIL      0     0     0  cannot open
                disk2      ONLINE       0     0     0
              spare1       ONLINE       0     0     0
        spares
          spare1           INUSE     currently in use

errors: No known data errors

Get a cup of coffee while the resilvering process runs.
# zpool status
  pool: pool1
 state: ONLINE
 scrub: resilver completed with 0 errors on Mon Feb 18 11:23:13 2008
config:

        NAME        STATE     READ WRITE CKSUM
        pool1       ONLINE       0     0     0
          mirror    ONLINE       0     0     0
            disk1   ONLINE       0     0     0
            disk2   ONLINE       0     0     0
        spares
          spare1    AVAIL   


# fmadm faulty

Notice the nice integration with FMA. Not only was the new device resilvered, but the hot spare was detached and the FMA fault was cleared. The fmstat counters still show that there was a problem and the fault report still existes in the fault log for later interrogation.
# fmstat
module             ev_recv ev_acpt wait  svc_t  %w  %b  open solve  memsz  bufsz
cpumem-retire            0       0  0.0    0.5   0   0     0     0      0      0
disk-transport           0       0  0.0  171.5   0   0     0     0    32b      0
eft                      1       0  0.0    3.6   0   0     0     0   1.4M      0
fmd-self-diagnosis       6       0  0.0    0.6   0   0     0     0      0      0
io-retire                1       0  0.0    0.9   0   0     0     0      0      0
snmp-trapgen             6       0  0.0    6.8   0   0     0     0    32b      0
sysevent-transport       0       0  0.0  294.3   0   0     0     0      0      0
syslog-msgs              6       0  0.0    4.2   0   0     0     0      0      0
zfs-diagnosis           36       1  0.0   51.6   0   0     0     1      0      0
zfs-retire               1       0  0.0  170.0   0   0     0     0      0      0

# fmdump
TIME                 UUID                                 SUNW-MSG-ID
Feb 16 11:38:16.0976 48935791-ff83-e622-fbe1-d54c20385afc ZFS-8000-GH
Feb 16 11:38:30.8519 9f7f288c-fea8-e5dd-bf23-c0c9c4e07233 ZFS-8000-GH
Feb 18 09:51:49.3025 2ac4568f-4040-cb5d-f3b8-ae3d69e7d713 ZFS-8000-GH
Feb 18 09:56:24.8029 d82d1716-c920-6243-e899-b7ddd386902e ZFS-8000-GH
Feb 18 10:23:07.2228 7c04a6f7-d22a-e467-c44d-80810f27b711 ZFS-8000-GH
Feb 18 10:25:14.6429 faca0639-b82b-c8e8-c8d4-fc085bc03caa ZFS-8000-GH
Feb 18 11:07:29.5195 50ea07a0-2cd9-6bfb-ff9e-e219740052d5 ZFS-8000-D3
Feb 18 11:16:44.2497 06bfe323-2570-46e8-f1a2-e00d8970ed0d ZFS-8000-D3


# fmdump -V -u 50ea07a0-2cd9-6bfb-ff9e-e219740052d5
TIME                 UUID                                 SUNW-MSG-ID
Feb 18 11:07:29.5195 50ea07a0-2cd9-6bfb-ff9e-e219740052d5 ZFS-8000-D3

  TIME                 CLASS                                 ENA
  Feb 18 11:07:27.8476 ereport.fs.zfs.vdev.open_failed       0xb22406c635500401

nvlist version: 0
        version = 0x0
        class = list.suspect
        uuid = 50ea07a0-2cd9-6bfb-ff9e-e219740052d5
        code = ZFS-8000-D3
        diag-time = 1203354449 236999
        de = (embedded nvlist)
        nvlist version: 0
                version = 0x0
                scheme = fmd
                authority = (embedded nvlist)
                nvlist version: 0
                        version = 0x0
                        product-id = Dimension XPS                
                        chassis-id = 7XQPV21
                        server-id = arrakis
                (end authority)

                mod-name = zfs-diagnosis
                mod-version = 1.0
        (end de)

        fault-list-sz = 0x1
        fault-list = (array of embedded nvlists)
        (start fault-list[0])
        nvlist version: 0
                version = 0x0
                class = fault.fs.zfs.device
                certainty = 0x64
                asru = (embedded nvlist)
                nvlist version: 0
                        version = 0x0
                        scheme = zfs
                        pool = 0x3a2ca6bebd96cfe3
                        vdev = 0xedef914b5d9eae8d
                (end asru)

                resource = (embedded nvlist)
                nvlist version: 0
                        version = 0x0
                        scheme = zfs
                        pool = 0x3a2ca6bebd96cfe3
                        vdev = 0xedef914b5d9eae8d
                (end resource)

        (end fault-list[0])

        fault-status = 0x3
        __ttl = 0x1
        __tod = 0x47b9bb51 0x1ef7b430

# fmadm reset zfs-diagnosis
fmadm: zfs-diagnosis module has been reset

# fmadm reset zfs-retire
fmadm: zfs-retire module has been reset

Problem 3: Unrecoverable corruption

For those of you that have attended one of my Boot Camps or Solaris Best Practices training classes know, House is one of my favorite TV shows - the only one that I watch regularly. And this next example would make a perfect episode. Is it likely to happen ? No, but it is so cool when it does :-)

Remember our second pool, pool2. It has the same contents as pool1. Now, let's do the unthinkable - let's corrupt both halves of the mirror. Surely data loss will follow, but the fact that Solaris stays up and running and can report what happened is pretty spectacular. But it gets so much better than that.
# dd if=/dev/zero of=/dev/dsk/disk3 bs=8192 count=10000 conv=notrunc
# dd if=/dev/zero of=/dev/dsk/disk4 bs=8192 count=10000 conv=notrunc
# zpool scrub pool2

# fmstat
module             ev_recv ev_acpt wait  svc_t  %w  %b  open solve  memsz  bufsz
cpumem-retire            0       0  0.0    0.5   0   0     0     0      0      0
disk-transport           0       0  0.0  166.0   0   0     0     0    32b      0
eft                      1       0  0.0    3.6   0   0     0     0   1.4M      0
fmd-self-diagnosis       6       0  0.0    0.6   0   0     0     0      0      0
io-retire                1       0  0.0    0.9   0   0     0     0      0      0
snmp-trapgen             8       0  0.0    6.3   0   0     0     0    32b      0
sysevent-transport       0       0  0.0  294.3   0   0     0     0      0      0
syslog-msgs              8       0  0.0    3.9   0   0     0     0      0      0
zfs-diagnosis         1032    1028  0.6   39.7   0   0    93     2    15K    13K
zfs-retire               2       0  0.0  158.5   0   0     0     0      0      0

As before, lots of zfs-diagnosis activity. And two hits to zfs-retire. But we only have one spare - this should be interesting. Let's see what is happenening.
# fmadm faulty
--------------- ------------------------------------  -------------- ---------
TIME            EVENT-ID                              MSG-ID         SEVERITY
--------------- ------------------------------------  -------------- ---------
Feb 18 09:56:24 d82d1716-c920-6243-e899-b7ddd386902e  ZFS-8000-GH    Major    
Feb 18 13:18:42 c3889bf1-8551-6956-acd4-914474093cd7

Fault class : fault.fs.zfs.vdev.checksum

Description : The number of checksum errors associated with a ZFS device
              exceeded acceptable levels.  Refer to
              http://sun.com/msg/ZFS-8000-GH for more information.

Response    : The device has been marked as degraded.  An attempt
              will be made to activate a hot spare if available.

Impact      : Fault tolerance of the pool may be compromised.

Action      : Run 'zpool status -x' and replace the bad device.

--------------- ------------------------------------  -------------- ---------
TIME            EVENT-ID                              MSG-ID         SEVERITY
--------------- ------------------------------------  -------------- ---------
Feb 16 11:38:30 9f7f288c-fea8-e5dd-bf23-c0c9c4e07233  ZFS-8000-GH    Major    
Feb 18 09:51:49 2ac4568f-4040-cb5d-f3b8-ae3d69e7d713
Feb 18 10:23:07 7c04a6f7-d22a-e467-c44d-80810f27b711
Feb 18 13:18:42 0a1bf156-6968-4956-d015-cc121a866790

Fault class : fault.fs.zfs.vdev.checksum

Description : The number of checksum errors associated with a ZFS device
              exceeded acceptable levels.  Refer to
              http://sun.com/msg/ZFS-8000-GH for more information.

Response    : The device has been marked as degraded.  An attempt
              will be made to activate a hot spare if available.

Impact      : Fault tolerance of the pool may be compromised.

Action      : Run 'zpool status -x' and replace the bad device.

# zpool status -x
  pool: pool2
 state: DEGRADED
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
 scrub: resilver completed with 602 errors on Mon Feb 18 13:20:14 2008
config:

        NAME          STATE     READ WRITE CKSUM
        pool2         DEGRADED     0     0 2.60K
          mirror      DEGRADED     0     0 2.60K
            spare     DEGRADED     0     0 2.43K
              disk3   DEGRADED     0     0 5.19K  too many errors
              spare1  DEGRADED     0     0 2.43K  too many errors
            disk4     DEGRADED     0     0 5.19K  too many errors
        spares
          spare1      INUSE     currently in use

errors: 247 data errors, use '-v' for a list

So ZFS tried to bring in a hot spare, but there were insufficient replicas to be able to reconstruct all of the data. But here is where is gets interesting. Let's see what zpool status -v says about things.
zpool status -v
  pool: pool1
 state: ONLINE
 scrub: resilver completed with 0 errors on Mon Feb 18 11:23:13 2008
config:

        NAME        STATE     READ WRITE CKSUM
        pool1       ONLINE       0     0     0
          mirror    ONLINE       0     0     0
            disk1   ONLINE       0     0     0
            disk2   ONLINE       0     0     0
        spares
          spare1    INUSE     in use by pool 'pool2'

errors: No known data errors

  pool: pool2
 state: DEGRADED
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
 scrub: resilver completed with 602 errors on Mon Feb 18 13:20:14 2008
config:

        NAME          STATE     READ WRITE CKSUM
        pool2         DEGRADED     0     0 2.60K
          mirror      DEGRADED     0     0 2.60K
            spare     DEGRADED     0     0 2.43K
              disk3   DEGRADED     0     0 5.19K  too many errors
              spare1  DEGRADED     0     0 2.43K  too many errors
            disk4     DEGRADED     0     0 5.19K  too many errors
        spares
          spare1      INUSE     currently in use

errors: Permanent errors have been detected in the following files:

        /pool2/scenic/cider mill crowds.jpg
        /pool2/scenic/Cleywindmill.jpg
        /pool2/scenic/csg_Landscapes001_GrandTetonNationalPark,Wyoming.jpg
        /pool2/scenic/csg_Landscapes002_ElowahFalls,Oregon.jpg
        /pool2/scenic/csg_Landscapes003_MonoLake,California.jpg
        /pool2/scenic/csg_Landscapes005_TurretArch,Utah.jpg
        /pool2/scenic/csg_Landscapes004_Wildflowers_MountRainer,Washington.jpg
        /pool2/scenic/csg_Landscapes!idx011.jpg
        /pool2/scenic/csg_Landscapes127_GreatSmokeyMountains-NorthCarolina.jpg
        /pool2/scenic/csg_Landscapes129_AcadiaNationalPark-Maine.jpg
        /pool2/scenic/csg_Landscapes130_GettysburgNationalPark-Pennsylvania.jpg
        /pool2/scenic/csg_Landscapes131_DeadHorseMill,CrystalRiver-Colorado.jpg
        /pool2/scenic/csg_Landscapes132_GladeCreekGristmill,BabcockStatePark-WestVirginia.jpg
        /pool2/scenic/csg_Landscapes133_BlackwaterFallsStatePark-WestVirginia.jpg
        /pool2/scenic/csg_Landscapes134_GrandCanyonNationalPark-Arizona.jpg
        /pool2/scenic/decisions decisions.jpg
        /pool2/scenic/csg_Landscapes135_BigSur-California.jpg
        /pool2/scenic/csg_Landscapes151_WataugaCounty-NorthCarolina.jpg
        /pool2/scenic/csg_Landscapes150_LakeInTheMedicineBowMountains-Wyoming.jpg
        /pool2/scenic/csg_Landscapes152_WinterPassage,PondMountain-Tennessee.jpg
        /pool2/scenic/csg_Landscapes154_StormAftermath,OconeeCounty-Georgia.jpg
        /pool2/scenic/Brig_Of_Dee.gif
        /pool2/scenic/pvnature14.gif
        /pool2/scenic/pvnature22.gif
        /pool2/scenic/pvnature7.gif
        /pool2/scenic/guadalupe.jpg
        /pool2/scenic/ernst-tinaja.jpg
        /pool2/scenic/pipes.gif
        /pool2/scenic/boat.jpg
        /pool2/scenic/pvhawaii.gif
        /pool2/scenic/cribgoch.jpg
        /pool2/scenic/sun1.gif
        /pool2/scenic/sun1.jpg
        /pool2/scenic/sun2.jpg
        /pool2/scenic/andes.jpg
        /pool2/scenic/treesky.gif
        /pool2/scenic/sailboatm.gif
        /pool2/scenic/Arizona1.jpg
        /pool2/scenic/Arizona2.jpg
        /pool2/scenic/Fence.jpg
        /pool2/scenic/Rockwood.jpg
        /pool2/scenic/sawtooth.jpg
        /pool2/scenic/pvaptr04.gif
        /pool2/scenic/pvaptr07.gif
        /pool2/scenic/pvaptr11.gif
        /pool2/scenic/pvntrr01.jpg
        /pool2/scenic/Millport.jpg
        /pool2/scenic/bryce2.jpg
        /pool2/scenic/bryce3.jpg
        /pool2/scenic/monument.jpg
        /pool2/scenic/rainier1.gif
        /pool2/scenic/arch.gif
        /pool2/scenic/pv-anzab.gif
        /pool2/scenic/pvnatr15.gif
        /pool2/scenic/pvocean3.gif
        /pool2/scenic/pvorngwv.gif
        /pool2/scenic/pvrmp001.gif
        /pool2/scenic/pvscen07.gif
        /pool2/scenic/pvsltd04.gif
        /pool2/scenic/banhall28600-04.JPG
        /pool2/scenic/pvwlnd01.gif
        /pool2/scenic/pvnature08.gif
        /pool2/scenic/pvnature13.gif
        /pool2/scenic/nokomis.jpg
        /pool2/scenic/lighthouse1.gif
        /pool2/scenic/lush.gif
        /pool2/scenic/oldmill.gif
        /pool2/scenic/gc1.jpg
        /pool2/scenic/gc2.jpg
        /pool2/scenic/canoe.gif
        /pool2/scenic/Donaldson-River.jpg
        /pool2/scenic/beach.gif
        /pool2/scenic/janloop.jpg
        /pool2/scenic/grobacro.jpg
        /pool2/scenic/fnlgld.jpg
        /pool2/scenic/bells.gif
        /pool2/scenic/Eilean_Donan.gif
        /pool2/scenic/Kilchurn_Castle.gif
        /pool2/scenic/Plockton.gif
        /pool2/scenic/Tantallon_Castle.gif
        /pool2/scenic/SouthStockholm.jpg
        /pool2/scenic/BlackRock_Cottage.jpg
        /pool2/scenic/seward.jpg
        /pool2/scenic/canadian_rockies_csg110_EmeraldBay.jpg
        /pool2/scenic/canadian_rockies_csg111_RedRockCanyon.jpg
        /pool2/scenic/canadian_rockies_csg112_WatertonNationalPark.jpg
        /pool2/scenic/canadian_rockies_csg113_WatertonLakes.jpg
        /pool2/scenic/canadian_rockies_csg114_PrinceOfWalesHotel.jpg
        /pool2/scenic/canadian_rockies_csg116_CameronLake.jpg
        /pool2/scenic/Castilla_Spain.jpg
        /pool2/scenic/Central-Park-Walk.jpg
        /pool2/scenic/CHANNEL.JPG



In my best Hugh Laurie voice trying to sound very Northeastern American, that is so cool! But we're not even done yet. Let's take this list of files and restore them - in this case, from pool1. Operationally this would be from a back up tape or nearline backup cache, but for our purposes, the contents in pool1 will do nicely.

First, let's clear the zpool error counters and return the spare disk. We want to make sure that our restore works as desired. Oh, and clear the FMA stats while we're at it.
# zpool clear
# zpool detach pool2 spare1

# fmadm reset zfs-diagnosis
fmadm: zfs-diagnosis module has been reset

# fmadm reset zfs-retire   
fmadm: zfs-retire module has been reset

Now individually restore the files that have errors in them and check again. You can even export and reimport the pool and you will find a very nice, happy, and thoroughly error free ZFS pool. Some rather unpleasant gnashing of zpool status -v output with awk has been omitted for sanity sake.
# zpool scrub pool2
# zpool status pool2
  pool: pool2
 state: ONLINE
 scrub: scrub completed with 0 errors on Mon Feb 18 14:04:56 2008
config:

        NAME        STATE     READ WRITE CKSUM
        pool2       ONLINE       0     0     0
          mirror    ONLINE       0     0     0
            disk3   ONLINE       0     0     0
            disk4   ONLINE       0     0     0
        spares
          spare1    AVAIL   

errors: No known data errors

# zpool export pool2
# zpool import pool2
# dircmp -s /pool1 /pool2

Conclusions and Review

So what have we learned ? ZFS and FMA are two great tastes that taste great together. No, that's chocolate and peanut butter, but you get this idea. One more great example of Isaac's Multiplicity of Solaris.

That, and I have finally found a good lab exercise for the FMA training materials. Ever since Christine Tran put the FMA workshop together, we have been looking for some good FMA lab exercises. The materials reference a synthetic fault generator that is not available in public (for obvious reasons). I haven't explored the FMA test harness enough to know if there is anything in there that would make a good lab. But this exercise that we have just explored seems to tie a number of key pieces together.

And of course, one more reason why Roxy says, "You should run Solaris."

Technocrati Tags:

Sunday Feb 17, 2008

Roxy says "You Should Run Solaris"


Look into my eyes.
You want to run Solaris.
You want to run Solaris.
You want to run Solaris.

Repeat after me
You want to run Solaris.

And make me a peanut butter and banana sandwich!

Click image to enlarge 

Wednesday Oct 03, 2007

LIve Upgrade from Solaris 10 11/06 to 8/07 without nonglobal zones

Live Upgrade is one of the most useful Solaris features, yet in my travels around the US I still don't see it used as much as I would like. I can think of several reasons for this - not all of them totally valid
  • I tried it once a long time ago and a patch or package that wasn't LU aware messed up my current boot environment. Not valid for Solaris components although we do see the occasional partner product with this problem. The last one I saw was the NVidia driver, and the good folks from NVidia fixed it very quickly once reported.
  • The documentation can be a bit intimidating. Valid with a capital V. But Live Upgrade is an amazingly flexible feature, so at some point you do have to describe these capabilities. As a guide through this documentation, several folks have blogged managable howto guides. You can find mine back in March 2007, although I've recently updated it. And there are other good blogs with plentry of examples. There is a very good Blueprint on Live Upgrade.
  • It doesn't work with the Veritas Volume Manager.
  • I didn't know about Live Upgrade. Well, you do now. But I have noticed that a lot of the Solaris conversation is focused on new features, like ZFS, Zones, SMF, DTrace and some of the older features like Flash archives and Live Upgrade don't receive the attention they deserve. The simple fact is that Live Upgrade takes all of the pain out of the patching process, at least once you know what to patch.
And I'm sure there are other reasons, but these are the ones I hear most often.

Let's turn our attention to the topic at hand, upgrading a Solaris 10 11/06 system to 8/07, without zones. This example will be on an x64 system, but the SPARC approach is simular.

If you have read my earlier blog on Live Upgrade, you will recall the process is
  1. Read Infodoc Infodoc 72099 and install any required patches
  2. Install the LU packages SUNWluu SUNWlur and SUNWlucfg (if present) from the installation media
  3. lurename(1m) if you want to change the name of your new boot environment
  4. lumake(1m) or ludelete(1m) + lucreate(1m) to repopulate the target boot environment with the proper software and configuration files
  5. luupgrade(1m) to upgrade the target boot environment
  6. luactivate(1m) to activate the new boot environment
  7. init 0 to perform the file synchronization and conversions, create the new boot archive and update your GRUB menu


So I fire up my web browser and run over to SunSolve to pick up Infodoc 72099 and see a rather large set of patches. And there are two lists, one for systems with non-global zones and one without. Since we're looking at a system without non-global zones we will start with the shorter of the two lists (the next article will cover systems with nonglobal zones).

Apparently we need patches
	 
Solaris 10 	x86 	118816-03 or higher 	nawk patch 	 
Solaris 10 	x86 	120901-03 or higher 	libzonecfg patch 	 
Solaris 10 	x86 	121334-04 or higher 	SUNWzoneu required patch 	 
Solaris 10 	x86 	119255-42 or higher 	patchadd/patchrm patches 	 
Solaris 10 	x86 	119318-01 or higher 	SVr4 Packaging Commands (usr) Patch 	 
Solaris 10 	x86 	117435-02 or higher 	biosdev patch for GRUB Boot 	 

Reboot after installation 	 

Solaris 10 	x86 	120236-01 or higher 	SUNWluzone required patches 	 
Solaris 10 	x86 	121429-08 or higher 	SUNWluzone required patches 	 
Solaris 10 	x86 	121003-03 or higher 	pax patch 	 
Solaris 10 	x86 	123122-02 or higher 	prodreg patch 	 
Solaris 10 	x86 	121005-03		sh patch 	 
Solaris 10 	x86 	119043-10		/usr/sbin/svccfg patch 	 
Solaris 10 	x86 	121902-02		i.manifest r.manifest class action script patch 	 
Solaris 10 	x86 	120901-03		libzonecfg patch 	 
Solaris 10 	x86 	120069-03		telnet security patch 	 
Solaris 10 	x86 	120070-02		cpio patch 	 
Solaris 10 	x86 	123333-01		tftp patch


Hmmm, seems like a lot of patches and a required reboot! So I fire up our new friend updatemanager to patch my system. I see that there is a new updatemanager patch available (121119-13), so I installed that one all by itself and restarted updatemanager.

I soon realize that my choice of patching tools is making this a bit challenging. Users of patch tools such as Patch Check Advanced(PCA) may have an easier time, but I was determined to do this with updatemanager, with occasional help from the patch READMEs in SunSolve.

The list of patches required for this upgrade applies to any release of Solaris 10. A fresh install of a Solaris 10 11/06 system only needed the following four patches - which is a lot better than I first thought.
	 
119255-42	 
121429-08
126539-01 as it replaces the required 121902-02
125419-01 as it replaces the required 120069-03
The difficulty with updatemanager was with the set of obsoleted patches. Something like the required 121902-02 that was obsoleted by 126539-01 which was installed took a bit of manual trolling through patch READMEs. So I'll save you the research - it came down to only the four above patches.

One important note: the required reboot after patch 117435-02 wasn't needed after all - so I'll try to save all of you Solaris 10 11/06 users one reboot. While I have your attention, it is a good idea, if not a best practice, to install patch and packaging patches separately.

Feeling a lot better about this process, I proceed and install the four required patches using updatemanager in two steps (119255-42 and then the other three patches) and all succeeded, as expected. All that was left to do was finish the standard procedure
# mount -o ro -F hsfs `lofiadm -a /export/iso/s10u4/solarisdvd.iso` /mnt 
# pkgadd -d /mnt/Solaris_10/Product SUNWlur SUNWluu SUNWlucfg 
# lurename -e nv71 -n s10u4 
# lumake -n s10u4 
# luupgrade -u -s /mnt -n s10u4 
# luactivate s10u4 
# init 0 


And all went as expected. Next time I will tackle the longer list of patches and examine the same upgrade path, but with nonglobal zones.

Technocrati Tags:

Thursday Jun 21, 2007

Updated Solaris Bootcamp Presentations

I've had a great time traveling around the country talking about Solaris. It's not exactly a difficult thing - there's plenty to talk about. Many of you have asked for copies of the latest Solaris update, virtualization overview and ZFS deep dive. Rather than have you dig through a bunch of old blog entries about bootcamps from 2005, here they are for your convenience.



I hope this will save you some digging though http://mediacast.sun.com and tons of old blogs.

In a few weeks I'll post a new "What's New in Solaris" which will have some really cool things. But we'll save that for later.

Technocrati Tags:

Monday Jun 11, 2007

True Virtualization ?

While this is inspired by a recent conversation with a customer, I have seen the term "true virtualization" used quite a bit lately - mostly by people who have just attended a VMware seminar, and to a lesser extend folks from IBM trying to compare LPARS with Solaris zones. While one must give due credit to the fine folks at VMware for raising Information Technology (IT) awareness and putting virtualization in the common vocabulary, they hardly have cornered the market on virtualization and using the term "true virtualization" may reveal how narrow an understanding they have of the concept or an unfortunate arrogance that their approach is the only one that matters.

Wikipedia defines virtualization as a technique for hiding the physical characteristics of computing resources from the way in which other systems, applications, or end users interact with those resources. While Wikipedia isn't the final authority, this definition is quite good and we will use it to start our exploration.

So what is true virtualization ? Anything that (potentially) hides architectural details from running objects (programs, services, operating systems, data). No more, no less - end of discussion.

Clearly VMware's virtualization products (ESX, Workstation) do that. They provide virtual machines that emulate the Intel x86 Instruction Set Architecture (ISA) so that operating systems think they are running on real hardware when in fact they are not. This type of virtualization would be classified as an abstraction type of virtual machines. But so is Xen, albeit with an interesting twist. In the case of Xen, a synthetic ISA based on the x86 is emulated removing some of the instructions that are difficult to virtualize. This makes porting a rather simple task - none of the user space code needs to be modified and the privileged code is generally limited to parts of the kernel that actually touch the hardware (virtual memory management, device drivers). In some respects, Xen is less of an abstraction as it does allow the virtual machines to see the architectural details thus permitting specific optimizations to occur that would be prohibited in the VMware case. And our good friends at Intel and AMD are adding new features to their processors to make virtualization less complicated and higher performance so the differences in approach between the VMware and Xen hypervisors may well blur over time.

But is this true virtualization ? No, it is just one of many types of virtualization.

How about the Java Virtual Machine (JVM) ? It is a run time executive that provides a virtualized environment for a completely synthetic ISA (although real pcode implementations have been done, they are largely for embedded systems). This is the magic behind write once and run anywhere and in general the approach works very well. So this is another example of virtualization - and also an abstraction type. And given the number of JVMs running around out there - if anyone is going to claim true virtualization, it would be the Java folks. Fortunately their understanding of the computer industry is broad and they are not arrogant - thus they would never suggest such folly.

Sun4v Logical Domains (LDOMs) are a thin hypervisor based partitioning of a radically multithreaded SPARC processor. The guest domains (virtual machines) run on real hardware but generally have no I/O devices. These guest domains get their I/O over a private channel from a service domain (a special type of domain that owns devices and contains the real device drivers). So I/O is virtualized but all other operations are executed on real hardware. The hypervisor provides resource (CPU and memory) allocation and management and the private channels for I/O (including networking). This too is virtualization, but not like Xen or VMware. This is an example of partitioning. Another example is IBM (Power) LPARS albeit with a slightly different approach.

Are there other types of virtualization ? Of course there are.

Solaris zones are an interesting type of virtualization called OS Virtualization. In this case we interpose the virtualization layer between the privileged kernel layer the non-privileged user space. The benefit here is that all user space objects (name space, processes, address spaces) are completely abstracted and isolated. Unlike the methods previously discussed, the kernel and underlying hardware resources are not artificially limited, so the full heavy lifting capability of the kernel is available to all zones (subject to other resource management policies). The trade-off for this capability is that all zones share a common kernel. This has some availability and flexibility limitations that should be considered in a system design using zones. Non-native (Branded) zones offers some interesting flexibilities that we are just now beginning to exploit, so the future of this approach is very bright indeed. And if I read my competitors announcements correctly, even our good friends at IBM are embracing this approach with future releases of AIX. So clearly there is something to this thing called OS Virtualization.

And there are other approaches as well - hybrids of the types we have been discussing. Special purpose libraries that either replace or interpose between common system libraries can provide some very nice virtualization capabilities - some of these transparent to applications, some not. The open source project Wine is a good example of this. User mode Linux and it's descendants offer some abilities to run an operating system as user mode program, albeit not particularly efficiently.

QEMU is an interesting general purpose ISA simulator/translator that can be used to host non-native operating systems (such as Windows while running Solaris or Linux). The interesting thing about QEMU is that you can strip out the translation features with a special kernel module (kqemu) and the result is very efficient and nicely performing OS hosting (essentially simulating x86 running on x86). Kernel-based Virtual Machines (KVM) extends the QEMU capability to add yet another style of virtualization to Linux. It is not entirely clear at present whether KVM is really a better idea or just another not invented here (NIH) Linux project. Time will tell, but it would have been nice for the Linux kernel maintainers to take a page from OpenSolaris and embrace an already existing project that had some non-Linux vendor participation (\*BSD, Solaris, Plan 9, plus some mainstream Linux distributions). At the very least it is confusing as most experienced IT professionals will associate KVM with Keyboard Video and Mouse switching products. There are other commercial products such as QuickTransit that use a similar approach (ISA translation).

And there are many many more.

So clearly the phrase "true virtualization" has no common or useful meaning. Questioning the application or definition of the phrase will likely uncover a predisposition or bias that might be a good starting point to carry on an interesting dialog. And that's always a good idea.

I leave you with one last thought. It is probably human nature to seek out the one uniform solution to all of our problems, the Grand Unification Theory being a great example. But in general, be skeptical of one size fits all approaches - while they may in fact fit all situations, they are generally neither efficient nor flattering. What does this have to do with virtualization ? Combining various techniques quite often will yield spectacular results. In other words, don't think VMware vs Zones - think VMware and Zones. In fact if you think Solaris, don't even think about zones, just do zones. If you need the additional abstraction to provide flexibility (heterogeneous or multiple version OS support) then use VMware or LDOMs. And zones.

Next time we'll take a look at abstraction style virtualization techniques and see if we can develop a method of predicting the overhead that each technique might impose on a system. Since a good apples to apples benchmark is not likely to ever see the light of day, perhaps some good old fashioned reasoning can help us make sense of what information we can find.

Technocrati Tags:

Tuesday Nov 07, 2006

Updated multi-boot disk layout

It's been a while since our Getting Started article on multi-boot disk configurations. The old Solaris bootloader is long gone, in favor of the more flexible (and multiboot friendly) GRUB. ZFS is now available in both Solaris and OpenSolaris. The Branded Zones and Xen OpenSolaris projects are getting more interesting. And in making the best of a difficult situation, a laptop disk failure has given me an opportunity to re-think the original layout.

In addition to the project goals in the original article, here are some new capabilities that I would like to explore.
  • Zones - I want to build zones, and lots of them. Solaris as well as Linux zones.
  • Leverage Live Upgrade so that I can upgrade my system while winging my way across the US
  • Consolidate all of my home directories into a small but useful set
  • Be able to build OpenSolaris on a more regular basis
  • Find a more efficient way of sharing a large Open Source repository across multiple OS instances
  • Start using ZFS for more than just simple demonstrations
  • Reserve some space to play with Linux distributions such as Fedora Core and Ubuntu
  • GNOME 2.16 development builds for OpenSolaris


Before we proceed we need to make sure that the file systems will be sized properly. With only 80GB to work with, we will have to be efficient in a few places (like sharing the Software Companion, Blastwave repository, and Staroffice 8), even at the cost of a bit of complexity.

Estimated disk requirements

Component
Size
Notes
Solaris 10
Entire Software Group
4GB
The main boot environment will include Solaris 10, Staroffice 8, development tools, Software Companion, and Blastwave repository
Software Express (Nevada)
Entire Software Group
5GB
Includes space for GNOME 2.16 development
OpenSolaris Nightly Build 5GB
BFU from Nevada or nightly build from source
Solaris 10
Software Companion
2GB
Installed in Solaris 10 /opt, shared in other installations
Staroffice 8 500MB
Installed in Solaris 10 /opt, shared in other installations
Compilers 1GB
Installed in Solaris 10 /opt, shared in other installations
Blastwave Repository 1GB
Installed in Solaris 10 /opt, shared in other installations
Swap 2GB
Not really needed for Solaris, but will be shared with Linux


Taking all of this into consideration, the new disk layout looks something like this.

Partition
Size
Type
Mount Point
Notes
1
12GB
NTFS
Window XP C:
/xp
Read-only access under Linux using Linux-ntfs kernel modules
No access from Solaris
2
44GB
Solaris UFS
s0 - S10 boot environment (8GB)
s1 - swap
s3 - Nevada boot environment (6GB)
s4 - OpenSolaris boot environment (6GB)
s5 - ZFS slice 1 (2GB)
s6 - ZFS slice 2 (2GB)
s7 - /export (16GB)
Solaris swap partition is available to Linux as /dev/hda9
4
22GB
Extended
N/A

5
4GB
FAT32
Windows XP E:
/pc on Solaris (pcfs) and Linux (vfat)
Device name is /dev/hda5 in Linux and /dev/dsk/c0d0p0:1 in Solaris
6
10GB
Linux (ext3)
/
Linux root (today Fedora Core, may soon be Ubuntu as Fedora Core 5 was a major disappointment)
7
6GB
Linux (ext3)
/export
Shared home directory that can quickly be reused for CentOS 3 (for testing BrandZ things) and Ubuntu releases.


Next time, a short example of using Live Upgrade to install a new version of Solaris Express while running the Solaris 10 daily driver.

Technocrati Tags:
About

Bob Netherton is a Principal Sales Consultant for the North American Commercial Hardware group, specializing in Solaris, Virtualization and Engineered Systems. Bob is also a contributing author of Solaris 10 Virtualization Essentials.

This blog will contain information about all three, but primarily focused on topics for Solaris system administrators.

Please follow me on Twitter Facebook or send me email

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today