Friday Feb 01, 2013

Installing Solaris 10 in VirtualBox (watch that memory setting)

During a recent Solaris 11 Hands on Workshop, a couple of attendees mentioned troubles installing Solaris 10 as a VirtualBox guest. In truth, it had been quite a while since I last installed Solaris 10, so I was unable to help at that particular moment, other than to recommend verifying the md5 checksum of the downloaded ISO image - a very common installation problem.

Thinking a bit more about this, the last time I installed Solaris 10 as a guest was 10/08 (u6) and it would have been on version 3.something of VirtualBox. Thanks to Live Upgrade and VirtualBox's cloning and snapshot capabilities, I really haven't had a reason to install anything newer. Until now.

Trying to duplicate my customer's problems, I grabbed a copy of the Solaris 10 8/11 ISO image from the Oracle Technology Network and verified the MD5 checksum. I then configured a guest VM with the following settings:

  • CPU: 1
  • Video Memory: 32MB
  • Base Memory: 1024MB
In other words, a pretty basic 1GB guest machine. More important, it is exactly the same configuration as the half dozen or so other guests that are running 8/11 just fine, courtesy of Live Upgrade.

After starting the guest machine with the Solaris 10 8/11 ISO image in the CDROM device, you quickly get a kernel oops that looks something like this.

Click image to enlarge

That's not quite what I expected, but it does look like what the workshop attendees were describing. Trying Solaris 10 10/09 (u8) produces slightly different results (the guest quietly stalls), but ultimately ends up in the same place - a failed installation. This led me down a path of changing the chipset to ICH9 and enabling IO APIC which helped with u8, but the u10 install was still punting.

Then I remembered reading something in the release notes about the minimum memory requirements bumping up. Sure enough, when I look up the System Requirements in theSolaris 10 8/11 Installation Guide, I see that the minimum memory size for x86 is now 1.5GB. After adjusting the base memory to 1500MB in the guest, the installation completes as expected.

The moral to this story is that minimum system requirements are documented for a reason and they really should be followed.

Thursday Dec 20, 2012

pkg fix is my friend - a followup

We bloggers appreciate questions and comments about what we post, whether privately in email or attached as comments to some article. In my last post, a reader asked a set of questions that were so good, I didn't want them to get lost down in the comments section. A big thanks to David Lange for asking these questions. I shall try to answer them here (perhaps with a bit more detail than you might have wanted).

Does the pkg fix reinstall binaries if the hash or chksum doesn't match?

Yes, it does. Let's actually see this in action, and then we will take a look at where it is getting the information required to correct the error.

Since I'm working on a series of Solaris 11 Automated Installer (AI) How To articles, installadm seems a good choice to damage, courtesy of the random number generator.

# ls /sbin/install*
/sbin/install             /sbin/installadm-convert  /sbin/installf
/sbin/installadm          /sbin/installboot         /sbin/installgrub

# cd /sbin
# mv installadm installadm-

# dd if=/dev/random of=/sbin/installadm bs=8192 count=32
0+32 records in
0+32 records out

# ls -la installadm*
-rw-r--r--   1 root     root       33280 Dec 18 18:50 installadm
-r-xr-xr-x   1 root     bin        12126 Dec 17 08:36 installadm-
-r-xr-xr-x   1 root     bin        74910 Dec 17 08:36 installadm-convert
OK, that should do it. Unless I am terribly unlucky, those random bytes will produce something that doesn't match the stored hash value of the installadm binary.

This time, I will begin the repair process with a pkg verify, just to see what is broken.

# pkg verify installadm
PACKAGE                                                                 STATUS 
pkg://solaris/install/installadm                                         ERROR

	file: usr/sbin/installadm
		Group: 'root (0)' should be 'bin (2)'
		Mode: 0644 should be 0555
		Size: 33280 bytes should be 12126
		Hash: 2e862c7ebd5dce82ffd1b30c666364f23e9118b5 
                     should be 68374d71b9cb91b458a49ec104f95438c9a149a7
For clarity, I have removed all of the compiled python module errors. Most of these have been corrected in Solaris 11.1, but you may see these occasionally when doing a pkg verify.

Since we have a real package error, let's correct it.

# pkg fix installadm
Verifying: pkg://solaris/install/installadm                     ERROR          

	file: usr/sbin/installadm
		Group: 'root (0)' should be 'bin (2)'
		Mode: 0644 should be 0555
		Size: 33280 bytes should be 12126
		Hash: 2e862c7ebd5dce82ffd1b30c666364f23e9118b5 
                     should be 68374d71b9cb91b458a49ec104f95438c9a149a7
Created ZFS snapshot: 2012-12-19-00:51:00
Repairing: pkg://solaris/install/installadm                  
                                                                               

DOWNLOAD                                  PKGS       FILES    XFER (MB)
Completed                                  1/1       24/24      0.1/0.1

PHASE                                        ACTIONS
Update Phase                                   24/24 

PHASE                                          ITEMS
Image State Update Phase                         2/2 
We can now run installadm as if it was never damaged.
# installadm list

Service Name     Alias Of       Status  Arch   Image Path 
------------     --------       ------  ----   ---------- 
default-i386     solaris11-i386 on      x86    /install/solaris11-i386
solaris11-i386   -              on      x86    /install/solaris11-i386
solaris11u1-i386 -              on      x86    /install/solaris11u1-i386
Oh, if you are wondering about that hash, it is a SHA1 checksum.
# digest -a sha1 /usr/sbin/installadm
68374d71b9cb91b458a49ec104f95438c9a149a7

If so does IPS keep the installation binaries in a depot or have to point to the originating depot to fix the problem?

IPS does keep a local cache of package attributes. Before diving into some of these details, it should be known that some, if not all of these, are private details of the current implementation of IPS, and can change in the future. Always consult the command and configuration file man pages before using any of these in scripts. In this case, the relevant information would be in pkg(5) (i.e. man -s 5 pkg).

Our first step is to identify which publisher has provided the package that is currently installed. In my case, there is only one (solaris), but in a large and mature enterprise deployment, there could be many publishers.

# pkg info installadm
pkg info installadm
          Name: install/installadm
       Summary: installadm utility
   Description: Automatic Installation Server Setup Tools
      Category: System/Administration and Configuration
         State: Installed
     Publisher: solaris
       Version: 0.5.11
 Build Release: 5.11
        Branch: 0.175.0.0.0.2.1482
Packaging Date: October 19, 2011 12:26:24 PM 
          Size: 1.04 MB
          FMRI: pkg://solaris/install/installadm@0.5.11,5.11-0.175.0.0.0.2.1482:20111019T122624Z
From this we have learned that the actual package name is install/installadm and the publisher is in fact, solaris. We have also learned that the version of installadm comes from the original Solaris 11 GA release (5.11-0.175.0.0). That will allow us to go take a look at some of the configuration files (private interface warning still in effect).

Note: Since package names contain slashes (/), we will have to encode them as %2F to keep the shell from interpreting them as a directory delimiter.

# cd /var/pkg/publisher/solaris/pkg/install%2Finstalladm
# ls -la
drwxr-xr-x   2 root     root           4 Dec 18 00:55 .
drwxr-xr-x 818 root     root         818 Dec 17 08:36 ..
-rw-r--r--   1 root     root       25959 Dec 17 08:36
            0.5.11%2C5.11-0.175.0.0.0.2.1482%3A20111019T122624Z
-rw-r--r--   1 root     root       26171 Dec 18 00:55
            0.5.11%2C5.11-0.175.0.13.0.3.0%3A20121026T213106Z
The file 0.5.11%2C5.11-0.175.0.0.0.2.1482%3A20111019T122624Z is the one we are interested in.
# digest -a sha1 /usr/sbin/installadm
68374d71b9cb91b458a49ec104f95438c9a149a7

# grep 68374d71b9cb91b458a49ec104f95438c9a149a7 *
file 68374d71b9cb91b458a49ec104f95438c9a149a7
chash=a5c14d2f8cc854dbd4fa15c3121deca6fca64515 group=bin mode=0555 
owner=root path=usr/sbin/installadm pkg.csize=3194 pkg.size=12126

That's how IPS knows our version of installadm has been tampered with. Since it is more than just changing attributes of the files, it has to download a new copy of the damaged files, in this case from the solaris publisher (or one of its mirrors). To keep from making this worse, it also makes a snapshot of the current boot environment, in case things go terribly wrong - which they do not.

Armed with this information, we can use some other IPS features, such as searching by binary hash.

# pkg search -r 68374d71b9cb91b458a49ec104f95438c9a149a7
INDEX                                    ACTION VALUE               PACKAGE
68374d71b9cb91b458a49ec104f95438c9a149a7 file   usr/sbin/installadm 
                 pkg:/install/installadm@0.5.11-0.175.0.0.0.2.1482
... or by name
# pkg search -r installadm
INDEX       ACTION VALUE                      PACKAGE
basename    dir    usr/lib/installadm         pkg:/install/installadm@0.5.11-0.175.0.0.0.2.1482
basename    dir    var/installadm             pkg:/install/installadm@0.5.11-0.175.0.0.0.2.1482
basename    file   usr/sbin/installadm        pkg:/install/installadm@0.5.11-0.175.0.0.0.2.1482
pkg.fmri    set    solaris/install/installadm pkg:/install/installadm@0.5.11-0.175.0.0.0.2.1482
pkg.summary set    installadm utility         pkg:/install/installadm@0.5.11-0.175.0.0.0.2.1482
And finally...
# pkg contents -m installadm

..... lots of output truncated ......

file 68374d71b9cb91b458a49ec104f95438c9a149a7 chash=a5c14d2f8cc854dbd4fa15c3121deca6fca64515 
group=bin mode=0555 owner=root path=usr/sbin/installadm pkg.csize=3194 pkg.size=12126
There is our information using a public and stable interface. Now you know, not only where IPS caches the information, but a predictable way to retrieve it, should you ever need to do so.

As with the verify and fix operations, this is much more helpful than the SVR4 packaging commands in Solaris 10 and earlier.

Given that customers might come up with their own ideas of keeping pkgs at various levels, could they be shooting themselves in the foot and creating such a customized OS that it causes problems?

Stephen Hahn has written quite a bit on the origins of IPS, both on his archived Sun blog as well as on the OpenSolaris pkg project page. While it is a fascinating and useful read, the short answer is that IPS helps prevent this from happening - certainly much more so than with the previous packaging system.

The assistance comes in several ways.

Full packages: Since IPS delivers full packages only, that eliminates one of the most confusing and frustrating aspects of the legacy Solaris packaging system. Every time you update a package with IPS, you get a complete version of the software, the way it was assembled and tested at Oracle (and presumably other publishers as well). No more patch order files and, perhaps more important, no more complicated scripts to automate the patching process.

Dependencies: A rich dependency mechanism allows the package maintainer to guarantee that other related software is at a compatible version. This includes incorporations, which protect large groups of software, such as the basic desktop, GNOME, auto-install and the userland tools. Although not a part of dependencies, facets allow for the control of optional software components - locales being a good example.

Boot environments: Solaris 10 system administrators can enjoy many of the benefits of IPS boot environment integration by using Live Upgrade and ZFS as a root file system. IPS takes this to the next level by automatically performing important operations, such as upgrading the pkg package when needed or taking a snapshot before performing any risky actions.

Expanding your question just a bit, IPS provides one new capability that should make updates much more predictable. If there is some specific component that an application requires, its version can be locked within a range. Here is an example, albeit a rather contrived one.

# pkg list -af jre-6
NAME (PUBLISHER)                                  VERSION                    IFO
runtime/java/jre-6                                1.6.0.37-0.175.1.2.0.3.0   ---
runtime/java/jre-6                                1.6.0.35-0.175.1.0.0.24.1  ---
runtime/java/jre-6                                1.6.0.35-0.175.0.11.0.4.0  ---
runtime/java/jre-6                                1.6.0.33-0.175.0.10.0.2.0  ---
runtime/java/jre-6                                1.6.0.33-0.175.0.9.0.2.0   ---
runtime/java/jre-6                                1.6.0.32-0.175.0.8.0.4.0   ---
runtime/java/jre-6                                1.6.0.0-0.175.0.0.0.2.0    i--
Suppose that we have an application that is tied to version 1.6.0.0 of the java runtime. You can lock it at that version and IPS will prevent you from applying any upgrade that would change it. In this example, an attempt to upgrade to SRU8 (which introduces version 1.6.0.32 of jre-6) will fail.
# pkg freeze -c "way cool demonstration of IPS" jre-6@1.6.0.0
runtime/java/jre-6 was frozen at 1.6.0.0

# pkg list -af jre-6
pkg list -af jre-6
NAME (PUBLISHER)                                  VERSION                    IFO
runtime/java/jre-6                                1.6.0.37-0.175.1.2.0.3.0   ---
runtime/java/jre-6                                1.6.0.35-0.175.1.0.0.24.1  ---
runtime/java/jre-6                                1.6.0.35-0.175.0.11.0.4.0  ---
runtime/java/jre-6                                1.6.0.33-0.175.0.10.0.2.0  ---
runtime/java/jre-6                                1.6.0.33-0.175.0.9.0.2.0   ---
runtime/java/jre-6                                1.6.0.32-0.175.0.8.0.4.0   ---
runtime/java/jre-6                                1.6.0.0-0.175.0.0.0.2.0    if-

# pkg update --be-name s11ga-sru08  entire@0.5.11-0.175.0.8
What follows is a lengthy set of complaints about not being able to satisfy all of the constraints, conveniently pointing back to our frozen package.

But wait, there's more. IPS can figure out the latest update it can apply that satisfies the frozen package constraint. In this example, it should find SRU7.

# pkg update --be-name s11ga-sru07
            Packages to update:  89
       Create boot environment: Yes
Create backup boot environment:  No

DOWNLOAD                                  PKGS       FILES    XFER (MB)
Completed                                89/89   3909/3909  135.7/135.7

PHASE                                        ACTIONS
Removal Phase                                720/720 
Install Phase                                889/889 
Update Phase                               5066/5066 

PHASE                                          ITEMS
Package State Update Phase                   178/178 
Package Cache Update Phase                     89/89 
Image State Update Phase                         2/2 

A clone of solaris exists and has been updated and activated.
On the next boot the Boot Environment s11ga-sru07 will be
mounted on '/'.  Reboot when ready to switch to this updated BE.


---------------------------------------------------------------------------
NOTE: Please review release notes posted at:

http://www.oracle.com/pls/topic/lookup?ctx=E23824&id=SERNS
---------------------------------------------------------------------------
When the system is rebooted, a quick look shows that we are indeed running with SRU7.

Perhaps we were too restrictive in locking down jre-6 to version 1.6.0.0. In this example, we will loosen the constraint to any 1.6.0 version, but prohibit upgrades that change it to 1.6.1. Note that I did not have to unfreeze the package as a new pkg freeze will replace the preceding one.

# pkg freeze jre-6@1.6.0
runtime/java/jre-6 was frozen at 1.6.0

# pkg list -af jre-6
NAME (PUBLISHER)                                  VERSION                    IFO
runtime/java/jre-6                                1.6.0.37-0.175.1.2.0.3.0   -f-
runtime/java/jre-6                                1.6.0.35-0.175.1.0.0.24.1  -f-
runtime/java/jre-6                                1.6.0.35-0.175.0.11.0.4.0  -f-
runtime/java/jre-6                                1.6.0.33-0.175.0.10.0.2.0  -f-
runtime/java/jre-6                                1.6.0.33-0.175.0.9.0.2.0   -f-
runtime/java/jre-6                                1.6.0.32-0.175.0.8.0.4.0   -f-
runtime/java/jre-6                                1.6.0.0-0.175.0.0.0.2.0    if-
This shows that all versions are available for upgrade (i.e. , they all satisfy the frozen package constraint).

Once again, IPS gives us a wonderful capability that is missing in the legacy packaging system.

When you perform a pkg update on a system are we guaranteed a highly tested configuration that has gone thru multiple regression tests?

Short answer: yes.

For the details, I will turn your attention to our friend, Gerry Haskins, and his two excellent blogs: The Patch Corner (Solaris 10 and earlier) and Solaris 11 Maintenance Lifecycle. Both are excellent reads and I encourage everybody to add them to your RSS reader of choice.

Of particular note is Gerry's presentation, Solaris 11 Customer Maintenance Lifecycle, which goes into some great detail about patches, upgrades and the like. If you dig back to around the time that Solaris 10 9/10(u9) was released, you will find a links to a pair of interesting documents titled Oracle Integrated Stack - Complete, Trusted Enterprise Solutions and Trust Your Enterprise Deployments to the Oracle Product Stack: The integrated platform that's been developed, tested and certified to get the job done. These documents describe several test environments, including the Oracle Certification Environment (OCE) and Oracle Automated Stress Test (OAST). All Solaris 10 patches and Solaris 11 package updates (including Oracle Solaris Cluster) are put through these tests prior to release. The result is a higher confidence that patches will not introduce stability or performance problems, negating the old practice of putting a release or patch bundle on the shelf while somebody else finds all of the problems. Local testing on your own equipment is still a necessary practice, but you are able to move more quickly to a new release thanks to these additional testing environments.

If I am allowed to ask a follow up question, it would be something like, "what can I do proactively to keep my system as current as possible and reduce the risks of bad patch or package interactions?"

That is where the Critical Patch Updates come into play. Solaris 11 Support Repository Updates (SRU) come out approximately once per month. Every third one (generally) is special and becomes the CPU for Solaris. If you have a regular cadence for applying CPUs or Patch Set Updates (PSU) for your other Oracle software, choose the corresponding SRU that has been designated as that quarter's CPU. You can find this information in My Oracle Support (MOS), on the Oracle Technology Network (OTN), or just read Gerry's blog in mid January, April, July and October.

Thanks again to David Lange for asking such good questions. I hope the answers helped.

Tuesday Dec 11, 2012

Solaris 11 pkg fix is my new friend

While putting together some examples of the Solaris 11 Automated Installer (AI), I managed to really mess up my system, to the point where AI was completely unusable. This was my fault as a combination of unfortunate incidents left some remnants that were causing problems, so I tried to clean things up. Unsuccessfully. Perhaps that was a bad idea (OK, it was a terrible idea), but this is Solaris 11 and there are a few more tricks in the sysadmin toolbox.

Here's what I did.

# rm -rf /install/*
# rm -rf /var/ai

# installadm create-service -n solaris11-x86 --imagepath /install/solaris11-x86 \
                 -s solaris-auto-install@5.11-0.175.0

Warning: Service svc:/network/dns/multicast:default is not online.
   Installation services will not be advertised via multicast DNS.

Creating service from: solaris-auto-install@5.11-0.175.0
DOWNLOAD                                PKGS         FILES    XFER (MB)   SPEED
Completed                                1/1       130/130  264.4/264.4    0B/s

PHASE                                          ITEMS
Installing new actions                       284/284
Updating package state database                 Done 
Updating image state                            Done 
Creating fast lookup database                   Done 
Reading search index                            Done 
Updating search index                            1/1 

Creating i386 service: solaris11-x86

Image path: /install/solaris11-x86
So far so good. Then comes an oops.....
setup-service[168]: cd: /var/ai//service/.conf-templ: [No such file or directory]
                                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
This is where you generally say a few things to yourself, and then promise to quit deleting configuration files and directories when you don't know what you are doing. Then you recall that the new Solaris 11 packaging system has some ability to correct common mistakes (like the one I just made). Let's give it a try.
# pkg fix installadm
Verifying: pkg://solaris/install/installadm                     ERROR
        dir: var/ai
                Group: 'root (0)' should be 'sys (3)'
        dir: var/ai/ai-webserver
                Missing: directory does not exist
        dir: var/ai/ai-webserver/compatibility-configuration
                Missing: directory does not exist
        dir: var/ai/ai-webserver/conf.d
                Missing: directory does not exist
        dir: var/ai/image-server
                Group: 'root (0)' should be 'sys (3)'
        dir: var/ai/image-server/cgi-bin
                Missing: directory does not exist
        dir: var/ai/image-server/images
                Group: 'root (0)' should be 'sys (3)'
        dir: var/ai/image-server/logs
                Missing: directory does not exist
        dir: var/ai/profile
                Missing: directory does not exist
        dir: var/ai/service
                Group: 'root (0)' should be 'sys (3)'
        dir: var/ai/service/.conf-templ
                Missing: directory does not exist
        dir: var/ai/service/.conf-templ/AI_data
                Missing: directory does not exist
        dir: var/ai/service/.conf-templ/AI_files
                Missing: directory does not exist
        file: var/ai/ai-webserver/ai-httpd-templ.conf
                Missing: regular file does not exist
        file: var/ai/service/.conf-templ/AI.db
                Missing: regular file does not exist
        file: var/ai/image-server/cgi-bin/cgi_get_manifest.py
                Missing: regular file does not exist
Created ZFS snapshot: 2012-12-11-21:09:53
Repairing: pkg://solaris/install/installadm                  
Creating Plan (Evaluating mediators): |

DOWNLOAD                                PKGS         FILES    XFER (MB)   SPEED
Completed                                1/1           3/3      0.0/0.0    0B/s

PHASE                                          ITEMS
Updating modified actions                      16/16
Updating image state                            Done 
Creating fast lookup database                   Done 
In just a few moments, IPS found the missing files and incorrect ownerships/permissions. Instead of reinstalling the system, or falling back to an earlier Live Upgrade boot environment, I was able to create my AI services and now all is well.
# installadm create-service -n solaris11-x86 --imagepath /install/solaris11-x86 \
                   -s solaris-auto-install@5.11-0.175.0
Warning: Service svc:/network/dns/multicast:default is not online.
   Installation services will not be advertised via multicast DNS.

Creating service from: solaris-auto-install@5.11-0.175.0
DOWNLOAD                                PKGS         FILES    XFER (MB)   SPEED
Completed                                1/1       130/130  264.4/264.4    0B/s

PHASE                                          ITEMS
Installing new actions                       284/284
Updating package state database                 Done 
Updating image state                            Done 
Creating fast lookup database                   Done 
Reading search index                            Done 
Updating search index                            1/1 

Creating i386 service: solaris11-x86

Image path: /install/solaris11-x86

Refreshing install services
Warning: mDNS registry of service solaris11-x86 could not be verified.

Creating default-i386 alias

Setting the default PXE bootfile(s) in the local DHCP configuration
to:
bios clients (arch 00:00):  default-i386/boot/grub/pxegrub


Refreshing install services
Warning: mDNS registry of service default-i386 could not be verified.

# installadm create-service -n solaris11u1-x86 --imagepath /install/solaris11u1-x86 \
                    -s solaris-auto-install@5.11-0.175.1
Warning: Service svc:/network/dns/multicast:default is not online.
   Installation services will not be advertised via multicast DNS.

Creating service from: solaris-auto-install@5.11-0.175.1
DOWNLOAD                                PKGS         FILES    XFER (MB)   SPEED
Completed                                1/1       514/514  292.3/292.3    0B/s

PHASE                                          ITEMS
Installing new actions                       661/661
Updating package state database                 Done 
Updating image state                            Done 
Creating fast lookup database                   Done 
Reading search index                            Done 
Updating search index                            1/1 

Creating i386 service: solaris11u1-x86

Image path: /install/solaris11u1-x86

Refreshing install services
Warning: mDNS registry of service solaris11u1-x86 could not be verified.

# installadm list

Service Name    Alias Of      Status  Arch   Image Path 
------------    --------      ------  ----   ---------- 
default-i386    solaris11-x86 on      i386   /install/solaris11-x86
solaris11-x86   -             on      i386   /install/solaris11-x86
solaris11u1-x86 -             on      i386   /install/solaris11u1-x86


This is way way better than pkgchk -f in Solaris 10. I'm really beginning to like this new IPS packaging system.

Friday Jan 13, 2012

Live Upgrade, /var/tmp and the Ever Growing Boot Environments

Even if you are a veteran Live Upgrade user, you might be caught by surprise when your new ZFS root pool starts filling up, and you have no idea where the space is going. I tripped over this one while installing different versions of StarOffice and OpenOffice and forgot that they left a rather large parcel behind in /var/tmp. When recently helping a customer through some Live Upgrade issues, I noticed that they were downloading patch clusters into /var/tmp and then I remembered that I used to do that too.

And then stopped. This is why. What follows has been added to the list of Common Live Upgrade Problems, as Number 3.

Let's start with a clean installation of Solaris 10 10/09 (u8).

# df -k /
Filesystem                       kbytes    used   avail capacity  Mounted on
rpool/ROOT/s10x_u8wos_08a      20514816 4277560 13089687    25%    /

So far, so good. Solaris is just a bit over 4GB. Another 3GB is used by the swap and dump devices. That should leave plenty of room for half a dozen or so patch cycles (assuming 1GB each) and an upgrade to the next release.

Now, let's put on the latest recommended patch cluster. Note that I am following the suggestions in my Live Upgrade Survival Guide, installing the prerequisite patches and the LU patch before actually installing the patch cluster.

# cd /var/tmp
# wget patchserver:/export/patches/10_x86_Recommended-2012-01-05.zip .
# unzip -qq 10_x86_Recommended-2012-01-05.zip

# wget patchserver:/export/patches/121431-69.zip
# unzip 121431-69

# cd 10x_Recommended
# ./installcluster --apply-prereq --passcode (you can find this in README)

# patchadd -M /var/tmp 121431-69

# lucreate -n s10u8-2012-01-05
# ./installcluster -d -B s10u8-2012-01-05 --passcode

# luactivate s10u8-2012-01-05
# init 0

After the new boot environment is activated, let's upgrade to the latest release of Solaris 10. In this case, it will be Solaris 10 8/11 (u10).

Yes, this does seem like an awful lot is happening in a short period of time. I'm trying to demonstrate a situation that really does happen when you forget something as simple as a patch cluster clogging up /var/tmp. Think of this as one of those time lapse video sequences you might see in a nature documentary.

# pkgrm SUNWluu SUNWlur SUNWlucfg
# pkgadd -d /cdrom/sol_10_811_x86  SUNWluu SUNWlur SUNWlucfg
# patchadd -M /var/tmp 121431-69

# lucreate -n s10u10-baseline'
# echo "autoreg=disable" > /var/tmp/no-autoreg
# luupgrade -u -s /cdrom/sol_10_811_x86 -k /var/tmp/no-autoreg -n s10u10-baseline
# luactivate s10u10-baseline
# init 0
As before, everything went exactly as expected. Or I thought so, until I logged in the first time and checked the free space in the root pool.
# df -k /
Filesystem                       kbytes    used   avail capacity  Mounted on
rpool/ROOT/s10u10-baseline     20514816 10795038 2432308    82%    /
Where did all of the space go ? Back of the napkin calculations of 4.5GB (s10u8) + 4.5GB (s10u10) + 1GB (patch set) + 3GB (swap and dump) = 13GB. 20GB pool - 13GB used = 7GB free. But there's only 2.4GB free ?

This is about the time that I smack myself on the forehead and realize that I put the patch cluster in the /var/tmp. Old habits die hard. This is not a problem, I can just delete it, right ?

Not so fast.

# du -sh /var/tmp
 5.4G   /var/tmp

# du -sh /var/tmp/10*
 3.8G   /var/tmp/10_x86_Recommended
 1.5G   /var/tmp/10_x86_Recommended-2012-01-05.zip

# rm -rf /var/tmp/10*

# du -sh /var/tmp
 3.4M   /var/tmp

Imagine the look on my face when I check the pool free space, expecting to see 7GB.
# df -k /
Filesystem                      kbytes    used   avail capacity  Mounted on
rpool/ROOT/s10u10-baseline    20514816 5074262 2424603    68%    /

We are getting closer. At least my root filesystem size is reasonable (5GB vs 11GB). But the free space hasn't changed at all.

Once again, I smack myself on the forehead. The patch cluster is also in the other two boot environments. All I have to do is get rid them too, and I'll get my free space back.

# lumount s10u8-2012-01-05 /mnt
# rm -rf /mnt/var/tmp/10_x86_Recommended*
# luumount s10u8-2012-01-05

# lumount s10x_u8wos_08a /mnt
# rm -rf /mnt/var/tmp/10_x86_Recommended*
# luumount s10x_u8wos_08a
Surely, that will get my free space reclaimed, right ?
# df -k /
Filesystem                    kbytes    used   avail capacity  Mounted on
rpool/ROOT/s10u10-baseline  20514816 5074265 2429261    68%    /

This is when I smack myself on the forehead for the third time in one afternoon. Just getting rid of them in the boot environments is not sufficient. It would be if I were using UFS as a root filesystem, but lucreate will use the ZFS snapshot and cloning features when used on a ZFS root. So the patch cluster is in the snapshot, and the oldest one at that.

Let's try this all over again, but this time I will put the patches somewhere else that is not part of a boot environment. If you are thinking of using root's home directory, think again - it is part of the boot environment. If you are running out of ideas, let me suggest that /export/patches might be a good place to put them.

Doing the exercise again, with the patches in /export/patches, I get similar results (to be expected), but with one significant different.This time the patches are in a shared ZFS dataset (/export) and can be deleted.

# lustatus
Boot Environment           Is       Active Active    Can    Copy      
Name                       Complete Now    On Reboot Delete Status    
-------------------------- -------- ------ --------- ------ ----------
s10x_u8wos_08a             yes      no     no        yes    -         
s10u8-2012-01-05           yes      no     no        yes    -         
s10u10-baseline            yes      yes    yes       no     -         

# df -k /
Filesystem                      kbytes    used   avail capacity  Mounted on
rpool/ROOT/s10u10-baseline    20514816 5184578 2445140    68%    /


# df -k /export
Filesystem                      kbytes    used   avail capacity  Mounted on
rpool/export                  20514816 5606384 2445142    70%    /export

This time, when I delete them, the disk space will be reclaimed.
# rm -rf /export/patches/10_x86_Recommended*

# df -k /
Filesystem                      kbytes    used   avail capacity  Mounted on
rpool/ROOT/s10u10-baseline    20514816 5184578 8048050    40%    /

Now, that's more like it. With this free space, I can continue to patch and maintain my system as I had originally planned - estimating a few hundred MB to 1.5GB per patch set.

The moral to the story is that even if you follow all of the best practices and recommendations, you can still be tripped up by old habits when you don't consider their consequences. And when you do, don't feel bad. Many best practices come from exercises just like this one.

Technocrati Tags:

Saturday Jan 07, 2012

Live Upgrade and ZFS Versioning

Thanks to John Kotches and Craig Bell for bringing this one up in the comments of an earlier article. I've included this in a new update to my Live Upgrade Survival Tips, but though it worthy posting all by itself.

ZFS pool and file system functionality may be added with a Solaris release. These new capabilities are identified in the ZFS zpool and file system version numbers. To find out what versions you are running, and what capabilities they provide, use the corresponding upgrade -v commands. Yes, it is a bit disconcerting at first, using an upgrade command, not to upgrade, but to determine which features exist.

Here is an example of each output, for your reference.

# zpool upgrade -v
This system is currently running ZFS pool version 31.

The following versions are supported:

VER  DESCRIPTION
---  --------------------------------------------------------
 1   Initial ZFS version
 2   Ditto blocks (replicated metadata)
 3   Hot spares and double parity RAID-Z
 4   zpool history
 5   Compression using the gzip algorithm
 6   bootfs pool property
 7   Separate intent log devices
 8   Delegated administration
 9   refquota and refreservation properties
 10  Cache devices
 11  Improved scrub performance
 12  Snapshot properties
 13  snapused property
 14  passthrough-x aclinherit
 15  user/group space accounting
 16  stmf property support
 17  Triple-parity RAID-Z
 18  Snapshot user holds
 19  Log device removal
 20  Compression using zle (zero-length encoding)
 21  Deduplication
 22  Received properties
 23  Slim ZIL
 24  System attributes
 25  Improved scrub stats
 26  Improved snapshot deletion performance
 27  Improved snapshot creation performance
 28  Multiple vdev replacements
 29  RAID-Z/mirror hybrid allocator
 30  Encryption
 31  Improved 'zfs list' performance

For more information on a particular version, including supported releases,
see the ZFS Administration Guide.


# zfs upgrade -v
The following filesystem versions are supported:

VER  DESCRIPTION
---  --------------------------------------------------------
 1   Initial ZFS filesystem version
 2   Enhanced directory entries
 3   Case insensitive and File system unique identifier (FUID)
 4   userquota, groupquota properties
 5   System attributes

For more information on a particular version, including supported releases,
see the ZFS Administration Guide.

In this particular example, the kernel supports up to zpool version 31 and ZFS version 5.

Where you can run into trouble with this is when you create a pool or file system and then fall back to a boot environment that is older and doesn't support those particular versions. The survival tip is keep your zpool and vfs versions at a level that is compatible with the oldest boot environment that you will ever fall back to. A corollary to this is that you can upgrade your pools and file systems when you have deleted the last boot environment that supports that particular version.

Your first question is probably, "what versions of ZFS go with the particular Solaris releases ?" Here is a table of Solaris releases since 10/08 (u6) and their corresponding zpool and zfs version numbers.

Solaris ReleaseZPOOL VersionZFS Version
Solaris 10 10/08 (u6)103
Solaris 10 5/09 (u7)103
Solaris 10 10/09 (u8)154
Solaris 10 9/10 (u9)224
Solaris 10 8/11 (u10)295
Solaris 11 11/11 (ga)335
Solaris 11.1346

Note that these versions are for the release as well as if you have patched a system to that same level. In other words, a Solaris 10 10/08 system with the latest recommended patch cluster installed might be at the 8/11 (u10) level. You can always use zpool upgrade -v and zfs upgrade -v to make sure.

Now you are wondering how you create a pool or file system at a version different than the default for your Solaris release. Fortunately, ZFS is flexible enough to allow us to do exactly that. Here is an example.

# zpool create testpool testdisk

# zpool get version testpool
NAME      PROPERTY  VALUE    SOURCE
testpool  version   31       default

# zfs get version testpool
NAME      PROPERTY  VALUE    SOURCE
testpool  version   5        -

This pool and associated top level file system can only be accessed on a Solaris 11 system. Let's destroy it and start again, this time making it possible to access it on a Solaris 10 10/09 system (zpool version 15, zfs version 4). We can use the -o version= and -O version= when the pool is created to accomplish this.
# zpool destroy testpool
# zpool create -o version=15 -O version=4 testpool testdisk
# zfs create testpool/data

# zpool get version testpool
NAME      PROPERTY  VALUE    SOURCE
testpool  version   15       local

# zfs get -r version testpool
NAME      PROPERTY  VALUE    SOURCE
testpool  version   4        -
testpool/data  version   4        -

In this example, we created the pool explicitly at version 15, and using -O to pass zfs file system creation options to the top level dataset, we set that to version 4. To make things easier, new file systems created in this pool will be at version 4, inheriting that from the parent, unless overridden by -o version= at the time the file system is created.

The last remaining task is to look at how you might upgrade a pool and file system when you have removed an old boot environment. We will go back to our previous example where we have a version 15 pool and 4 dataset. We have removed the Solaris 10 10/09 boot environment and now the oldest is Solaris 10 8/11 (u10). That supports version 29 pools and version 5 file systems. We will use zpool/zfs upgrade -V to set the specific versions to 29 and 5 respectively.

# zpool upgrade -V 29 testpool
This system is currently running ZFS pool version 31.

Successfully upgraded 'testpool' from version 15 to version 29

# zpool get version testpool
NAME      PROPERTY  VALUE    SOURCE
testpool  version   29       local

# zfs upgrade -V 5 testpool
1 filesystems upgraded

# zfs get -r version testpool
testpool       version   5        -
testpool/data  version   4        -

That didn't go quite as expected, or did it ? The pool was upgraded as expected, as was the top level dataset. But testpool/data is still at version 4. It initially inherited that version from the parent when it was created. When using zfs upgrade, only the datasets listed are upgraded. If we wanted the entire pool of file systems to be upgraded, we should have used -r for recursive.
# zfs upgrade -V 5 -r testpool
1 filesystems upgraded
1 filesystems already at this version

# zfs get -r version testpool
NAME           PROPERTY  VALUE    SOURCE
testpool       version   5        -
testpool/data  version   5        -

Now, that's more like it.

For review, the tip is to keep your shared ZFS datasets and pools are the lowest versions supported by the oldest boot environments you plan to use. You can always use upgrade -v to see what versions are available for use, and by using -o version= and -O version, you can create new pools and datasets that are accessible by older boot environments. This last bit can also come in handy if you are moving pools around systems that might be at different versions.

Thanks again to Craig and John for this great tip.

Technocrati Tags:

Wednesday Jul 06, 2011

Live Upgrade Survival Guide

When I started blogging about Live Upgrade, it was always my intention to post a list of tips. In this companion piece to Common Live Upgrade Problems, I will take a look at several proactive things you can do to make your Live Upgrade experience go more smoothly. Some of these are documented, although not always as obviously as I would like. Others are common sense. A few might surprise you.

Since this is getting to be a long article, here are the tips, with direct links down to the explanation and examples.

  1. Keep your patching and packaging utilities up to date
  2. Check the log files
  3. ZFS pool and file system versioning
  4. Use ZFS for your root file system
  5. Don't save the patch backout files
  6. Start using Live Upgrade immediately after initial installation
  7. Keep your patching and packaging utilities up to date
  8. Use the installcluster script instead of luupgrade -t
  9. Keep your boot configurations simple
  10. Keeping /var/tmp clean
Without any further delay, here my Live Upgrade Survival Tips.

1. Always make sure your patching and packaging utilities are up to date

This is the most frequent source of beginners troubles with Live Upgrade, and it is completely unnecessary. As such, if you call me or ask for help over email, my first question to you will be "Have you applied the prerequisite patches ? What about 121430/121341 ?" If your answer is, "I don't know", my response will be "Ok then. I'll wait while you check and apply what is out of date. Call me back when you have finished - long pause - if you are still having troubles."

Live upgrade frequently stresses the installation tools. New versions are supplied on the update media, but we continue to fix corner cases, even after an update is released. It is important to check for any patches related to patching or packaging tools and update them before performing any Live Upgrade activities.

Previously, you had to dig through the Infodoc 72099, then it was rewritten as Infodoc 206844. Today, this document lives on as Solaris Live Upgrade Software Patch Requirements. It is a much better read, but it is still an intimidating list of patches to sort through. To ease the effort on system adminstrators, we now include these patches in the Solaris 10 recommended patch cluster along with a safe way to install them in the current boot environment.

Note: it is still worth checking the status of the Live Upgrade patch itself (121430 SPARC or 121431 x86).

In this example, I'm taking a system from Solaris 10 10/08 (u6) to Solaris 10 10/09 (u8). I am already in the directory where the patch cluster was downloaded and unpacked.

# lofiadm -a /export/iso/s10/s10u8-b08a-x86.iso
/dev/lofi/1
# mount -o ro -F hsfs /dev/lofi/1 /mnt
# pkgadd -d /mnt/Solaris_10/Product SUNWluu SUNWlur SUNWlucfg

# ./installcluster --apply-prepreq --s10cluster
Setup .


Recommended OS Cluster Solaris 10 x86 (2011.06.17)

Application of patches started : 2011.06.29 11:19:11

Applying 120901-03 ( 1 of 11) ... skipped
Applying 121334-04 ( 2 of 11) ... skipped
Applying 119255-81 ( 3 of 11) ... skipped
Applying 119318-01 ( 4 of 11) ... skipped
Applying 121297-01 ( 5 of 11) ... skipped
Applying 138216-01 ( 6 of 11) ... skipped
Applying 122035-05 ( 7 of 11) ... skipped
Applying 127885-01 ( 8 of 11) ... skipped
Applying 145045-03 ( 9 of 11) ... skipped
Applying 142252-02 (10 of 11) ... skipped
Applying 125556-10 (11 of 11) ... skipped

Application of patches finished : 2011.06.29 11:19:13


Following patches were skipped :
 Patches already applied
 120901-03     119318-01     138216-01     127885-01     142252-02
 121334-04     121297-01     122035-05     145045-03     125556-10
 119255-81

Installation of prerequisite patches complete.

Install log files written :
  /var/sadm/install_data/s10x_rec_cluster_short_2011.06.29_11.19.11.log
  /var/sadm/install_data/s10x_rec_cluster_verbose_2011.06.29_11.19.11.log
After installing the new Live Upgrade packages from the installation media, I ran the installcluster script from the latest recommended patch cluster. The --apply-prereq argument tells the script just to install required live upgrade patches in the current boot environment. Since I have run several live upgrades previously from this boot environment, it is not surprising that all of the patches had already been applied. You're mileage will vary.

The --s10cluster argument is the current patch cluster password. The intent is to make you read the included README, if for no other reason than to obtain the latest cluster password.

# lucreate -n s10u8-baseline
Checking GRUB menu...

Population of boot environment  successful.
Creation of boot environment  successful.

# luupgrade -u -s /mnt -n s10u8-baseline

Things will always go better when you have the proper versions of the patching and packaging utilities. This is not just a Live Upgrade survival tip, but a good one for general system maintenance.

2. Always check the logs. Always, always, always

How many problems could we prevent if we just read the documentation or took a look at the logs left after the maintenance activity finishes ? Repetitive success with Live Upgrade may lull you into a false sense of security. Things frequently work so well, and if the final output from the command is not proclaiming the end of civilization, we move on to the next step. Bzzzzt. Not so fast.

Patches

For patching, the situation is rather simple. Look at the summary output from luupgrade (or the installcluster script) and see if any patches failed to install properly. If you missed this, you can always go back into the patch logs themselves to see what happened.
# lumount s10u9-2011-06-23 /mnt
# grep -i failed /mnt/var/sadm/patch/*/log
# grep -i error /mnt/var/sadm/patch/*/log
/mnt/var/sadm/patch/118668-32/log:compress(1) returned error code 2
/mnt/var/sadm/patch/119314-42/log:compress(1) returned error code 2
/mnt/var/sadm/patch/119314-42/log:compress(1) returned error code 2
/mnt/var/sadm/patch/119314-42/log:compress(1) returned error code 2
/mnt/var/sadm/patch/124939-04/log:compress(1) returned error code 2
So no patches failed to install, but there were a few errors. A closer look at the log files will tell us that these are harmless, caused when the existing patch backout files failed to compress. That's fine, they were already compressed.
# cat /mnt/var/sadn/patch/119314-42/log

Installation of  was successful.

This appears to be an attempt to install the same architecture and
version of a package which is already installed.  This installation
will attempt to overwrite this package.

/.alt.s10u9-2011-06-23-undo/var/sadm/pkg/SUNWlvmg/save/119314-42/undo: -- file u
nchanged
compress(1) returned error code 2
The SUNWlvmg backout package will not be compressed.
Continuing to process backout package.

Installation of  was successful.

Upgrades

Upgrades are a bit more tricky because there are two different classes of problems: packages that failed to install and configuration files that couldn't be properly upgraded.

The easiest to see are packages that failed to install. These packages are clearly identified in the last of the output from luupgrade. In case you missed them, we will tell you about them again if you try to luactivate(1M) a boot environment where some packaged failed to install.

As with patching, if you missed the messages, you can look back at the upgrade log file in the alternate boot environment. You can find it at /var/sadm/system/logs/upgrade_log.

# lumount s10u9-2011-06-23 /mnt
# tail -18 /mnt/var/sadm/system/logs/upgrade_log
Installation of  was successful.

The messages printed to the screen by this upgrade have been saved to:

	/a/var/sadm/system/logs/upgrade_log

After this system is rebooted, the upgrade log can be found in the file:

	/var/sadm/system/logs/upgrade_log


Please examine the file:

	/a/var/sadm/system/data/upgrade_cleanup

It contains a list of actions that may need to be performed to complete
the upgrade.  After this system is rebooted, this file can be found at:

	/var/sadm/system/data/upgrade_cleanup

After performing cleanup actions, you must reboot the system.	- Environment variables (/etc/default/init)
Updating package information on boot environment .
Package information successfully updated on boot environment .
Adding operating system patches to the BE .
There may be cases where an upgrade isn't able to process a configuration file that has been customized. In that case, the upgrade process will either preserve the original, saving the new configuration file under a different name, or the reverse, saving the existing file under a new name and installing a new one. How can you tell which of these happened ?

Check the upgrade_cleanup log file. It is so important that we mention it twice as luupgrade finishes its output. Here is a snippet from an upgrade from Solaris 10 10/09 to Solaris 10 9/10.

# lumount s10u9-baseline /mnt
# cat /mnt/var/sadm/system/data/upgrade_cleanup

..... lots of output removed for readability ....

/a/kernel/drv/e1000g.conf: existing file preserved, the new version was installed as /a/kernel/drv/e1000g.conf.new

/etc/snmp/conf/snmpd.conf: existing file renamed to /etc/snmp/conf/snmpd.conf~10

/a/etc/mail/sendmail.cf: existing file renamed to /a/etc/mail/sendmail.cf.old
/a/etc/mail/submit.cf: existing file renamed to /a/etc/mail/submit.cf.old

Sendmail has been upgraded to version 8.14.4 .
After you reboot, you may want to run
/usr/sbin/check-hostname
and
/usr/sbin/check-permissions ALL
These two shell-scripts will check for common
misconfigurations and recommend corrective
action, or report if things are OK.

In this example, we see several different actions taken by the installer.

In the case of /kernel/drv/e1000g.conf (the e1000 driver configuration file), the original contents were preserved and a new default file was installed at /kernel/drv/e1000g.conf.new. Let's see what differences exist between the two files.

# lumount s10u9-baseline /mnt
# diff /mnt/kernel/drv/e1000g.conf /mnt/kernel/drv/e1000g.conf.new
# Copyright 2010 Sun Microsystems, Inc.  All rights reserved.
11c11
< # ident	"@(#)e1000g.conf	1.4	06/03/06 SMI"
---
> # ident	"@(#)e1000g.conf	1.5	10/01/12 SMI"
41,45c41,51
<         # These are maximum frame limits, not the actual ethernet frame
<         # size. Your actual ethernet frame size would be determined by
<         # protocol stack configuration (please refer to ndd command man pages)
<         # For Jumbo Frame Support (9k ethernet packet) 
<         # use 3 (upto 16k size frames)
---
> 	#
> 	# These are maximum frame limits, not the ethernet payload size
> 	# (usually called MTU).  Your actual ethernet MTU is determined by frame
> 	# size limit and protocol stack configuration (please refer to ndd
> 	# command man pages)
> 	#
> 	# For Jumbo Frame Support (9k ethernet packet) use 3 (upto 16k size
> 	# frames).  On PCH adapter type (82577 and 82578) you can configure up
> 	# to 4k size frames.  The 4k size is only allowed at 1gig speed, so if
> 	# you select 4k frames size, you cannot force or autonegotiate the
> 	# 10/100 speed options.
The differences in the two files are just comments. That is a common case, and not unexpected since I had not modified the e1000g driver configuration file.

For /etc/snmp/conf/snmpd.conf, the situation was the reverse. The existing copy was saved with a new file extension of ~10. A quick look shows these two files to be identical.

The last example is from our friend sendmail. Since this upgrade includes a new version of sendmail, it is reasonable to expect several differences in the old and new configuration files.

# diff /mnt/etc/mail/sendmail.cf /mnt/etc/mail/sendmail.cf.old

236c236,237
< O DaemonPortOptions=Name=MTA
---
> O DaemonPortOptions=Name=MTA-v4, Family=inet
> O DaemonPortOptions=Name=MTA-v6, Family=inet6
281c282
< # key for shared memory; 0 to turn off, -1 to auto-select
---
> # key for shared memory; 0 to turn off
284,285c285
< # file to store auto-selected key for shared memory (SharedMemoryKey = -1)
< #O SharedMemoryKeyFile
---
As with an earlier example, the output was truncated to improve readability. In this case, I would take all of my local modifications to sendmail.cf and apply those to the new configuration file. Note that the log file suggests running two scripts after I make these modifications to check for common errors.

There are several other actions the installer can take. To learn more about those, take a look at the top portion of the upgrade_cleanup file where they are all explained in great detail, including recommended actions for the system administrator.

3. Watch your ZFS Pool and File System Version Numbers

Thanks to John Kotches and Craig Bell for bringing this one up in the comments. This one is a bit sneaky and it can catch you totally unaware. As such, I've included this pretty high up in the list of survival tips.

ZFS pool and file system functionality may be added with a Solaris release. These new capabilities are identified in the ZFS zpool and file system version numbers. To find out what versions you are running, and what capabilities they provide, use the corresponding upgrade -v commands. Yes, it is a bit disconcerting at first, using an upgrade command, not to upgrade, but to determine which features exist.

Here is an example of each output, for your reference.

# zpool upgrade -v
This system is currently running ZFS pool version 31.

The following versions are supported:

VER  DESCRIPTION
---  --------------------------------------------------------
 1   Initial ZFS version
 2   Ditto blocks (replicated metadata)
 3   Hot spares and double parity RAID-Z
 4   zpool history
 5   Compression using the gzip algorithm
 6   bootfs pool property
 7   Separate intent log devices
 8   Delegated administration
 9   refquota and refreservation properties
 10  Cache devices
 11  Improved scrub performance
 12  Snapshot properties
 13  snapused property
 14  passthrough-x aclinherit
 15  user/group space accounting
 16  stmf property support
 17  Triple-parity RAID-Z
 18  Snapshot user holds
 19  Log device removal
 20  Compression using zle (zero-length encoding)
 21  Deduplication
 22  Received properties
 23  Slim ZIL
 24  System attributes
 25  Improved scrub stats
 26  Improved snapshot deletion performance
 27  Improved snapshot creation performance
 28  Multiple vdev replacements
 29  RAID-Z/mirror hybrid allocator
 30  Encryption
 31  Improved 'zfs list' performance

For more information on a particular version, including supported releases,
see the ZFS Administration Guide.


# zfs upgrade -v
The following filesystem versions are supported:

VER  DESCRIPTION
---  --------------------------------------------------------
 1   Initial ZFS filesystem version
 2   Enhanced directory entries
 3   Case insensitive and File system unique identifier (FUID)
 4   userquota, groupquota properties
 5   System attributes

For more information on a particular version, including supported releases,
see the ZFS Administration Guide.

In this particular example, the kernel supports up to zpool version 31 and ZFS version 5.

Where you can run into trouble with this is when you create a pool or file system and then fall back to a boot environment that is older and doesn't support those particular versions. The survival tip is keep your zpool and vfs versions at a level that is compatible with the oldest boot environment that you will ever fall back to. A corollary to this is that you can upgrade your pools and file systems when you have deleted the last boot environment that supports that particular version.

Your first question is probably, "what versions of ZFS go with the particular Solaris releases ?" Here is a table of Solaris releases since 10/08 (u6) and their corresponding zpool and zfs version numbers.

Solaris ReleaseZPOOL VersionZFS Version
Solaris 10 10/08 (u6)103
Solaris 10 5/09 (u7)103
Solaris 10 10/09 (u8)154
Solaris 10 9/10 (u9)224
Solaris 10 8/11 (u10)295
Solaris 10 1/13 (u11)325
Solaris 11 11/11 (ga)335
Solaris 11.1346

Note that these versions are for the release as well as if you have patched a system to that same level. In other words, a Solaris 10 10/08 system with the latest recommended patch cluster installed might be at the 8/11 (u10) level. You can always use zpool upgrade -v and zfs upgrade -v to make sure.

Now you are wondering how you create a pool or file system at a version different than the default for your Solaris release. Fortunately, ZFS is flexible enough to allow us to do exactly that. Here is an example.

# zpool create testpool testdisk

# zpool get version testpool
NAME      PROPERTY  VALUE    SOURCE
testpool  version   31       default

# zfs get version testpool
NAME      PROPERTY  VALUE    SOURCE
testpool  version   5        -

This pool and associated top level file system can only be accessed on a Solaris 11 system. Let's destroy it and start again, this time making it possible to access it on a Solaris 10 10/09 system (zpool version 15, zfs version 4). We can use the -o version= and -O version= when the pool is created to accomplish this.
# zpool destroy testpool
# zpool create -o version=15 -O version=4 testpool testdisk
# zfs create testpool/data

# zpool get version testpool
NAME      PROPERTY  VALUE    SOURCE
testpool  version   15       local

# zfs get -r version testpool
NAME      PROPERTY  VALUE    SOURCE
testpool  version   4        -
testpool/data  version   4        -

In this example, we created the pool explicitly at version 15, and using -O to pass zfs file system creation options to the top level dataset, we set that to version 4. To make things easier, new file systems created in this pool will be at version 4, inheriting that from the parent, unless overridden by -o version= at the time the file system is created.

The last remaining task is to look at how you might upgrade a pool and file system when you have removed an old boot environment. We will go back to our previous example where we have a version 15 pool and 4 dataset. We have removed the Solaris 10 10/09 boot environment and now the oldest is Solaris 10 8/11 (u10). That supports version 29 pools and version 5 file systems. We will use zpool/zfs upgrade -V to set the specific versions to 29 and 5 respectively.

# zpool upgrade -V 29 testpool
This system is currently running ZFS pool version 31.

Successfully upgraded 'testpool' from version 15 to version 29

# zpool get version testpool
NAME      PROPERTY  VALUE    SOURCE
testpool  version   29       local

# zfs upgrade -V 5 testpool
1 filesystems upgraded

# zfs get -r version testpool
testpool       version   5        -
testpool/data  version   4        -

That didn't go quite as expected, or did it ? The pool was upgraded as expected, as was the top level dataset. But testpool/data is still at version 4. It initially inherited that version from the parent when it was created. When using zfs upgrade, only the datasets listed are upgraded. If we wanted the entire pool of file systems to be upgraded, we should have used -r for recursive.
# zfs upgrade -V 5 -r testpool
1 filesystems upgraded
1 filesystems already at this version

# zfs get -r version testpool
NAME           PROPERTY  VALUE    SOURCE
testpool       version   5        -
testpool/data  version   5        -

Now, that's more like it.

For review, the tip is to keep your shared ZFS datasets and pools are the lowest versions supported by the oldest boot environments you plan to use. You can always use upgrade -v to see what versions are available for use, and by using -o version= and -O version, you can create new pools and datasets that are accessible by older boot environments. This last tip can also come in handy if you are moving pools around systems that might be at different versions.

4. Use ZFS as your root file system

While Live Upgrade can take away a lot of the challenges of patching and upgrading Solaris systems, one small obstacle can make it nearly impossible to deploy - adequate disk slices. The disk sizes are going much faster than that of Solaris, so on any relatively modern system, there should be adequate space on the internal disks to place at least two, if not more boot environments. This can also include a plethora of zones, if sparse root are used.

The problem is generally not space, but disk slices (partitions). With a regular disk label, there is a limit of 8 partitions (0-7). One of these (slice 2) is taken by the disk utilities to record the size of the disk, so it is not available for our use. Take another for the first swap area, one more for the Solaris Volume Manager (SVM) or two if you are using Veritas encapsulated root disks. Pretty soon, you run out of slices. Of course this assumes that you didn't use the entire boot disk to store things such as local data, home directories, backup configuration data, etc.

In other words, if you didn't plan on using Live Upgrade before provisioning the system, it is unlikely that you will have the necessary slices or space available to start using it later. Perhaps in an upcoming posting, I will put together a little cookbook to give some ideas on how to work around this.

The proper long term answer is to use ZFS for your root file system. As we can see in the Solaris 11 Express release notes, ZFS is now integrated with the new packaging and installation tools to simplify system maintenance. All of the capabilities of Live Upgrade are just built in and they work right out of the box. The key to making all of that work smoothly is the ability to rely on certain ZFS features being available for the root file system (snapshot, clone).

Beginning with Solaris 10 10/08, ZFS has been an optional choice for the root file system. Thanks to some early adopters that have helped sort out the corner cases, ZFS is an excellent choice for use as a root file system. In fact, I would go a bit further and suggest that ZFS is the recommended root file system today.

By using ZFS, the disk slice challenges have just gone away. The only question that remains is whether or not the root pool has enough space to hold the alternate boot environment, but even that has a different look with ZFS. Instead of copying the source boot environment, Live Upgrade makes a clone, saving both time and space. The new boot environment only needs enough disk space to hold the changes between boot environments, not the entire Solaris installation.

Time for another example.

# zfs list -r panroot/ROOT
NAME                                                USED  AVAIL  REFER  MOUNTPOINT
panroot/ROOT                                       36.7G  5.06G    18K  legacy
panroot/ROOT/s10u6_baseline                        10.6M  5.06G  6.92G  /
panroot/ROOT/s10u8-baseline                        34.7M  5.06G  7.08G  /
panroot/ROOT/s10u9-2011-06-23                      1.46G  5.06G  7.73G  /
panroot/ROOT/s10u9-2011-06-23-undo                 1.48G  5.06G  7.66G  /mnt
panroot/ROOT/s10u9-baseline                        12.7G  5.06G  7.43G  /
panroot/ROOT/s10x_u6wos_07b                         119M  5.06G  3.87G  /
Each of these ZFS datasets corresponds to a separate boot environment - a bootable Solaris installation. The space required to keep the extra boot environments around is the sum of the dataset used space plus that of the snapshot (not shown in this example). For this single disk configuration, it would be impossible to hold so many different bootable Solaris images, but a df(1) shows that I have space for at least this many, if not more.

If you are using ZFS as your root file system, you are just one command away from being able to enjoy all of the benefits of Live Upgrade.

5. Don't save patch backout files

At first you might think this is a curious recommendation, but stick with me for a few moments.

One of most important features of Live Upgrade is maintaining a safe fall back in case you run into troubles with a patch or upgrade. Rather than performing surgery on a malfunctioning boot environment, perhaps doing more harm with each patch backed out, why not boot back to a known safe configuration ? One luactivate and an init 0 and you are back to an known operating configuration where you can take your time performing forensic analysis of your troubled boot environment.

That would make all of those undo.Z files littering up /var/sadm/patch somewhat extraneous. And that gets us to the next reason for not saving the backout files, space - but not what you are thinking. Sure, the new boot environment is larger with all of those files laying around, but how much are we talking about ?

More than you think. Quite a bit more, actually.

Here is an example where I have installed the June 23, 2011 recommended patch cluster on a Solaris 10 9/10 system, with and without backout files.

# zfs list -r panroot/ROOT | grep s10u9
panroot/ROOT/s10u9-2011-06-23                      1.46G  4.06G  7.73G  /
panroot/ROOT/s10u9-2011-06-23-undo                 2.53G  4.06G  8.66G  /mnt
panroot/ROOT/s10u9-baseline                        12.7G  4.06G  7.43G  /
That's a gigabyte of difference between the boot environment with and without the undo.Z files. Surely there must be some other explanation. Let's see.
# lumount s10u9-2011-06-23-undo /mnt
# find /mnt/var/sadm/patch -name undo.Z -print | xargs -n1 rm -f 
# zfs list -r panroot/ROOT | grep s10u9
panroot/ROOT/s10u9-2011-06-23                      1.46G  5.06G  7.73G  /
panroot/ROOT/s10u9-2011-06-23-undo                 1.46G  5.06G  7.66G  /mnt
panroot/ROOT/s10u9-baseline                        12.7G  5.06G  7.43G  /
If it was just this one gigabyte, I might not be making such a big deal about it. Did you ever think about those zones you are deploying ? As the zone installer runs through all of those packages for the new non-global zone, it copies all of the applicable undo.Z files, if they are present. This compounds the space problem.

In this example, before removing the undo.Z files, I created a zone on each boot environment, so that I can see the space difference. Remember that these are sparse root zones, and should only be around 100MB in size.

# zfs list -r panroot/zones
NAME                              USED  AVAIL  REFER  MOUNTPOINT-files/root/var/t
panroot/zones                     761M  4.53G    22K  /zones
panroot/zones/with-undo-files     651M  4.53G   651M  /zones/with-undo-files
panroot/zones/without-undo-files  111M  4.53G   111M  /zones/without-undo-files
That's right - there's a 540MB difference between the two zones, and the only difference is whether or not the patch backout files were preserved. Throw in a couple of dozen zones, and this becomes more than just a nuisance. Not only does it take more space and time to create the zones, it also impacts the zone backups. All so that you can keep around files that you will never use.

When you run the installcluster script, don't forget the -d flag. If you prefer luupgrade -t, the magic sequence is -O "-d".

6. Start using Live Upgrade immediately after installation

This tip is largely influenced by how you provision your systems, and the frequency in which you might wipe the configuration and start again. My primary system is something of a lab experiment, but isn't too dissimilar from many development environments I have seen.

Right after I installed Solaris 10 from the 10/08 media, I made created a second boot environment, preserving this initial pristine configuration. Rather than reinstalling Solaris from media or a jumpstart server, I would just boot back to the original boot enviroment, delete the remaining boot environments, and in just a few moments, back to square one.

Another useful boot enviromnent to preserve is the initial customization, done immediately after installation. Users are added, security settings are changed, and a handful of software packages are installed. Preserving this system baseline can be very useful, should your system need to be refreshed in a hurry. In my case, that did happen at 34,000 ft, somewhere over Ohio - but that's a story for another day.

If a system is to live through multiple upgrades, it might be a good idea to encode the Solaris release and the patch cluster in the boot environment name. A taxonomy that works for me is -. For example, s10u9-baseline would be the intial upgrade to Solaris 10 9/10, and s10u9-2011-06-23 would be that same release, but patched using the June 23, 2011 patch cluster.

Putting this all together, we have something like this.

# lustatus
Boot Environment           Is       Active Active    Can    Copy
Name                       Complete Now    On Reboot Delete Status
-------------------------- -------- ------ --------- ------ ----------
s10x_u6wos_07b             yes      no     no        yes    -
s10u6-baseline             yes      no     no        yes    -
s10u8-baseline             yes      no     no        yes    -
s10u9-baseline             yes      yes    yes       no     -
s10u9-2011-06-23           yes      no     no        yes    -

The thing I like about this is arrangement is that I can quickly jump to a particular Solaris release when a customer asks me if a particular feature exists, or some patch has been integrated. I can see how this might be useful for some development environments as well.

7. Remember to install the LU packages from the upgrade media

Using Live Upgrade for an upgrade has an additional set over using it for patching. When performing an upgrade, the Live Upgrade packages from the installation media need to be installed in the current boot environment. After doing this, it is still necessary to check for prerequisite packages, especially if several months has passes since the update was released.

Prior to Solaris 10 8/07 (u4), including Solaris 8 and 9, there were only two Live Upgrade packages: SUNWluu and SUNWlur. Solaris 10 8/07 (u4) and late will have a third package, SUNWlucfg. These packages can be found in the Product directory on the installation media.

Here is an example.

# mount -o ro -F hsfs `lofiadm -a /export/iso/s10/s10u9-ga-x86.iso` /mnt
# pkgadd -d /mnt/Solaris_10/Product SUNWluu SUNWlur SUNWlucfg
# cd 
# ./installcluster --apply-prereq --s10cluster
Now we are ready to use lucreate and luupgrade -u to create a new boot environment and upgrade it to the release on the installation media.

8. Use the installcluster script from the Solaris patch cluster

It would be perfectly acceptable to unpack a Solaris recommended patch cluster and then use luupgrade -t to install the patches into an alternate boot environment. Live Upgrade will build a patch order file based on the metadata in all of the patches, and will generally do the right thing.

Occasionally, it might be more convenient to do things in a slightly different order, or handle patch installation errors just a bit better. That's what the installcluster script does. For corner cases where the patch order file might be incorrectly generated, the script builds its own installation order, working around some inconvenience situations. It also does a better job with error handling, perhaps trying a different way or sequence to install a problematic patch.

The most important difference between the two installation methods is how they report their progress. Let's take a look at the two and see which you like better. First luupgrade -t

# lucreate -n zippy
# luupgrade -t -s /export/patches/10_Recommended-2011-06-23/patches -n zippy
Validating patches...

Loading patches installed on the system...

Done!

Loading patches requested to install.

Architecture for package SUNWstaroffice-core01 from directory SUNWstaroffice-core01.i in patch 120186-22 differs from the package installed on the system.
Version of package SUNWmcosx from directory SUNWmcosx in patch 121212-02 differs from the package installed on the system.
Version of package SUNWmcos from directory SUNWmcos in patch 121212-02 differs from the package installed on the system.
..... lots of similar output deleted .......

The following requested patches are already installed on the system
Requested patch 113000-07 is already installed on the system.
Requested patch 117435-02 is already installed on the system.

..... more output deleted .......

The following requested patches do not update any packages installed on the system
No Packages from patch 121212-02 are installed on the system.
No Packages from patch 125540-06 are installed on the system.
No Packages from patch 125542-06 are installed on the system.

Checking patches that you specified for installation.

Done!

..... yet more output deleted .....

Approved patches will be installed in this order:

118668-32 118669-32 119281-25 119314-42 119758-20 119784-18 119813-13 119901-11
119907-18 120186-22 120544-22 121429-15 122912-25 123896-22 124394-11 124939-04
125138-28 125139-28 125216-04 125333-17 125732-06 126869-05 136999-10 137001-08
137081-05 138624-04 138823-08 138827-08 140388-02 140861-02 141553-04 143318-03
143507-02 143562-09 143600-10 143616-02 144054-04 144489-17 145007-02 145125-02
145797-01 145802-06 146020-01 146280-01 146674-01 146773-01 146803-02 146859-01
146862-01 147183-01 147228-01 147218-01 145081-04 145201-06

Checking installed patches...
Installing patch packages...

Patch 118668-32 has been successfully installed.
See /a/var/sadm/patch/118668-32/log for details

Patch packages installed:
  SUNWj5cfg
  SUNWj5dev
  SUNWj5dmo
  SUNWj5man
  SUNWj5rt

Checking installed patches...
Installing patch packages...

Patch 118669-32 has been successfully installed.
See /a/var/sadm/patch/118669-32/log for details

Patch packages installed:
  SUNWj5dmx
  SUNWj5dvx
  SUNWj5rtx

Checking installed patches...
Executing prepatch script...
Installing patch packages...

Patch 119281-25 has been successfully installed.
See /a/var/sadm/patch/119281-25/log for details
Executing postpatch script...

Patch packages installed:
  SUNWdtbas
  SUNWdtdst
  SUNWdtinc
  SUNWdtma
  SUNWdtmad
  SUNWmfrun

Checking installed patches...
Executing prepatch script...
Installing patch packages...

Patch 119314-42 has been successfully installed.
I think you get the picture. To gauge progress, you have to keep scrolling back to the list of packages, and find the one luupgrade is currently working on. After just a few minutes, the scroll buffer of your terminal window will be exhausted and you will be left guessing how long the operation will take to complete.

Let's compare this to the output from the installcluster script. Note the use of -d from an earlier recommendation.

# lucreate -n zippy
# ./installcluster -d -B zippy --s10cluster
Setup ..............


Recommended OS Cluster Solaris 10 x86 (2011.06.17)

Application of patches started : 2011.07.06 00:25:07

Applying 120901-03 (  1 of 216) ... skipped
Applying 121334-04 (  2 of 216) ... skipped
Applying 119255-81 (  3 of 216) ... skipped
Applying 119318-01 (  4 of 216) ... skipped
Applying 121297-01 (  5 of 216) ... skipped
Applying 138216-01 (  6 of 216) ... skipped
Applying 122035-05 (  7 of 216) ... skipped
Applying 127885-01 (  8 of 216) ... skipped
Applying 145045-03 (  9 of 216) ... skipped
Applying 142252-02 ( 10 of 216) ... skipped
Applying 125556-10 ( 11 of 216) ... skipped
Applying 140797-01 ( 12 of 216) ... skipped
Applying 113000-07 ( 13 of 216) ... skipped
Applying 117435-02 ( 14 of 216) ... skipped
Applying 118344-14 ( 15 of 216) ... skipped
Applying 118668-32 ( 16 of 216) ... success
Applying 118669-32 ( 17 of 216) ... success
Applying 118778-14 ( 18 of 216) ... skipped
Applying 121182-05 ( 19 of 216) ... skipped
Notice the nice clean output. You can always tell where you are in the installation process (nn out of 216) and there is not a lot of extra information cluttering up the controlling terminal.

9. Keep it Simple

This will be the most difficult and controverial of the survival tips, and that's why I have saved it for last. Remember that Live Upgrade must work across three releases and all the various patch combinations and clusters. At the very least, it stresses the installation programs and patching tools.

For a UFS root system, the administrator has a lot of control where the various file systems are laid out. All it takes is enough -m lines, or if that becomes too unwieldy, a list of slices in a control file passed by -M.

ZFS provides a significant simplification of the Solaris file systems, and it is expected that system adminstrators will take advantage of this. Of the Solaris directories (/, /usr, /etc, /var, /opt, /kernel, /platform, /bin, /sbin, /lib, /dev, /devices), only /var is allowed to be broken out into its own dataset. Many legacy operational procedures, some dating back to SunOS 4.x days, will have /usr, /usr/local, /opt, /var and /var/crash split into different file systems. Not only is this not a recommended practice for a ZFS root system, it may actually prevent the use of Live Upgrade. If forced to choose between Live Upgrade and my old configuration habits, I will take Live Upgrade every time.

There will be more

I hope to occasionally revise this article, adding new tips, or reworking some of the current ones. Yes, that would make this more of a Wiki type of document than your typical blog, and that might be where this ends up some day. Until then, feel free to bookmark this page and return to it as often as you need to, especially as you plan out your Live Upgrade activities.

If you have some tips and suggestions, please leave them in the comments. If I can work them up with good examples, I'll add them to the page (with full credit, of course). Technocrati Tags:

Thursday Jun 30, 2011

Common Live Upgrade Problems

As I have worked with customers deploying Live Upgrade in their environments, several problems seem to surface over and over. With this blog article, I will try to collect these troubles, as well as suggest some workarounds. If this sounds like the beginnings of a Wiki, you would be right. At present, there is not enough material for one, so we will use this blog for the time being. I do expect new material to be posted on occasion, so if you wish to bookmark it for future reference, a permanent link can be found here.

To help with your navigation, here is an index of the common problems.

  1. lucreate(1M) copies a ZFS root rather than making a clone
  2. luupgrade(1M) and the Solaris autoregistration file
  3. Watch out for an ever growing /var/tmp
Without any further delay, here are some common Live Upgrade problems.

Live Upgrade copies over ZFS root clone

This was introduced in Solaris 10 10/09 (u8) and the root of the problem is a duplicate entry in the source boot environments ICF configuration file. Prior to u8, a ZFS root file system was not included in /etc/vfstab, since the mount is implicit at boot time. Starting with u8, the root file system is included in /etc/vfstab, and when the boot environment is scanned to create the ICF file, a duplicate entry is recorded. Here's what the error looks like.
# lucreate -n s10u9-baseline
Checking GRUB menu...
System has findroot enabled GRUB
Analyzing system configuration.
Comparing source boot environment  file systems with the
file system(s) you specified for the new boot environment. Determining
which file systems should be in the new boot environment.
Updating boot environment description database on all BEs.
Updating system configuration files.
Creating configuration for boot environment .
Source boot environment is .
Creating boot environment .
Creating file systems on boot environment .
Creating  file system for  in zone  on .

The error indicator -----> /usr/lib/lu/lumkfs: test: unknown operator zfs

Populating file systems on boot environment .
Checking selection integrity.
Integrity check OK.
Populating contents of mount point .

This should not happen ------> Copying.

Ctrl-C and cleanup
If you weren't paying close attention, you might not even know this is an error. The symptoms are lucreate times that are way too long due to the extraneous copy, or the one that alerted me to the problem, the root file system is filling up - again thanks to a redundant copy.

This problem has already been identified and corrected, and a patch (121431-58 or later for x86, 121430-57 for SPARC) is available. Unfortunately, this patch has not yet made it into the Solaris 10 Recommended Patch Cluster. Applying the prerequisite patches from the latest cluster is a recommendation from the Live Upgrade Survival Guide blog, so an additional step will be required until the patch is included. Let's see how this works.

# patchadd -p | grep 121431
Patch: 121429-13 Obsoletes: Requires: 120236-01 121431-16 Incompatibles: Packages: SUNWluzone
Patch: 121431-54 Obsoletes: 121436-05 121438-02 Requires: Incompatibles: Packages: SUNWlucfg SUNWluu SUNWlur

# unzip 121431-58
# patchadd 121431-58
Validating patches...

Loading patches installed on the system...

Done!

Loading patches requested to install.

Done!

Checking patches that you specified for installation.

Done!


Approved patches will be installed in this order:

121431-58


Checking installed patches...
Executing prepatch script...
Installing patch packages...

Patch 121431-58 has been successfully installed.
See /var/sadm/patch/121431-58/log for details
Executing postpatch script...

Patch packages installed:
  SUNWlucfg
  SUNWlur
  SUNWluu

# lucreate -n s10u9-baseline
Checking GRUB menu...
System has findroot enabled GRUB
Analyzing system configuration.
INFORMATION: Unable to determine size or capacity of slice .
Comparing source boot environment  file systems with the
file system(s) you specified for the new boot environment. Determining
which file systems should be in the new boot environment.
INFORMATION: Unable to determine size or capacity of slice .
Updating boot environment description database on all BEs.
Updating system configuration files.
Creating configuration for boot environment .
Source boot environment is .
Creating boot environment .
Cloning file systems from boot environment  to create boot environment .
Creating snapshot for  on .
Creating clone for  on .
Setting canmount=noauto for  in zone  on .
Saving existing file  in top level dataset for BE  as //boot/grub/menu.lst.prev.
Saving existing file  in top level dataset for BE  as //boot/grub/menu.lst.prev.
Saving existing file  in top level dataset for BE  as //boot/grub/menu.lst.prev.
File  propagation successful
Copied GRUB menu from PBE to ABE
No entry for BE  in GRUB menu
Population of boot environment  successful.
Creation of boot environment  successful.
This time it took just a few seconds. A cursory examination of the offending ICF file (/etc/lu/ICF.3 in this case) shows that the duplicate root file system entry is now gone.
# cat /etc/lu/ICF.3
s10u8-baseline:-:/dev/zvol/dsk/panroot/swap:swap:8388608
s10u8-baseline:/:panroot/ROOT/s10u8-baseline:zfs:0
s10u8-baseline:/vbox:pandora/vbox:zfs:0
s10u8-baseline:/setup:pandora/setup:zfs:0
s10u8-baseline:/export:pandora/export:zfs:0
s10u8-baseline:/pandora:pandora:zfs:0
s10u8-baseline:/panroot:panroot:zfs:0
s10u8-baseline:/workshop:pandora/workshop:zfs:0
s10u8-baseline:/export/iso:pandora/iso:zfs:0
s10u8-baseline:/export/home:pandora/home:zfs:0
s10u8-baseline:/vbox/HardDisks:pandora/vbox/HardDisks:zfs:0
s10u8-baseline:/vbox/HardDisks/WinXP:pandora/vbox/HardDisks/WinXP:zfs:0
This error can show up in a slightly different form. When activating a new boot environment, propogation of the bootloader and configuration files may fail with an error indicating that an old boot enviromnent could not be mounted. That prevents the activation from taking place and you will find yourself booting back into the old BE.

Again, the root cause is the root file system entry in /etc/vfstab. Even though the mount at boot time flag is set to no, it confuses lumount(1M) as it cycles through duing the propogation phase. To correct this problem, boot back to the offending boot environment and remove the vfstab entry for /.

lucreate(1M) and the new (Solaris 10 10/09 and later) autoregistration file

This one is actually mentioned in the Oracle Solaris 9/10 release notes. I know, I hate it when that happens too.

Here's what the "error" looks like.

# luupgrade -u -s /mnt -n s10u9-baseline

System has findroot enabled GRUB
No entry for BE  in GRUB menu
Copying failsafe kernel from media.
61364 blocks
miniroot filesystem is 
Mounting miniroot at 
ERROR:
        The auto registration file <> does not exist or incomplete.
        The auto registration file is mandatory for this upgrade.
        Use -k  argument along with luupgrade command.
        autoreg_file is path to auto registration information file.
        See sysidcfg(4) for a list of valid keywords for use in
        this file.

        The format of the file is as follows.

                oracle_user=xxxx
                oracle_pw=xxxx
                http_proxy_host=xxxx
                http_proxy_port=xxxx
                http_proxy_user=xxxx
                http_proxy_pw=xxxx

        For more details refer "Oracle Solaris 10 9/10 Installation
        Guide: Planning for Installation and Upgrade".

As with the previous problem, this is also easy to work around. Assuming that you don't want to use the auto-registration feature at upgrade time, create a file that contains just autoreg=disable and pass the filename on to luupgrade.

Here is an example.

# echo "autoreg=disable" > /var/tmp/no-autoreg
# luupgrade -u -s /mnt -k /var/tmp/no-autoreg -n s10u9-baseline
 
System has findroot enabled GRUB
No entry for BE  in GRUB menu
Copying failsafe kernel from media.
61364 blocks
miniroot filesystem is 
Mounting miniroot at 
#######################################################################
 NOTE: To improve products and services, Oracle Solaris communicates
 configuration data to Oracle after rebooting.

 You can register your version of Oracle Solaris to capture this data
 for your use, or the data is sent anonymously.

 For information about what configuration data is communicated and how
 to control this facility, see the Release Notes or
 www.oracle.com/goto/solarisautoreg.

 INFORMATION: After activated and booted into new BE ,
 Auto Registration happens automatically with the following Information

autoreg=disable
#######################################################################
Validating the contents of the media .
The media is a standard Solaris media.
The media contains an operating system upgrade image.
The media contains  version <10>.
Constructing upgrade profile to use.
Locating the operating system upgrade program.
Checking for existence of previously scheduled Live Upgrade requests.
Creating upgrade profile for BE .
Checking for GRUB menu on ABE .
Saving GRUB menu on ABE .
Checking for x86 boot partition on ABE.
Determining packages to install or upgrade for BE .
Performing the operating system upgrade of the BE .
CAUTION: Interrupting this process may leave the boot environment unstable
or unbootable.
The Live Upgrade operation now proceeds as expected. Once the system upgrade is complete, we can manually register the system. If you want to do a hands off registration during the upgrade, see the Oracle Solaris Auto Registration section of the Oracle Solaris Release Notes for instructions on how to do that.

/var/tmp and the ever growing boot environment

Let's start with a clean installation of Solaris 10 10/09 (u8).
# df -k /
Filesystem                       kbytes    used   avail capacity  Mounted on
rpool/ROOT/s10x_u8wos_08a      20514816 4277560 13089687    25%    /

So far, so good. Solaris is just a bit over 4GB. Another 3GB is used by the swap and dump devices. That should leave plenty of room for half a dozen or so patch cycles (assuming 1GB each) and an upgrade to the next release.

Now, let's put on the latest recommended patch cluster. Note that I am following the suggestions in my Live Upgrade Survival Guide, installing the prerequisite patches and the LU patch before actually installing the patch cluster.

# cd /var/tmp
# wget patchserver:/export/patches/10_x86_Recommended-2012-01-05.zip .
# unzip -qq 10_x86_Recommended-2012-01-05.zip

# wget patchserver:/export/patches/121431-69.zip
# unzip 121431-69

# cd 10x_Recommended
# ./installcluster --apply-prereq --passcode (you can find this in README)

# patchadd -M /var/tmp 121431-69

# lucreate -n s10u8-2012-01-05
# ./installcluster -d -B s10u8-2012-01-05 --passcode

# luactivate s10u8-2012-01-05
# init 0

After the new boot environment is activated, let's upgrade to the latest release of Solaris 10. In this case, it will be Solaris 10 8/11 (u10).

Yes, this does seem like an awful lot is happening in a short period of time. I'm trying to demonstrate a situation that really does happen when you forget something as simple as a patch cluster clogging up /var/tmp. Think of this as one of those time lapse video sequences you might see in a nature documentary.

# pkgrm SUNWluu SUNWlur SUNWlucfg
# pkgadd -d /cdrom/sol_10_811_x86  SUNWluu SUNWlur SUNWlucfg
# patchadd -M /var/tmp 121431-69

# lucreate -n s10u10-baseline'
# echo "autoreg=disable" > /var/tmp/no-autoreg
# luupgrade -u -s /cdrom/sol_10_811_x86 -k /var/tmp/no-autoreg -n s10u10-baseline
# luactivate s10u10-baseline
# init 0
As before, everything went exactly as expected. Or I thought so, until I logged in the first time and checked the free space in the root pool.
# df -k /
Filesystem                       kbytes    used   avail capacity  Mounted on
rpool/ROOT/s10u10-baseline     20514816 10795038 2432308    82%    /
Where did all of the space go ? Back of the napkin calculations of 4.5GB (s10u8) + 4.5GB (s10u10) + 1GB (patch set) + 3GB (swap and dump) = 13GB. 20GB pool - 13GB used = 7GB free. But there's only 2.4GB free ?

This is about the time that I smack myself on the forehead and realize that I put the patch cluster in the /var/tmp. Old habits die hard. This is not a problem, I can just delete it, right ?

Not so fast.

# du -sh /var/tmp
 5.4G   /var/tmp

# du -sh /var/tmp/10*
 3.8G   /var/tmp/10_x86_Recommended
 1.5G   /var/tmp/10_x86_Recommended-2012-01-05.zip

# rm -rf /var/tmp/10*

# du -sh /var/tmp
 3.4M   /var/tmp

Imagine the look on my face when I check the pool free space, expecting to see 7GB free.
# df -k /
Filesystem                      kbytes    used   avail capacity  Mounted on
rpool/ROOT/s10u10-baseline    20514816 5074262 2424603    68%    /

We are getting closer, I suppose. At least my root filesystem size is reasonable (5GB vs 11GB). But the free space hasn't changed at all.

Once again, I smack myself on the forehead. The patch cluster is also in the other two boot environments. All I have to do is get rid them too, and I'll get my free space back. Right ?

# lumount s10u8-2012-01-05 /mnt
# rm -rf /mnt/var/tmp/10_x86_Recommended*
# luumount s10u8-2012-01-05

# lumount s10x_u8wos_08a /mnt
# rm -rf /mnt/var/tmp/10_x86_Recommended*
# luumount s10x_u8wos_08a
Surely, the free space will now be 7GB.
# df -k /
Filesystem                    kbytes    used   avail capacity  Mounted on
rpool/ROOT/s10u10-baseline  20514816 5074265 2429261    68%    /

This is when I smack myself on the forehead for the third time in one afternoon. Just getting rid of them in the boot environments is not sufficient. It would be if I were using UFS as a root filesystem, but lucreate will use the ZFS snapshot and cloning features when used on a ZFS root. So the patch cluster is in the snapshot, and the oldest one at that.

Let's try this all over again, but this time I will put the patches somewhere else that is not part of a boot environment. If you are thinking of using root's home directory, think again - it is part of the boot environment. If you are running out of ideas, let me suggest that /export/patches might be a good place to put them.

Doing the exercise again, with the patches in /export/patches, I get similar results (to be expected), but this time the patches are in a shared ZFS dataset (/export).

# lustatus
Boot Environment           Is       Active Active    Can    Copy      
Name                       Complete Now    On Reboot Delete Status    
-------------------------- -------- ------ --------- ------ ----------
s10x_u8wos_08a             yes      no     no        yes    -         
s10u8-2012-01-05           yes      no     no        yes    -         
s10u10-baseline            yes      yes    yes       no     -         

# df -k /
Filesystem                      kbytes    used   avail capacity  Mounted on
rpool/ROOT/s10u10-baseline    20514816 5184578 2445140    68%    /


# df -k /export
Filesystem                      kbytes    used   avail capacity  Mounted on
rpool/export                  20514816 5606384 2445142    70%    /export

This means that I can delete them, and reclaim the space.
# rm -rf /export/patches/10_x86_Recommended*

# df -k /
Filesystem                      kbytes    used   avail capacity  Mounted on
rpool/ROOT/s10u10-baseline    20514816 5184578 8048050    40%    /

Now, that's more like it. With this free space, I can continue to patch and maintain my system as I had originally planned - estimating a few hundred MB to 1.5GB per patch set.

Technocrati Tags:

Friday Sep 17, 2010

Getting Ready for Oracle Open World

Has it really been February since my last blog entry ? That can't be right, but I can't argue with the timestamps.

I would like to thank those of you that check back on my blog, looking for new things. Now that the Oracle acquisition is complete and we've been working in our new organizations for a while, I should be much more active.

That brings me to the reason for the blog today, Oracle Open World starts in just a few days. I still hve a lot to get done before then, but I'm starting to look forward to the big kickoff at 5:30pm on Sunday, September 19.

There are a number of great Solaris session and demonstrations planned, many of them given by members of the US Solaris team in the North American Commercial Hardware pillar (check them out on my blogroll, over in the far right column).

I will be giving two different sessions this year: a Introduction to ZFS Hands on Lab plus a talk on Best Practices for Patching Solaris. If you are interested in attending either or both of these sessions, here are the details.

Session:136261 - Introduction to the ZFS File System (Hands on Lab)
When: Tuesday, Sep 20. 11am - noon
Location: Marriott Marquis, Nob Hill


Session:316255 - Patching Best Practices for the Oracle Solaris Operating System
When: Thursday, Sep 22. 1:30pm - 2:30pm
Location: Moscone South, Room 310

I hope that some of you can stop by and say hello. As soon as I'm finished with my presentation materials, I'll post them here for your reference. If there's any interest in the ZFS hands on lab, I can post that too - just let me know. That one is actually a lot of fun and works great in a VirtualBox guest.

Next week I hope to blog things I've learned in the keynotes and sessions. I will also be posting out notes on Twitter, if I can get any bandwidth with 43,000 people in attendance. Feel free to follow me on Twitter if you want to receive these updates.

If you are traveling to San Francisco for the conference, safe travels.

Technocrati Tags:

Wednesday Nov 25, 2009

Oracle Database 11g Release 2 is now available for Solaris (SPARC and x86)

Oracle Database 11g Release 2 for Solaris is now available for download at the Oracle Technology Network Development. This includes 64 bit binaries for both SPARC and x86. This is pretty exciting news since this is the first time that x86 binaries for Solaris have been available at the same time as SPARC. A big thanks to our friends at Oracle for making this happen.

Let the upgrades and installations begin.

Technocrati Tags:

Monday Nov 23, 2009

Great Lakes OpenSolaris Users Group - Nov 2009

I would like to thank Chip Bennett and all of the fine folks from Laurus Technologies for hosting the November meeting of the Great Lakes OpenSolaris Users Group (GLUG), especially on such a short notice. It was a pleasure coming back and I enjoyed meeting up some some old friends and making some new ones.

We had a rather nice discussion around recent enhancements to ZFS. As promised, I have posted my slides for your review. Please let me know if you have any trouble downloading them or if you find any confusing or erroneous bits.

I appreciate all of the folks that turned out as well as those that connected to the webcast. I hope to see all of you again at a future meeting.

Sunday Nov 22, 2009

Taking ZFS deduplication for a test drive

Now that I have a working OpenSolaris build 128 system, I just had to take ZFS deduplication for a spin, to see if it was worth all of the hype.

Here is my test case: I have 2 directories of photos, totaling about 90MB each. And here's the trick - they are almost complete duplicates of each other. I downloaded all of the photos from the same camera on 2 different days. How many of you do that ? Yeah, me too.

Let's see what ZFS can figure out about all of this. If it is super smart we should end up with a total of 90MB of used space. That's what I'm hoping for.

The first step is to create the pool and turn on deduplication from the beginning.
# zpool create -f scooby -O dedup=on c2t2d0s2
This will use sha256 for determining if 2 blocks are the same. Since sha256 has such a low collision probability (something like 1x10\^-77), we will not turn on automatic verification. If we were using an algorithm like fletcher4 which has a higher collision rate we should also perform a complete block compare before allowing the block removal (dedup=fletcher4,verify)

Now copy the first 180MB (remember, this is 2 sets of 90MB which are nearly identical sets of photos).
# zfs create scooby/doo
# cp -r /pix/Alaska\* /scooby/doo
And the second set.
# zfs create scooby/snack
# cp -r /pix/Alaska\* /scooby/snack
And finally the third set.
# zfs create scooby/dooby
# cp -r /pix/Alaska\* /scooby/dooby
Let's make sure there are in fact three copies of the photos.
# df -k | grep scooby
scooby               74230572      25 73706399     1%    /scooby
scooby/doo           74230572  174626 73706399     1%    /scooby/doo
scooby/snack         74230572  174626 73706399     1%    /scooby/snack
scooby/dooby         74230572  174625 73706399     1%    /scooby/dooby


OK, so far so good. But I can't quite tell if the deduplication is actually doing anything. With all that free space, it's sort of hard to see. Let's look at the pool properties.
# zpool get all scooby
NAME    PROPERTY       VALUE       SOURCE
scooby  size           71.5G       -
scooby  capacity       0%          -
scooby  altroot        -           default
scooby  health         ONLINE      -
scooby  guid           5341682982744598523  default
scooby  version        22          default
scooby  bootfs         -           default
scooby  delegation     on          default
scooby  autoreplace    off         default
scooby  cachefile      -           default
scooby  failmode       wait        default
scooby  listsnapshots  off         default
scooby  autoexpand     off         default
scooby  dedupratio     5.98x       -
scooby  free           71.4G       -
scooby  allocated      86.8M       -
Now this is telling us something.

First notice the allocated space. Just shy of 90MB. But there's 522MB of data (174MB x 3). But only 87MB used out of the pool. That's a good start.

Now take a look at the dedupratio. Almost 6. And that's exactly what we would expect, if ZFS is as good as we are lead to believe. 3 sets of 2 duplicate directories is 6 total copies of the same set of photos. And ZFS caught every one of them.

So if you want to do this yourself, point your OpenSolaris package manager at the dev repository and wait for build 128 packages to show up. If you need instructions on using the OpenSolaris dev repository, point the browser of your choice at http://pkg.opensolaris.org/dev/en/index.shtml. And if you can't wait for the packages to show up, you can always .

Technocrati Tags:
<script type="text/javascript"> var sc_project=1193495; var sc_invisible=1; var sc_security="a46f6831"; </script> <script type="text/javascript" src="http://www.statcounter.com/counter/counter.js"></script>

Friday Oct 09, 2009

What's New in Solaris 10 10/09


Solaris 10 10/09 (u8) is now available for download at http://sun.com/solaris/get.jsp. DVD ISO images (full and segments that can be reassembled after download) are available for both SPARC and x86.

Here are a few of the new features in this release that caught my attention.

Packaging and Patching

Improved performance of SVR4 package commands: Improvements have been made in the SVR4 package commands (pkgadd, pkgrm, pkginfo et al). The impact of these can be seen in drastically reduced zone installation time. How much of an improvement you ask (and you know I have to answer with some data, right) ?
# cat /etc/release; uname -a

                        Solaris 10 5/09 s10x_u7wos_08 X86
           Copyright 2009 Sun Microsystems, Inc.  All Rights Reserved.
                        Use is subject to license terms.
                             Assembled 30 March 2009
SunOS chapterhouse 5.10 Generic_141415-09 i86pc i386 i86pc

# time zoneadm -z zone1 install
Preparing to install zone .
Creating list of files to copy from the global zone.
Copying <2905> files to the zone.
Initializing zone product registry.
Determining zone package initialization order.
Preparing to initialize <1453> packages on the zone.
Initialized <1453> packages on zone.
Zone  is initialized.
Installation of these packages generated errors: 
The file  contains a log of the zone installation.

real    5m48.476s
user    0m45.538s
sys     2m9.222s
#  cat /etc/release; uname -a

                       Solaris 10 10/09 s10x_u8wos_08a X86
           Copyright 2009 Sun Microsystems, Inc.  All Rights Reserved.
                        Use is subject to license terms.
                           Assembled 16 September 2009
SunOS corrin 5.10 Generic_141445-09 i86pc i386 i86pc

# time zoneadm -z zone1 install
Preparing to install zone .
Creating list of files to copy from the global zone.
Copying <2915> files to the zone.
Initializing zone product registry.
Determining zone package initialization order.
Preparing to initialize <1432> packages on the zone.
Initialized <1432> packages on zone.
Zone  is initialized.
Installation of these packages generated errors: 
The file  contains a log of the zone installation.

real    3m4.677s
user    0m44.593s
sys     0m48.003s
OK, that's pretty impressive. A zone installation on Solaris 10 10/09 takes about half of the time as it does on Solaris 10 5/09. It is also worth noting the rather large reduction in the amount of system time (48 seconds vs 129 seconds) too.

Zones parallel patching: Before Solaris 10 10/09 the patching process was single threaded which could lead to prolonged patching time on a system with several nonglobal zones. Starting with this update you can specify the number of threads to be used to patch a system with zones. Enable this feature by assigning a value to num_proc in /etc/patch/pdo.conf. The maximum value is capped at 1.5 times the number of on-line CPUs, but can be limited by a lower value of num_proc.

This feature is also available by applying Solaris patches 119254-66 (SPARC) or 119255-66 (x86).

For more information on the effects of zone parallel patching, see Container Guru Jeff Victor's excellent Patching Zones Goes Zoom.

ZFS Enhancements

Flash archive install into a ZFS root filesystem: ZFS support for the root file system was introduced in Solaris 10 10/08 but the install tools did not work with flash archives. Solaris 10 10/09 provides the ability to install a flash archive created from an existing ZFS root system. This capability is also provided by patches 119534-15 + 124630-26 (SPARC) or 119535-15 + 124631-27 (x86) that can be applied to a Solaris 10 10/08 or later system. There are still a few limitations such as the the flash source must be from a ZFS root system and you cannot use differential archives. More information can be found in Installing a ZFS Root File System (Flash Archive Installation).

Set ZFS properties on the initial zpool file system: Prior to Solaris 10 10/09, ZFS file system properties could only be set once the initial file system was created. This would make it impossible to create a pool with same name as an existing mounted file system or to be able to have replication or compression from the time the pool is created. In Solaris 10 10/09 you can specify any ZFS file system property using zpool -O.
 zpool create -O mountpoint=/data,copies=3,compression=on datapool c1t1d0 c1t2d0
ZFS Read Cache (L2ARC): You now have the ability to add persistent read ahead caches to a ZFS zpool. This can improve the read performance of ZFS as well as reducing the ZFS memory footprint.

L2ARC devices are added as cache vdevs to a pool. In the following example we will create a pool of 2 mirrored devices, 2 cache devices and a spare.
 
# zpool create datapool mirror c1t1d0 c1t2d0 cache c1t3d0 c1t4d0 spare c1t5d0

# zpool status datapool
  pool: datapool
 state: ONLINE
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        datapool    ONLINE       0     0     0
          mirror    ONLINE       0     0     0
            c1t1d0  ONLINE       0     0     0
            c1t2d0  ONLINE       0     0     0
        cache
          c1t3d0    ONLINE       0     0     0
          c1t4d0    ONLINE       0     0     0
        spares
          c1t5d0    AVAIL

errors: No known data errors
So what do ZFS cache devices do ? Rather than go into a lengthy explanation of the L2ARC, I would rather refer you to Fishworks developer Brendan Gregg's excellent treatment of the subject.

Unlike the intent log (ZIL), L2ARC cache devices can be added and removed dynamically.
# zpool remove datapool c1t3d0
# zpool remove datapool c1t4d0

# zpool status datapool
  pool: datapool
 state: ONLINE
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        datapool    ONLINE       0     0     0
          mirror    ONLINE       0     0     0
            c1t1d0  ONLINE       0     0     0
            c1t2d0  ONLINE       0     0     0
        spares
          c1t5d0    AVAIL

errors: No known data errors


# zpool add datapool cache c1t3d0

# zpool status datapool
  pool: datapool
 state: ONLINE
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        datapool    ONLINE       0     0     0
          mirror    ONLINE       0     0     0
            c1t1d0  ONLINE       0     0     0
            c1t2d0  ONLINE       0     0     0
        cache
          c1t3d0    ONLINE       0     0     0
        spares
          c1t5d0    AVAIL

errors: No known data errors
New cache control properties: Two new ZFS properties are introduced with Solaris 10 10/09. These control what what is stored (nothing, data + metadata, or metadata only) in the ARC (memory) and L2ARC (external) caches. These new properties are
  • primarycache - controls what is stored in the memory resident ARC cache
  • secondarycache - controls what is stored in the L2ARC
and they can take the values
  • none - the caches are not used
  • metadata - only file system metadata is cached
  • all - both file system data and the metadata is stored in the associated cache
# zpool create -O primarycache=metadata -O secondarycache=all datapool c1t1d0 c1t2d0 cache c1t3d0 
There are workloads such as databases that perform better or make more efficient use of memory if the system is not competing with the caches that the applications are maintaining themselves.

User and group quotas:ZFS has always had quotas and reservations but they were applied at the file system level. To achieve user or group quotas would require creating additional file systems which might make administration more complex. Starting with Solaris 10 10/09 you can apply both user and group quotas to a file system much like you would with UFS. The ZFS file system must be at version 15 or later and the zpool must be at version 4 or later.

Let's create a file system and see if we are at the proper versions to set quotas.
# zfs create rpool/newdata
# chown bobn:local /rpool/newdata

# zpool get version rpool
NAME   PROPERTY  VALUE    SOURCE
rpool  version   18       default


# zpool upgrade -v
This system is currently running ZFS pool version 18.

The following versions are supported:

VER  DESCRIPTION
---  --------------------------------------------------------
 1   Initial ZFS version
 2   Ditto blocks (replicated metadata)
 3   Hot spares and double parity RAID-Z
 4   zpool history
 5   Compression using the gzip algorithm
 6   bootfs pool property
 7   Separate intent log devices
 8   Delegated administration
 9   refquota and refreservation properties
 10  Cache devices
 11  Improved scrub performance
 12  Snapshot properties
 13  snapused property
 14  passthrough-x aclinherit
 15  user/group space accounting
 16  stmf property support
 17  Triple-parity RAID-Z
 18  snapshot user holds
For more information on a particular version, including supported releases, see:

http://www.opensolaris.org/os/community/zfs/version/N

Where 'N' is the version number.


# zfs get version rpool/newdata
NAME           PROPERTY  VALUE    SOURCE
rpool/newdata  version   4 

# zfs upgrade -v
The following filesystem versions are supported:

VER  DESCRIPTION
---  --------------------------------------------------------
 1   Initial ZFS filesystem version
 2   Enhanced directory entries
 3   Case insensitive and File system unique identifier (FUID)
 4   userquota, groupquota properties

For more information on a particular version, including supported releases, see:

http://www.opensolaris.org/os/community/zfs/version/zpl/N

Where 'N' is the version number.
Excellent. Now let's set a user and group quota and see what happens. We'll set a group quota of 1GB and a user quota at 2GB.
# zfs set groupquota@local=1g rpool/newdata
# zfs set userquota@bobn=2g rpool/newdata

# su - bobn

% mkfile 500M /rpool/newdata/file1
% mkfile 500M /rpool/newdata/file2
% mkfile 500M /rpool/newdata/file3
file3: initialized 40370176 of 524288000 bytes: Disc quota exceeded

As expected, we have exceeded our group quota. Let's change the group of the existing files and see if we can proceed to our user quota.
% rm /rpool/newdata/file3
% chgrp sales /rpool/newdata/file1 /rpool/newdata/file2
% mkfile 500m /rpool/newdata/file3
Could not open /rpool/newdata/disk3: Disc quota exceeded

Whoa! What's going on here ? Relax - ZFS does things asynchronously unless told otherwise. And we should have noticed this when the mkfile for file3 actually started. ZFS wasn't quite caught up with the current usage. A good sync should do the trick.
% sync
% mkfile 500M /rpool/newdata/file3
% mkfile 500M /rpool/newdata/file4
% mkfile 500M /rpool/newdata/file5
/rpool/newdata/disk5: initialized 140247040 of 524288000 bytes: Disc quota exceeded

Great. We now have user and group quotas. How can I find out what I have used against my quota ? There are two new ZFS properties, userused and groupused that will show what the group or user is currently consuming.
% zfs get userquota@bobn,userused@bobn rpool/newdata
NAME           PROPERTY        VALUE           SOURCE
rpool/newdata  userquota@bobn  2G              local
rpool/newdata  userused@bobn   1.95G           local

% zfs get groupquota@local,groupused@local rpool/newdata
NAME           PROPERTY          VALUE             SOURCE
rpool/newdata  groupquota@local  1G                local
rpool/newdata  groupused@local   1000M             local

% zfs get groupquota@sales,groupused@sales rpool/newdata
NAME           PROPERTY          VALUE             SOURCE
rpool/newdata  groupquota@sales  none              local
rpool/newdata  groupused@sales   1000M             local

% zfs get groupquota@scooby,groupused@scooby rpool/newdata
NAME           PROPERTY           VALUE              SOURCE
rpool/newdata  groupquota@scooby  -                  -
rpool/newdata  groupused@scooby   -   
New space usage properties: Four new usage properties have been added to ZFS file systems.
  • usedbychildren (usedchild) - this is the amount of space that is used by all of the children of the specified dataset
  • usedbydataset (usedds) - this is the total amount of space that would be freed if this dataset and it's snapshots and reservations were destroyed
  • usedbyrefreservation (usedrefreserv) - this is the amount of space that would be freed if the dataset's reservations were to be removed
  • usertbysnapshots (usedsnap) - the total amount of space that would be freed if all of the snapshots of this dataset were deleted.
# zfs get all datapool | grep used
datapool  used                  5.39G                  -
datapool  usedbysnapshots       19K                    -
datapool  usedbydataset         26K                    -
datapool  usedbychildren        5.39G                  -
datapool  usedbyrefreservation  0                      -


These new properties can also be viewed in a nice tabular form using zfs list -o space.
# zfs list -r -o space datapool
NAME           AVAIL   USED  USEDSNAP  USEDDS  USEDREFRESERV  USEDCHILD
datapool        480M  5.39G       19K     26K              0      5.39G
datapool@now       -    19K         -       -              -          -
datapool/fs1    480M   400M         0    400M              0          0
datapool/fs2   1.47G  1.00G         0   1.00G              0          0
datapool/fs3    480M    21K         0     21K              0          0
datapool/fs4   2.47G      0         0       0              0          0
datapool/vol1  1.47G     1G         0     16K          1024M          0

Miscellaneous

Support for 2TB boot disks: Solaris 10 10/09 supports a disk Volume Table of Contents (VTOC) of up to 2TB in size. The previous maximum VTOC size was 1TB. On x86 systems you must be running Solaris with a 64bit kernel and have at least 1GB of memory to use a VTOC larger that 1TB.

pcitool: A new command for Solaris that can assign interrupts to specific threads or display the current interrupt routing. This command is available for both SPARC and x86.

New iSCSI initiator SMF service: svc:/network/iscsi/initiator:default is a new Service Management Facility (SMF) service to control discovery and enumeration of iSCSI devices early in the boot process. Other boot services that may require iSCSI services can add dependencies to insure that the devices are available before being needed.

Device Drivers

The following device drivers are either new to Solaris or have had some new features or chipsets added.
  • MPxIO support for the LSI 6180 Controller
  • LSI MPT 2.0 SAS 2.0 controllers (mpt_sas)
  • Broadcom NetXTreme II gigabit Ethernet (bcm5716c and bcm5716s) controllers
  • Interrupt remapping for Intel VT-x enabled processors
  • Support for SATA AHCI tape
  • Sun StorageTek 6Gb/s SAS RAID controller and LSI MegaRAID 92xx (mt_sas)
  • Intel 82598 and 82599 10Gb/s PCIe Ethernet controller

Open Source Software Updates

The following open source packages have been updated for Solaris 10 10/09.
  • NTP 4.2.5
  • PostgreSQL versions 8.1.17, 8.2.13 and 8.3.7
  • Samba 3.0.35

For more information

A complete list of new features and changes can be found in the Solaris 10 10/09 Release Notes and the What's New in Solaris 10 10/09 documentation at docs.sun.com.

Technocrati Tags:

Thursday May 21, 2009

Getting Rid of Pesky Live Upgrade Boot Environments

As we discussed earlier, Live Upgrade can solve most of the problems associated with patching and upgrading your Solaris system. I'm not quite ready to post the next installment in the LU series quite yet, but from some of the comments and email I have received, there are two problems that I would like to help you work around.

Oh where oh where did that file system go ?

One thing you can do to stop Live Upgrade in its tracks is to remove a file system that it thinks another boot environment needs. This does fall into the category of user error, but you are more likely to run into this in a ZFS world where file systems can be created and destroyed with great ease. You will also run into a varient of this if you change your zone configurations without recreating your boot environment, but I'll save that for a later day.

Here is our simple test case:
  1. Create a ZFS file system.
  2. Create a new boot environment.
  3. Delete the ZFS file system.
  4. Watch Live Upgrade fail.

# zfs create arrakis/temp

# lucreate -n test
Checking GRUB menu...
System has findroot enabled GRUB
Analyzing system configuration.
Comparing source boot environment <s10u7-baseline> file systems with the
file system(s) you specified for the new boot environment. Determining
which file systems should be in the new boot environment.
Updating boot environment description database on all BEs.
Updating system configuration files.
Creating configuration for boot environment <test>.
Source boot environment is <s10u7-baseline>.
Creating boot environment <test>.
Cloning file systems from boot environment <s10u7-baseline> to create boot environment <test>.
Creating snapshot for <rpool/ROOT/s10u7-baseline> on <rpool/ROOT/s10u7-baseline@test>.
Creating clone for <rpool/ROOT/s10u7-baseline@test> on <rpool/ROOT/test>.
Setting canmount=noauto for </> in zone <global> on <rpool/ROOT/test>.
Saving existing file </boot/grub/menu.lst> in top level dataset for BE <s10u6_baseline> as <mount-point>>//boot/grub/menu.lst.prev.
Saving existing file </boot/grub/menu.lst> in top level dataset for BE <test> as <mount-point>//boot/grub/menu.lst.prev.
Saving existing file </boot/grub/menu.lst> in top level dataset for BE <nv114> as <mount-point>//boot/grub/menu.lst.prev.
Saving existing file </boot/grub/menu.lst> in top level dataset for BE <route66> as <mount-point>//boot/grub/menu.lst.prev.
Saving existing file </boot/grub/menu.lst> in top level dataset for BE <nv95> as <mount-point>//boot/grub/menu.lst.prev.
File </boot/grub/menu.lst> propagation successful
Copied GRUB menu from PBE to ABE
No entry for BE <test> in GRUB menu
Population of boot environment <test> successful.
Creation of boot environment <test> successful.

# zfs destroy arrakis/temp

# luupgrade -t -s /export/patches/10_x86_Recommended-2009-05-14  -O "-d" -n test
System has findroot enabled GRUB
No entry for BE <test> in GRUB menu
Validating the contents of the media </export/patches/10_x86_Recommended-2009-05-14>.
The media contains 143 software patches that can be added.
All 143 patches will be added because you did not specify any specific patches to add.
Mounting the BE <test>.
ERROR: Read-only file system: cannot create mount point </.alt.tmp.b-59c.mnt/arrakis/temp>
ERROR: failed to create mount point </.alt.tmp.b-59c.mnt/arrakis/temp> for file system </arrakis/temp>
ERROR: unmounting partially mounted boot environment file systems
ERROR: cannot mount boot environment by icf file </etc/lu/ICF.5>
ERROR: Unable to mount ABE <test>: cannot complete lumk_iconf
Adding patches to the BE <test>.
Validating patches...

Loading patches installed on the system...

Cannot check name /a/var/sadm/pkg.
Unmounting the BE <test>.
The patch add to the BE <test> failed (with result code <1>).
The proper Live Upgrade solution to this problem would be to destroy and recreate the boot environment, or just recreate the missing file system (I'm sure that most of you have figured the latter part out on your own). The rationale is that the alternate boot environment no longer matches the storage configuration of its source. This was fine in a UFS world, but perhaps a bit constraining when ZFS rules the landscape. What if you really wanted the file system to be gone forever.

With a little more understanding of the internals of Live Upgrade, we can fix this rather easily.

Important note: We are about to modify undocumented Live Upgrade configuration files. The formats, names, and contents are subject to change without notice and any errors made while doing this can render your Live Upgrade configuration unusable.

The file system configurations for each boot environment are kept in a set of Internal Configuration Files (ICF) in /etc/lu named ICF.n, where n is the boot environment number. From the error message above we see that /etc/lu/ICF.5 is the one that is causing the problem. Let's take a look.
# cat /etc/lu/ICF.5
test:-:/dev/dsk/c5d0s1:swap:4225095
test:-:/dev/zvol/dsk/rpool/swap:swap:8435712
test:/:rpool/ROOT/test:zfs:0
test:/archives:/dev/dsk/c1t0d0s2:ufs:327645675
test:/arrakis:arrakis:zfs:0
test:/arrakis/misc:arrakis/misc:zfs:0
test:/arrakis/misc2:arrakis/misc2:zfs:0
test:/arrakis/stuff:arrakis/stuff:zfs:0

test:/arrakis/temp:arrakis/temp:zfs:0

test:/audio:arrakis/audio:zfs:0
test:/backups:arrakis/backups:zfs:0
test:/export:arrakis/export:zfs:0
test:/export/home:arrakis/home:zfs:0
test:/export/iso:arrakis/iso:zfs:0
test:/export/linux:arrakis/linux:zfs:0
test:/rpool:rpool:zfs:0
test:/rpool/ROOT:rpool/ROOT:zfs:0
test:/usr/local:arrakis/local:zfs:0
test:/vbox:arrakis/vbox:zfs:0
test:/vbox/fedora8:arrakis/vbox/fedora8:zfs:0
test:/video:arrakis/video:zfs:0
test:/workshop:arrakis/workshop:zfs:0
test:/xp:/dev/dsk/c2d0s7:ufs:70396830
test:/xvm:arrakis/xvm:zfs:0
test:/xvm/fedora8:arrakis/xvm/fedora8:zfs:0
test:/xvm/newfs:arrakis/xvm/newfs:zfs:0
test:/xvm/nv113:arrakis/xvm/nv113:zfs:0
test:/xvm/opensolaris:arrakis/xvm/opensolaris:zfs:0
test:/xvm/s10u5:arrakis/xvm/s10u5:zfs:0
test:/xvm/ub710:arrakis/xvm/ub710:zfs:0
The first step is to clean up the mess left by the failing luupgrade attempt. At the very least we will need to unmount the alternate boot environment root. It is also very likely that we will have to unmount a few temporary directories, such as /tmp and /var/run. Since this is ZFS we will also have to remove the directories created when these file systems were mounted.
# df -k | tail -3
rpool/ROOT/test      49545216 6879597 7546183    48%    /.alt.tmp.b-Fx.mnt
swap                 4695136       0 4695136     0%    /a/var/run
swap                 4695136       0 4695136     0%    /a/tmp

# luumount test
# umount /a/var/run
# umount /a/tmp
# rmdir /a/var/run /a/var /a/tmp

Next we need to remove the missing file system entry from the current copy of the ICF file. Use whatever method you prefer (vi, perl, grep). Once we have corrected our local copy of the ICF file we must propagate it to the alternate boot environment we are about to patch. You can skip the propagation if you are going to delete the boot environment without doing any other maintenance activities. The normal Live Upgrade operations will take care of propagating the ICF files to the other boot environments, so we should not have to worry about them at this time.
# mv /etc/lu/ICF.5 /tmp/ICF.5
# grep -v arrakis/temp /tmp/ICF.5 > /etc/lu/ICF.5 
# cp /etc/lu/ICF.5 `lumount test`/etc/lu/ICF.5
# luumount test
At this point we should be good to go. Let's try the luupgrade again.
# luupgrade -t -n test -O "-d" -s /export/patches/10_x86_Recommended-2009-05-14
System has findroot enabled GRUB
No entry for BE  in GRUB menu
Validating the contents of the media .
The media contains 143 software patches that can be added.
All 143 patches will be added because you did not specify any specific patches to add.
Mounting the BE <test>.
Adding patches to the BE <test>.
Validating patches...

Loading patches installed on the system...

Done!

Loading patches requested to install.

Approved patches will be installed in this order:

118668-19 118669-19 119214-19 123591-10 123896-10 125556-03 139100-02


Checking installed patches...
Verifying sufficient filesystem capacity (dry run method)...
Installing patch packages...

Patch 118668-19 has been successfully installed.
Patch 118669-19 has been successfully installed.
Patch 119214-19 has been successfully installed.
Patch 123591-10 has been successfully installed.
Patch 123896-10 has been successfully installed.
Patch 125556-03 has been successfully installed.
Patch 139100-02 has been successfully installed.

Unmounting the BE <test>.
The patch add to the BE <test> completed.
Now that the alternate boot environment has been patched, we can activate it at our convenience.

I keep deleting and deleting and still can't get rid of those pesky boot environments

This is an interesting corner case where the Live Upgrade configuration files get so scrambled that even simple tasks like deleting a boot environment are not possible. Every time I have gotten myself into this situation I can trace it back to some ill advised shortcut that seemed harmless at the time, but I won't rule out bugs and environment as possible causes.

Here is our simple test case: turn our boot environment from the previous example into a zombie - something that is neither alive nor dead but just takes up space and causes a mild annoyance.

Important note: Don't try this on a production system. This is for demonstration purposes only.
# dd if=/dev/random of=/etc/lu/ICF.5 bs=2048 count=2
0+2 records in
0+2 records out

# ludelete -f test
System has findroot enabled GRUB
No entry for BE <test> in GRUB menu
ERROR: The mount point </.alt.tmp.b-fxc.mnt> is not a valid ABE mount point (no /etc directory found).
ERROR: The mount point </.alt.tmp.b-fxc.mnt> provided by the <-m> option is not a valid ABE mount point.
Usage: lurootspec [-l error_log] [-o outfile] [-m mntpt]
ERROR: Cannot determine root specification for BE <test>.
ERROR: boot environment <test> is not mounted
Unable to delete boot environment.
Our first task is to make sure that any partially mounted boot environment is cleaned up. A df should help us here.
# df -k | tail -5
arrakis/xvm/opensolaris 350945280      19 17448377     1%    /xvm/opensolaris
arrakis/xvm/s10u5    350945280      19 17448377     1%    /xvm/s10u5
arrakis/xvm/ub710    350945280      19 17448377     1%    /xvm/ub710
swap                 4549680       0 4549680     0%    /.alt.tmp.b-fxc.mnt/var/run
swap                 4549680       0 4549680     0%    /.alt.tmp.b-fxc.mnt/tmp


# umount /.alt.tmp.b-fxc.mnt/tmp
# umount /.alt.tmp.b-fxc.mnt/var/run
Ordinarily you would use lufslist(1M) to try to determine which file systems are in use by the boot environment you are trying to delete. In this worst case scenario that is not possible. A bit of forensic investigation and a bit more courage will help us figure this out.

The first place we will look is /etc/lutab. This is the configuration file that lists all boot environments known to Live Upgrade. There is a man page for this in section 4, so it is somewhat of a public interface but please take note of the warning
 
        The lutab file must not be edited by hand. Any user  modifi-
        cation  to  this file will result in the incorrect operation
        of the Live Upgrade feature.
This is very good advice and failing to follow it has led some some of my most spectacular Live Upgrade meltdowns. But in this case Live Upgrade is already broken and it may be possible to undo the damage and restore proper operation. So let's see what we can find out.
# cat /etc/lutab
# DO NOT EDIT THIS FILE BY HAND. This file is not a public interface.
# The format and contents of this file are subject to change.
# Any user modification to this file may result in the incorrect
# operation of Live Upgrade.
3:s10u5_baseline:C:0
3:/:/dev/dsk/c2d0s0:1
3:boot-device:/dev/dsk/c2d0s0:2
1:s10u5_lu:C:0
1:/:/dev/dsk/c5d0s0:1
1:boot-device:/dev/dsk/c5d0s0:2
2:s10u6_ufs:C:0
2:/:/dev/dsk/c4d0s0:1
2:boot-device:/dev/dsk/c4d0s0:2
4:s10u6_baseline:C:0
4:/:rpool/ROOT/s10u6_baseline:1
4:boot-device:/dev/dsk/c4d0s3:2
10:route66:C:0
10:/:rpool/ROOT/route66:1
10:boot-device:/dev/dsk/c4d0s3:2
11:nv95:C:0
11:/:rpool/ROOT/nv95:1
11:boot-device:/dev/dsk/c4d0s3:2
6:s10u7-baseline:C:0
6:/:rpool/ROOT/s10u7-baseline:1
6:boot-device:/dev/dsk/c4d0s3:2
7:nv114:C:0
7:/:rpool/ROOT/nv114:1
7:boot-device:/dev/dsk/c4d0s3:2
5:test:C:0
5:/:rpool/ROOT/test:1
5:boot-device:/dev/dsk/c4d0s3:2
We can see that the boot environment named test is (still) BE #5 and has it's root file system at rpool/ROOT/test. This is the default dataset name and indicates that the boot environment has not been renamed. Consider the following example for a more complicated configuration.
# lucreate -n scooby
# lufslist scooby | grep ROOT
rpool/ROOT/scooby       zfs            241152 /                   -
rpool/ROOT              zfs       39284664832 /rpool/ROOT         -

# lurename -e scooby -n doo
# lufslist doo | grep ROOT
rpool/ROOT/scooby       zfs            241152 /                   -
rpool/ROOT              zfs       39284664832 /rpool/ROOT         -
The point is that we have to trust the contents of /etc/lutab but it does not hurt to do a bit of sanity checking before we start deleting ZFS datasets. To remove boot environment test from the view of Live Upgrade, delete the three lines in /etc/lutab starting with 5 (in this example). We should also remove it's Internal Configuration File (ICF) /etc/lu/ICF.5
# mv -f /etc/lutab /etc/lutab.old
# grep -v \^5: /etc/lutab.old > /etc/lutab
# rm -f /etc/lu/ICF.5

# lustatus
Boot Environment           Is       Active Active    Can    Copy      
Name                       Complete Now    On Reboot Delete Status    
-------------------------- -------- ------ --------- ------ ----------
s10u5_baseline             yes      no     no        yes    -         
s10u5_lu                   yes      no     no        yes    -         
s10u6_ufs                  yes      no     no        yes    -         
s10u6_baseline             yes      no     no        yes    -         
route66                    yes      no     no        yes    -         
nv95                       yes      yes    yes       no     -         
s10u7-baseline             yes      no     no        yes    -         
nv114                      yes      no     no        yes    -         
If the boot environment being deleted is in UFS then we are done. Well, not exactly - but pretty close. We still need to propagate the updated configuration files to the remaining boot environments. This will be done during the next live upgrade operation (lucreate, lumake, ludelete, luactivate) and I would recommend that you let Live Upgrade handle this part. The exception to this will be if you boot directly into another boot environment without activating it first. This isn't a recommended practice and has been the source of some of my most frustrating mistakes.

If the exorcised boot environment is in ZFS then we still have a little bit of work to do. We need to delete the old root datasets and any snapshots that they may have been cloned from. In our example the root dataset was rpool/ROOT/test. We need to look for any children as well as the originating snapshot, if present.
# zfs list -r rpool/ROOT/test
NAME                  USED  AVAIL  REFER  MOUNTPOINT
rpool/ROOT/test       234K  6.47G  8.79G  /.alt.test
rpool/ROOT/test/var    18K  6.47G    18K  /.alt.test/var

# zfs get -r origin rpool/ROOT/test
NAME             PROPERTY  VALUE                 SOURCE
rpool/ROOT/test  origin    rpool/ROOT/nv95@test  -
rpool/ROOT/test/var  origin    rpool/ROOT/nv95/var@test    
       
# zfs destroy rpool/ROOT/test/var
# zfs destroy rpool/ROOT/nv95/var@test
# zfs destroy rpool/ROOT/test
# zfs destroy rpool/ROOT/nv95@test
Important note:luactivate will promote the newly activated root dataset so that snapshots used to create alternate boot environments should be easy to delete. If you are switching between boot environments without activating them first (which I have already warned you about doing), you may have to manually promote a different dataset so that the snapshots can be deleted.

To BE or not to BE - how about no BE ?

You may find yourself in a situation where you have things so scrambled up that you want to start all over again. We can use what we have just learned to unwind Live Upgrade and start from a clean configuration. Specifically we want to delete /etc/lutab, the ICF and related files, all of the temporary files in /etc/lu/tmp and a few files that hold environment variables for some of the lu scripts. And if using ZFS we will also have to delete any datasets and snapshots that are no longer needed.
 
# rm -f /etc/lutab 
# rm -f /etc/lu/ICF.* /etc/lu/INODE.* /etc/lu/vtoc.*
# rm -f /etc/lu/.??*
# rm -f /etc/lu/tmp/* 

# lustatus
ERROR: No boot environments are configured on this system
ERROR: cannot determine list of all boot environment names

# lucreate -c scooby -n doo
Checking GRUB menu...
Analyzing system configuration.
No name for current boot environment.
Current boot environment is named <scooby>.
Creating initial configuration for primary boot environment <scooby>.
The device </dev/dsk/c4d0s3> is not a root device for any boot environment; cannot get BE ID.
PBE configuration successful: PBE name <scooby> PBE Boot Device </dev/dsk/c4d0s3>.
Comparing source boot environment <scooby> file systems with the file 
system(s) you specified for the new boot environment. Determining which 
file systems should be in the new boot environment.
Updating boot environment description database on all BEs.
Updating system configuration files.
Creating configuration for boot environment <doo>.
Source boot environment is <scooby>.
Creating boot environment <doo>.
Cloning file systems from boot environment <scooby> to create boot environment <doo>.
Creating snapshot for <rpool/ROOT/scooby> on <rpool/ROOT/scooby@doo>.
Creating clone for <rpool/ROOT/scooby@doo> on <rpool/ROOT/doo>.
Setting canmount=noauto for </> in zone <global> on <rpool/ROOT/doo>.
Saving existing file </boot/grub/menu.lst> in top level dataset for BE <doo> as <mount-point>//boot/grub/menu.lst.prev.
File </boot/grub/menu.lst> propagation successful
Copied GRUB menu from PBE to ABE
No entry for BE <doo> in GRUB menu
Population of boot environment <doo> successful.
Creation of boot environment <doo> successful.

# luactivate doo
System has findroot enabled GRUB
Generating boot-sign, partition and slice information for PBE 

File  deletion successful
File  deletion successful
File  deletion successful
Activation of boot environment  successful.

# lustatus
Boot Environment           Is       Active Active    Can    Copy      
Name                       Complete Now    On Reboot Delete Status    
-------------------------- -------- ------ --------- ------ ----------
scooby                     yes      yes    no        no     -         
doo                        yes      no     yes       no     -        
Pretty cool, eh ?

There are still a few more interesting corner cases, but we will deal with those in the one of the next articles. In the mean time, please remember to
  • Check Infodoc 206844 for Live Upgrade patch requirements
  • Keep your patching and package utilities updated
  • Use luactivate to switch between boot environments


Technocrati Tags: <script type="text/javascript"> var sc_project=1193495; var sc_invisible=1; var sc_security="a46f6831"; </script> <script type="text/javascript" src="http://www.statcounter.com/counter/counter.js"></script>

Tuesday Mar 24, 2009

Adobe releases an x86 version of Acroread 9.1 for Solaris

Great Googly Moogly!!! Our friends at Adobe have finally released a new x86 version of Acroread for Solaris. Download Acroread 9.1 from Adobe.com and say goodbye to evince, xpdf, and the especially interesting Acroread out of the Linux branded zone trick.
Click image to enlarge 

Sunday Mar 22, 2009

Dr. Live Upgrade - Or How I Learned to Stop Worrying and Love Solaris Patching

Who loves to patch or upgrade a system ?

That's right, nobody. Or if you do perhaps we should start a local support group to help you come to terms with this unusual fascination. Patching, and to a lesser extent upgrades (which can be thought of as patches delivered more efficiently through package replacement), is the the most common complaint that I hear when meeting with system administrators and their management.

Most of the difficulties seem to fit into one of the following categories.
  • Analysis: What patches need to be applied to my system ?
  • Effort: What do I have to do to perform the required maintenance ?
  • Outage: How long will the system be down to perform the maintenance ?
  • Recovery: What happens when something goes wrong ?
And if a single system gives you a headache, adding a few containers into the mix will bring on a full migraine. And without some relief you may be left with the impression that containers aren't worth the effort. That's unfortunate because containers don't have to be troublesome and patching doesn't have to be hard. But it does take getting to know one of the most important and sadly least used features in Solaris: Live Upgrade

Before we looking at Live Upgrade, let's start with a definition. A boot environment is the set of all file systems and devices that are unique to an instance of Solaris on a system. If you have several boot environments then some data will be shared (non svr4 package installed applications, data, local home directories) and some will be exclusive to one boot environment. Not making this more complicated than it needs to be, a boot environment is generally your root (including /usr and /etc), /var (frequently split out on a separate file system), and /opt. Swap may or may not be a part of a boot environment - it is your choice. I prefer to share swap, but there are some operational situations where this may not be feasible. There may be additional items, but generally everything else is shared. Network mounted file systems and removable media are assumed to be shared.

With this definition behind us, let's proceed.

Analysis: What patches need to be applied to my system ?

For all of the assistance that Live Upgrade offers, it doesn't do anything to help with the analysis phase. Fortunately there are plenty of tools that can help with this phase. Some of them work nicely with Live Upgrade, others take a bit more effort.

smpatch(1M) has an analyze capability that can determine which patches need to be applied to your system. It will get a list of patches from an update server, most likely one at Sun, and match up the dependencies and requirements with your system. smpatch can be used to download these patches for future application or it can apply them for you. smpatch works nicely with Live Upgrade, so from a single command you can upgrade an alternate boot environment. With containers!

The Sun Update Manager is a simple to use graphical front end for smpatch. It gives you a little more flexibility during the inspection phase by allowing you to look at individual patch README files. It is also much easier to see what collection a patch belongs to (recommended, security, none) and if the application of that patch will require a reboot. For all of that additional flexibility you lose the integration with Live Upgrade. Not for lack of trying, but I have not found a good way to make Update Manager and Live Upgrade play together.

Sun xVM Ops Center has a much more sophisticated patch analysis system that uses additional knowledge engines beyond those used by smpatch and Update Manager. The result is a higher quality patch bundle tailored for each individual system, automated deployment of the patch bundle, detailed auditing of what was done and simple backout should problems occur. And it basically does the same for Windows and Linux. It is this last feature that makes things interesting. Neither Windows nor Linux have anything like Live Upgrade and the least common denominator approach of Ops Center in its current state means that it doesn't work with Live Upgrade. Fortunately this will change in the not too distant future, and when it does I will be shouting about this feature from rooftops (OK, what I really mean is I'll post a blog and a tweet about it). If I can coax Ops Center into doing the analysis and download pieces then I can manually bolt it onto Live Upgrade for a best of both worlds solution.

These are our offerings and there are others. Some of them are quite good and in use in many places. Patch Check Advanced (PCA) is one of the more common tools in use. It operates on a patch dependency cross reference file and does a good job with the dependency analysis (this is obsoleted by that, etc). It can be used to maintain an alternate boot environment and in simple cases that would be fine. If the alternate boot environment contains any containers then I would use Live Upgrade's luupgrade instead of PCA's patchadd -R approach. If I was familiar with PCA then I would still use it for the analysis and download feature. Just let luupgrade apply the patches. You might have to uncompress the patches downloaded by PCA before handing them over to luupgrade, but that is a minor implementation detail.

In summary, use an analysis tool appropriate to the task (based on familiarity, budget and complexity) to figure out what patches are needed. Then use Live Upgrade (luupgrade) to deploy the desired patches.

Effort: What does it take to perform the required maintenance ?

This is a big topic and I could write pages on the subject. Even if I use an analysis tool like smpatch or pca to save me hours of trolling through READMEs drawing dependency graphs, there is still a lot of work to do in order to survive the ordeal of applying patches. Some of the more common techniques include ....
Backing up your boot environment.
I should not have to mention this, but there are some operational considerations unique to system maintenance. Even though tiny, there is a greater chance that you will render your system non-bootable during system maintenance than any other operational task. Even with mature processes, human factors can come into play and bad things can happen (oops - that was my fallback boot environment that I just ran newfs(1M) on).

This is why automation and time tested scripting becomes so important. Should you do the unthinkable and render a system nonfunctional, rapid restoration of the boot environment is important. And getting it back to the last known good state is just as important. A fresh backup that can be restored by utilities from install media or jumpstart miniroot is a very good idea. Flash archives (see flarcreate(1M)) is even better, although complications with containers make this less interesting now than in previous releases of Solaris. How many of you take a backup before applying patches ? Probably about the same number as replace batteries in your RAID controllers or change out your UPS systems after their expiration date.

Split Mirrors
One interesting technique is to split mirrors instead of backups. Of course this only works if you mirror your boot environment (a recommended practice for those systems with adequate disk space). Break your mirror, apply patches to the non-running half, cut over the updated boot environment during the next maintenance window and see how this goes. At first glance this seems like a good idea, but there are two catches.
  1. Do you synchronize dynamic boot environment elements ? Things like /etc/passwd, /etc/shadow, /var/adm/messages, print and mail queues are constantly changing. It is possible that these have changed between the mirror split and subsequent activation.
  2. How long are you willing to run without your boot environment being mirrored ? This may cause to you certify the new boot environment too quickly. You want to reestablish your mirror, but if that is your fallback in case of trouble you have a conundrum. And if you are the sort that seems to have a black cloud following you through life, you will discover a problem shortly after you started the mirror resync.
Pez disks ?
OK, the mirror split thing can be solved by swinging in another disk. Operationally a bit more complex and you have at least one disk that you can't use for other purposes (like hosting a few containers), but it can be done. I wouldn't do it (mainly because I know where this story is heading) but many of you do.
Better living through Live Upgrade
Everything we do to try to make it better adds complexity, or another hundred lines of scripting. It doesn't need to be this way, and if you become one with the LU commands it won't for you either. Live Upgrade will take care building and updating multiple boot environments. It will check to make sure the disks being used are bootable and not part of another boot environment. It works with the Solaris Volume Manager, Veritas encapulated root devices, and starting with Solaris 10 10/08 (update 6) ZFS. It also takes care of the synchronization problem. Starting with Solaris 10 8/07 (update 4), Live Upgrade also works with containers, both native and branded (and with Solaris 10 10/08 your zoneroots can be in a ZFS pool).

Outage: How long will my system be down for the maintenance?

Or perhaps more to the point, how long will my applications be unavailable ? The proper reply is it depends on how big the patch bundle is and how many containers you have. And if a kernel patch is involved, double or triple your estimate. This can be a big problem and cause you to take short cuts like only install some patches now and others later when it is more convenient. Our good friend Bart Smaalders has a nice discussion on the implications of this approach and what we are doing in OpenSolaris to solve this. That solution will eventually work its way into the Next Solaris, but in the mean time we have a problem to solve.

There is a large set (not really large, but more than one) of patches that require a quiescent system to be properly applied. An example would be a kernel patch that causes a change to libc. It is sort of hard to rip out libc on a running system (new processes get the new libc my may have issues with the running kernel, old processes get the old libc and tend to be fine, until they do a fork(2) and exec(2)). So we developed a brilliant solution to this problem - deferred activation patching. If you apply one of these troublesome patches then we will throw it in a queue to be applied the next time the system is quiesced (a fancy term for the next time we're in single user mode). This solves the current system stability concerns but may make the next reboot take a bit longer. And if you forgot you have deferred patches in your queue, don't get anxious and interrupt the shutdown or next boot. Grab a noncaffeinated beverage and put some Bobby McFerrin on your iPod. Don't Worry, Be Happy.

So deferred activation patching seems like a good way to deal with situation where everything goes well. And some brilliant engineers are working on applying patches in parallel (where applicable) which will make this even better. But what happens when things go wrong ? This is when you realize that patchrm(1M) is not your friend. It has never been your friend, nor will it ever be. I have an almost paralyzing fear of dentists, but would rather visit one then start down a path where patchrm is involved. Well tested tools and some automation can reduce this to simple anxiety, but if I could eliminate patchrm altogether I would be much happier.

For all that Live Upgrade can do to ease system maintenance, it is in the area of outage and recovery that make it special. And when speaking about Solaris, either in training or evangelism events, this is why I urge attendees to drop whatever they are doing and adopt Live Upgrade immediately.

Since Live Upgrade (lucreate, lumake, luupgrade) operates on an alternate boot environment, the currently running set of applications are not affected. The system stays up, applications stay running and nothing is changing underneath them so there is no cause for concern. The only impact is some additional load by the live upgrade operations. If that is a concern then run live upgrade in a project and cap resource consumption to that project.

An interesting implication of Live Upgrade is that the operational sanity of each step is no longer required. All that matters is the end state. This gives us more freedom to apply patches in a more efficient fashion than would be possible on a running boot environment. This is especially noticeable on a system with containers. The time that the upgrade runs is significantly reduced, and all the while applications are running. No more deferred activation patches, no more single user mode patching. And if all goes poorly after activating the new boot environment you still have your old one to fall back on. Queue Bobby McFerrin for another round of "Don't Worry, Be Happy".

This brings up another feature of Live Upgrade - the synchronization of system files in flight between boot environments. After a boot environment is activated, a synchronization process is queued as a K0 script to be run during shutdown. Live Upgrade will catch a lot of private files that we know about and the obvious public ones (/etc/passwd, /etc/shadow, /var/adm/messages, mail queues). It also provides a place (/etc/lu/synclist) for you to include things we might not have thought about or are unique to your applications.

When using Live Upgrade applications are only unavailable for the amount of time it takes to shut down the system (the synchronization process) and boot the new boot environment. This may include some minor SMF manifest importing but that should not add much to the new boot time. You only have to complete the restart during a maintenance window, not the entire upgrade. While vampires are all the rage for teenagers these days, system administrators can now come out into the light and work regular hours.

Recovery: What happens when something goes wrong?

This is when you will fully appreciate Live Upgrade. After activation of a new boot environment, now called the Primary Boot Environment (PBE), your old boot environment, now called an Alternate Boot Environment (ABE) can still be called upon in case of trouble. Just activate it and shut down the system. Applications will be down for a short period (the K0 sync and subsequence start up), but there will be no more wringing of the hands, reaching for beverages with too much caffeine and vitamin B12, trying to remember where you kept your bottle of Tums. Queue Bobby McFerrin one more timne and "Don't Worry, Be Happy". You will be back to your previous operational state in a matter of a few minutes (longer if you have a large server with many disks). Then you can mount up your ABE and troll through the logs trying to determine what went wrong. If you have a service contract then we will troll through the logs with you.

I neglected to mention earlier, disks that comprise boot environments can be mirrored, so there is no rush to certification. Everything can be mirrored, at all times. Which is a very good thing. You still need to back up your boot environments, but you will find yourself reaching for the backup media much less often when using Live Upgrade.

All that is left are a few simple examples of how to use Live Upgrade. I'll save that for next time.

Technocrati Tags:
About

Bob Netherton is a Principal Sales Consultant for the North American Commercial Hardware group, specializing in Solaris, Virtualization and Engineered Systems. Bob is also a contributing author of Solaris 10 Virtualization Essentials.

This blog will contain information about all three, but primarily focused on topics for Solaris system administrators.

Please follow me on Twitter Facebook or send me email

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today