Thursday Dec 20, 2012

pkg fix is my friend - a followup

We bloggers appreciate questions and comments about what we post, whether privately in email or attached as comments to some article. In my last post, a reader asked a set of questions that were so good, I didn't want them to get lost down in the comments section. A big thanks to David Lange for asking these questions. I shall try to answer them here (perhaps with a bit more detail than you might have wanted).

Does the pkg fix reinstall binaries if the hash or chksum doesn't match?

Yes, it does. Let's actually see this in action, and then we will take a look at where it is getting the information required to correct the error.

Since I'm working on a series of Solaris 11 Automated Installer (AI) How To articles, installadm seems a good choice to damage, courtesy of the random number generator.

# ls /sbin/install*
/sbin/install             /sbin/installadm-convert  /sbin/installf
/sbin/installadm          /sbin/installboot         /sbin/installgrub

# cd /sbin
# mv installadm installadm-

# dd if=/dev/random of=/sbin/installadm bs=8192 count=32
0+32 records in
0+32 records out

# ls -la installadm*
-rw-r--r--   1 root     root       33280 Dec 18 18:50 installadm
-r-xr-xr-x   1 root     bin        12126 Dec 17 08:36 installadm-
-r-xr-xr-x   1 root     bin        74910 Dec 17 08:36 installadm-convert
OK, that should do it. Unless I am terribly unlucky, those random bytes will produce something that doesn't match the stored hash value of the installadm binary.

This time, I will begin the repair process with a pkg verify, just to see what is broken.

# pkg verify installadm
PACKAGE                                                                 STATUS 
pkg://solaris/install/installadm                                         ERROR

	file: usr/sbin/installadm
		Group: 'root (0)' should be 'bin (2)'
		Mode: 0644 should be 0555
		Size: 33280 bytes should be 12126
		Hash: 2e862c7ebd5dce82ffd1b30c666364f23e9118b5 
                     should be 68374d71b9cb91b458a49ec104f95438c9a149a7
For clarity, I have removed all of the compiled python module errors. Most of these have been corrected in Solaris 11.1, but you may see these occasionally when doing a pkg verify.

Since we have a real package error, let's correct it.

# pkg fix installadm
Verifying: pkg://solaris/install/installadm                     ERROR          

	file: usr/sbin/installadm
		Group: 'root (0)' should be 'bin (2)'
		Mode: 0644 should be 0555
		Size: 33280 bytes should be 12126
		Hash: 2e862c7ebd5dce82ffd1b30c666364f23e9118b5 
                     should be 68374d71b9cb91b458a49ec104f95438c9a149a7
Created ZFS snapshot: 2012-12-19-00:51:00
Repairing: pkg://solaris/install/installadm                  
                                                                               

DOWNLOAD                                  PKGS       FILES    XFER (MB)
Completed                                  1/1       24/24      0.1/0.1

PHASE                                        ACTIONS
Update Phase                                   24/24 

PHASE                                          ITEMS
Image State Update Phase                         2/2 
We can now run installadm as if it was never damaged.
# installadm list

Service Name     Alias Of       Status  Arch   Image Path 
------------     --------       ------  ----   ---------- 
default-i386     solaris11-i386 on      x86    /install/solaris11-i386
solaris11-i386   -              on      x86    /install/solaris11-i386
solaris11u1-i386 -              on      x86    /install/solaris11u1-i386
Oh, if you are wondering about that hash, it is a SHA1 checksum.
# digest -a sha1 /usr/sbin/installadm
68374d71b9cb91b458a49ec104f95438c9a149a7

If so does IPS keep the installation binaries in a depot or have to point to the originating depot to fix the problem?

IPS does keep a local cache of package attributes. Before diving into some of these details, it should be known that some, if not all of these, are private details of the current implementation of IPS, and can change in the future. Always consult the command and configuration file man pages before using any of these in scripts. In this case, the relevant information would be in pkg(5) (i.e. man -s 5 pkg).

Our first step is to identify which publisher has provided the package that is currently installed. In my case, there is only one (solaris), but in a large and mature enterprise deployment, there could be many publishers.

# pkg info installadm
pkg info installadm
          Name: install/installadm
       Summary: installadm utility
   Description: Automatic Installation Server Setup Tools
      Category: System/Administration and Configuration
         State: Installed
     Publisher: solaris
       Version: 0.5.11
 Build Release: 5.11
        Branch: 0.175.0.0.0.2.1482
Packaging Date: October 19, 2011 12:26:24 PM 
          Size: 1.04 MB
          FMRI: pkg://solaris/install/installadm@0.5.11,5.11-0.175.0.0.0.2.1482:20111019T122624Z
From this we have learned that the actual package name is install/installadm and the publisher is in fact, solaris. We have also learned that the version of installadm comes from the original Solaris 11 GA release (5.11-0.175.0.0). That will allow us to go take a look at some of the configuration files (private interface warning still in effect).

Note: Since package names contain slashes (/), we will have to encode them as %2F to keep the shell from interpreting them as a directory delimiter.

# cd /var/pkg/publisher/solaris/pkg/install%2Finstalladm
# ls -la
drwxr-xr-x   2 root     root           4 Dec 18 00:55 .
drwxr-xr-x 818 root     root         818 Dec 17 08:36 ..
-rw-r--r--   1 root     root       25959 Dec 17 08:36
            0.5.11%2C5.11-0.175.0.0.0.2.1482%3A20111019T122624Z
-rw-r--r--   1 root     root       26171 Dec 18 00:55
            0.5.11%2C5.11-0.175.0.13.0.3.0%3A20121026T213106Z
The file 0.5.11%2C5.11-0.175.0.0.0.2.1482%3A20111019T122624Z is the one we are interested in.
# digest -a sha1 /usr/sbin/installadm
68374d71b9cb91b458a49ec104f95438c9a149a7

# grep 68374d71b9cb91b458a49ec104f95438c9a149a7 *
file 68374d71b9cb91b458a49ec104f95438c9a149a7
chash=a5c14d2f8cc854dbd4fa15c3121deca6fca64515 group=bin mode=0555 
owner=root path=usr/sbin/installadm pkg.csize=3194 pkg.size=12126

That's how IPS knows our version of installadm has been tampered with. Since it is more than just changing attributes of the files, it has to download a new copy of the damaged files, in this case from the solaris publisher (or one of its mirrors). To keep from making this worse, it also makes a snapshot of the current boot environment, in case things go terribly wrong - which they do not.

Armed with this information, we can use some other IPS features, such as searching by binary hash.

# pkg search -r 68374d71b9cb91b458a49ec104f95438c9a149a7
INDEX                                    ACTION VALUE               PACKAGE
68374d71b9cb91b458a49ec104f95438c9a149a7 file   usr/sbin/installadm 
                 pkg:/install/installadm@0.5.11-0.175.0.0.0.2.1482
... or by name
# pkg search -r installadm
INDEX       ACTION VALUE                      PACKAGE
basename    dir    usr/lib/installadm         pkg:/install/installadm@0.5.11-0.175.0.0.0.2.1482
basename    dir    var/installadm             pkg:/install/installadm@0.5.11-0.175.0.0.0.2.1482
basename    file   usr/sbin/installadm        pkg:/install/installadm@0.5.11-0.175.0.0.0.2.1482
pkg.fmri    set    solaris/install/installadm pkg:/install/installadm@0.5.11-0.175.0.0.0.2.1482
pkg.summary set    installadm utility         pkg:/install/installadm@0.5.11-0.175.0.0.0.2.1482
And finally...
# pkg contents -m installadm

..... lots of output truncated ......

file 68374d71b9cb91b458a49ec104f95438c9a149a7 chash=a5c14d2f8cc854dbd4fa15c3121deca6fca64515 
group=bin mode=0555 owner=root path=usr/sbin/installadm pkg.csize=3194 pkg.size=12126
There is our information using a public and stable interface. Now you know, not only where IPS caches the information, but a predictable way to retrieve it, should you ever need to do so.

As with the verify and fix operations, this is much more helpful than the SVR4 packaging commands in Solaris 10 and earlier.

Given that customers might come up with their own ideas of keeping pkgs at various levels, could they be shooting themselves in the foot and creating such a customized OS that it causes problems?

Stephen Hahn has written quite a bit on the origins of IPS, both on his archived Sun blog as well as on the OpenSolaris pkg project page. While it is a fascinating and useful read, the short answer is that IPS helps prevent this from happening - certainly much more so than with the previous packaging system.

The assistance comes in several ways.

Full packages: Since IPS delivers full packages only, that eliminates one of the most confusing and frustrating aspects of the legacy Solaris packaging system. Every time you update a package with IPS, you get a complete version of the software, the way it was assembled and tested at Oracle (and presumably other publishers as well). No more patch order files and, perhaps more important, no more complicated scripts to automate the patching process.

Dependencies: A rich dependency mechanism allows the package maintainer to guarantee that other related software is at a compatible version. This includes incorporations, which protect large groups of software, such as the basic desktop, GNOME, auto-install and the userland tools. Although not a part of dependencies, facets allow for the control of optional software components - locales being a good example.

Boot environments: Solaris 10 system administrators can enjoy many of the benefits of IPS boot environment integration by using Live Upgrade and ZFS as a root file system. IPS takes this to the next level by automatically performing important operations, such as upgrading the pkg package when needed or taking a snapshot before performing any risky actions.

Expanding your question just a bit, IPS provides one new capability that should make updates much more predictable. If there is some specific component that an application requires, its version can be locked within a range. Here is an example, albeit a rather contrived one.

# pkg list -af jre-6
NAME (PUBLISHER)                                  VERSION                    IFO
runtime/java/jre-6                                1.6.0.37-0.175.1.2.0.3.0   ---
runtime/java/jre-6                                1.6.0.35-0.175.1.0.0.24.1  ---
runtime/java/jre-6                                1.6.0.35-0.175.0.11.0.4.0  ---
runtime/java/jre-6                                1.6.0.33-0.175.0.10.0.2.0  ---
runtime/java/jre-6                                1.6.0.33-0.175.0.9.0.2.0   ---
runtime/java/jre-6                                1.6.0.32-0.175.0.8.0.4.0   ---
runtime/java/jre-6                                1.6.0.0-0.175.0.0.0.2.0    i--
Suppose that we have an application that is tied to version 1.6.0.0 of the java runtime. You can lock it at that version and IPS will prevent you from applying any upgrade that would change it. In this example, an attempt to upgrade to SRU8 (which introduces version 1.6.0.32 of jre-6) will fail.
# pkg freeze -c "way cool demonstration of IPS" jre-6@1.6.0.0
runtime/java/jre-6 was frozen at 1.6.0.0

# pkg list -af jre-6
pkg list -af jre-6
NAME (PUBLISHER)                                  VERSION                    IFO
runtime/java/jre-6                                1.6.0.37-0.175.1.2.0.3.0   ---
runtime/java/jre-6                                1.6.0.35-0.175.1.0.0.24.1  ---
runtime/java/jre-6                                1.6.0.35-0.175.0.11.0.4.0  ---
runtime/java/jre-6                                1.6.0.33-0.175.0.10.0.2.0  ---
runtime/java/jre-6                                1.6.0.33-0.175.0.9.0.2.0   ---
runtime/java/jre-6                                1.6.0.32-0.175.0.8.0.4.0   ---
runtime/java/jre-6                                1.6.0.0-0.175.0.0.0.2.0    if-

# pkg update --be-name s11ga-sru08  entire@0.5.11-0.175.0.8
What follows is a lengthy set of complaints about not being able to satisfy all of the constraints, conveniently pointing back to our frozen package.

But wait, there's more. IPS can figure out the latest update it can apply that satisfies the frozen package constraint. In this example, it should find SRU7.

# pkg update --be-name s11ga-sru07
            Packages to update:  89
       Create boot environment: Yes
Create backup boot environment:  No

DOWNLOAD                                  PKGS       FILES    XFER (MB)
Completed                                89/89   3909/3909  135.7/135.7

PHASE                                        ACTIONS
Removal Phase                                720/720 
Install Phase                                889/889 
Update Phase                               5066/5066 

PHASE                                          ITEMS
Package State Update Phase                   178/178 
Package Cache Update Phase                     89/89 
Image State Update Phase                         2/2 

A clone of solaris exists and has been updated and activated.
On the next boot the Boot Environment s11ga-sru07 will be
mounted on '/'.  Reboot when ready to switch to this updated BE.


---------------------------------------------------------------------------
NOTE: Please review release notes posted at:

http://www.oracle.com/pls/topic/lookup?ctx=E23824&id=SERNS
---------------------------------------------------------------------------
When the system is rebooted, a quick look shows that we are indeed running with SRU7.

Perhaps we were too restrictive in locking down jre-6 to version 1.6.0.0. In this example, we will loosen the constraint to any 1.6.0 version, but prohibit upgrades that change it to 1.6.1. Note that I did not have to unfreeze the package as a new pkg freeze will replace the preceding one.

# pkg freeze jre-6@1.6.0
runtime/java/jre-6 was frozen at 1.6.0

# pkg list -af jre-6
NAME (PUBLISHER)                                  VERSION                    IFO
runtime/java/jre-6                                1.6.0.37-0.175.1.2.0.3.0   -f-
runtime/java/jre-6                                1.6.0.35-0.175.1.0.0.24.1  -f-
runtime/java/jre-6                                1.6.0.35-0.175.0.11.0.4.0  -f-
runtime/java/jre-6                                1.6.0.33-0.175.0.10.0.2.0  -f-
runtime/java/jre-6                                1.6.0.33-0.175.0.9.0.2.0   -f-
runtime/java/jre-6                                1.6.0.32-0.175.0.8.0.4.0   -f-
runtime/java/jre-6                                1.6.0.0-0.175.0.0.0.2.0    if-
This shows that all versions are available for upgrade (i.e. , they all satisfy the frozen package constraint).

Once again, IPS gives us a wonderful capability that is missing in the legacy packaging system.

When you perform a pkg update on a system are we guaranteed a highly tested configuration that has gone thru multiple regression tests?

Short answer: yes.

For the details, I will turn your attention to our friend, Gerry Haskins, and his two excellent blogs: The Patch Corner (Solaris 10 and earlier) and Solaris 11 Maintenance Lifecycle. Both are excellent reads and I encourage everybody to add them to your RSS reader of choice.

Of particular note is Gerry's presentation, Solaris 11 Customer Maintenance Lifecycle, which goes into some great detail about patches, upgrades and the like. If you dig back to around the time that Solaris 10 9/10(u9) was released, you will find a links to a pair of interesting documents titled Oracle Integrated Stack - Complete, Trusted Enterprise Solutions and Trust Your Enterprise Deployments to the Oracle Product Stack: The integrated platform that's been developed, tested and certified to get the job done. These documents describe several test environments, including the Oracle Certification Environment (OCE) and Oracle Automated Stress Test (OAST). All Solaris 10 patches and Solaris 11 package updates (including Oracle Solaris Cluster) are put through these tests prior to release. The result is a higher confidence that patches will not introduce stability or performance problems, negating the old practice of putting a release or patch bundle on the shelf while somebody else finds all of the problems. Local testing on your own equipment is still a necessary practice, but you are able to move more quickly to a new release thanks to these additional testing environments.

If I am allowed to ask a follow up question, it would be something like, "what can I do proactively to keep my system as current as possible and reduce the risks of bad patch or package interactions?"

That is where the Critical Patch Updates come into play. Solaris 11 Support Repository Updates (SRU) come out approximately once per month. Every third one (generally) is special and becomes the CPU for Solaris. If you have a regular cadence for applying CPUs or Patch Set Updates (PSU) for your other Oracle software, choose the corresponding SRU that has been designated as that quarter's CPU. You can find this information in My Oracle Support (MOS), on the Oracle Technology Network (OTN), or just read Gerry's blog in mid January, April, July and October.

Thanks again to David Lange for asking such good questions. I hope the answers helped.

Tuesday Dec 11, 2012

Solaris 11 pkg fix is my new friend

While putting together some examples of the Solaris 11 Automated Installer (AI), I managed to really mess up my system, to the point where AI was completely unusable. This was my fault as a combination of unfortunate incidents left some remnants that were causing problems, so I tried to clean things up. Unsuccessfully. Perhaps that was a bad idea (OK, it was a terrible idea), but this is Solaris 11 and there are a few more tricks in the sysadmin toolbox.

Here's what I did.

# rm -rf /install/*
# rm -rf /var/ai

# installadm create-service -n solaris11-x86 --imagepath /install/solaris11-x86 \
                 -s solaris-auto-install@5.11-0.175.0

Warning: Service svc:/network/dns/multicast:default is not online.
   Installation services will not be advertised via multicast DNS.

Creating service from: solaris-auto-install@5.11-0.175.0
DOWNLOAD                                PKGS         FILES    XFER (MB)   SPEED
Completed                                1/1       130/130  264.4/264.4    0B/s

PHASE                                          ITEMS
Installing new actions                       284/284
Updating package state database                 Done 
Updating image state                            Done 
Creating fast lookup database                   Done 
Reading search index                            Done 
Updating search index                            1/1 

Creating i386 service: solaris11-x86

Image path: /install/solaris11-x86
So far so good. Then comes an oops.....
setup-service[168]: cd: /var/ai//service/.conf-templ: [No such file or directory]
                                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
This is where you generally say a few things to yourself, and then promise to quit deleting configuration files and directories when you don't know what you are doing. Then you recall that the new Solaris 11 packaging system has some ability to correct common mistakes (like the one I just made). Let's give it a try.
# pkg fix installadm
Verifying: pkg://solaris/install/installadm                     ERROR
        dir: var/ai
                Group: 'root (0)' should be 'sys (3)'
        dir: var/ai/ai-webserver
                Missing: directory does not exist
        dir: var/ai/ai-webserver/compatibility-configuration
                Missing: directory does not exist
        dir: var/ai/ai-webserver/conf.d
                Missing: directory does not exist
        dir: var/ai/image-server
                Group: 'root (0)' should be 'sys (3)'
        dir: var/ai/image-server/cgi-bin
                Missing: directory does not exist
        dir: var/ai/image-server/images
                Group: 'root (0)' should be 'sys (3)'
        dir: var/ai/image-server/logs
                Missing: directory does not exist
        dir: var/ai/profile
                Missing: directory does not exist
        dir: var/ai/service
                Group: 'root (0)' should be 'sys (3)'
        dir: var/ai/service/.conf-templ
                Missing: directory does not exist
        dir: var/ai/service/.conf-templ/AI_data
                Missing: directory does not exist
        dir: var/ai/service/.conf-templ/AI_files
                Missing: directory does not exist
        file: var/ai/ai-webserver/ai-httpd-templ.conf
                Missing: regular file does not exist
        file: var/ai/service/.conf-templ/AI.db
                Missing: regular file does not exist
        file: var/ai/image-server/cgi-bin/cgi_get_manifest.py
                Missing: regular file does not exist
Created ZFS snapshot: 2012-12-11-21:09:53
Repairing: pkg://solaris/install/installadm                  
Creating Plan (Evaluating mediators): |

DOWNLOAD                                PKGS         FILES    XFER (MB)   SPEED
Completed                                1/1           3/3      0.0/0.0    0B/s

PHASE                                          ITEMS
Updating modified actions                      16/16
Updating image state                            Done 
Creating fast lookup database                   Done 
In just a few moments, IPS found the missing files and incorrect ownerships/permissions. Instead of reinstalling the system, or falling back to an earlier Live Upgrade boot environment, I was able to create my AI services and now all is well.
# installadm create-service -n solaris11-x86 --imagepath /install/solaris11-x86 \
                   -s solaris-auto-install@5.11-0.175.0
Warning: Service svc:/network/dns/multicast:default is not online.
   Installation services will not be advertised via multicast DNS.

Creating service from: solaris-auto-install@5.11-0.175.0
DOWNLOAD                                PKGS         FILES    XFER (MB)   SPEED
Completed                                1/1       130/130  264.4/264.4    0B/s

PHASE                                          ITEMS
Installing new actions                       284/284
Updating package state database                 Done 
Updating image state                            Done 
Creating fast lookup database                   Done 
Reading search index                            Done 
Updating search index                            1/1 

Creating i386 service: solaris11-x86

Image path: /install/solaris11-x86

Refreshing install services
Warning: mDNS registry of service solaris11-x86 could not be verified.

Creating default-i386 alias

Setting the default PXE bootfile(s) in the local DHCP configuration
to:
bios clients (arch 00:00):  default-i386/boot/grub/pxegrub


Refreshing install services
Warning: mDNS registry of service default-i386 could not be verified.

# installadm create-service -n solaris11u1-x86 --imagepath /install/solaris11u1-x86 \
                    -s solaris-auto-install@5.11-0.175.1
Warning: Service svc:/network/dns/multicast:default is not online.
   Installation services will not be advertised via multicast DNS.

Creating service from: solaris-auto-install@5.11-0.175.1
DOWNLOAD                                PKGS         FILES    XFER (MB)   SPEED
Completed                                1/1       514/514  292.3/292.3    0B/s

PHASE                                          ITEMS
Installing new actions                       661/661
Updating package state database                 Done 
Updating image state                            Done 
Creating fast lookup database                   Done 
Reading search index                            Done 
Updating search index                            1/1 

Creating i386 service: solaris11u1-x86

Image path: /install/solaris11u1-x86

Refreshing install services
Warning: mDNS registry of service solaris11u1-x86 could not be verified.

# installadm list

Service Name    Alias Of      Status  Arch   Image Path 
------------    --------      ------  ----   ---------- 
default-i386    solaris11-x86 on      i386   /install/solaris11-x86
solaris11-x86   -             on      i386   /install/solaris11-x86
solaris11u1-x86 -             on      i386   /install/solaris11u1-x86


This is way way better than pkgchk -f in Solaris 10. I'm really beginning to like this new IPS packaging system.

Wednesday Aug 15, 2012

Pre-work for Upcoming Solaris 11 Boot Camps

Over the next few weeks, I will be hosting some Solaris 11 hands on workshops. Some of these will be public events at an Oracle office while others will be private sessions for a specific customer.

The public sessions I'm hosting are

Note: there is also another identical Solaris 11 session hosted by my colleague, Pavel Anni, in Broomfield, Colorado on August 23.

If you are planning on attending any of these sessions (including Pavel's), there are several things you can do in advance that will help not only you, but your fellow attendees.

Enable VT-x or AMD-V on your Laptop

If you will be using VirtualBox to host your workshop guest image, you need to enable the hardware virtualization feature. This is typically found in your BIOS and where you find the setting varies by laptop manufacturer. If you do not find it in the system or CPU settings, try looking in security. If you are given the choice of VT-x and VT-d, you only need to enable VT-x.

If you have a company laptop that does not allow you to change the BIOS settings, you might ask your employer if they can provide you one for the day that is not locked down.

Note: Enabling hardware virtualization is a requirement to complete the workshop.

Download and Install VirtualBox

Since this will be primarily a hands on lab, you are encouraged to bring a laptop. The labs will all be run in a Solaris guest machine, so your laptop will also need a virtualization application, such as VMware or VirtualBox. We recommend VirtualBox. You can download a free copy at VirtualBox.org. Binaries are available for Windows, MacOS, Solaris and most Linux distributions.

After installing VirtualBox, you should also install the VirtualBox Extensions Pack. These are not required for the lab, but should you continue to use the guest machine after the workshop, you might find some of the features very useful.

Download a Solaris 11 VM Appliance from the Oracle Technology Network (OTN)

You can download a pre-built Solaris 11 guest image directly from the Oracle Technology Network. Here is a link to the VM download page. Accept the license and download the latest Solaris 11 VirtualBox guest image.

Once downloaded, you can use the VirtualBox VM import function to create a usable guest. Clicking File -> Import Appliance on the VirtualBox main window will launch the import wizard. Select the file you just downloaded and in a few minutes you will have a bootable Solaris 11 guest. The import process should look something like this.


Click image to enlarge

Configure the Solaris Guest

The first time you boot the Solaris 11 guest, you will be required to complete a short configuration dialog. Once you have specified all of the items on the page, press F2 to advance to the next screen.

The introduction screen looks like this.



Click image to enlarge

On the second page, specify the host name and default network setup. The default name of solaris is used throughout the lab. For the network setup, select Automatic.



Click image to enlarge

The next item in the initial system configuration is the timezone. That does not matter for the hands on labs. If you are experiencing poor weather, I have found that setting the system to Aruba time can be helpful.

The final step is to set the root password and set up the initial user. To stay consistent with the lab handouts, set the root password to oracle2011. The initial user should be specified as lab and its password should be oracle1.



Click image to enlarge

Finally, you will be presented a summary screen, which should look something like this. When satisfied, press F2 to complete.



Click image to enlarge

The Solaris 11 VM image from the Oracle Technology Network has the VirtualBox Guest Additions already installed. This enables keyboard and mouse integration as well resize/seamless windows.

Set up a Local Repository

To complete the zone installation labs in the workshop, you will need to access the Oracle public Solaris 11 repository, which means you also must have wireless network access. This does not always work well in a workshop with 30 or 40 users stressing out the local wireless access point. To make this easier, you can create your own customized package repository in your newly imported Solaris 11 guest. My colleague, Pavel Anni, has supplied this excellent set of instructions on how to do that..

1. Create a directory or a ZFS file system to hold your local repository.

# mkdir /repo
or 
# zfs create -o mountpoint=/repo -o compress=gzip rpool/repo
2. Create an empty repository in it
# pkgrepo create /repo
3. Create a text file 'zone-pkgs.txt' with the list of necessary packages. That list should look like this (cut and paste is your best friend).

pkg://solaris/compress/bzip2
pkg://solaris/compress/gzip
pkg://solaris/compress/p7zip
pkg://solaris/compress/unzip
pkg://solaris/compress/zip
pkg://solaris/consolidation/SunVTS/SunVTS-incorporation
pkg://solaris/consolidation/X/X-incorporation
pkg://solaris/consolidation/admin/admin-incorporation
pkg://solaris/consolidation/cacao/cacao-incorporation
pkg://solaris/consolidation/cde/cde-incorporation
pkg://solaris/consolidation/cns/cns-incorporation
pkg://solaris/consolidation/dbtg/dbtg-incorporation
pkg://solaris/consolidation/desktop/desktop-incorporation
pkg://solaris/consolidation/desktop/gnome-incorporation
pkg://solaris/consolidation/gfx/gfx-incorporation
pkg://solaris/consolidation/install/install-incorporation
pkg://solaris/consolidation/ips/ips-incorporation
pkg://solaris/consolidation/java/java-incorporation
pkg://solaris/consolidation/jdmk/jdmk-incorporation
pkg://solaris/consolidation/l10n/l10n-incorporation
pkg://solaris/consolidation/ldoms/ldoms-incorporation
pkg://solaris/consolidation/man/man-incorporation
pkg://solaris/consolidation/nspg/nspg-incorporation
pkg://solaris/consolidation/nvidia/nvidia-incorporation
pkg://solaris/consolidation/osnet/osnet-incorporation
pkg://solaris/consolidation/sfw/sfw-incorporation
pkg://solaris/consolidation/sic_team/sic_team-incorporation
pkg://solaris/consolidation/solaris_re/solaris_re-incorporation
pkg://solaris/consolidation/sunpro/sunpro-incorporation
pkg://solaris/consolidation/ub_javavm/ub_javavm-incorporation
pkg://solaris/consolidation/userland/userland-incorporation
pkg://solaris/consolidation/vpanels/vpanels-incorporation
pkg://solaris/consolidation/xvm/xvm-incorporation
pkg://solaris/crypto/ca-certificates
pkg://solaris/database/sqlite-3
pkg://solaris/developer/base-developer-utilities
pkg://solaris/developer/debug/mdb
pkg://solaris/developer/macro/cpp
pkg://solaris/diagnostic/cpu-counters
pkg://solaris/diagnostic/snoop
pkg://solaris/diagnostic/tcpdump
pkg://solaris/driver/serial/asy
pkg://solaris/driver/storage/cmdk
pkg://solaris/driver/storage/mpt
pkg://solaris/driver/x11/xsvc
pkg://solaris/editor/vim/vim-core
pkg://solaris/entire
pkg://solaris/group/system/solaris-small-server
pkg://solaris/library/database/gdbm
pkg://solaris/library/expat
pkg://solaris/library/libffi
pkg://solaris/library/libidn
pkg://solaris/library/libmilter
pkg://solaris/library/libtecla
pkg://solaris/library/libxml2
pkg://solaris/library/libxslt
pkg://solaris/library/ncurses
pkg://solaris/library/nspr
pkg://solaris/library/perl-5/sun-solaris-512
pkg://solaris/library/python-2/cherrypy-26
pkg://solaris/library/python-2/lxml-26
pkg://solaris/library/python-2/m2crypto-26
pkg://solaris/library/python-2/mako-26
pkg://solaris/library/python-2/ply-26
pkg://solaris/library/python-2/pybonjour-26
pkg://solaris/library/python-2/pycurl-26
pkg://solaris/library/python-2/pyopenssl-26
pkg://solaris/library/python-2/python-extra-26
pkg://solaris/library/python-2/simplejson-26
pkg://solaris/library/readline
pkg://solaris/library/security/nss
pkg://solaris/library/security/openssl
pkg://solaris/library/security/trousers
pkg://solaris/library/zlib
pkg://solaris/media/cdrtools
pkg://solaris/media/xorriso
pkg://solaris/naming/ldap
pkg://solaris/network/bridging
pkg://solaris/network/dns/bind
pkg://solaris/network/ipfilter
pkg://solaris/network/open-fabrics
pkg://solaris/network/ping
pkg://solaris/network/rsync
pkg://solaris/network/ssh
pkg://solaris/network/ssh/ssh-key
pkg://solaris/package/pkg
pkg://solaris/package/pkg/zones-proxy
pkg://solaris/package/svr4
pkg://solaris/release/name
pkg://solaris/release/notices
pkg://solaris/runtime/perl-512
pkg://solaris/runtime/python-26
pkg://solaris/security/nss-utilities
pkg://solaris/security/sudo
pkg://solaris/security/tcp-wrapper
pkg://solaris/service/file-system/nfs
pkg://solaris/service/network/dns/mdns
pkg://solaris/service/network/smtp/sendmail
pkg://solaris/service/network/ssh
pkg://solaris/service/security/gss
pkg://solaris/service/security/kerberos-5
pkg://solaris/shell/bash
pkg://solaris/shell/ksh
pkg://solaris/system/boot-environment-utilities
pkg://solaris/system/boot/wanboot
pkg://solaris/system/core-os
pkg://solaris/system/data/terminfo/terminfo-core
pkg://solaris/system/data/timezone
pkg://solaris/system/device-administration
pkg://solaris/system/dtrace
pkg://solaris/system/dtrace/dtrace-toolkit
pkg://solaris/system/fault-management
pkg://solaris/system/fault-management/smtp-notify
pkg://solaris/system/file-system/autofs
pkg://solaris/system/file-system/hsfs
pkg://solaris/system/file-system/nfs
pkg://solaris/system/file-system/pcfs
pkg://solaris/system/file-system/udfs
pkg://solaris/system/file-system/ufs
pkg://solaris/system/file-system/zfs
pkg://solaris/system/install
pkg://solaris/system/install/configuration
pkg://solaris/system/install/locale
pkg://solaris/system/kernel
pkg://solaris/system/kernel/platform
pkg://solaris/system/kernel/secure-rpc
pkg://solaris/system/kernel/security/gss
pkg://solaris/system/library
pkg://solaris/system/library/boot-management
pkg://solaris/system/library/c++-runtime
pkg://solaris/system/library/gcc-3-runtime
pkg://solaris/system/library/iconv/utf-8
pkg://solaris/system/library/install
pkg://solaris/system/library/libpcap
pkg://solaris/system/library/math
pkg://solaris/system/library/openmp
pkg://solaris/system/library/security/gss
pkg://solaris/system/library/security/gss/diffie-hellman
pkg://solaris/system/library/security/gss/spnego
pkg://solaris/system/library/security/libsasl
pkg://solaris/system/library/security/rpcsec
pkg://solaris/system/library/storage/libdiskmgt
pkg://solaris/system/library/storage/scsi-plugins
pkg://solaris/system/linker
pkg://solaris/system/locale
pkg://solaris/system/manual
pkg://solaris/system/manual/locale
pkg://solaris/system/network
pkg://solaris/system/network/nis
pkg://solaris/system/network/routing
pkg://solaris/system/prerequisite/gnu
pkg://solaris/system/resource-mgmt/resource-caps
pkg://solaris/system/resource-mgmt/resource-pools
pkg://solaris/system/system-events
pkg://solaris/system/zones
pkg://solaris/system/zones/brand/brand-solaris
pkg://solaris/terminal/luit
pkg://solaris/terminal/resize
pkg://solaris/text/doctools
pkg://solaris/text/doctools/ja
pkg://solaris/text/groff/groff-core
pkg://solaris/text/less
pkg://solaris/text/spelling-utilities
pkg://solaris/web/curl
pkg://solaris/web/wget
pkg://solaris/x11/header/x11-protocols
pkg://solaris/x11/library/libfontenc
pkg://solaris/benchmark/iperf
4. Populate your local repository with the required packages. At present, it is not possible to do this in parallel, so the packages much be received on at a time. Depending on your network speed, this step could take 2 to 3 hours.
# for f in `cat zone-pkgs.txt` ; \ 
do pkgrecv -s http://pkg.oracle.com/solaris/release -d /repo $f ; \ 
echo $f ; \ 
done
pkgrepo rebuild -s /repo
5. Check if you really have 167 packages (if you have downloaded and installed the archive, it might be more, we have added apache and iperf packages for our demo purposes)
# pkgrepo info -s file:///repo
6. Set up and enable package repository service in the global zone:
# svccfg -s application/pkg/server setprop pkg/inst_root=/repo   
# svcprop -p pkg/inst_root application/pkg/server   (Just checking...)
# svcadm refresh application/pkg/server 
# svcadm enable application/pkg/server 
7. Switch repositories (disable the all existing ones and mirrors and enable the local one):
# pkg set-publisher -G '*' -M '*' -g http://10.0.2.15/ solaris
Note that it should use your global zone's IP address (in this case, provided automatically by VirtualBox). Then all the zones you create will keep this address and be able to install packages from the global zone. It won't work if you set your repository's HTTP address just to http://localhost.

Download zoneplot

The zones portion of the hands on lab will make use of two utilities that are not in Solaris. You will need to download both Pavel Anni's zoneplot and Andreas Bernauer's Gnuplot driver utility

Optional: Return your Solaris publisher to the Oracle default repository

When you have completed all of the labs, you can restore the original Oracle default repository.
# pkg set-publisher -G '*' -g http://pkg.oracle.com/solaris/release -P solaris
That should be about it. Please leave a comment if you have any questions. I am looking forward to seeing you at one of these, or a future Solaris event.

Technocrati Tags:

Sunday Nov 22, 2009

Taking ZFS deduplication for a test drive

Now that I have a working OpenSolaris build 128 system, I just had to take ZFS deduplication for a spin, to see if it was worth all of the hype.

Here is my test case: I have 2 directories of photos, totaling about 90MB each. And here's the trick - they are almost complete duplicates of each other. I downloaded all of the photos from the same camera on 2 different days. How many of you do that ? Yeah, me too.

Let's see what ZFS can figure out about all of this. If it is super smart we should end up with a total of 90MB of used space. That's what I'm hoping for.

The first step is to create the pool and turn on deduplication from the beginning.
# zpool create -f scooby -O dedup=on c2t2d0s2
This will use sha256 for determining if 2 blocks are the same. Since sha256 has such a low collision probability (something like 1x10\^-77), we will not turn on automatic verification. If we were using an algorithm like fletcher4 which has a higher collision rate we should also perform a complete block compare before allowing the block removal (dedup=fletcher4,verify)

Now copy the first 180MB (remember, this is 2 sets of 90MB which are nearly identical sets of photos).
# zfs create scooby/doo
# cp -r /pix/Alaska\* /scooby/doo
And the second set.
# zfs create scooby/snack
# cp -r /pix/Alaska\* /scooby/snack
And finally the third set.
# zfs create scooby/dooby
# cp -r /pix/Alaska\* /scooby/dooby
Let's make sure there are in fact three copies of the photos.
# df -k | grep scooby
scooby               74230572      25 73706399     1%    /scooby
scooby/doo           74230572  174626 73706399     1%    /scooby/doo
scooby/snack         74230572  174626 73706399     1%    /scooby/snack
scooby/dooby         74230572  174625 73706399     1%    /scooby/dooby


OK, so far so good. But I can't quite tell if the deduplication is actually doing anything. With all that free space, it's sort of hard to see. Let's look at the pool properties.
# zpool get all scooby
NAME    PROPERTY       VALUE       SOURCE
scooby  size           71.5G       -
scooby  capacity       0%          -
scooby  altroot        -           default
scooby  health         ONLINE      -
scooby  guid           5341682982744598523  default
scooby  version        22          default
scooby  bootfs         -           default
scooby  delegation     on          default
scooby  autoreplace    off         default
scooby  cachefile      -           default
scooby  failmode       wait        default
scooby  listsnapshots  off         default
scooby  autoexpand     off         default
scooby  dedupratio     5.98x       -
scooby  free           71.4G       -
scooby  allocated      86.8M       -
Now this is telling us something.

First notice the allocated space. Just shy of 90MB. But there's 522MB of data (174MB x 3). But only 87MB used out of the pool. That's a good start.

Now take a look at the dedupratio. Almost 6. And that's exactly what we would expect, if ZFS is as good as we are lead to believe. 3 sets of 2 duplicate directories is 6 total copies of the same set of photos. And ZFS caught every one of them.

So if you want to do this yourself, point your OpenSolaris package manager at the dev repository and wait for build 128 packages to show up. If you need instructions on using the OpenSolaris dev repository, point the browser of your choice at http://pkg.opensolaris.org/dev/en/index.shtml. And if you can't wait for the packages to show up, you can always .

Technocrati Tags:
<script type="text/javascript"> var sc_project=1193495; var sc_invisible=1; var sc_security="a46f6831"; </script> <script type="text/javascript" src="http://www.statcounter.com/counter/counter.js"></script>

Tuesday Mar 17, 2009

Time-slider saves the day (or at least a lot of frustration)

As I was tidying up my Live Upgrade boot environments yesterday, I did something that I thought was terribly clever but had some pretty wicked side effects. While linking up all of my application configuration directories (firefox, mozilla, thunderbird, [g]xine, staroffice) I got blindsided by the GNOME message client: pidgin, or more specifically one of our migration assistants from GAIM to pidgin.

As a quick background, Solaris, Solaris Express Community Edition (SXCE), and OpenSolaris all have different versions of the GNOME desktop. Since some of the configuration settings are incompatible across releases the easy solution is to keep separate home directories for each version of GNOME you might use. Which is fine until you grow weary of setting your message filters for Thunderbird again or forget which Firefox has that cached password for the local recreation center that you only use once a year. Pretty quickly you come up with the idea of a common directory for all shared configuration files (dot directories, collections of pictures, video, audio, presentations, scripts).

For one boot environment you do something like
$ mkdir /export/home/me
$ for dotdir in .thunderbird .purple .mozilla .firefox .gxine .xine .staroffice .wine .staroffice\* .openoffice\* .VirtualBox .evolution bin lib misc presentations 
> do
> mv $dotdir /export/home/me
> ln -s /export/home/me/$dotdir   $dotdir
> done
And for the other GNOME home directories you do something like
$ for dotdir in .thunderbird .purple .mozilla .firefox .gxine .xine .staroffice .wine .staroffice\* .openoffice\* .VirtualBox .evolution bin lib misc presentations 
> do
> mv $dotdir ${dotdir}.old
> ln -s /export/home/me/$dotdir   $dotdir
> done
And all is well. Until......

Booted into Solaris 10 and fired up pidgin thinking I would get all of my accounts activated and the default chatrooms started. Instead I was met by this rather nasty note that I had incompatible GAIM entries and it would try to convert them for me. What it did was wipe out all of my pidgin settings. And sure enough when I look into the shared directory, .purple contained all new and quite empty configuration settings.

This is where I am hoping to get some sympathy, since we have all done things like this. But then I remembered I had started time-slider earlier in the day (from the OpenSolaris side of things).
$ time-slider-setup
And there were my .purple files from 15 minutes ago, right before the GAIM conversion tools made a mess of them.
$ cd /export/home/.zfs/snapshot
$ ls
zfs-auto-snap:daily-2009-03-16-22:47
zfs-auto-snap:daily-2009-03-17-00:00
zfs-auto-snap:frequent-2009-03-17-11:45
zfs-auto-snap:frequent-2009-03-17-12:00
zfs-auto-snap:frequent-2009-03-17-12:15
zfs-auto-snap:frequent-2009-03-17-12:30
zfs-auto-snap:hourly-2009-03-16-22:47
zfs-auto-snap:hourly-2009-03-16-23:00
zfs-auto-snap:hourly-2009-03-17-00:00
zfs-auto-snap:hourly-2009-03-17-01:00
zfs-auto-snap:hourly-2009-03-17-02:00
zfs-auto-snap:hourly-2009-03-17-03:00
zfs-auto-snap:hourly-2009-03-17-04:00
zfs-auto-snap:hourly-2009-03-17-05:00
zfs-auto-snap:hourly-2009-03-17-06:00
zfs-auto-snap:hourly-2009-03-17-07:00
zfs-auto-snap:hourly-2009-03-17-08:00
zfs-auto-snap:hourly-2009-03-17-09:00
zfs-auto-snap:hourly-2009-03-17-10:00
zfs-auto-snap:hourly-2009-03-17-11:00
zfs-auto-snap:hourly-2009-03-17-12:00
zfs-auto-snap:monthly-2009-03-16-11:38
zfs-auto-snap:weekly-2009-03-16-22:47

$ cd zfs-auto-snap:frequent-2009-03-17-12:15/me/.purple
$ rm -rf /export/home/me/.purple/\*
$ cp -r \* /export/home/me/.purple

(and this is is really really important)
$ mv $HOME/.gaim $HOME/.gaim-never-to-be-heard-from-again

Log out and back in to refresh the GNOME configuration settings and everything is as it should be. OpenSolaris time-slider is just one more reason that I'm glad that it is my daily driver.

Technocrati Tags:

Monday Mar 02, 2009

Alaska and Oregon Solaris Boot Camps

A big thanks to all who attended the Solaris Boot Camps in Juneau, Fairbanks, Portland and Salem. I hope that you found the information useful. And thanks for all of the good questions and discussion.

Here are the materials that were used during the bootcamp.

Please send me email if you have any questions or want to follow up on any of the discussions.

Thanks again for your attendance and continued support for Solaris.

Technocrati Tags:

Monday Feb 18, 2008

ZFS and FMA - Two great tastes .....

Our good friend Isaac Rozenfeld talks about the Multiplicity of Solaris. When talking about Solaris I will use the phrase "The Vastness of Solaris". If you have attended a Solaris Boot Camp or Tech Day in the last few years you get an idea of what we are talking about - when we go on about Solaris hour after hour after hour.

But the key point in Isaac's multiplicity discussion is how the cornucopia of Solaris features work together to do some pretty spectacular (and competitively differentiating) things. In the past we've looked at combinations such as ZFS and Zones or Service Management, Role Based Access Control (RBAC) and Least Privilege. Based on a conversation last week in St. Louis, let's consider how ZFS and Solaris Fault Management (FMA) play together.

Preparation

Let's begin by creating some fake devices that we can play with. I don't have enough disks on this particular system, but I'm not going to let that slow me down. If you have sufficient real hot swappable disks, feel free to use them instead.
# mkfile 1g /dev/disk1
# mkfile 1g /dev/disk2
# mkfile 512m /dev/disk3
# mkfile 512m /dev/disk4
# mkfile 1g /dev/disk5

Now let's create a couple of zpools using the fake devices. pool1 will be a 1GB mirrored pool using disk1 and disk2. pool2 will be a 512MB mirrored pool using disk3 and disk4. Device spare1 will spare both pools in case of a problem - which we are about to inflict upon the pools.
# zpool create pool1 mirror disk1 disk2 spare spare1
# zpool create pool2 mirror disk3 disk4 spare spare1
# zpool status
  pool: pool1
 state: ONLINE
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        pool1       ONLINE       0     0     0
          mirror    ONLINE       0     0     0
            disk1   ONLINE       0     0     0
            disk2   ONLINE       0     0     0
        spares
          spare1    AVAIL   

errors: No known data errors

  pool: pool2
 state: ONLINE
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        pool2       ONLINE       0     0     0
          mirror    ONLINE       0     0     0
            disk3   ONLINE       0     0     0
            disk4   ONLINE       0     0     0
        spares
          spare1    AVAIL   

errors: No known data errors

So far so good. If we were to run a scrub on either pool, it will complete immediately. Remember that unlike hardware RAID disk replacement, ZFS scrubbing and resilvering only touches blocks that contain actual data. Since there is no data in these pools (yet), there is little for the scrubbing process to do.
# zpool scrub pool1
# zpool scrub pool2
# zpool status
  pool: pool1
 state: ONLINE
 scrub: scrub completed with 0 errors on Mon Feb 18 09:24:16 2008
config:

        NAME        STATE     READ WRITE CKSUM
        pool1       ONLINE       0     0     0
          mirror    ONLINE       0     0     0
            disk1   ONLINE       0     0     0
            disk2   ONLINE       0     0     0
        spares
          spare1    AVAIL   

errors: No known data errors

  pool: pool2
 state: ONLINE
 scrub: scrub completed with 0 errors on Mon Feb 18 09:24:17 2008
config:

        NAME        STATE     READ WRITE CKSUM
        pool2       ONLINE       0     0     0
          mirror    ONLINE       0     0     0
            disk3   ONLINE       0     0     0
            disk4   ONLINE       0     0     0
        spares
          spare1    AVAIL   

errors: No known data errors

Let's populate both pools with some data. I happen to have a directory of scenic images that I use as screen backgrounds - that will work nicely.

# cd /export/pub/pix>
# find scenic -print | cpio -pdum /pool1
# find scenic -print | cpio -pdum /pool2

# df -k | grep pool
pool1                1007616  248925  758539    25%    /pool1
pool2                 483328  248921  234204    52%    /pool2

And yes, cp -r would have been just as good.

Problem 1: Simple data corruption

Time to inflict some harm upon the pool. First, some simple corruption. Writing some zeros over half of the mirror should do quite nicely.
# dd if=/dev/zero of=/dev/dsk/disk1 bs=8192 count=10000 conv=notrunc
10000+0 records in
10000+0 records out 

At this point we are unaware that anything has happened to our data. So let's try accessing some of the data to see if we can observe ZFS self healing in action. If your system has plenty of memory and is relatively idle, accessing the data may not be sufficient. If you still end up with no errors after the cpio, try a zpool scrub - that will catch all errors in the data.
# cd /pool1
# find . -print | cpio -ov > /dev/null
416027 blocks

Let's ask our friend fmstat(1m) if anything is wrong ?
# fmstat
module             ev_recv ev_acpt wait  svc_t  %w  %b  open solve  memsz  bufsz
cpumem-retire            0       0  0.0    0.1   0   0     0     0      0      0
disk-transport           0       0  0.0  366.5   0   0     0     0    32b      0
eft                      0       0  0.0    2.6   0   0     0     0   1.4M      0
fmd-self-diagnosis       1       0  0.0    0.2   0   0     0     0      0      0
io-retire                0       0  0.0    1.1   0   0     0     0      0      0
snmp-trapgen             1       0  0.0   16.0   0   0     0     0    32b      0
sysevent-transport       0       0  0.0  620.3   0   0     0     0      0      0
syslog-msgs              1       0  0.0    9.7   0   0     0     0      0      0
zfs-diagnosis          162     162  0.0    1.5   0   0     1     0   168b   140b
zfs-retire               1       1  0.0  112.3   0   0     0     0      0      0

As the guys in the Guinness commercial say, "Brilliant!" The important thing to note here is that the zfs-diagnosis engine has run several times indicating that there is a problem somewhere in one of my pools. I'm also running this on Nevada so the zfs-retire engine has also run, kicking in a hot spare due to excessive errors.

So which pool is having the problems ? We continue our FMA investigation to find out.
# fmadm faulty
--------------- ------------------------------------  -------------- ---------
TIME            EVENT-ID                              MSG-ID         SEVERITY
--------------- ------------------------------------  -------------- ---------
Feb 18 09:56:24 d82d1716-c920-6243-e899-b7ddd386902e  ZFS-8000-GH    Major    

Fault class : fault.fs.zfs.vdev.checksum

Description : The number of checksum errors associated with a ZFS device
              exceeded acceptable levels.  Refer to
              http://sun.com/msg/ZFS-8000-GH for more information.

Response    : The device has been marked as degraded.  An attempt
              will be made to activate a hot spare if available.

Impact      : Fault tolerance of the pool may be compromised.

Action      : Run 'zpool status -x' and replace the bad device.


# zpool status -x
  pool: pool1
 state: DEGRADED
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://www.sun.com/msg/ZFS-8000-9P
 scrub: resilver in progress, 44.83% done, 0h0m to go
config:

        NAME          STATE     READ WRITE CKSUM
        pool1         DEGRADED     0     0     0
          mirror      DEGRADED     0     0     0
            spare     DEGRADED     0     0     0
              disk1   DEGRADED     0     0   162  too many errors
              spare1  ONLINE       0     0     0
            disk2     ONLINE       0     0     0
        spares
          spare1      INUSE     currently in use

errors: No known data errors

This tells us all that we need to know. The device disk1 was found to have quite a few checksum errors - so many in fact that it was replaced automatically by a hot spare. The spare was resilvering and a full complement of data replicas would be available soon. The entire process was automatic and completely observable.

Since we inflicted harm upon the (fake) disk device ourself, we know that it is in fact quite healthy. So we can restore our pool to its original configuration rather simply - by detaching the spare and clearing the error. We should also clear the FMA counters and repair the ZFS vdev so that we can tell if anything else is misbehaving in either this or another pool.
# zpool detach pool1 spare1
# zpool clear pool
# zpool status pool1
  pool: pool1
 state: ONLINE
 scrub: resilver completed with 0 errors on Mon Feb 18 10:25:26 2008
config:

        NAME        STATE     READ WRITE CKSUM
        pool1       ONLINE       0     0     0
          mirror    ONLINE       0     0     0
            disk1   ONLINE       0     0     0
            disk2   ONLINE       0     0     0
        spares
          spare1    AVAIL   

errors: No known data errors


# fmadm reset zfs-diagnosis
# fmadm reset zfs-retire
# fmstat
module             ev_recv ev_acpt wait  svc_t  %w  %b  open solve  memsz  bufsz
cpumem-retire            0       0  0.0    0.5   0   0     0     0      0      0
disk-transport           0       0  0.0  223.5   0   0     0     0    32b      0
eft                      1       0  0.0    4.6   0   0     0     0   1.4M      0
fmd-self-diagnosis       4       0  0.0    0.6   0   0     0     0      0      0
io-retire                1       0  0.0    1.1   0   0     0     0      0      0
snmp-trapgen             4       0  0.0    8.8   0   0     0     0    32b      0
sysevent-transport       0       0  0.0  372.7   0   0     0     0      0      0
syslog-msgs              4       0  0.0    5.4   0   0     0     0      0      0
zfs-diagnosis            0       0  0.0    1.4   0   0     0     0      0      0
zfs-retire               0       0  0.0    0.0   0   0     0     0      0      0


# fmdump -v -u d82d1716-c920-6243-e899-b7ddd386902e
TIME                 UUID                                 SUNW-MSG-ID
Feb 18 09:51:49.3025 d82d1716-c920-6243-e899-b7ddd386902e ZFS-8000-GH
  100%  fault.fs.zfs.vdev.checksum

        Problem in: 
           Affects: zfs://pool=pool1/vdev=449a3328bc444732
               FRU: -
          Location: -

# fmadm repair zfs://pool=pool1/vdev=449a3328bc444732
fmadm: recorded repair to zfs://pool=pool1/vdev=449a3328bc444732

# fmadm faulty

Problem 2: Device failure

Time to do a little more harm. In this case I will simulate the failure of a device by removing the fake device. Again we will access the pool and then consult fmstat to see what is happening (are you noticing a pattern here????).
# rm -f /dev/dsk/disk2
# cd /pool1
# find . -print | cpio -oc > /dev/null
416027 blocks

# fmstat
module             ev_recv ev_acpt wait  svc_t  %w  %b  open solve  memsz  bufsz
cpumem-retire            0       0  0.0    0.5   0   0     0     0      0      0
disk-transport           0       0  0.0  214.2   0   0     0     0    32b      0
eft                      1       0  0.0    4.6   0   0     0     0   1.4M      0
fmd-self-diagnosis       4       0  0.0    0.6   0   0     0     0      0      0
io-retire                1       0  0.0    1.1   0   0     0     0      0      0
snmp-trapgen             4       0  0.0    8.8   0   0     0     0    32b      0
sysevent-transport       0       0  0.0  372.7   0   0     0     0      0      0
syslog-msgs              4       0  0.0    5.4   0   0     0     0      0      0
zfs-diagnosis            0       0  0.0    1.4   0   0     0     0      0      0
zfs-retire               0       0  0.0    0.0   0   0     0     0      0      0

Rats, the find ran totally out of cache from the last example. As before, should this happen,proceed directly to zpool scrub.
# zpool scrub pool1
# fmstat
module             ev_recv ev_acpt wait  svc_t  %w  %b  open solve  memsz  bufsz
cpumem-retire            0       0  0.0    0.5   0   0     0     0      0      0
disk-transport           0       0  0.0  190.5   0   0     0     0    32b      0
eft                      1       0  0.0    4.1   0   0     0     0   1.4M      0
fmd-self-diagnosis       5       0  0.0    0.5   0   0     0     0      0      0
io-retire                1       0  0.0    1.0   0   0     0     0      0      0
snmp-trapgen             6       0  0.0    7.4   0   0     0     0    32b      0
sysevent-transport       0       0  0.0  329.0   0   0     0     0      0      0
syslog-msgs              6       0  0.0    4.6   0   0     0     0      0      0
zfs-diagnosis           16       1  0.0   70.3   0   0     1     1   168b   140b
zfs-retire               1       0  0.0  509.8   0   0     0     0      0      0

Again, hot sparing has kicked in automatically. The evidence of this is the zfs-retire engine running.
# fmadm faulty
--------------- ------------------------------------  -------------- ---------
TIME            EVENT-ID                              MSG-ID         SEVERITY
--------------- ------------------------------------  -------------- ---------
Feb 18 11:07:29 50ea07a0-2cd9-6bfb-ff9e-e219740052d5  ZFS-8000-D3    Major    
Feb 18 11:16:43 06bfe323-2570-46e8-f1a2-e00d8970ed0d

Fault class : fault.fs.zfs.device

Description : A ZFS device failed.  Refer to http://sun.com/msg/ZFS-8000-D3 for
              more information.

Response    : No automated response will occur.

Impact      : Fault tolerance of the pool may be compromised.

Action      : Run 'zpool status -x' and replace the bad device.

# zpool status -x
  pool: pool1
 state: DEGRADED
status: One or more devices could not be opened.  Sufficient replicas exist for
        the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
   see: http://www.sun.com/msg/ZFS-8000-2Q
 scrub: resilver in progress, 4.94% done, 0h0m to go
config:

        NAME          STATE     READ WRITE CKSUM
        pool1         DEGRADED     0     0     0
          mirror      DEGRADED     0     0     0
            disk1     ONLINE       0     0     0
            spare     DEGRADED     0     0     0
              disk2   UNAVAIL      0     0     0  cannot open
              spare1  ONLINE       0     0     0
        spares
          spare1      INUSE     currently in use

errors: No known data errors

As before, this tells us all that we need to know. A device (disk2) has failed and is no longer in operation. Sufficient spares existed and one was automatically attached to the damaged pool. Resilvering completed successfully and the data is once again fully mirrored.

But here's the magic. Let's repair the device - again simulated with our fake device.
# mkfile 1g /dev/dsk/disk2
# zpool repair pool1 disk2
# zpool status pool1 
  pool: pool1
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
 scrub: resilver in progress, 4.86% done, 0h1m to go
config:

        NAME               STATE     READ WRITE CKSUM
        pool1              DEGRADED     0     0     0
          mirror           DEGRADED     0     0     0
            disk1          ONLINE       0     0     0
            spare          DEGRADED     0     0     0
              replacing    DEGRADED     0     0     0
                disk2/old  UNAVAIL      0     0     0  cannot open
                disk2      ONLINE       0     0     0
              spare1       ONLINE       0     0     0
        spares
          spare1           INUSE     currently in use

errors: No known data errors

Get a cup of coffee while the resilvering process runs.
# zpool status
  pool: pool1
 state: ONLINE
 scrub: resilver completed with 0 errors on Mon Feb 18 11:23:13 2008
config:

        NAME        STATE     READ WRITE CKSUM
        pool1       ONLINE       0     0     0
          mirror    ONLINE       0     0     0
            disk1   ONLINE       0     0     0
            disk2   ONLINE       0     0     0
        spares
          spare1    AVAIL   


# fmadm faulty

Notice the nice integration with FMA. Not only was the new device resilvered, but the hot spare was detached and the FMA fault was cleared. The fmstat counters still show that there was a problem and the fault report still existes in the fault log for later interrogation.
# fmstat
module             ev_recv ev_acpt wait  svc_t  %w  %b  open solve  memsz  bufsz
cpumem-retire            0       0  0.0    0.5   0   0     0     0      0      0
disk-transport           0       0  0.0  171.5   0   0     0     0    32b      0
eft                      1       0  0.0    3.6   0   0     0     0   1.4M      0
fmd-self-diagnosis       6       0  0.0    0.6   0   0     0     0      0      0
io-retire                1       0  0.0    0.9   0   0     0     0      0      0
snmp-trapgen             6       0  0.0    6.8   0   0     0     0    32b      0
sysevent-transport       0       0  0.0  294.3   0   0     0     0      0      0
syslog-msgs              6       0  0.0    4.2   0   0     0     0      0      0
zfs-diagnosis           36       1  0.0   51.6   0   0     0     1      0      0
zfs-retire               1       0  0.0  170.0   0   0     0     0      0      0

# fmdump
TIME                 UUID                                 SUNW-MSG-ID
Feb 16 11:38:16.0976 48935791-ff83-e622-fbe1-d54c20385afc ZFS-8000-GH
Feb 16 11:38:30.8519 9f7f288c-fea8-e5dd-bf23-c0c9c4e07233 ZFS-8000-GH
Feb 18 09:51:49.3025 2ac4568f-4040-cb5d-f3b8-ae3d69e7d713 ZFS-8000-GH
Feb 18 09:56:24.8029 d82d1716-c920-6243-e899-b7ddd386902e ZFS-8000-GH
Feb 18 10:23:07.2228 7c04a6f7-d22a-e467-c44d-80810f27b711 ZFS-8000-GH
Feb 18 10:25:14.6429 faca0639-b82b-c8e8-c8d4-fc085bc03caa ZFS-8000-GH
Feb 18 11:07:29.5195 50ea07a0-2cd9-6bfb-ff9e-e219740052d5 ZFS-8000-D3
Feb 18 11:16:44.2497 06bfe323-2570-46e8-f1a2-e00d8970ed0d ZFS-8000-D3


# fmdump -V -u 50ea07a0-2cd9-6bfb-ff9e-e219740052d5
TIME                 UUID                                 SUNW-MSG-ID
Feb 18 11:07:29.5195 50ea07a0-2cd9-6bfb-ff9e-e219740052d5 ZFS-8000-D3

  TIME                 CLASS                                 ENA
  Feb 18 11:07:27.8476 ereport.fs.zfs.vdev.open_failed       0xb22406c635500401

nvlist version: 0
        version = 0x0
        class = list.suspect
        uuid = 50ea07a0-2cd9-6bfb-ff9e-e219740052d5
        code = ZFS-8000-D3
        diag-time = 1203354449 236999
        de = (embedded nvlist)
        nvlist version: 0
                version = 0x0
                scheme = fmd
                authority = (embedded nvlist)
                nvlist version: 0
                        version = 0x0
                        product-id = Dimension XPS                
                        chassis-id = 7XQPV21
                        server-id = arrakis
                (end authority)

                mod-name = zfs-diagnosis
                mod-version = 1.0
        (end de)

        fault-list-sz = 0x1
        fault-list = (array of embedded nvlists)
        (start fault-list[0])
        nvlist version: 0
                version = 0x0
                class = fault.fs.zfs.device
                certainty = 0x64
                asru = (embedded nvlist)
                nvlist version: 0
                        version = 0x0
                        scheme = zfs
                        pool = 0x3a2ca6bebd96cfe3
                        vdev = 0xedef914b5d9eae8d
                (end asru)

                resource = (embedded nvlist)
                nvlist version: 0
                        version = 0x0
                        scheme = zfs
                        pool = 0x3a2ca6bebd96cfe3
                        vdev = 0xedef914b5d9eae8d
                (end resource)

        (end fault-list[0])

        fault-status = 0x3
        __ttl = 0x1
        __tod = 0x47b9bb51 0x1ef7b430

# fmadm reset zfs-diagnosis
fmadm: zfs-diagnosis module has been reset

# fmadm reset zfs-retire
fmadm: zfs-retire module has been reset

Problem 3: Unrecoverable corruption

For those of you that have attended one of my Boot Camps or Solaris Best Practices training classes know, House is one of my favorite TV shows - the only one that I watch regularly. And this next example would make a perfect episode. Is it likely to happen ? No, but it is so cool when it does :-)

Remember our second pool, pool2. It has the same contents as pool1. Now, let's do the unthinkable - let's corrupt both halves of the mirror. Surely data loss will follow, but the fact that Solaris stays up and running and can report what happened is pretty spectacular. But it gets so much better than that.
# dd if=/dev/zero of=/dev/dsk/disk3 bs=8192 count=10000 conv=notrunc
# dd if=/dev/zero of=/dev/dsk/disk4 bs=8192 count=10000 conv=notrunc
# zpool scrub pool2

# fmstat
module             ev_recv ev_acpt wait  svc_t  %w  %b  open solve  memsz  bufsz
cpumem-retire            0       0  0.0    0.5   0   0     0     0      0      0
disk-transport           0       0  0.0  166.0   0   0     0     0    32b      0
eft                      1       0  0.0    3.6   0   0     0     0   1.4M      0
fmd-self-diagnosis       6       0  0.0    0.6   0   0     0     0      0      0
io-retire                1       0  0.0    0.9   0   0     0     0      0      0
snmp-trapgen             8       0  0.0    6.3   0   0     0     0    32b      0
sysevent-transport       0       0  0.0  294.3   0   0     0     0      0      0
syslog-msgs              8       0  0.0    3.9   0   0     0     0      0      0
zfs-diagnosis         1032    1028  0.6   39.7   0   0    93     2    15K    13K
zfs-retire               2       0  0.0  158.5   0   0     0     0      0      0

As before, lots of zfs-diagnosis activity. And two hits to zfs-retire. But we only have one spare - this should be interesting. Let's see what is happenening.
# fmadm faulty
--------------- ------------------------------------  -------------- ---------
TIME            EVENT-ID                              MSG-ID         SEVERITY
--------------- ------------------------------------  -------------- ---------
Feb 18 09:56:24 d82d1716-c920-6243-e899-b7ddd386902e  ZFS-8000-GH    Major    
Feb 18 13:18:42 c3889bf1-8551-6956-acd4-914474093cd7

Fault class : fault.fs.zfs.vdev.checksum

Description : The number of checksum errors associated with a ZFS device
              exceeded acceptable levels.  Refer to
              http://sun.com/msg/ZFS-8000-GH for more information.

Response    : The device has been marked as degraded.  An attempt
              will be made to activate a hot spare if available.

Impact      : Fault tolerance of the pool may be compromised.

Action      : Run 'zpool status -x' and replace the bad device.

--------------- ------------------------------------  -------------- ---------
TIME            EVENT-ID                              MSG-ID         SEVERITY
--------------- ------------------------------------  -------------- ---------
Feb 16 11:38:30 9f7f288c-fea8-e5dd-bf23-c0c9c4e07233  ZFS-8000-GH    Major    
Feb 18 09:51:49 2ac4568f-4040-cb5d-f3b8-ae3d69e7d713
Feb 18 10:23:07 7c04a6f7-d22a-e467-c44d-80810f27b711
Feb 18 13:18:42 0a1bf156-6968-4956-d015-cc121a866790

Fault class : fault.fs.zfs.vdev.checksum

Description : The number of checksum errors associated with a ZFS device
              exceeded acceptable levels.  Refer to
              http://sun.com/msg/ZFS-8000-GH for more information.

Response    : The device has been marked as degraded.  An attempt
              will be made to activate a hot spare if available.

Impact      : Fault tolerance of the pool may be compromised.

Action      : Run 'zpool status -x' and replace the bad device.

# zpool status -x
  pool: pool2
 state: DEGRADED
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
 scrub: resilver completed with 602 errors on Mon Feb 18 13:20:14 2008
config:

        NAME          STATE     READ WRITE CKSUM
        pool2         DEGRADED     0     0 2.60K
          mirror      DEGRADED     0     0 2.60K
            spare     DEGRADED     0     0 2.43K
              disk3   DEGRADED     0     0 5.19K  too many errors
              spare1  DEGRADED     0     0 2.43K  too many errors
            disk4     DEGRADED     0     0 5.19K  too many errors
        spares
          spare1      INUSE     currently in use

errors: 247 data errors, use '-v' for a list

So ZFS tried to bring in a hot spare, but there were insufficient replicas to be able to reconstruct all of the data. But here is where is gets interesting. Let's see what zpool status -v says about things.
zpool status -v
  pool: pool1
 state: ONLINE
 scrub: resilver completed with 0 errors on Mon Feb 18 11:23:13 2008
config:

        NAME        STATE     READ WRITE CKSUM
        pool1       ONLINE       0     0     0
          mirror    ONLINE       0     0     0
            disk1   ONLINE       0     0     0
            disk2   ONLINE       0     0     0
        spares
          spare1    INUSE     in use by pool 'pool2'

errors: No known data errors

  pool: pool2
 state: DEGRADED
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
 scrub: resilver completed with 602 errors on Mon Feb 18 13:20:14 2008
config:

        NAME          STATE     READ WRITE CKSUM
        pool2         DEGRADED     0     0 2.60K
          mirror      DEGRADED     0     0 2.60K
            spare     DEGRADED     0     0 2.43K
              disk3   DEGRADED     0     0 5.19K  too many errors
              spare1  DEGRADED     0     0 2.43K  too many errors
            disk4     DEGRADED     0     0 5.19K  too many errors
        spares
          spare1      INUSE     currently in use

errors: Permanent errors have been detected in the following files:

        /pool2/scenic/cider mill crowds.jpg
        /pool2/scenic/Cleywindmill.jpg
        /pool2/scenic/csg_Landscapes001_GrandTetonNationalPark,Wyoming.jpg
        /pool2/scenic/csg_Landscapes002_ElowahFalls,Oregon.jpg
        /pool2/scenic/csg_Landscapes003_MonoLake,California.jpg
        /pool2/scenic/csg_Landscapes005_TurretArch,Utah.jpg
        /pool2/scenic/csg_Landscapes004_Wildflowers_MountRainer,Washington.jpg
        /pool2/scenic/csg_Landscapes!idx011.jpg
        /pool2/scenic/csg_Landscapes127_GreatSmokeyMountains-NorthCarolina.jpg
        /pool2/scenic/csg_Landscapes129_AcadiaNationalPark-Maine.jpg
        /pool2/scenic/csg_Landscapes130_GettysburgNationalPark-Pennsylvania.jpg
        /pool2/scenic/csg_Landscapes131_DeadHorseMill,CrystalRiver-Colorado.jpg
        /pool2/scenic/csg_Landscapes132_GladeCreekGristmill,BabcockStatePark-WestVirginia.jpg
        /pool2/scenic/csg_Landscapes133_BlackwaterFallsStatePark-WestVirginia.jpg
        /pool2/scenic/csg_Landscapes134_GrandCanyonNationalPark-Arizona.jpg
        /pool2/scenic/decisions decisions.jpg
        /pool2/scenic/csg_Landscapes135_BigSur-California.jpg
        /pool2/scenic/csg_Landscapes151_WataugaCounty-NorthCarolina.jpg
        /pool2/scenic/csg_Landscapes150_LakeInTheMedicineBowMountains-Wyoming.jpg
        /pool2/scenic/csg_Landscapes152_WinterPassage,PondMountain-Tennessee.jpg
        /pool2/scenic/csg_Landscapes154_StormAftermath,OconeeCounty-Georgia.jpg
        /pool2/scenic/Brig_Of_Dee.gif
        /pool2/scenic/pvnature14.gif
        /pool2/scenic/pvnature22.gif
        /pool2/scenic/pvnature7.gif
        /pool2/scenic/guadalupe.jpg
        /pool2/scenic/ernst-tinaja.jpg
        /pool2/scenic/pipes.gif
        /pool2/scenic/boat.jpg
        /pool2/scenic/pvhawaii.gif
        /pool2/scenic/cribgoch.jpg
        /pool2/scenic/sun1.gif
        /pool2/scenic/sun1.jpg
        /pool2/scenic/sun2.jpg
        /pool2/scenic/andes.jpg
        /pool2/scenic/treesky.gif
        /pool2/scenic/sailboatm.gif
        /pool2/scenic/Arizona1.jpg
        /pool2/scenic/Arizona2.jpg
        /pool2/scenic/Fence.jpg
        /pool2/scenic/Rockwood.jpg
        /pool2/scenic/sawtooth.jpg
        /pool2/scenic/pvaptr04.gif
        /pool2/scenic/pvaptr07.gif
        /pool2/scenic/pvaptr11.gif
        /pool2/scenic/pvntrr01.jpg
        /pool2/scenic/Millport.jpg
        /pool2/scenic/bryce2.jpg
        /pool2/scenic/bryce3.jpg
        /pool2/scenic/monument.jpg
        /pool2/scenic/rainier1.gif
        /pool2/scenic/arch.gif
        /pool2/scenic/pv-anzab.gif
        /pool2/scenic/pvnatr15.gif
        /pool2/scenic/pvocean3.gif
        /pool2/scenic/pvorngwv.gif
        /pool2/scenic/pvrmp001.gif
        /pool2/scenic/pvscen07.gif
        /pool2/scenic/pvsltd04.gif
        /pool2/scenic/banhall28600-04.JPG
        /pool2/scenic/pvwlnd01.gif
        /pool2/scenic/pvnature08.gif
        /pool2/scenic/pvnature13.gif
        /pool2/scenic/nokomis.jpg
        /pool2/scenic/lighthouse1.gif
        /pool2/scenic/lush.gif
        /pool2/scenic/oldmill.gif
        /pool2/scenic/gc1.jpg
        /pool2/scenic/gc2.jpg
        /pool2/scenic/canoe.gif
        /pool2/scenic/Donaldson-River.jpg
        /pool2/scenic/beach.gif
        /pool2/scenic/janloop.jpg
        /pool2/scenic/grobacro.jpg
        /pool2/scenic/fnlgld.jpg
        /pool2/scenic/bells.gif
        /pool2/scenic/Eilean_Donan.gif
        /pool2/scenic/Kilchurn_Castle.gif
        /pool2/scenic/Plockton.gif
        /pool2/scenic/Tantallon_Castle.gif
        /pool2/scenic/SouthStockholm.jpg
        /pool2/scenic/BlackRock_Cottage.jpg
        /pool2/scenic/seward.jpg
        /pool2/scenic/canadian_rockies_csg110_EmeraldBay.jpg
        /pool2/scenic/canadian_rockies_csg111_RedRockCanyon.jpg
        /pool2/scenic/canadian_rockies_csg112_WatertonNationalPark.jpg
        /pool2/scenic/canadian_rockies_csg113_WatertonLakes.jpg
        /pool2/scenic/canadian_rockies_csg114_PrinceOfWalesHotel.jpg
        /pool2/scenic/canadian_rockies_csg116_CameronLake.jpg
        /pool2/scenic/Castilla_Spain.jpg
        /pool2/scenic/Central-Park-Walk.jpg
        /pool2/scenic/CHANNEL.JPG



In my best Hugh Laurie voice trying to sound very Northeastern American, that is so cool! But we're not even done yet. Let's take this list of files and restore them - in this case, from pool1. Operationally this would be from a back up tape or nearline backup cache, but for our purposes, the contents in pool1 will do nicely.

First, let's clear the zpool error counters and return the spare disk. We want to make sure that our restore works as desired. Oh, and clear the FMA stats while we're at it.
# zpool clear
# zpool detach pool2 spare1

# fmadm reset zfs-diagnosis
fmadm: zfs-diagnosis module has been reset

# fmadm reset zfs-retire   
fmadm: zfs-retire module has been reset

Now individually restore the files that have errors in them and check again. You can even export and reimport the pool and you will find a very nice, happy, and thoroughly error free ZFS pool. Some rather unpleasant gnashing of zpool status -v output with awk has been omitted for sanity sake.
# zpool scrub pool2
# zpool status pool2
  pool: pool2
 state: ONLINE
 scrub: scrub completed with 0 errors on Mon Feb 18 14:04:56 2008
config:

        NAME        STATE     READ WRITE CKSUM
        pool2       ONLINE       0     0     0
          mirror    ONLINE       0     0     0
            disk3   ONLINE       0     0     0
            disk4   ONLINE       0     0     0
        spares
          spare1    AVAIL   

errors: No known data errors

# zpool export pool2
# zpool import pool2
# dircmp -s /pool1 /pool2

Conclusions and Review

So what have we learned ? ZFS and FMA are two great tastes that taste great together. No, that's chocolate and peanut butter, but you get this idea. One more great example of Isaac's Multiplicity of Solaris.

That, and I have finally found a good lab exercise for the FMA training materials. Ever since Christine Tran put the FMA workshop together, we have been looking for some good FMA lab exercises. The materials reference a synthetic fault generator that is not available in public (for obvious reasons). I haven't explored the FMA test harness enough to know if there is anything in there that would make a good lab. But this exercise that we have just explored seems to tie a number of key pieces together.

And of course, one more reason why Roxy says, "You should run Solaris."

Technocrati Tags:

Thursday Jun 21, 2007

Updated Solaris Bootcamp Presentations

I've had a great time traveling around the country talking about Solaris. It's not exactly a difficult thing - there's plenty to talk about. Many of you have asked for copies of the latest Solaris update, virtualization overview and ZFS deep dive. Rather than have you dig through a bunch of old blog entries about bootcamps from 2005, here they are for your convenience.



I hope this will save you some digging though http://mediacast.sun.com and tons of old blogs.

In a few weeks I'll post a new "What's New in Solaris" which will have some really cool things. But we'll save that for later.

Technocrati Tags:
About

Bob Netherton is a Principal Sales Consultant for the North American Commercial Hardware group, specializing in Solaris, Virtualization and Engineered Systems. Bob is also a contributing author of Solaris 10 Virtualization Essentials.

This blog will contain information about all three, but primarily focused on topics for Solaris system administrators.

Please follow me on Twitter Facebook or send me email

Search

Archives
« July 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
  
       
Today