ZFS and FMA - Two great tastes .....

Our good friend Isaac Rozenfeld talks about the Multiplicity of Solaris. When talking about Solaris I will use the phrase "The Vastness of Solaris". If you have attended a Solaris Boot Camp or Tech Day in the last few years you get an idea of what we are talking about - when we go on about Solaris hour after hour after hour.

But the key point in Isaac's multiplicity discussion is how the cornucopia of Solaris features work together to do some pretty spectacular (and competitively differentiating) things. In the past we've looked at combinations such as ZFS and Zones or Service Management, Role Based Access Control (RBAC) and Least Privilege. Based on a conversation last week in St. Louis, let's consider how ZFS and Solaris Fault Management (FMA) play together.

Preparation

Let's begin by creating some fake devices that we can play with. I don't have enough disks on this particular system, but I'm not going to let that slow me down. If you have sufficient real hot swappable disks, feel free to use them instead.
# mkfile 1g /dev/disk1
# mkfile 1g /dev/disk2
# mkfile 512m /dev/disk3
# mkfile 512m /dev/disk4
# mkfile 1g /dev/disk5

Now let's create a couple of zpools using the fake devices. pool1 will be a 1GB mirrored pool using disk1 and disk2. pool2 will be a 512MB mirrored pool using disk3 and disk4. Device spare1 will spare both pools in case of a problem - which we are about to inflict upon the pools.
# zpool create pool1 mirror disk1 disk2 spare spare1
# zpool create pool2 mirror disk3 disk4 spare spare1
# zpool status
  pool: pool1
 state: ONLINE
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        pool1       ONLINE       0     0     0
          mirror    ONLINE       0     0     0
            disk1   ONLINE       0     0     0
            disk2   ONLINE       0     0     0
        spares
          spare1    AVAIL   

errors: No known data errors

  pool: pool2
 state: ONLINE
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        pool2       ONLINE       0     0     0
          mirror    ONLINE       0     0     0
            disk3   ONLINE       0     0     0
            disk4   ONLINE       0     0     0
        spares
          spare1    AVAIL   

errors: No known data errors

So far so good. If we were to run a scrub on either pool, it will complete immediately. Remember that unlike hardware RAID disk replacement, ZFS scrubbing and resilvering only touches blocks that contain actual data. Since there is no data in these pools (yet), there is little for the scrubbing process to do.
# zpool scrub pool1
# zpool scrub pool2
# zpool status
  pool: pool1
 state: ONLINE
 scrub: scrub completed with 0 errors on Mon Feb 18 09:24:16 2008
config:

        NAME        STATE     READ WRITE CKSUM
        pool1       ONLINE       0     0     0
          mirror    ONLINE       0     0     0
            disk1   ONLINE       0     0     0
            disk2   ONLINE       0     0     0
        spares
          spare1    AVAIL   

errors: No known data errors

  pool: pool2
 state: ONLINE
 scrub: scrub completed with 0 errors on Mon Feb 18 09:24:17 2008
config:

        NAME        STATE     READ WRITE CKSUM
        pool2       ONLINE       0     0     0
          mirror    ONLINE       0     0     0
            disk3   ONLINE       0     0     0
            disk4   ONLINE       0     0     0
        spares
          spare1    AVAIL   

errors: No known data errors

Let's populate both pools with some data. I happen to have a directory of scenic images that I use as screen backgrounds - that will work nicely.

# cd /export/pub/pix>
# find scenic -print | cpio -pdum /pool1
# find scenic -print | cpio -pdum /pool2

# df -k | grep pool
pool1                1007616  248925  758539    25%    /pool1
pool2                 483328  248921  234204    52%    /pool2

And yes, cp -r would have been just as good.

Problem 1: Simple data corruption

Time to inflict some harm upon the pool. First, some simple corruption. Writing some zeros over half of the mirror should do quite nicely.
# dd if=/dev/zero of=/dev/dsk/disk1 bs=8192 count=10000 conv=notrunc
10000+0 records in
10000+0 records out 

At this point we are unaware that anything has happened to our data. So let's try accessing some of the data to see if we can observe ZFS self healing in action. If your system has plenty of memory and is relatively idle, accessing the data may not be sufficient. If you still end up with no errors after the cpio, try a zpool scrub - that will catch all errors in the data.
# cd /pool1
# find . -print | cpio -ov > /dev/null
416027 blocks

Let's ask our friend fmstat(1m) if anything is wrong ?
# fmstat
module             ev_recv ev_acpt wait  svc_t  %w  %b  open solve  memsz  bufsz
cpumem-retire            0       0  0.0    0.1   0   0     0     0      0      0
disk-transport           0       0  0.0  366.5   0   0     0     0    32b      0
eft                      0       0  0.0    2.6   0   0     0     0   1.4M      0
fmd-self-diagnosis       1       0  0.0    0.2   0   0     0     0      0      0
io-retire                0       0  0.0    1.1   0   0     0     0      0      0
snmp-trapgen             1       0  0.0   16.0   0   0     0     0    32b      0
sysevent-transport       0       0  0.0  620.3   0   0     0     0      0      0
syslog-msgs              1       0  0.0    9.7   0   0     0     0      0      0
zfs-diagnosis          162     162  0.0    1.5   0   0     1     0   168b   140b
zfs-retire               1       1  0.0  112.3   0   0     0     0      0      0

As the guys in the Guinness commercial say, "Brilliant!" The important thing to note here is that the zfs-diagnosis engine has run several times indicating that there is a problem somewhere in one of my pools. I'm also running this on Nevada so the zfs-retire engine has also run, kicking in a hot spare due to excessive errors.

So which pool is having the problems ? We continue our FMA investigation to find out.
# fmadm faulty
--------------- ------------------------------------  -------------- ---------
TIME            EVENT-ID                              MSG-ID         SEVERITY
--------------- ------------------------------------  -------------- ---------
Feb 18 09:56:24 d82d1716-c920-6243-e899-b7ddd386902e  ZFS-8000-GH    Major    

Fault class : fault.fs.zfs.vdev.checksum

Description : The number of checksum errors associated with a ZFS device
              exceeded acceptable levels.  Refer to
              http://sun.com/msg/ZFS-8000-GH for more information.

Response    : The device has been marked as degraded.  An attempt
              will be made to activate a hot spare if available.

Impact      : Fault tolerance of the pool may be compromised.

Action      : Run 'zpool status -x' and replace the bad device.


# zpool status -x
  pool: pool1
 state: DEGRADED
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://www.sun.com/msg/ZFS-8000-9P
 scrub: resilver in progress, 44.83% done, 0h0m to go
config:

        NAME          STATE     READ WRITE CKSUM
        pool1         DEGRADED     0     0     0
          mirror      DEGRADED     0     0     0
            spare     DEGRADED     0     0     0
              disk1   DEGRADED     0     0   162  too many errors
              spare1  ONLINE       0     0     0
            disk2     ONLINE       0     0     0
        spares
          spare1      INUSE     currently in use

errors: No known data errors

This tells us all that we need to know. The device disk1 was found to have quite a few checksum errors - so many in fact that it was replaced automatically by a hot spare. The spare was resilvering and a full complement of data replicas would be available soon. The entire process was automatic and completely observable.

Since we inflicted harm upon the (fake) disk device ourself, we know that it is in fact quite healthy. So we can restore our pool to its original configuration rather simply - by detaching the spare and clearing the error. We should also clear the FMA counters and repair the ZFS vdev so that we can tell if anything else is misbehaving in either this or another pool.
# zpool detach pool1 spare1
# zpool clear pool
# zpool status pool1
  pool: pool1
 state: ONLINE
 scrub: resilver completed with 0 errors on Mon Feb 18 10:25:26 2008
config:

        NAME        STATE     READ WRITE CKSUM
        pool1       ONLINE       0     0     0
          mirror    ONLINE       0     0     0
            disk1   ONLINE       0     0     0
            disk2   ONLINE       0     0     0
        spares
          spare1    AVAIL   

errors: No known data errors


# fmadm reset zfs-diagnosis
# fmadm reset zfs-retire
# fmstat
module             ev_recv ev_acpt wait  svc_t  %w  %b  open solve  memsz  bufsz
cpumem-retire            0       0  0.0    0.5   0   0     0     0      0      0
disk-transport           0       0  0.0  223.5   0   0     0     0    32b      0
eft                      1       0  0.0    4.6   0   0     0     0   1.4M      0
fmd-self-diagnosis       4       0  0.0    0.6   0   0     0     0      0      0
io-retire                1       0  0.0    1.1   0   0     0     0      0      0
snmp-trapgen             4       0  0.0    8.8   0   0     0     0    32b      0
sysevent-transport       0       0  0.0  372.7   0   0     0     0      0      0
syslog-msgs              4       0  0.0    5.4   0   0     0     0      0      0
zfs-diagnosis            0       0  0.0    1.4   0   0     0     0      0      0
zfs-retire               0       0  0.0    0.0   0   0     0     0      0      0


# fmdump -v -u d82d1716-c920-6243-e899-b7ddd386902e
TIME                 UUID                                 SUNW-MSG-ID
Feb 18 09:51:49.3025 d82d1716-c920-6243-e899-b7ddd386902e ZFS-8000-GH
  100%  fault.fs.zfs.vdev.checksum

        Problem in: 
           Affects: zfs://pool=pool1/vdev=449a3328bc444732
               FRU: -
          Location: -

# fmadm repair zfs://pool=pool1/vdev=449a3328bc444732
fmadm: recorded repair to zfs://pool=pool1/vdev=449a3328bc444732

# fmadm faulty

Problem 2: Device failure

Time to do a little more harm. In this case I will simulate the failure of a device by removing the fake device. Again we will access the pool and then consult fmstat to see what is happening (are you noticing a pattern here????).
# rm -f /dev/dsk/disk2
# cd /pool1
# find . -print | cpio -oc > /dev/null
416027 blocks

# fmstat
module             ev_recv ev_acpt wait  svc_t  %w  %b  open solve  memsz  bufsz
cpumem-retire            0       0  0.0    0.5   0   0     0     0      0      0
disk-transport           0       0  0.0  214.2   0   0     0     0    32b      0
eft                      1       0  0.0    4.6   0   0     0     0   1.4M      0
fmd-self-diagnosis       4       0  0.0    0.6   0   0     0     0      0      0
io-retire                1       0  0.0    1.1   0   0     0     0      0      0
snmp-trapgen             4       0  0.0    8.8   0   0     0     0    32b      0
sysevent-transport       0       0  0.0  372.7   0   0     0     0      0      0
syslog-msgs              4       0  0.0    5.4   0   0     0     0      0      0
zfs-diagnosis            0       0  0.0    1.4   0   0     0     0      0      0
zfs-retire               0       0  0.0    0.0   0   0     0     0      0      0

Rats, the find ran totally out of cache from the last example. As before, should this happen,proceed directly to zpool scrub.
# zpool scrub pool1
# fmstat
module             ev_recv ev_acpt wait  svc_t  %w  %b  open solve  memsz  bufsz
cpumem-retire            0       0  0.0    0.5   0   0     0     0      0      0
disk-transport           0       0  0.0  190.5   0   0     0     0    32b      0
eft                      1       0  0.0    4.1   0   0     0     0   1.4M      0
fmd-self-diagnosis       5       0  0.0    0.5   0   0     0     0      0      0
io-retire                1       0  0.0    1.0   0   0     0     0      0      0
snmp-trapgen             6       0  0.0    7.4   0   0     0     0    32b      0
sysevent-transport       0       0  0.0  329.0   0   0     0     0      0      0
syslog-msgs              6       0  0.0    4.6   0   0     0     0      0      0
zfs-diagnosis           16       1  0.0   70.3   0   0     1     1   168b   140b
zfs-retire               1       0  0.0  509.8   0   0     0     0      0      0

Again, hot sparing has kicked in automatically. The evidence of this is the zfs-retire engine running.
# fmadm faulty
--------------- ------------------------------------  -------------- ---------
TIME            EVENT-ID                              MSG-ID         SEVERITY
--------------- ------------------------------------  -------------- ---------
Feb 18 11:07:29 50ea07a0-2cd9-6bfb-ff9e-e219740052d5  ZFS-8000-D3    Major    
Feb 18 11:16:43 06bfe323-2570-46e8-f1a2-e00d8970ed0d

Fault class : fault.fs.zfs.device

Description : A ZFS device failed.  Refer to http://sun.com/msg/ZFS-8000-D3 for
              more information.

Response    : No automated response will occur.

Impact      : Fault tolerance of the pool may be compromised.

Action      : Run 'zpool status -x' and replace the bad device.

# zpool status -x
  pool: pool1
 state: DEGRADED
status: One or more devices could not be opened.  Sufficient replicas exist for
        the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
   see: http://www.sun.com/msg/ZFS-8000-2Q
 scrub: resilver in progress, 4.94% done, 0h0m to go
config:

        NAME          STATE     READ WRITE CKSUM
        pool1         DEGRADED     0     0     0
          mirror      DEGRADED     0     0     0
            disk1     ONLINE       0     0     0
            spare     DEGRADED     0     0     0
              disk2   UNAVAIL      0     0     0  cannot open
              spare1  ONLINE       0     0     0
        spares
          spare1      INUSE     currently in use

errors: No known data errors

As before, this tells us all that we need to know. A device (disk2) has failed and is no longer in operation. Sufficient spares existed and one was automatically attached to the damaged pool. Resilvering completed successfully and the data is once again fully mirrored.

But here's the magic. Let's repair the device - again simulated with our fake device.
# mkfile 1g /dev/dsk/disk2
# zpool repair pool1 disk2
# zpool status pool1 
  pool: pool1
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
 scrub: resilver in progress, 4.86% done, 0h1m to go
config:

        NAME               STATE     READ WRITE CKSUM
        pool1              DEGRADED     0     0     0
          mirror           DEGRADED     0     0     0
            disk1          ONLINE       0     0     0
            spare          DEGRADED     0     0     0
              replacing    DEGRADED     0     0     0
                disk2/old  UNAVAIL      0     0     0  cannot open
                disk2      ONLINE       0     0     0
              spare1       ONLINE       0     0     0
        spares
          spare1           INUSE     currently in use

errors: No known data errors

Get a cup of coffee while the resilvering process runs.
# zpool status
  pool: pool1
 state: ONLINE
 scrub: resilver completed with 0 errors on Mon Feb 18 11:23:13 2008
config:

        NAME        STATE     READ WRITE CKSUM
        pool1       ONLINE       0     0     0
          mirror    ONLINE       0     0     0
            disk1   ONLINE       0     0     0
            disk2   ONLINE       0     0     0
        spares
          spare1    AVAIL   


# fmadm faulty

Notice the nice integration with FMA. Not only was the new device resilvered, but the hot spare was detached and the FMA fault was cleared. The fmstat counters still show that there was a problem and the fault report still existes in the fault log for later interrogation.
# fmstat
module             ev_recv ev_acpt wait  svc_t  %w  %b  open solve  memsz  bufsz
cpumem-retire            0       0  0.0    0.5   0   0     0     0      0      0
disk-transport           0       0  0.0  171.5   0   0     0     0    32b      0
eft                      1       0  0.0    3.6   0   0     0     0   1.4M      0
fmd-self-diagnosis       6       0  0.0    0.6   0   0     0     0      0      0
io-retire                1       0  0.0    0.9   0   0     0     0      0      0
snmp-trapgen             6       0  0.0    6.8   0   0     0     0    32b      0
sysevent-transport       0       0  0.0  294.3   0   0     0     0      0      0
syslog-msgs              6       0  0.0    4.2   0   0     0     0      0      0
zfs-diagnosis           36       1  0.0   51.6   0   0     0     1      0      0
zfs-retire               1       0  0.0  170.0   0   0     0     0      0      0

# fmdump
TIME                 UUID                                 SUNW-MSG-ID
Feb 16 11:38:16.0976 48935791-ff83-e622-fbe1-d54c20385afc ZFS-8000-GH
Feb 16 11:38:30.8519 9f7f288c-fea8-e5dd-bf23-c0c9c4e07233 ZFS-8000-GH
Feb 18 09:51:49.3025 2ac4568f-4040-cb5d-f3b8-ae3d69e7d713 ZFS-8000-GH
Feb 18 09:56:24.8029 d82d1716-c920-6243-e899-b7ddd386902e ZFS-8000-GH
Feb 18 10:23:07.2228 7c04a6f7-d22a-e467-c44d-80810f27b711 ZFS-8000-GH
Feb 18 10:25:14.6429 faca0639-b82b-c8e8-c8d4-fc085bc03caa ZFS-8000-GH
Feb 18 11:07:29.5195 50ea07a0-2cd9-6bfb-ff9e-e219740052d5 ZFS-8000-D3
Feb 18 11:16:44.2497 06bfe323-2570-46e8-f1a2-e00d8970ed0d ZFS-8000-D3


# fmdump -V -u 50ea07a0-2cd9-6bfb-ff9e-e219740052d5
TIME                 UUID                                 SUNW-MSG-ID
Feb 18 11:07:29.5195 50ea07a0-2cd9-6bfb-ff9e-e219740052d5 ZFS-8000-D3

  TIME                 CLASS                                 ENA
  Feb 18 11:07:27.8476 ereport.fs.zfs.vdev.open_failed       0xb22406c635500401

nvlist version: 0
        version = 0x0
        class = list.suspect
        uuid = 50ea07a0-2cd9-6bfb-ff9e-e219740052d5
        code = ZFS-8000-D3
        diag-time = 1203354449 236999
        de = (embedded nvlist)
        nvlist version: 0
                version = 0x0
                scheme = fmd
                authority = (embedded nvlist)
                nvlist version: 0
                        version = 0x0
                        product-id = Dimension XPS                
                        chassis-id = 7XQPV21
                        server-id = arrakis
                (end authority)

                mod-name = zfs-diagnosis
                mod-version = 1.0
        (end de)

        fault-list-sz = 0x1
        fault-list = (array of embedded nvlists)
        (start fault-list[0])
        nvlist version: 0
                version = 0x0
                class = fault.fs.zfs.device
                certainty = 0x64
                asru = (embedded nvlist)
                nvlist version: 0
                        version = 0x0
                        scheme = zfs
                        pool = 0x3a2ca6bebd96cfe3
                        vdev = 0xedef914b5d9eae8d
                (end asru)

                resource = (embedded nvlist)
                nvlist version: 0
                        version = 0x0
                        scheme = zfs
                        pool = 0x3a2ca6bebd96cfe3
                        vdev = 0xedef914b5d9eae8d
                (end resource)

        (end fault-list[0])

        fault-status = 0x3
        __ttl = 0x1
        __tod = 0x47b9bb51 0x1ef7b430

# fmadm reset zfs-diagnosis
fmadm: zfs-diagnosis module has been reset

# fmadm reset zfs-retire
fmadm: zfs-retire module has been reset

Problem 3: Unrecoverable corruption

For those of you that have attended one of my Boot Camps or Solaris Best Practices training classes know, House is one of my favorite TV shows - the only one that I watch regularly. And this next example would make a perfect episode. Is it likely to happen ? No, but it is so cool when it does :-)

Remember our second pool, pool2. It has the same contents as pool1. Now, let's do the unthinkable - let's corrupt both halves of the mirror. Surely data loss will follow, but the fact that Solaris stays up and running and can report what happened is pretty spectacular. But it gets so much better than that.
# dd if=/dev/zero of=/dev/dsk/disk3 bs=8192 count=10000 conv=notrunc
# dd if=/dev/zero of=/dev/dsk/disk4 bs=8192 count=10000 conv=notrunc
# zpool scrub pool2

# fmstat
module             ev_recv ev_acpt wait  svc_t  %w  %b  open solve  memsz  bufsz
cpumem-retire            0       0  0.0    0.5   0   0     0     0      0      0
disk-transport           0       0  0.0  166.0   0   0     0     0    32b      0
eft                      1       0  0.0    3.6   0   0     0     0   1.4M      0
fmd-self-diagnosis       6       0  0.0    0.6   0   0     0     0      0      0
io-retire                1       0  0.0    0.9   0   0     0     0      0      0
snmp-trapgen             8       0  0.0    6.3   0   0     0     0    32b      0
sysevent-transport       0       0  0.0  294.3   0   0     0     0      0      0
syslog-msgs              8       0  0.0    3.9   0   0     0     0      0      0
zfs-diagnosis         1032    1028  0.6   39.7   0   0    93     2    15K    13K
zfs-retire               2       0  0.0  158.5   0   0     0     0      0      0

As before, lots of zfs-diagnosis activity. And two hits to zfs-retire. But we only have one spare - this should be interesting. Let's see what is happenening.
# fmadm faulty
--------------- ------------------------------------  -------------- ---------
TIME            EVENT-ID                              MSG-ID         SEVERITY
--------------- ------------------------------------  -------------- ---------
Feb 18 09:56:24 d82d1716-c920-6243-e899-b7ddd386902e  ZFS-8000-GH    Major    
Feb 18 13:18:42 c3889bf1-8551-6956-acd4-914474093cd7

Fault class : fault.fs.zfs.vdev.checksum

Description : The number of checksum errors associated with a ZFS device
              exceeded acceptable levels.  Refer to
              http://sun.com/msg/ZFS-8000-GH for more information.

Response    : The device has been marked as degraded.  An attempt
              will be made to activate a hot spare if available.

Impact      : Fault tolerance of the pool may be compromised.

Action      : Run 'zpool status -x' and replace the bad device.

--------------- ------------------------------------  -------------- ---------
TIME            EVENT-ID                              MSG-ID         SEVERITY
--------------- ------------------------------------  -------------- ---------
Feb 16 11:38:30 9f7f288c-fea8-e5dd-bf23-c0c9c4e07233  ZFS-8000-GH    Major    
Feb 18 09:51:49 2ac4568f-4040-cb5d-f3b8-ae3d69e7d713
Feb 18 10:23:07 7c04a6f7-d22a-e467-c44d-80810f27b711
Feb 18 13:18:42 0a1bf156-6968-4956-d015-cc121a866790

Fault class : fault.fs.zfs.vdev.checksum

Description : The number of checksum errors associated with a ZFS device
              exceeded acceptable levels.  Refer to
              http://sun.com/msg/ZFS-8000-GH for more information.

Response    : The device has been marked as degraded.  An attempt
              will be made to activate a hot spare if available.

Impact      : Fault tolerance of the pool may be compromised.

Action      : Run 'zpool status -x' and replace the bad device.

# zpool status -x
  pool: pool2
 state: DEGRADED
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
 scrub: resilver completed with 602 errors on Mon Feb 18 13:20:14 2008
config:

        NAME          STATE     READ WRITE CKSUM
        pool2         DEGRADED     0     0 2.60K
          mirror      DEGRADED     0     0 2.60K
            spare     DEGRADED     0     0 2.43K
              disk3   DEGRADED     0     0 5.19K  too many errors
              spare1  DEGRADED     0     0 2.43K  too many errors
            disk4     DEGRADED     0     0 5.19K  too many errors
        spares
          spare1      INUSE     currently in use

errors: 247 data errors, use '-v' for a list

So ZFS tried to bring in a hot spare, but there were insufficient replicas to be able to reconstruct all of the data. But here is where is gets interesting. Let's see what zpool status -v says about things.
zpool status -v
  pool: pool1
 state: ONLINE
 scrub: resilver completed with 0 errors on Mon Feb 18 11:23:13 2008
config:

        NAME        STATE     READ WRITE CKSUM
        pool1       ONLINE       0     0     0
          mirror    ONLINE       0     0     0
            disk1   ONLINE       0     0     0
            disk2   ONLINE       0     0     0
        spares
          spare1    INUSE     in use by pool 'pool2'

errors: No known data errors

  pool: pool2
 state: DEGRADED
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
 scrub: resilver completed with 602 errors on Mon Feb 18 13:20:14 2008
config:

        NAME          STATE     READ WRITE CKSUM
        pool2         DEGRADED     0     0 2.60K
          mirror      DEGRADED     0     0 2.60K
            spare     DEGRADED     0     0 2.43K
              disk3   DEGRADED     0     0 5.19K  too many errors
              spare1  DEGRADED     0     0 2.43K  too many errors
            disk4     DEGRADED     0     0 5.19K  too many errors
        spares
          spare1      INUSE     currently in use

errors: Permanent errors have been detected in the following files:

        /pool2/scenic/cider mill crowds.jpg
        /pool2/scenic/Cleywindmill.jpg
        /pool2/scenic/csg_Landscapes001_GrandTetonNationalPark,Wyoming.jpg
        /pool2/scenic/csg_Landscapes002_ElowahFalls,Oregon.jpg
        /pool2/scenic/csg_Landscapes003_MonoLake,California.jpg
        /pool2/scenic/csg_Landscapes005_TurretArch,Utah.jpg
        /pool2/scenic/csg_Landscapes004_Wildflowers_MountRainer,Washington.jpg
        /pool2/scenic/csg_Landscapes!idx011.jpg
        /pool2/scenic/csg_Landscapes127_GreatSmokeyMountains-NorthCarolina.jpg
        /pool2/scenic/csg_Landscapes129_AcadiaNationalPark-Maine.jpg
        /pool2/scenic/csg_Landscapes130_GettysburgNationalPark-Pennsylvania.jpg
        /pool2/scenic/csg_Landscapes131_DeadHorseMill,CrystalRiver-Colorado.jpg
        /pool2/scenic/csg_Landscapes132_GladeCreekGristmill,BabcockStatePark-WestVirginia.jpg
        /pool2/scenic/csg_Landscapes133_BlackwaterFallsStatePark-WestVirginia.jpg
        /pool2/scenic/csg_Landscapes134_GrandCanyonNationalPark-Arizona.jpg
        /pool2/scenic/decisions decisions.jpg
        /pool2/scenic/csg_Landscapes135_BigSur-California.jpg
        /pool2/scenic/csg_Landscapes151_WataugaCounty-NorthCarolina.jpg
        /pool2/scenic/csg_Landscapes150_LakeInTheMedicineBowMountains-Wyoming.jpg
        /pool2/scenic/csg_Landscapes152_WinterPassage,PondMountain-Tennessee.jpg
        /pool2/scenic/csg_Landscapes154_StormAftermath,OconeeCounty-Georgia.jpg
        /pool2/scenic/Brig_Of_Dee.gif
        /pool2/scenic/pvnature14.gif
        /pool2/scenic/pvnature22.gif
        /pool2/scenic/pvnature7.gif
        /pool2/scenic/guadalupe.jpg
        /pool2/scenic/ernst-tinaja.jpg
        /pool2/scenic/pipes.gif
        /pool2/scenic/boat.jpg
        /pool2/scenic/pvhawaii.gif
        /pool2/scenic/cribgoch.jpg
        /pool2/scenic/sun1.gif
        /pool2/scenic/sun1.jpg
        /pool2/scenic/sun2.jpg
        /pool2/scenic/andes.jpg
        /pool2/scenic/treesky.gif
        /pool2/scenic/sailboatm.gif
        /pool2/scenic/Arizona1.jpg
        /pool2/scenic/Arizona2.jpg
        /pool2/scenic/Fence.jpg
        /pool2/scenic/Rockwood.jpg
        /pool2/scenic/sawtooth.jpg
        /pool2/scenic/pvaptr04.gif
        /pool2/scenic/pvaptr07.gif
        /pool2/scenic/pvaptr11.gif
        /pool2/scenic/pvntrr01.jpg
        /pool2/scenic/Millport.jpg
        /pool2/scenic/bryce2.jpg
        /pool2/scenic/bryce3.jpg
        /pool2/scenic/monument.jpg
        /pool2/scenic/rainier1.gif
        /pool2/scenic/arch.gif
        /pool2/scenic/pv-anzab.gif
        /pool2/scenic/pvnatr15.gif
        /pool2/scenic/pvocean3.gif
        /pool2/scenic/pvorngwv.gif
        /pool2/scenic/pvrmp001.gif
        /pool2/scenic/pvscen07.gif
        /pool2/scenic/pvsltd04.gif
        /pool2/scenic/banhall28600-04.JPG
        /pool2/scenic/pvwlnd01.gif
        /pool2/scenic/pvnature08.gif
        /pool2/scenic/pvnature13.gif
        /pool2/scenic/nokomis.jpg
        /pool2/scenic/lighthouse1.gif
        /pool2/scenic/lush.gif
        /pool2/scenic/oldmill.gif
        /pool2/scenic/gc1.jpg
        /pool2/scenic/gc2.jpg
        /pool2/scenic/canoe.gif
        /pool2/scenic/Donaldson-River.jpg
        /pool2/scenic/beach.gif
        /pool2/scenic/janloop.jpg
        /pool2/scenic/grobacro.jpg
        /pool2/scenic/fnlgld.jpg
        /pool2/scenic/bells.gif
        /pool2/scenic/Eilean_Donan.gif
        /pool2/scenic/Kilchurn_Castle.gif
        /pool2/scenic/Plockton.gif
        /pool2/scenic/Tantallon_Castle.gif
        /pool2/scenic/SouthStockholm.jpg
        /pool2/scenic/BlackRock_Cottage.jpg
        /pool2/scenic/seward.jpg
        /pool2/scenic/canadian_rockies_csg110_EmeraldBay.jpg
        /pool2/scenic/canadian_rockies_csg111_RedRockCanyon.jpg
        /pool2/scenic/canadian_rockies_csg112_WatertonNationalPark.jpg
        /pool2/scenic/canadian_rockies_csg113_WatertonLakes.jpg
        /pool2/scenic/canadian_rockies_csg114_PrinceOfWalesHotel.jpg
        /pool2/scenic/canadian_rockies_csg116_CameronLake.jpg
        /pool2/scenic/Castilla_Spain.jpg
        /pool2/scenic/Central-Park-Walk.jpg
        /pool2/scenic/CHANNEL.JPG



In my best Hugh Laurie voice trying to sound very Northeastern American, that is so cool! But we're not even done yet. Let's take this list of files and restore them - in this case, from pool1. Operationally this would be from a back up tape or nearline backup cache, but for our purposes, the contents in pool1 will do nicely.

First, let's clear the zpool error counters and return the spare disk. We want to make sure that our restore works as desired. Oh, and clear the FMA stats while we're at it.
# zpool clear
# zpool detach pool2 spare1

# fmadm reset zfs-diagnosis
fmadm: zfs-diagnosis module has been reset

# fmadm reset zfs-retire   
fmadm: zfs-retire module has been reset

Now individually restore the files that have errors in them and check again. You can even export and reimport the pool and you will find a very nice, happy, and thoroughly error free ZFS pool. Some rather unpleasant gnashing of zpool status -v output with awk has been omitted for sanity sake.
# zpool scrub pool2
# zpool status pool2
  pool: pool2
 state: ONLINE
 scrub: scrub completed with 0 errors on Mon Feb 18 14:04:56 2008
config:

        NAME        STATE     READ WRITE CKSUM
        pool2       ONLINE       0     0     0
          mirror    ONLINE       0     0     0
            disk3   ONLINE       0     0     0
            disk4   ONLINE       0     0     0
        spares
          spare1    AVAIL   

errors: No known data errors

# zpool export pool2
# zpool import pool2
# dircmp -s /pool1 /pool2

Conclusions and Review

So what have we learned ? ZFS and FMA are two great tastes that taste great together. No, that's chocolate and peanut butter, but you get this idea. One more great example of Isaac's Multiplicity of Solaris.

That, and I have finally found a good lab exercise for the FMA training materials. Ever since Christine Tran put the FMA workshop together, we have been looking for some good FMA lab exercises. The materials reference a synthetic fault generator that is not available in public (for obvious reasons). I haven't explored the FMA test harness enough to know if there is anything in there that would make a good lab. But this exercise that we have just explored seems to tie a number of key pieces together.

And of course, one more reason why Roxy says, "You should run Solaris."

Technocrati Tags:
Comments:

Bob,

What great blogs you write! The knowledge I receive every time I read them is unbelievable.

Thanks
Mark

Posted by Mark Huff on February 20, 2008 at 08:47 AM CST #

Great example of taking a real world problem and showing how Solaris behaves. Believe Roxy says spare1=disk5 ? ;)

Posted by Isaac on February 24, 2008 at 04:37 AM CST #

and thx for the Multiplicity plug!

Posted by isaac on February 24, 2008 at 04:40 AM CST #

I guess I could just ask you directly (you're about 5 meters away from me), but I'm seeing 149 events for zfs-diagnosis on a Thumper, but nothing faulted. I assume these are zfs checksum errors, automagically fixed by the zfs gods?

bash-3.00# fmstat
module ev_recv ev_acpt wait svc_t %w %b open solve memsz bufsz
cpumem-retire 0 0 0.0 0.0 0 0 0 0 0 0
disk-monitor 0 0 0.0 0.0 0 0 0 0 54K 0
disk-transport 0 0 0.0 20680.8 0 1 0 0 32b 0
eft 0 0 0.0 3.7 0 0 0 0 1.4M 0
fabric-xlate 0 0 0.0 0.0 0 0 0 0 0 0
fmd-self-diagnosis 0 0 0.0 0.0 0 0 0 0 0 0
io-retire 0 0 0.0 0.0 0 0 0 0 0 0
snmp-trapgen 0 0 0.0 0.0 0 0 0 0 0 0
sp-monitor 0 0 0.0 3.6 0 0 0 0 20b 0
sysevent-transport 0 0 0.0 161.7 0 0 0 0 0 0
syslog-msgs 0 0 0.0 0.0 0 0 0 0 32b 0
zfs-diagnosis 149 0 0.0 0.8 0 0 0 0 0 0
zfs-retire 0 0 0.0 0.0 0 0 0 0 0 0

Posted by Charles Soto on April 20, 2009 at 03:51 AM CDT #

Post a Comment:
  • HTML Syntax: NOT allowed
About

Bob Netherton is a Principal Sales Consultant for the North American Commercial Hardware group, specializing in Solaris, Virtualization and Engineered Systems. Bob is also a contributing author of Solaris 10 Virtualization Essentials.

This blog will contain information about all three, but primarily focused on topics for Solaris system administrators.

Please follow me on Twitter Facebook or send me email

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today