I'm the kind of guy who likes to tinker. To see under the bonnet. I
used to have a go at "fixing" TV's by taking the back off and seeing
what could be adjusted (which is kind-of anathema to one of the
philosophies of ZFS).
So, when I have been presenting and demonstrating ZFS to customers,
the thing I really like to show is what ZFS does when I inject "silent
data corruption" into one device in a mirrored storage pool.
This is cool, because ZFS does a couple of things that are not done by
any comparable product:
- It detects the corruption by using checksums on all data
and metadata. - It automatically repairs the damage, using data from the
other mirror, assuming checksum(s) on that mirror are OK.
This all happens before the data is passed off to the process
that asked for it. This is how it looks in slideware:

The key to demonstrating this live is how to inject corruption,
without having to apply a magnet or lightning bolt to my disk. Here is
my version of such a demonstration:
- Create a mirrored storage pool, and filesystem
cleek[bash]# zpool create demo mirror /export/zfs/zd0 /export/zfs/zd1 cleek[bash]# zfs create demo/ccs |
- Load up some data into that filesystem, see how we are doing
cleek[bash]# cp -pr /usr/ccs/bin /demo/ccs cleek[bash]# zfs list NAME USED AVAIL REFER MOUNTPOINT demo 2.57M 231M 9.00K /demo demo/ccs 2.51M 231M 2.51M /demo/ccs |
- Get a personal checksum of all the data in the files - the "find/cat" will output the contents of all files, then I pipe all that data into "cksum"
cleek[bash]# cd /demo/ccs cleek[bash]# find . -type f -exec cat {} + | cksum 1891695928 2416605 |
- Now for the fun part. I will inject some corruption by writing
some zeroes onto the start of one of the mirrors.cleek[bash]# dd bs=1024k count=32 conv=notrunc if=/dev/zero of=/export/zfs/zd0 32+0 records in 32+0 records out |
- Now if I re-read the data now, ZFS will not find any
problems, and I can verify this at any time using "zpool status"cleek[bash]# find . -type f -exec cat {} + | cksum 1891695928 2416605 cleek[bash]# zpool status demo pool: demo state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM demo ONLINE 0 0 0 mirror ONLINE 0 0 0 /export/zfs/zd0 ONLINE 0 0 0 /export/zfs/zd1 ONLINE 0 0 0 |
The reason for this is that ZFS still has all the data for this
filesystem cached, so it does not need to read anything from the
storage pool's devices.
- To force ZFS' cached data to be flushed, I export and re-import
my storage poolcleek[bash]# cd / cleek[bash]# zpool export -f demo cleek[bash]# zpool import -d /export/zfs demo cleek[bash]# cd - /demo/ccs |
- At this point, I should find that ZFS has found some corrupt
metadatacleek[bash]# zpool status demo pool: demo state: ONLINE status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool online' or replace the device with 'zpool replace'. see: http://www.sun.com/msg/ZFS-8000-9P scrub: none requested config: NAME STATE READ WRITE CKSUM demo ONLINE 0 0 0 mirror ONLINE 0 0 0 /export/zfs/zd0 ONLINE 0 0 /export/zfs/zd1 ONLINE 0 0 0 |
- Cool -
Solaris Fault Manager at work. I'll bring that mirror back online, so ZFS
will try using it for what I plan to do next...cleek[bash]# zpool online demo/export/zfs/zd0 Bringing device /export/zfs/zd0 online |
- Now, I can repeat my read of data to generate my checksum, and
check what happenscleek[bash]# find . -type f -exec cat {} + | cksum 1891695928 2416605 note that my checksum is the same cleek[bash]# zpool status [...] NAME STATE READ WRITE CKSUM demo ONLINE 0 0 0 mirror ONLINE 0 0 0 /export/zfs/zd0 ONLINE 0 0 /export/zfs/zd1 ONLINE 0 0 0 |
Of course, if I wanted to know the instant things happened, I could
also use DTrace (in another window):
cleek[bash]# dtrace -n :zfs:zio_checksum_error:entry dtrace: description ':zfs:zio_checksum_error:entry' matched 1 probe CPU ID FUNCTION:NAME 0 40650 zio_checksum_error:entry 0 40650 zio_checksum_error:entry 0 40650 zio_checksum_error:entry 0 40650 zio_checksum_error:entry [...] |
Technorati Tag: ZFS
In step 7, the faulted device still reads as "ONLINE", so it seems somewhat counter-intuitive that it could be re-enabled with "zpool online".
Is that an error in the output of "zpool status"?
ZFS Self-Healing feature sounds really good. But based upon my experience, there is always some catches with automatic recovery, or at sth to be aware of.
Step 6 is pretty critical. Otherwise, ZFS still has the data cashed.