corrupted files and 'zpool status -v'

If ZFS detects either a checksum error or read I/O failure and is not able to correct it (say by successfully reading from the other side of a mirror), then it will store a log of objects that are damaged permanently (perhaps due to silent corruption).

Previously (that is, before snv_57), the output we gave was only somewhat useful:

# zpool status -v
  pool: monkey
 state: ONLINE
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        monkey      ONLINE      26     0     0
          c1t1d0s7  ONLINE      12     0     0
          c1t1d0s6  ONLINE      14     0     0

errors: The following persistent errors have been detected:

          DATASET  OBJECT  RANGE
          0x0      0x13    lvl=0 blkid=0
          0x5      0x4     lvl=0 blkid=0
          0x17     0x4     lvl=0 blkid=0
          0x1d     0x4     lvl=0 blkid=0
          0x24     0x5     lvl=0 blkid=0
          0x2a     0x4     lvl=0 blkid=0
          0x2a     0x6     lvl=0 blkid=0
          0x30     0x4     lvl=0 blkid=0
          0x36     0x0     lvl=0 blkid=2

If you were lucky, the DATASET object number would actually get converted into a dataset name. If it didn't then you would have to use zdb(1M) to figure out what the dataset name/mountpoint was. After that, you would have to use the '-inum' option to find(1) to figure out what the actual file was (see the opensolaris thread on it). While it is really powerful to even have this ability, it would be really nice to have ZFS do all the dirty work for you - we are after all shooting for easy administration.

With the putback of: 6410433 'zpool status -v' would be more useful with filenames, observability has been greatly increased!:

# zpool status -v
  pool: monkey
 state: ONLINE
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        monkey      ONLINE      24     0     0
          c1t1d0s6  ONLINE      10     0     0
          c1t1d0s7  ONLINE      14     0     0

errors: Permanent errors have been detected in the following files:

        /monkey/a.txt
        /monkey/bananas/b.txt
        /eric/c.txt
        /monkey/sub/dir/d.txt
        monkey/ghost:/e.txt
        monkey/ghost:/boo/f.txt
        monkey/dnode:<0x0>
        <metadata>:<0x13>

For the listings above, we attempt to print out the full path to the file. If we successfully find the full path and the dataset is mounted then we print out the full path with a preceding "/" (such as in the "/monkey/a.txt" example above). If we successfully find it, but the dataset is not mounted, then we print out the dataset name (no preceding "/"), followed by the path within the dataset to the file (see the "monkey/ghost:/e.txt" example above).

If we can't successfully translate the object number to a file path (either due to error or the object doesn't have a real file path associated with it as is the case for say a dnode_t), then we print out the dataset name followed by the object's number (as in the "monkey/dnode:<0x0>" case above). If an object in the MOS gets corrupted then we print out the special tag of <metadata>, followed by the object number.

Couple this with background scrubbing and you have very impressive fault management and observability. What other filesystem/storage system can give you this ability?

Note: these changes are in snv_57, will hopefully make s10u4, and perhaps even Leopard :)

If you're stuck on old bits (without the above mentioned changes) and are trying to figure out how to translate object numbers to filenames, then check out this thread

Comments:

Hi Eric, This caused me to ask a few questions. 1) How did this corruption happen? With COW, I thought ZFS was pretty much bullet proof. 2) How often does ZFS break like this? 3) Is it possible to build 2 raidz2 pools and then mirror them in ZFS? Regards, Scott

Posted by Scott Johnson on March 05, 2007 at 09:44 PM PST #

I purposedly caused the corruption by using a special in-house program to do that (zinject).

ZFS is bullet proof in that it uses checksums to provide end-end integrity, which means it won't return corrupted data to the user/app. ZFS (nor any filesystem) can actually stop a device from corrupting data. Imagine the root user dd'ing over your devices. ZFS can't stop that. Or a monster truck trampling your disks, ZFS can't stop that either. What ZFS can do, is to not return bad data.

Posted by eric kustarz on March 06, 2007 at 05:47 AM PST #

"If an object in the MOS gets corrupted then we print out the special tag of <metadata>, followed by the object number." But what is "MOS"?

Posted by UX-admin on March 14, 2007 at 07:30 PM PDT #

The MOS is the Meta-Objset. You have one per pool.

Check out:
http://opensolaris.org/os/community/zfs/docs/ondiskformat0822.pdf

Posted by eric kustarz on March 15, 2007 at 02:09 AM PDT #

Post a Comment:
Comments are closed for this entry.
About

erickustarz

Search

Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today