ZFS saved my data. Right now.

Kid with a swimming ring.As you know, I have a server at home that I use for storing all my photos, music, backups and more using the Solaris ZFS filesystem. You could say that I store my life on my server.

For storage, I use Western Digital's MyBook Essential Edition USB drives because they are the cheapest ones I could find from a well-known brand. The packaging says "Put your life on it!". How fitting.

Last week, I had a team meeting and a colleague introduced us to some performance tuning techiques. When we started playing with iostat(1M), I logged into my server to do some stress tests. That was when my server said something like this:

constant@condorito:~$ zpool status

(data from other pools omitted)

  pool: santiago
 state: DEGRADED
status: One or more devices has experienced an unrecoverable error.  An
	attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
	using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://www.sun.com/msg/ZFS-8000-9P
 scrub: scrub completed after 16h28m with 0 errors on Fri Aug  8 11:19:37 2008
config:

	NAME         STATE     READ WRITE CKSUM
	santiago     DEGRADED     0     0     0
	  mirror     DEGRADED     0     0     0
	    c10t0d0  DEGRADED     0     0   135  too many errors
	    c9t0d0   DEGRADED     0     0    20  too many errors
	  mirror     ONLINE       0     0     0
	    c8t0d0   ONLINE       0     0     0
	    c7t0d0   ONLINE       0     0     0

errors: No known data errors

This tells us 3 important things:

  • Two of my disks (c10t0d0 and c9t0d0) are happily giving me garbage back instead of my data. Without knowing it.
    Thanks to ZFS' checksumming, we can detect this, even though the drive thinks everything is ok.
    No other storage device, RAID array, NAS or file system I know of can do this. Not even the increasingly hyped (and admittedly cool-looking) Drobo [1].
  • Because both drives are configured as a mirror, bad data from one device can be corrected by reading good data from the other device. This is the "applications are unaffected" and "no known data errors" part.
    Again, it's the checksums that enable ZFS to distinguish good data blocks from bad ones, and therefore enabling self-healing while the system is reading stuff from disk.
    As a result, even though both disks are not functioning properly, my data is still safe, because (luckily, albeit with millions of blocks per disk, statistics is on my side here) the erroneous blocks don't overlap in terms of what pieces of data they store.
    Again, no other storage technology can do this. RAID arrays only kick in when the disk drives as a whole are unacessible or when a drive  diagnoses itself to be broken. They do nothing against silent data corruption, which is what we see here and what all people on this planet that don't use ZFS (yet) can't see (yet). Until it's too late.
  • Data hygiene is a good thing. Do a "zpool scrub <poolname>" once in a while. Use cron(1M) to do this, for example every other week for all pools.

Over the weekend, I ordered myself a new disk (sheesh, they dropped EUR 5 in price already after just 5 days...) and after a "zpool replace santiago c10t0d0 c11t0d0" on monday, my pool started resilvering:

constant@condorito:~$ zpool status

(data from other pools omitted)

  pool: santiago
 state: DEGRADED
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://www.sun.com/msg/ZFS-8000-9P
 scrub: resilver in progress for 1h13m, 6.23% done, 18h23m to go
config:

        NAME           STATE     READ WRITE CKSUM
        santiago       DEGRADED     0     0     0
          mirror       DEGRADED     0     0     0
            replacing  DEGRADED     0     0     0
              c10t0d0  DEGRADED     0     0   135  too many errors
              c11t0d0  ONLINE       0     0     0
            c9t0d0     DEGRADED     0     0    20  too many errors
          mirror       ONLINE       0     0     0
            c8t0d0     ONLINE       0     0     0
            c7t0d0     ONLINE       0     0     0

errors: No known data errors

The next step for me is to send the c10t0d0 drive back and ask for a replacement under warranty (it's only a couple of months old). After receiving c10's replacement, I'll consider sending in c9 for replacement (depending on how the next scrub goes).

Which makes me wonder: How will drive manufacturers react to a new wave of warranty cases based on drive errors that were not easily detectable before?

[1] To the guys at Drobo: Of course you're invited to implement ZFS into the next revision of your products. It's open source. In fact, Drobo and ZFS would make a perfect team!

Comments:

This is Tom from Data Robotics, makers of Drobo. ZFS is definitely an interesting technology and one that we'll explore as it could be complementary to Drobo as you point out. Right now though we haven't seen very wide adoption of ZFS and it is still a little rough around the edges from a user interface point of view (especially relative to Drobo) :)

Also, in the situation you pointed out, your data would have been just fine on a Drobo. Please see the following blog posting to learn more:
http://www.drobospace.com/blog/entry/11007/How-Does-Drobo-Protect-My-Data-/

See the "soft failures" section.

That said, I do understand that ZFS has some unique features--so thanks very much for the feedback. Feel free to email me with more thoughts!

Posted by Tom on August 12, 2008 at 12:42 PM CEST #

Ist doch ganz einfach: wer seine Daten mag, sichert sie auf sinnvollen Dateisystemen.

Und bevor sowas wie ReiserFS kaputt geht, lösche ich meine Daten lieber selbst ;)

Posted by Sebastian on August 12, 2008 at 02:43 PM CEST #

So ... before you return disks and foist the issue onto the disk vendors, what common components in the data path can you identify that may be responsible for the corruption. If you've been running happily for a while and then corruption appears on two distinct devices in a similar timeframe ;-)

Common driver, PCI bus, PCI card, USB controller, etc ... there are a number of documented cases of power supply issues causing corruption which was picked up by ZFS checksumming for instance ...

Just a thought, but a bit of SGR may be called for!

Posted by Craig Morgan on August 12, 2008 at 03:20 PM CEST #

Hi Constantin, sorry to hear of your drive errors. As it's the summer, and you have USB disks enclosed in a small box, could it be that the drives are overheating? What happens if you run this script to check the drive temps?
http://breden.org.uk/2008/05/16/home-fileserver-drive-temps/

I'm running my drives around 40 degrees C and, so far, in 7 months of operation, I have not seen any read, write or checksum errors after scrubbing the pool.

Cheers,
Simon

Posted by Simon Breden on August 12, 2008 at 04:05 PM CEST #

Hi,

first of all: thank you all for your comments. I wasn't expecting that many comments in such a short time!

Tom, thanks for commenting and let me point out that Drobo is truely a brilliant concept. You mention the ability of drives to detect broken blocks upon reading (soft failures). This kind of detection relies on the data block being transported correctly to the drive. As Craig pointed out, the issue might have been the power supply jamming the bus or the USB connection. Then, the drive only uses 8 Bits per blocks for error detection which is not enough to detect multiple bit errors per block. In this particular case, the drive did not report read errors (I checked the logfiles). The checksum errors that ZFS reported were at the ZFS level and not the drive level, so they could detect what the drive couldn't.

In a possible Drobo scenario, the controller inside the drobo box could implement ZFS and provide better data protection. Even better would be a ZFS implementation at the driver level on the host, but that may be difficult to get acceptance for at the user base.

To Craig: Yes, the errors could have been caused by common components such as the motherboard, the USB controller etc. Incidentally, both drives that exhibited errors are 1TB drives, while the other two drives are only 512 GB. Also, one of the failed drives reported read errors (that were re-tryable) a while ago, so in this particular case I'm thinking that 1TB is not (yet?) as reliable as 512GB. Let's see what WD says.

To Simon: Thanks for the pointer. I'll download and install the monitoring scripts and check the temperature. My server is in a basement and the drives are set up with their holes to the top, so I think I'm ok here, but it never hurts to check.

Again, thanks and keep up the good comments!

Cheers,
Constantin

Posted by Constantin Gonzalez on August 13, 2008 at 02:58 AM CEST #

Post a Comment:
Comments are closed for this entry.
About

Tune in and find out useful stuff about Sun Solaris, CPU and System Technology, Web 2.0 - and have a little fun, too!

Search

Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today
Bookmarks
TopEntries
Blogroll
OldTopEntries