Holy smokes! A holey file!

I was RASing around with ZFS the other day, and managed to find a file which was corrupted.

# zpool scrub zpl_slim
# zpool status -v zpl_slim
  pool: zpl_slim
 state: DEGRADED
status: One or more devices has experienced an error resulting in data
        corruption. Applications may be affected.

action: Restore the file in question if possible. Otherwise restore the
        entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
 scrub: scrub completed after 0h2m with 1 errors on Tue Mar 11 13:12:42 2008
config:
        NAME        STATE     READ WRITE CKSUM
        zpl_slim    DEGRADED     0     0     9
          c2t0d0s0  DEGRADED     0     0     9

errors: Permanent errors have been detected in the following files:
                /mnt/root/lib/amd64/libc.so.1

# ls -ls /mnt/root/lib/amd64/libc.so.1
4667 -rwxr-xr-x 1 root bin 2984368 Oct 31 18:04 /mnt/root/lib/amd64/libc.so.1

argv! Of course, this particular file is easily extracted from the original media, it does't contain anything unique. For those who might be concerned that it is the C runtime library, and thus very critical to running Solaris, the machine in use is only 32-bit, so the 64-bit (amd64) version of this file is never used. But suppose this were an important file for me and I wanted to recover something from it? This is a more interesting challenge...

First, let's review a little bit about how ZFS works. By default, when ZFS writes anything, it generates a checksum which is recorded someplace else, presumably safe. Actually, the checksum is recorded at least twice, just to be doubly sure it is correct. And that record is also checksummed. Back to the story, the checksum is computed on a block, not for the whole file. This is an important distinction which will come into play later. If we perform a storage pool scrub, ZFS will find the broken file and report it to you (see above), which is a good thing -- much better than simply ignoring it, like many other file systems will do.

OK, so we know that somewhere in the midst of this 2.8 MByte file, we have some corruption. But can we at least recover the bits that aren't corrupted? The answer is yes. But if you try a copy, then it bails with an error.

# cp /mnt/root/lib/amd64/libc.so.1 /tmp
/mnt/root/lib/amd64/libc.so.1: I/O error

Since the copy was not successful, there is no destination file, not even a partial file. It turns out that cp uses mmap(2) to map the input file and copies it to the output file with a big write(2). Since the write doesn't complete correctly, it complains and removes the output file. What we need is something less clever, dd.

# dd if=/mnt/root/lib/amd64/libc.so.1 of=/tmp/whee
read: I/O error
2304+0 records in
2304+0 records out
# ls -ls /tmp/whee
2304 -rw-r--r-- 1 root root 1179648 Mar 12 18:53 /tmp/whee

OK, from this experiment we know that we can get about 1.2 MBytes by directly copying with dd. But this isn't all, or even half of the file. We can get a little more clever than that. To make it simpler, I wrote a little ksh script:

#!/bin/ksh
integer i=0
while ((i < 23))
do
    typeset -RZ2 j=$i
    dd if=$1 of=$2.$j bs=128k iseek=$i count=1
    i=i+1
done

This script will write each of the first 23 128kByte blocks from the first argument (a file) to a unique filename as a number appended to the second argument. dd is really dumb and doesn't offer much error handling which is why I hardwired the count into the script. An enterprising soul with a little bit of C programming skill could do something more complex which handles the more general case. Ok, that was difficult to understand, and I wrote it. To demonstrate, I first appologize for the redundant verbosity:

# ./getaround.ksh libc.so.1 /tmp/zz
1+0 records in
1+0 records out
1+0 records in
1+0 records out
1+0 records in
1+0 records out
1+0 records in
1+0 records out
1+0 records in
1+0 records out
1+0 records in
1+0 records out
1+0 records in
1+0 records out
1+0 records in
1+0 records out
1+0 records in
1+0 records out
read: I/O error
0+0 records in
0+0 records out

1+0 records in
1+0 records out
1+0 records in
1+0 records out
1+0 records in
1+0 records out
1+0 records in
1+0 records out
1+0 records in
1+0 records out
1+0 records in
1+0 records out
1+0 records in
1+0 records out
1+0 records in
1+0 records out
1+0 records in
1+0 records out
1+0 records in
1+0 records out
1+0 records in
1+0 records out
1+0 records in
1+0 records out
0+1 records in
0+1 records out
# ls -ls /tmp/zz.\*
 256 -rw-r--r-- 1 root root 131072 Mar 12 19:00 /tmp/zz.00
 256 -rw-r--r-- 1 root root 131072 Mar 12 19:00 /tmp/zz.01
 256 -rw-r--r-- 1 root root 131072 Mar 12 19:00 /tmp/zz.02
 256 -rw-r--r-- 1 root root 131072 Mar 12 19:00 /tmp/zz.03
 256 -rw-r--r-- 1 root root 131072 Mar 12 19:00 /tmp/zz.04
 256 -rw-r--r-- 1 root root 131072 Mar 12 19:00 /tmp/zz.05
 256 -rw-r--r-- 1 root root 131072 Mar 12 19:00 /tmp/zz.06
 256 -rw-r--r-- 1 root root 131072 Mar 12 19:00 /tmp/zz.07
 256 -rw-r--r-- 1 root root 131072 Mar 12 19:00 /tmp/zz.08
   0 -rw-r--r-- 1 root root      0 Mar 12 19:00 /tmp/zz.09
 256 -rw-r--r-- 1 root root 131072 Mar 12 19:00 /tmp/zz.10
 256 -rw-r--r-- 1 root root 131072 Mar 12 19:00 /tmp/zz.11
 256 -rw-r--r-- 1 root root 131072 Mar 12 19:00 /tmp/zz.12
 256 -rw-r--r-- 1 root root 131072 Mar 12 19:00 /tmp/zz.13
 256 -rw-r--r-- 1 root root 131072 Mar 12 19:00 /tmp/zz.14
 256 -rw-r--r-- 1 root root 131072 Mar 12 19:00 /tmp/zz.15
 256 -rw-r--r-- 1 root root 131072 Mar 12 19:00 /tmp/zz.16
 256 -rw-r--r-- 1 root root 131072 Mar 12 19:00 /tmp/zz.17
 256 -rw-r--r-- 1 root root 131072 Mar 12 19:00 /tmp/zz.18
 256 -rw-r--r-- 1 root root 131072 Mar 12 19:00 /tmp/zz.19
 256 -rw-r--r-- 1 root root 131072 Mar 12 19:00 /tmp/zz.20
 256 -rw-r--r-- 1 root root 131072 Mar 12 19:00 /tmp/zz.21
 200 -rw-r--r-- 1 root root 100784 Mar 12 19:00 /tmp/zz.22

So we can clearly see that the 10th (128kByte) block is corrupted, but the rest of the blocks are ok. We can now reassemble the file with a zero-filled block.

# dd if=/dev/zero of=/tmp/zz.09 bs=128k count=1
1+0 records in
1+0 records out
# cat /tmp/zz.\* > /tmp/zz
# ls -ls /tmp/zz
5832 -rw-r--r-- 1 root root 2984368 Mar 12 19:03 /tmp/zz

Now I have recreated the file with a zero-filled hole where the data corruption was. Just for grins, if you try to compare with the previous file, you should get what you expect.

# cmp libc.so.1 /tmp/zz+
cmp: EOF on libc.so.1

How is this useful?

Personally, I'm not sure this will be very useful for many corruption cases. As a RAS guy, I advocate many verified copies of important data placed on diverse systems and media. But most folks aren't so inclined. Everytime we talk about this on the zfs-discuss alias, somebody will say that they don't care about corruption in the middle of their mp3 files. I'm no audiophile, but I prefer my mp3s to be hole-less. So I did this little exercise to show how you can regain full access to the non-corrupted bits of a corrupted file in a more-or-less easy way. Consider this a proof of concept. There are many possible variations, such as filling with spaces instead of nulls when you are missing parts of a text file -- opportunities abound.

Comments:

Post a Comment:
Comments are closed for this entry.
About

relling

Search

Archives
« March 2015
SunMonTueWedThuFriSat
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
    
       
Today