Wednesday Sep 28, 2011

btrfs scrub - go fix corruptions with mirror copies please!

Another day, another btrfs entry. I'm trying to learn all the in's and out's of the filesystem here.

As many of you know, btrfs supports CRC for data and metadata. I created a simple btrfs filesystem :

# mkfs.btrfs -L btrfstest -d raid1 -m raid1 /dev/sdb /dev/sdc
then created a file on the volume :
# dd if=/dev/urandom of=foo bs=1M count=100

# md5sum /btrfs/foo 
76f4c03dc7a3477939467ee230696b70  /btrfs/foo
so now lets play the bad guy and write over the disk itself, underneath the filesystem so it has no idea. This could be a shared device with another server that accidentally had data written on it, or a bad userspace program that spews out to the wrong device or even a bug in kernel...

Step 1: find the physical layout of the file :
# filefrag -v /btrfs/foo
Filesystem type is: 9123683e
File size of /btrfs/foo is 104857600 (25600 blocks, blocksize 4096)
 ext logical physical expected length flags
   0       0   269312           25600 eof
/btrfs/foo: 1 extent found

# echo $[4096*269312]
1103101952
The filesystem is 4k blocksize and we know it's at block 269312. Now we call btrfs-map-logical to find out what the physical offsets are on both the mirrors (/dev/sdb /dev/sdc) so I can happily overwrite it with junk.
# btrfs-map-logical -l 1103101952 -o scratch /dev/sdb
mirror 1 logical 1103101952 physical 1083179008 device /dev/sdc
mirror 2 logical 1103101952 physical 1103101952 device /dev/sdb
there we go. now. let's scribble :
# dd if=/dev/urandom of=/dev/sdc bs=1 count=50000 seek=1083179008
so we just wrote 50k bytes of random stuff to /dev/sdc at the offset of its copy of file foo
accessing the file gives the right md5sum still but now we have this command called scrub that can be run at any time and it will go through the filesystem you specific and check for any nasty errors and recover them. This happens through creating a kernel thread that does this in the background and then you can just use scrub status to see where it's at later.
# btrfs scrub start /btrfs

# btrfs scrub status /btrfs
scrub status for 15e213ad-4e2a-44f6-85d8-86d13e94099f
scrub started at Wed Sep 28 12:36:26 2011 and finished after 2 seconds
     total bytes scrubbed: 200.48MB with 13 errors
     error details: csum=13
     corrected errors: 13, uncorrectable errors: 0, unverified
As you can see above, the scrubber found 13 errors. A quick peek in dmesg shows the following :
btrfs: fixed up at 1103101952
btrfs: fixed up at 1103106048
btrfs: fixed up at 1103110144
btrfs: fixed up at 1103114240
btrfs: fixed up at 1103118336
btrfs: fixed up at 1103122432
btrfs: fixed up at 1103126528
btrfs: fixed up at 1103130624
btrfs: fixed up at 1103134720
btrfs: fixed up at 1103138816

# md5sum /btrfs/foo 
76f4c03dc7a3477939467ee230696b70  /btrfs/foo
Everything got repaired. This happens on both data and metadata. If there was a true IO error reading from one of the 2 sides we'd have handled that in the filesystem as well. If you don't have mirroring then with CRC it would have told you it was bad data and given you an IO error (instead of reading junk).
About

Wim Coekaerts is the Senior Vice President of Linux and Virtualization Engineering for Oracle. He is responsible for Oracle's complete desktop to data center virtualization product line and the Oracle Linux support program.

You can follow him on Twitter at @wimcoekaerts

Search

Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
9
10
11
12
13
14
15
16
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today