btrfs scrub - go fix corruptions with mirror copies please!
By Wcoekaer-Oracle on Sep 28, 2011
As many of you know, btrfs supports CRC for data and metadata. I created a simple btrfs filesystem :
# mkfs.btrfs -L btrfstest -d raid1 -m raid1 /dev/sdb /dev/sdcthen created a file on the volume :
# dd if=/dev/urandom of=foo bs=1M count=100 # md5sum /btrfs/foo 76f4c03dc7a3477939467ee230696b70 /btrfs/fooso now lets play the bad guy and write over the disk itself, underneath the filesystem so it has no idea. This could be a shared device with another server that accidentally had data written on it, or a bad userspace program that spews out to the wrong device or even a bug in kernel...
Step 1: find the physical layout of the file :
# filefrag -v /btrfs/foo Filesystem type is: 9123683e File size of /btrfs/foo is 104857600 (25600 blocks, blocksize 4096) ext logical physical expected length flags 0 0 269312 25600 eof /btrfs/foo: 1 extent found # echo $[4096*269312] 1103101952The filesystem is 4k blocksize and we know it's at block 269312. Now we call btrfs-map-logical to find out what the physical offsets are on both the mirrors (/dev/sdb /dev/sdc) so I can happily overwrite it with junk.
# btrfs-map-logical -l 1103101952 -o scratch /dev/sdb mirror 1 logical 1103101952 physical 1083179008 device /dev/sdc mirror 2 logical 1103101952 physical 1103101952 device /dev/sdbthere we go. now. let's scribble :
# dd if=/dev/urandom of=/dev/sdc bs=1 count=50000 seek=1083179008so we just wrote 50k bytes of random stuff to /dev/sdc at the offset of its copy of file foo
accessing the file gives the right md5sum still but now we have this command called scrub that can be run at any time and it will go through the filesystem you specific and check for any nasty errors and recover them. This happens through creating a kernel thread that does this in the background and then you can just use scrub status to see where it's at later.
# btrfs scrub start /btrfs # btrfs scrub status /btrfs scrub status for 15e213ad-4e2a-44f6-85d8-86d13e94099f scrub started at Wed Sep 28 12:36:26 2011 and finished after 2 seconds total bytes scrubbed: 200.48MB with 13 errors error details: csum=13 corrected errors: 13, uncorrectable errors: 0, unverifiedAs you can see above, the scrubber found 13 errors. A quick peek in dmesg shows the following :
btrfs: fixed up at 1103101952 btrfs: fixed up at 1103106048 btrfs: fixed up at 1103110144 btrfs: fixed up at 1103114240 btrfs: fixed up at 1103118336 btrfs: fixed up at 1103122432 btrfs: fixed up at 1103126528 btrfs: fixed up at 1103130624 btrfs: fixed up at 1103134720 btrfs: fixed up at 1103138816 # md5sum /btrfs/foo 76f4c03dc7a3477939467ee230696b70 /btrfs/fooEverything got repaired. This happens on both data and metadata. If there was a true IO error reading from one of the 2 sides we'd have handled that in the filesystem as well. If you don't have mirroring then with CRC it would have told you it was bad data and given you an IO error (instead of reading junk).