X

Oracle Linux, virtualization , Enterprise and Cloud Management Cloud technology musings

  • September 28, 2011

btrfs scrub - go fix corruptions with mirror copies please!

Another day, another btrfs entry. I'm trying to learn all the in's and out's of the filesystem here.



As many of you know, btrfs supports CRC for data and metadata. I created a simple btrfs filesystem :


# mkfs.btrfs -L btrfstest -d raid1 -m raid1 /dev/sdb /dev/sdc

then created a file on the volume :
# dd if=/dev/urandom of=foo bs=1M count=100
# md5sum /btrfs/foo
76f4c03dc7a3477939467ee230696b70 /btrfs/foo

so now lets play the bad guy and write over the disk itself, underneath the filesystem so it has no idea. This could be a shared device with another server that accidentally had data written on it, or a bad userspace program that spews out to the wrong device or even a bug in kernel...



Step 1: find the physical layout of the file :
# filefrag -v /btrfs/foo
Filesystem type is: 9123683e
File size of /btrfs/foo is 104857600 (25600 blocks, blocksize 4096)
ext logical physical expected length flags
0 0 269312 25600 eof
/btrfs/foo: 1 extent found
# echo $[4096*269312]
1103101952

The filesystem is 4k blocksize and we know it's at block 269312. Now we call btrfs-map-logical to find out what the physical offsets are on both the mirrors (/dev/sdb /dev/sdc) so I can happily overwrite it with junk.

# btrfs-map-logical -l 1103101952 -o scratch /dev/sdb
mirror 1 logical 1103101952 physical 1083179008 device /dev/sdc
mirror 2 logical 1103101952 physical 1103101952 device /dev/sdb

there we go. now. let's scribble :
# dd if=/dev/urandom of=/dev/sdc bs=1 count=50000 seek=1083179008

so we just wrote 50k bytes of random stuff to /dev/sdc at the offset of its copy of file foo

accessing the file gives the right md5sum still but now we have this command called scrub that can be run at any time and it will go through the filesystem you specific and check for any nasty errors and recover them. This happens through creating a kernel thread that does this in the background and then you can just use scrub status to see where it's at later.
# btrfs scrub start /btrfs
# btrfs scrub status /btrfs
scrub status for 15e213ad-4e2a-44f6-85d8-86d13e94099f
scrub started at Wed Sep 28 12:36:26 2011 and finished after 2 seconds
total bytes scrubbed: 200.48MB with 13 errors
error details: csum=13
corrected errors: 13, uncorrectable errors: 0, unverified

As you can see above, the scrubber found 13 errors. A quick peek in dmesg shows the following :
btrfs: fixed up at 1103101952
btrfs: fixed up at 1103106048
btrfs: fixed up at 1103110144
btrfs: fixed up at 1103114240
btrfs: fixed up at 1103118336
btrfs: fixed up at 1103122432
btrfs: fixed up at 1103126528
btrfs: fixed up at 1103130624
btrfs: fixed up at 1103134720
btrfs: fixed up at 1103138816
# md5sum /btrfs/foo
76f4c03dc7a3477939467ee230696b70 /btrfs/foo

Everything got repaired. This happens on both data and metadata. If there was a true IO error reading from one of the 2 sides we'd have handled that in the filesystem as well. If you don't have mirroring then with CRC it would have told you it was bad data and given you an IO error (instead of reading junk).

Join the discussion

Comments ( 1 )
  • peterK Friday, January 5, 2018
    Thanks for this nice and detailled description how CRC is working.
    Inspired of your Example I will put a similar sequence in my courseware if you don't object.
    I hope so.
    Peter
    Sorry for my bad english an thanks.
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.Captcha
Oracle

Integrated Cloud Applications & Platform Services