ZFS and the uberblock

Inspired by Constantin's comment on USB sticks wearing out Matthias's blog entry about an eco-friendly home server, I tried to find out more about how and how often the ZFS uberblock is written.

Using DTrace, it's not that difficult:

We start by finding out which DTrace probes exist for the uberblock:

$ dtrace -l | grep -i uberblock
31726        fbt               zfs            vdev_uberblock_compare entry
31727        fbt               zfs            vdev_uberblock_compare return
31728        fbt               zfs          vdev_uberblock_load_done entry
31729        fbt               zfs          vdev_uberblock_load_done return
31730        fbt               zfs          vdev_uberblock_sync_done entry
31731        fbt               zfs          vdev_uberblock_sync_done return
31732        fbt               zfs               vdev_uberblock_sync entry
31733        fbt               zfs               vdev_uberblock_sync return
34304        fbt               zfs          vdev_uberblock_sync_list entry
34305        fbt               zfs          vdev_uberblock_sync_list return
34404        fbt               zfs                  uberblock_update entry
34405        fbt               zfs                  uberblock_update return
34408        fbt               zfs                  uberblock_verify entry
34409        fbt               zfs                  uberblock_verify return
34416        fbt               zfs               vdev_uberblock_load entry
34417        fbt               zfs               vdev_uberblock_load return

So there are two probes on uberblock_update: fbt:zfs:uberblock_update:entry and fbt:zfs:uberblock_update:return!

Now we can find out more about it by searching the OpenSolaris sources: When searching for definition of uberblock_update in project onnv, we find one hit for line 49 in file uberblock.c, and when clicking on it, we see:

source extract: line 49 of file uberblock.c

Now, when searching again for the definitions of the first two arguments (args[0| and args[1|) of uberblock_update (which is uberblock and vdev), we get:

For uberblock, the following hits are shown:

When clicking on the link on the definition of struct uberblock (around line 53 in file uberblock_impl.h), we get:

For the members of struct vdev, it's not that easy. First, we get a long hit list when searching for the definition of vdev in the source browser. But if we search for "struct vdev" in that list, using the browser's search function, we get:


When clicking on the definition of struct vdev (around line 108 in file vdev_impl.h), we can see all the members of this structure.

Here are all the links, plus one more for struct blkprt (a member of struct uberblock), again in one place:

Now we are prepared to access the data via DTrace, by printing the arguments and members as in the following example:

printf ("%d %d %d", args[0]->ub_timestamp, args[1]->vdev_id, args[2]);

So a sample final DTrace script to print out as much information in the event of an uberblock_update as we can, and also print out any relevant I/O (hoping that from showing both at the same time, we can see where and how often the uberblocks are written):

io:genunix:default_physio:start,
io:genunix:bdev_strategy:start,
io:genunix:biodone:done
{
   printf ("%d %s %d %d", timestamp, execname,
     args[0]->b_blkno, args[0]->b_bcount);
}

fbt:zfs:uberblock_update:entry
{
   printf ("%d %s, %d, %d, %d, %d", timestamp, execname,
     pid, args[0]->ub_rootbp.blk_prop, args[1]->vdev_asize, args[2]);
}

The lines for showing the I/O are derived from DTrace scripts for I/O analysis in the DTrace Toolkit

Although I was unable to print out members of struct vdev (the second argument to uberblock_update() ) with the fbt:zfs:uberblock_update:entry probe (I also tried fbt:zfs:uberblock_update:return but had other problems with that one), the results when running this script, using
$ dtrace -s zfs-uberblock-report-02.d
, are quite interesting. Here's an extract (long lines shortened):

  0  33280  uberblock_update:entry 102523281435514 sched, 0, 922..345, 0, 21005
  0   5510     bdev_strategy:start 102523490757174 sched 282 1024
  0   5510     bdev_strategy:start 102523490840779 sched 794 1024
  0   5510     bdev_strategy:start 102523490873844 sched 18493722 1024
  0   5510     bdev_strategy:start 102523490903928 sched 18494234 1024
  0   5498            biodone:done 102523491215729 sched 282 1024
  0   5498            biodone:done 102523491576878 sched 794 1024
  0   5498            biodone:done 102523491873015 sched 18493722 1024
  0   5498            biodone:done 102523492232464 sched 18494234 1024
...
  0  33280  uberblock_update:entry 102553280316974 sched, 0, 922..345, 0, 21006
  0   5510     bdev_strategy:start 102553910907205 sched 284 1024
  0   5510     bdev_strategy:start 102553910989248 sched 796 1024
  0   5510     bdev_strategy:start 102553911022603 sched 18493724 1024
  0   5510     bdev_strategy:start 102553911052733 sched 18494236 1024
  0   5498            biodone:done 102553911344640 sched 284 1024
  0   5498            biodone:done 102553911623733 sched 796 1024
  0   5498            biodone:done 102553911981236 sched 18493724 1024
  0   5498            biodone:done 102553912250614 sched 18494236 1024
...
  0  33280  uberblock_update:entry 102583279275573 sched, 0, 922..345, 0, 21007
  0   5510     bdev_strategy:start 102583540376459 sched 286 1024
  0   5510     bdev_strategy:start 102583540459265 sched 798 1024
  0   5510     bdev_strategy:start 102583540492968 sched 18493726 1024
  0   5510     bdev_strategy:start 102583540522840 sched 18494238 1024
  0   5498            biodone:done 102583540814677 sched 286 1024
  0   5498            biodone:done 102583541091636 sched 798 1024
  0   5498            biodone:done 102583541406962 sched 18493726 1024
  0   5498            biodone:done 102583541743494 sched 18494238 1024

Using the following (n)awk one-liners:

$ nawk '/uberblock/{print}}' zfs-ub-report-02.d.out
$ nawk '/uberblock/{a=0}{a++;if ((a==2)){print}}' zfs-ub-report-02.d.out
$ nawk '/uberblock/{a=0}{a++;if ((a>=1)&&(a<=5)){print}}' zfs-ub-report-02.d.out
, we can print:
  • only the uberblock_update lines, or
  • just the next line after the line that matches the uberblock_update entry, or
  • all 4 lines after that entry, including the entry itself.

When running the script for a while and capturing its output, we can later analyze at which block number the first block after uberblock_update() is written, and we can see that the numbers are always even, the lowest number is 256 and the highest number is 510, with a block size of 1024. Those block numbers always go from 256, 258, 260, and so forth, until they reach 510. Then, they start with 256 again. So every (510-256)/2+1 = 128th iteration (yes, it's one more, as we have to include the first element after subtracting the first from the last element), the first block is overwritten again. The same is true for blocks 768...1022, 18493696...18493950 and 18494208...18494462 (the third and fourth block ranges should be different for different zpool sizes).

Now that we understand how and in which order the uberblocks are written, we are prepared to examine after how many days the uberblock area of a USB stick without wear leveling would probably be worn out. More on that and how we can use zdb for that, in my next blog entry.

Some more links on this topic:

Comments:

Post a Comment:
  • HTML Syntax: NOT allowed
About

blogfinger

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today