Wednesday Mar 19, 2008

how dedupalicious is your pool?

WIth the putback of 6656655 zdb should be able to display blkptr signatures, we can now get the "signature" of the block pointers in a pool. To see an example, let's first put some content into an empty pool:

heavy# zpool create bigIO c0t0d0 c0t1d0
heavy# zpool list
NAME    SIZE   USED  AVAIL    CAP  HEALTH  ALTROOT
bigIO   928G  95.5K   928G     0%  ONLINE  -
heavy# mkfile 1m /bigIO/1m.txt
heavy# echo "dedup me" > /bigIO/ejk.txt
heavy# cp /bigIO/ejk.txt /bigIO/ejk2.txt
heavy# echo "no dedup" > /bigIO/nope.txt
heavy# cp /bigIO/ejk.txt /bigIO/ejk3.txt

Now lets run zdb with the new option "-S". We pass in "user:all", where "user" tells zdb that we only want user data blocks (as opposed to both user and metadata) and "all" tells zdb to print out all blocks (skipping any checksum algorithm strength comparisons).

heavy# zdb -L -S user:all bigIO
0       131072  1       ZFS plain file  fletcher2       uncompressed    0:0:0:0
0       131072  1       ZFS plain file  fletcher2       uncompressed    0:0:0:0
0       131072  1       ZFS plain file  fletcher2       uncompressed    0:0:0:0
0       131072  1       ZFS plain file  fletcher2       uncompressed    0:0:0:0
0       131072  1       ZFS plain file  fletcher2       uncompressed    0:0:0:0
0       131072  1       ZFS plain file  fletcher2       uncompressed    0:0:0:0
0       131072  1       ZFS plain file  fletcher2       uncompressed    0:0:0:0
0       131072  1       ZFS plain file  fletcher2       uncompressed    0:0:0:0
0       512     1       ZFS plain file  fletcher2       uncompressed    656d207075646564:a:ada40e0eac8cac80:140
0       512     1       ZFS plain file  fletcher2       uncompressed    656d207075646564:a:ada40e0eac8cac80:140
0       512     1       ZFS plain file  fletcher2       uncompressed    7075646564206f6e:a:eac8cac840dedc0:140
0       512     1       ZFS plain file  fletcher2       uncompressed    656d207075646564:a:ada40e0eac8cac80:140
heavy# 

This displays the signature of each block pointer - where the columns are level, physical size, number of dvas, object type, checksum type, compression type, and finally the actual checksum of the block.

So this is interesting, but what could we do with this information? Well, one thing we could do, is to figure out how much your pool can take advantage of dedup. Let's assume that the dedup implementation does matching based on the actual checksum and any checksum algorithm is strong enough (in reality, we'd need sha256 or stronger). So starting with the above pool and using a simple perl script 'line_by_line_process.pl' (shown at the end of this blog), we find:

heavy# zdb -L -S user:all bigIO > /tmp/zdb_out.txt
heavy# sort -k 7 -t "`/bin/echo '\\t'`" /tmp/zdb_out.txt > /tmp/zdb_out_sorted.txt
heavy# ./line_by_line_process.pl /tmp/zdb_out_sorted.txt 
total PSIZE:               0t1050624
total unique PSIZE:        0t132096
total that can be duped:   0t918528
percent that can be duped  87.4269005847953%
heavy#   

In our trivial case, we can see that we could get a huge win - 87% of the pool can be dedup'd!. Upon closer examination, we notice that mkfile writes out all zero blocks. If you had compression enabled, there won't be any actual blocks for this file. So let's look at a case where just the "ejk.txt" contents are getting dedup'd:

heavy# zpool destroy bigIO
heavy# zpool create bigIO c0t0d0 c0t1d0
heavy# dd if=/dev/random of=/bigIO/1m.txt bs=1024 count=5
5+0 records in
5+0 records out
heavy# echo "dedup me" > /bigIO/ejk.txt
heavy# cp /bigIO/ejk.txt /bigIO/ejk2.txt
heavy# echo "no dedup" > /bigIO/nope.txt
heavy# cp /bigIO/ejk.txt /bigIO/ejk3.txt
heavy# zdb -L -S user:all bigIO > /tmp/zdb_out.txt
heavy# sort -k 7 -t "`/bin/echo '\\t'`" /tmp/zdb_out.txt > /tmp/zdb_out_sorted.txt
heavy# ./line_by_line_process.pl /tmp/zdb_out_sorted.txt       
total PSIZE:               0t7168
total unique PSIZE:        0t6144
total that can be duped:   0t1024
percent that can be duped  14.2857142857143%
heavy# 

Ok in this different setup we can see ~14% of capacity can actually be dedup'd - a nice savings on capacity.

So the question becomes - how dedupalicious is your pool?




ps: here is the simple perl script 'line_by_line_process.pl':

#!/usr/bin/perl

# Run this script as:
#  % script 


# total PSIZE
$totalps = 0;

# total unique PSIZE
$totalups = 0;

$last_cksum = -1;

$path = $ARGV[0];
open(FH, $path) or die "Can't open $!";

while (<>) {
        my $line = $_;
        ($level, $psize, $ndvas, $type, $cksum_alg, $compress, $cksum) = split /\\t/, $line, 7;
        if ($cksum ne $last_cksum) {
                $totalups += $psize;
        }
        $last_cksum = $cksum;
        $totalps += $psize;
}

print "total PSIZE:               0t".$totalps."\\n";
print "total unique PSIZE:        0t".$totalups."\\n";
print "total that can be duped:   0t".($totalps - $totalups)."\\n";
print "percent that can be duped  ".($totalps - $totalups) / $totalps \* 100 ."%\\n";
About

erickustarz

Search

Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today