Taking ZFS deduplication for a test drive
By User12611829-Oracle on Nov 22, 2009
Here is my test case: I have 2 directories of photos, totaling about 90MB each. And here's the trick - they are almost complete duplicates of each other. I downloaded all of the photos from the same camera on 2 different days. How many of you do that ? Yeah, me too.
Let's see what ZFS can figure out about all of this. If it is super smart we should end up with a total of 90MB of used space. That's what I'm hoping for.
The first step is to create the pool and turn on deduplication from the beginning.
# zpool create -f scooby -O dedup=on c2t2d0s2This will use sha256 for determining if 2 blocks are the same. Since sha256 has such a low collision probability (something like 1x10\^-77), we will not turn on automatic verification. If we were using an algorithm like fletcher4 which has a higher collision rate we should also perform a complete block compare before allowing the block removal (dedup=fletcher4,verify)
Now copy the first 180MB (remember, this is 2 sets of 90MB which are nearly identical sets of photos).
# zfs create scooby/doo # cp -r /pix/Alaska\* /scooby/dooAnd the second set.
# zfs create scooby/snack # cp -r /pix/Alaska\* /scooby/snackAnd finally the third set.
# zfs create scooby/dooby # cp -r /pix/Alaska\* /scooby/doobyLet's make sure there are in fact three copies of the photos.
# df -k | grep scooby scooby 74230572 25 73706399 1% /scooby scooby/doo 74230572 174626 73706399 1% /scooby/doo scooby/snack 74230572 174626 73706399 1% /scooby/snack scooby/dooby 74230572 174625 73706399 1% /scooby/dooby
OK, so far so good. But I can't quite tell if the deduplication is actually doing anything. With all that free space, it's sort of hard to see. Let's look at the pool properties.
# zpool get all scooby NAME PROPERTY VALUE SOURCE scooby size 71.5G - scooby capacity 0% - scooby altroot - default scooby health ONLINE - scooby guid 5341682982744598523 default scooby version 22 default scooby bootfs - default scooby delegation on default scooby autoreplace off default scooby cachefile - default scooby failmode wait default scooby listsnapshots off default scooby autoexpand off default scooby dedupratio 5.98x - scooby free 71.4G - scooby allocated 86.8M -Now this is telling us something.
First notice the allocated space. Just shy of 90MB. But there's 522MB of data (174MB x 3). But only 87MB used out of the pool. That's a good start.
Now take a look at the dedupratio. Almost 6. And that's exactly what we would expect, if ZFS is as good as we are lead to believe. 3 sets of 2 duplicate directories is 6 total copies of the same set of photos. And ZFS caught every one of them.
So if you want to do this yourself, point your OpenSolaris package manager at the dev repository and wait for build 128 packages to show up. If you need instructions on using the OpenSolaris dev repository, point the browser of your choice at http://pkg.opensolaris.org/dev/en/index.shtml. And if you can't wait for the packages to show up, you can always .