zfs deduplication

Inspired by reading Jeff Bonwick's Blog I decided to give it a go on my development gates. A lot of files are shared between clones of the gate and even between builds, so hopefully I should get a saving in the number of blocks used in my project file system.

Being cautious I am using an alternate boot environment created using beadm, and backing up my code using hg backup (a useful mercurial extension included in the ON build tools)

I'm impressed. As it works on a block level, rather than a file level, so the saving isn't directly proportional the number of duplicate files. But you still get a significant saving, albeit at the expense of using more CPU. It needs to do a sha256 checksum comparison of the blocks to ensure they're really identical.

Enabling it is simply a case of

 $ pfexec zfs set dedup=on <pool-name> 

Though obviously you can do so much more. Jeff's blog (and the comments) are a goldmine of information about the subject.



Comments:

Can you share the build times with and without dedup on? That is, is the hash calculation having a material impact on getting your work done or is it mostly using up otherwise idle CPU cycles.

Some details on your machine (# cpus/cores/strands) and the parallelism of dmake would be helpful as well.

Posted by Mike Gerdts on December 03, 2009 at 04:36 PM GMT #

This was just in a virtual machine on my laptop which I use when traveling so the comparisons aren't really valid. It did max out the one virtual cpu I'd assigned to the guest though, so the build time for a full nightly was significantly effected.

Posted by guest on December 04, 2009 at 06:34 AM GMT #

Post a Comment:
Comments are closed for this entry.
About

Chris W Beal

Search

Archives
« April 2014
MonTueWedThuFriSatSun
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
    
       
Today