An Oracle blog about Solaris

zfs deduplication

Chris Beal
Senior Principal Software Engineer

Inspired by reading Jeff Bonwick's Blog I decided to give it a go on my development gates. A lot of files are shared between clones of the gate and even between builds, so hopefully I should get a saving in the number of blocks used in my project file system.

Being cautious I am using an alternate boot environment created using beadm, and backing up my code using hg backup (a useful mercurial extension included in the ON build tools)

I'm impressed. As it works on a block level, rather than a file level, so the saving isn't directly proportional the number of duplicate files. But you still get a significant saving, albeit at the expense of using more CPU. It needs to do a sha256 checksum comparison of the blocks to ensure they're really identical.

Enabling it is simply a case of

 $ pfexec zfs set dedup=on <pool-name> 

Though obviously you can do so much more. Jeff's blog (and the comments) are a goldmine of information about the subject.

Join the discussion

Comments ( 2 )
  • Mike Gerdts Thursday, December 3, 2009

    Can you share the build times with and without dedup on? That is, is the hash calculation having a material impact on getting your work done or is it mostly using up otherwise idle CPU cycles.

    Some details on your machine (# cpus/cores/strands) and the parallelism of dmake would be helpful as well.

  • guest Friday, December 4, 2009

    This was just in a virtual machine on my laptop which I use when traveling so the comparisons aren't really valid. It did max out the one virtual cpu I'd assigned to the guest though, so the build time for a full nightly was significantly effected.

Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.