News, tips, partners, and perspectives for the Oracle Solaris operating system

Having my secured cake and Cloning it too (aka Encryption + Dedup with ZFS)

Darren Moffat
Senior Software Architect

The main goal of encryption is to make the (presumably sensitive) cleartext data indistinguishable from random data.  Good file system encryption usually aims to have the same plaintext encrypt to different ciphertext at least when written at a different "location" even if the same key is used.  One way to achieve that is that the initialisation vector (IV) is some how derived from where the blocks of the files are stored on disk.  In this respect the encryption support in ZFS is no different, by default we derive the IV from a combination of what dataset / object the block is for and also when (its transaction) written.  This means that the same block of plaintext data written to a different file in the same filesystem will get a different IV and thus different ciphertext.  Since ZFS is copy-on-write and we use the transaction identifier it also means that if we "overwrite" the same block of a file at a later time it still ends up having a different IV and thus will be different ciphertext.  Each encrypted dataset in ZFS has a different set of data encryption keys (see my earlier post on assured delete for more details on that), so there we change the IV and the encryption key so have a really high level of confidence of getting different ciphertext when written to different datasets.

The goal of deduplication in storage is to coalesce matching disk blocks into a smaller number of copies (ideally 1, but in ZFS that nunber depends on the value of the copies property on the dataset and the pool wide dedupditto property so it could be more than 1).  Given the above description of how we do encryption it would seem that encryption and deduplication are fundamentally at odds with each other - and usually that is true.

When we write a block to disk in ZFS it goes through the ZIO pipeline and in doing so a number of transforms are optionally applied to the data:  compress -> encryption -> checksum -> dedup -> raid.

The deduplication step uses the checksums of the blocks to find suitable matches. This means it is acting on the already compressed and encrypted data.  Also in ZFS deduplication matches are searched for in all datasets in the pool with dedup=on.

So we have very little chance of getting any deduplication hits with encrypted datasets because of how the IV is generated and the fact that each dataset has its own set of encryption keys.  In fact not getting hits with deduplication is actually a good test that we are using different keys and IVs and thus getting different ciphertext for the same plaintext.

So encryption=on + dedup=on is pointless, right ?

Not so with ZFS, I wasn't happy about giving up on deduplication for encrypted datasets, so we found a solution, it has some restrictions but I think they are reasonable and realistic ones.

Within what I'll call a "clone family", ie all datasets are clones of the same original dataset or are clones of those clones, we would be sharing data encryption keys in the default case, because they share data (again see my earlier post on assured delete for info on the data encryption keys). So I found a method of generating the IV such that within the "clone family" we will get dedup hits for the same plaintext.  For this to work you must not run 'zfs key -K' on any of the clones and you must not pass '-K' to 'zfs clone' when you create your clones.  Note that dedup does not apply to child datasets only to the snapshots/clones, and by that I mean it doesn't break you just won't get deduplication matches.

So no it isn't pointless and whats more for some configurations it will actually work really well.  A common use case for a configuration that does work well is a set of visualisation image (maybe filesystems for local Zones or ZVOLs shared over iSCSI for  OVM or similar) where they are all derived from the same original master by using zfs clones and that all get patched/updated with the pretty much the same set of patches/updaets.  This is a case where clones+dedup work well for the unencrypted case, and one which as shown above can still work well even when encryption is enabled.

The usual deployment caveats with ZFS deduplication still apply, ie it is block based and it works best when you have lots of available DRAM and/or L2ARC for caching the DDT.  ZFS Encryption doesn't add any additional requirements to this. 

So we can happily do this type of thing, and have it "work as expected":

$ zfs create -o compression=on -o encryption=on -o dedup=on tank/builds
$ zfs create tank/builds/master
$ zfs clone tank/builds/master@1tank/builds/project-one
$ zfs clone tank/builds/master@1 tank/builds/project-two

General documentation for ZFS support of encryption is in the Oracle Solaris ZFS Administration Guide in the Encrypting ZFS File Systems section.

Join the discussion

Comments ( 3 )
  • Jim Klimov Sunday, November 21, 2010

    Hello, Darren, and thanks for your write-ups on ZFS.

    I wanted to clarify some point, which was true in the past for deduplication with compression - is it still true now? Here goes:

    When we enable deduplication and compression (and now encryption) on a dataset, each of our data blocks is heavily processed (compressed and encrypted and finally checksummed) before "falling prey" to deduplication. That is, after commiting lots of computational resources, the compressed and encrypted block itself is discarded and some counter is incremented.

    One optimization discussed in the past was to create a lower-level ZFS Volume dataset with compression (and now encryption), and inside that volume create another ZFS pool and datasets with deduplication. In this case the plaintext block's checksum is almost instantly available, and if the block is unique - it goes to the heavy processing to be written in the underlying volume.

    Another idea was to store the checksums of uncompressed blocks as well as checksums of on-disk blocks, much to the same effect of finding duplicates before heavy processing. Then it might be possible, for example, to either add a pointer to the on-disk block, or store it in the best of the assigned compression levels and change other pointers from less-compressed (or uncompressed) block to the new better-compressed one. But implementing this idea would probably involve changing too much "under the hood" of ZFS, so it seems less feasible.

    As all of these points were discussed in the past, I wonder if the current implementation of ZFS smartly takes advantage of some such solution to save CPU resources, increase performance and hide complexity from the user at the same time?

    If not, have there been any measurements about the performance setbacks vs. gains of having two-level pools like in the first scenario above?

    Now that we have encryption in the mix, would any of these proposed solutions reduce the encrypted data security and put it at risk somehow?



  • Darren Moffat Monday, November 22, 2010

    Deduplication is still based on the block as stored on disk. For encrypted data this is very important as using the untransformed data (pre compress and encrypt) would give very different results - one that wouldn't actually work because you wouldn't be able to decrypt the ciphertext blocks if they belonged to a different dataset (that isn't a clone) because the dataset the read is done on behalf of wouldn't have the correct keys.

    Also while it is possible to build a pool directly on top of local ZVOLs doing so is not recommended. zpool -> iSCSI -> zvol is fully supported though.

  • Nitin V Monday, June 3, 2013

    So does it mean that if I have 3rd party in-line encryption between system (ZFS) & SAN Storage, the 'ZFS Deduplication' wouldn't work as it won't have the ability (keys) to decrypt the ciphertext blocks?



Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.