Wednesday Dec 16, 2009

Can we improve ZFS dedup performance via SHA256 ?

Today I integrated the fix for 6344836 ZFS should not have a private copy of SHA256.  The original intent of this was to avoid having a duplicate implementation of the SHA256 source code in the ONNV source tree.   The hope was that for some platforms there would be an improvement in the performance of SHA256 over the private ZFS copy and that would have some impact on ZFS performance.   Until deduplication support arrived in ZFS the SHA256 wasn't heavily used by default since the default data checksum is fletcher not SHA256.  However I had been running a variant of this fix in the ZFS crypto project gate for almost 2 years now since when encryption is enabled on a ZFS dataset we force the use of sha256 as the checksum for data/metadata of a dataset.

As part of approving my RTI Jeff Bonwick rightly wanted to know that the change wouldn't regress the performance of deduplication support that he had just integrated.   So asked for a ZFS micro test based on deduplication and also for some micro benchmark numbers comparing the time to do a SHA256 digest over the various ZFS block sizes 1k through 128k (in powers of two).  The micro benchmark uses an all zeros block (malloc followed by bzero) of the appropriate size and averages the time to do SHA2Init,SHA2Update,SHA2Final or the equivalent and put it into a zio_cksum_t (a big endian layout of 4 unsigned 64 bit ints), these are the averages in nano seconds over a run of 10,000 iterations for each block size.

The micro benchmark was run in userland using 64 bit processes but the SHA256 code used is identical to that used by the zfs kernel module and the misc/sha2 kernel module from the cryptographic framework.

Note these are micro benchmarks and may not be indicative of real world performance, I selected two modern machines from Sun's hardware portfollio an X4140 (the head unit of a Sun Unified Storage 7310) and a UltraSPARC T2 based T5120.  Note that the fix I integrated only uses a software implementation of SHA256 on the T5120 (UltraSPARC T2) and is not (yet) using the on CPU hardware implementation of SHA256.  The reason for this is to do with boot time availability of the Solaris Cryptographic Framework and the need to have ZFS as the root filesystem.  I know how to fix this but it was a slightly more detailed fix and one I didn't think appropriate to introduce in build 131 of OpenSolaris - which means it needs to wait until post 134.

All very nice but as I said a that is a micro benchmark, what does this actually look like at the ZFS level.  Again this is still a benchmark and is not necessarily indicative of any possible real world performance improvements, the goal here is to show that regardless of wither or not there is an improvement in the implementation of calculating a SHA256 checksum for ZFS will it be noticed for dedup.   This test was suggested by Jeff as a quick way to determine if there is any impact to dedup in changing the SHA256 implementation - it uses data that will obviously dedup.  Note here the actual numbers aren't important because the goal wasn't to show how fast ZFS can do IO but to show the relative difference between the before and after implementations of SHA256.  As such I didn't build a pool config with the purpose of doing the best possible IO I just used a single disk pool using one of the available internal drives in each of my machines (again an X4140 and a T5120).

# zpool create mypool -O dedup=on c1t1d0 
# rm -f 100g; sync; ptime sh -c 'mkfile 100g 100g; sync' 

X4140 running build 129 with private ZFS copy of SHA256

real     3:34.386051924
user        0.341710806
sys      1:02.587268898

X4140 running build with misc/sha256

real     2:25.386560294
user        0.317230220
sys        56.600231785

T5120 running build 129 with private ZFS copy of SHA256

real 8:40.703912346
user 2.704046212
sys 4:06.518025697

T5120 running build with misc/sha256

real 5:40.593874259
user 2.704308565
sys 3:59.648897024

So for both the X4140 and the T5120 there is a noticeable decrease in the real time taken to run the test. In each case I ran the test 6 times and picked the lowest time result in each case - the variance was actually very small anyway (usually under a second).  Given that mkfile produces blocks of all zeros, and compression was not on, then there would be massive amounts of dedup hits going on, how much ?

# zpool get dedup mypool
mypool  dedupratio  819200.00x  -

Big hits for dedup so lots of exercising of SHA256 in this "test" case.

Again these are tests primary done to show no regression in peformance from switching to the common copy of the SHA256 code, but have shown that there is an actually a significant improvement.  Wither or not you will see that improvement with your real world data on your real systems depends on many factors - not least of which is wither your data is dedupable and wither or not SHA256 was anywhere near being the critical factor in your observed performance.

My RTI advocate was happy I'd done the due diligence and approved my RTI so this is integrated into ONNV during build 131.

Happy deduplicating.




« April 2014