X
  • ZFS
    January 31, 2007

gzip for ZFS update

Guest Author

The other day I posted about a prototype I had created that adds a gzip compression algorithm to ZFS. ZFS already allows administrators to choose to compress filesystems using the LZJB compression algorithm. This prototype introduced a more effective -- albeit more computationally expensive -- alternative based on zlib.

As an arbitrary measure, I used tar(1) to create and expand archives of an ON (Solaris kernel) source tree on ZFS filesystems compressed with lzjb and gzip algorithms as well as on an uncompressed ZFS filesystem for reference:



Thanks for the feedback. I was curious if people would find this interesting and they do. As a result, I've decided to polish this wad up and integrate it into Solaris. I like Robert Milkowski's recommendation of options for different gzip levels, so I'll be implementing that. I'll also upgrade the kernel's version of zlib from 1.1.4 to 1.2.3 (the latest) for some compression performance improvements. I've decided (with some hand-wringing) to succumb to the requests for me to make these code modifications available. This is not production quality. If anything goes wrong it's completely your problem/fault -- don't make me regret this. Without further disclaimer:
pdf
patch

In reply to some of the comments:


UX-admin
One could choose between lzjb for day-to-day use, or bzip2 for heavily compressed, "archival" file systems (as we all know, bzip2 beats the living daylights out of gzip in terms of compression about 95-98% of the time).

It may be that bzip2 is a better algorithm, but we already have (and need zlib) in the kernel, and I'm loath to add another algorithm


ivanvdb25
Hi, I was just wondering if the gzip compression has been enabled, does it give problems when an ZFS volume is created on an X86 system and afterwards imported on a Sun Sparc?

That isn't a problem. Data can be moved from one architecture to another (and I'll be verifying that before I putback).


dennis
Are there any documents somewhere explaining the hooks of zfs and how to add features like this to zfs? Would be useful for developers who want to add features like filesystem-based encryption to it. Thanks for your great work!

There aren't any documents exactly like that, but there's plenty of documentation in the code itself -- that's how I figured it out, and it wasn't too bad. The ZFS source tour will probably be helpful for figuring out the big picture.

Update 3/22/2007: This work was integrated into build 62 of onnv.



Technorati Tags:

Join the discussion

Comments ( 7 )
  • Jeff Bonwick Thursday, February 1, 2007
    Thanks for doing this -- way cool. One thought on the different compression levels: if I recall correctly, the level isn't part of the compressed output, so it's not strictly necessary to burn 9 slots (gzip1...gzip9) in the compression vector array. Keeping an extra byte per dataset to remember the level would be trivial; the only awkward part is the dnode. Although we don't currently export it in the admin model, compression is actually settable on a per-file basis, not just per-filesystem, and follows the same inheritance rules that we use for nested datasets. So the dnode also has a byte for compression flavor, and if we didn't use 9 distinct values, we'd need another byte to encode the level. We have spare bytes in the dnode, but it seems kind of wasteful. Then again, so is this comment. If you want to burn 9 slots, be my guest. ;-)
  • alexvdb25 Thursday, February 1, 2007
    Thanks Adam for your work and your answers to our questions :-)
  • Igor Thursday, February 1, 2007
    Adam,
    That's great feature!
    What is time(sec) on left diagram - wall or CPU time? What was the other time?
    What hardware it was measured on (CPUs, HD etc)?
    Is your implementation expected to scale with # of CPUs?
  • Adam Leventhal Thursday, February 1, 2007
    Jeff,


    Thanks for the note. I'll ping you about the trade-offs between using extra compression slots versus using a byte in the dnode.




    Igor,


    The performance test was very ad hoc, but the numbers are the real time on an otherwise idle 2-way 2Ghz Opteron. As for whether it will scale, tar itself is single-threaded and there's the problem of ZFS compression executing only on a single CPU (6460622). When these issues get sorted out and I'm closer to putting back, I or someone else will gather sore more meaningful data.
  • Igor Thursday, February 1, 2007
    Thanks for clarifications. It seems there's no good reason to use tar for per. measurements of ZFS.
  • roland Saturday, February 3, 2007
    hey, this is really damn great stuff!
    i would really like to try zfs with zlib compression, especially doing performance comparison and compression efficiency.
    unfortunately, i`m not that much into solaris but more on linux, so i`m a real newbie when it comes to applying patches to solaris kernel source and compile/install such kernel.
    any hints how to do this ?
    or - should i just sit and wait for some number of days/weeks until this shows up in some update-package/distro release ?
    regards
    roland
  • Adam Leventhal Friday, February 9, 2007
    roland,


    I hope to have this putback to Solaris within the next few weeks, and it will be available shortly after that. I didn't really intend anyone but the most adventurous OpenSolaris user to attempt to apply the patch, so unless that describes you sit tight -- gzip for ZFS will be available relatively soon.
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.Captcha