a small ZFS hack

I've been dabbling a bit in ZFS recently, and what's amazing is not just how well it solved the well-understood filesystem problem, but how its design opens the door to novel ways to manage data. Compression is a great example. An almost accidental by-product of the design is that your data can be stored compressed on disk. This is especially interesting in an era when we have CPU cycles to spare, many too few available IOPs, and disk latencies that you can measure with a stop watch (well, not really, but you get the idea). With ZFS can you trade in some of those spare CPU cycles for IOPs by turning on compression, and the additional latency introduced by decompression is dwarfed by the time we spend twiddling our thumbs waiting for the platter to complete another revolution.

smaller and smaller

Turning on compression in zfs (zfs compression=on <dataset>) enables the so called LZJB compression algorithm -- a variation on Lempel-Ziv tagged by its humble author. LZJB is fast, reasonably effective, and quite simple (compress and decompress are implemented in about a hundred lines of code). But the ZFS architecture can support many compression algorithms. Just as users can choose from several different checksum algorithms (fletcher2, fletcher4, or sha256), ZFS lets you pick your compression routine -- it's just that there's only the one so far.

putting the z(lib) in ZFS

I thought it might be interesting to add a gzip compression algorithm based on zlib. I was able to hack this up pretty quicky because the Solaris kernel already contains a complete copy of zlib (albeit scattered around a little) for decompressing CTF data for DTrace, and apparently for some sort of compressed PPP streams module (or whatever... I don't care). Here's what the ZFS/zlib mash-up looks like (for the curious, this is with the default compression level -- 6 on a scale from 1 to 9):

# zfs create pool/gzip
# zfs set compression=gzip pool/gzip
# cp -r /pool/lzjb/\* /pool/gzip
# zfs list
NAME        USED  AVAIL  REFER  MOUNTPOINT
pool/gzip  64.9M  33.2G  64.9M  /pool/gzip
pool/lzjb   128M  33.2G   128M  /pool/lzjb

That's with a 1.2G crash dump (pretty much the most compressible file imaginable). Here are the compression ratios with a pile of ELF binaries (/usr/bin and /usr/lib):

# zfs get compressratio
NAME       PROPERTY       VALUE      SOURCE
pool/gzip  compressratio  3.27x      -
pool/lzjb  compressratio  1.89x      -

Pretty cool. Actually compressing these files with gzip(1) yields a slightly smaller result, but it's very close, and the convenience of getting the same compression transparently from the filesystem is awfully compelling. It's just a prototype at the moment. I have no idea how well it will perform in terms of speed, but early testing suggests that it will be lousy compared to LZJB. I'd be very interested in any feedback: Would this be a useful feature? Is there an ideal trade-off between CPU time and compression ratio? I'd like to see if this is worth integrating into OpenSolaris.


Technorati Tags:

Comments:

Useful? Yes

The trouble with any compression in the file system and this one makes it even more clear that you would want to be able to get at both the compressed and the uncompressed data.

Consider an ftp server it would be good if it could offer compressed data without having the system uncompress in the file system only to compress again in the ftp server.

Then there is NFS......

Posted by Chris Gerhard on January 28, 2007 at 06:37 PM PST #

Useful? Yes It's very interesting that a compression algorithm could easily added to ZFS. Is this hack available as source code somewhere? :-) Thanks and best regards, Ivan

Posted by ivanvdb25 on January 28, 2007 at 07:57 PM PST #

Of course it would be useful and should be integrated! I was playing with the same idea some time ago and adding another compression algorithm to ZFS is easy - the hard part is to do compression/decompression in kernel. If you've got gzip it should be integrated ASAP. There's one thing which can limit performance - there's open bug that couses all compression/decompression in ZFS being run by only one thread so only one CPU is utilized. Anyway lot of people especially with SATA disks are using ZFS for long term storage and do not necessary need lot of IOs. Simple way of specifying level of compression would be also useful - maybe in form of compression=gzip-N where N is compression level. Without specyfying -N (so only compression=gzip) default level would be enforced. Hope to see it integrated in hours... ok, in days :))) Great job! If you can provide you code changes privately right now it would be great.

Posted by Robert Milkowski on January 28, 2007 at 08:26 PM PST #

One side effect of the compression feature is that it skews the CPU utilization. I've been using the compress feature on a 3TB filesystem with excellent results. The one issue I notice is that when I've a decent amount of I/O against the filesystem my CPU spends most of its time in 'sys', >40% is not abnormal on my E2900 (24 x 96GB)

Posted by Anantha on January 28, 2007 at 10:25 PM PST #

Hi, I was just wondering if the gzip compression has been enabled, does it give problems when an ZFS volume is created on an X86 system and afterwards imported on a Sun Sparc? Best regards, Ivan

Posted by ivanvdb25 on January 29, 2007 at 12:07 AM PST #

Adam, this is positively and without a doubt some really great stuff! One could choose between lzjb for day-to-day use, or bzip2 for heavily compressed, "archival" file systems (as we all know, bzip2 beats the living daylights out of gzip in terms of compression about 95-98% of the time).

Historical tidbit:

ZFS finally implemented per-filesystem, one could say "per-directory" compression that AmigaOS had with the XFH: pseudo drive implemented with the xpkmaster.library (http://www.dstoecker.eu/xpkmaster.html).

Posted by UX-admin on January 29, 2007 at 02:29 AM PST #

Are there any documents somewhere explaining the hooks of zfs and how to add features like this to zfs? Would be useful for developers who want to add features like filesystem-based encryption to it. Thanks for your great work!

Posted by dennis on January 29, 2007 at 08:01 AM PST #

UX-admin, how do you claim that bzip2 "beats the living daylights out of gzip" ? I haven't seen it compress files significantly better than gzip, and it uses considerably more CPU time to do so.

Posted by Derek Morr on January 29, 2007 at 09:02 AM PST #

Dennis - just look into source, in .h files which have lot of useful comments. See also http://opensolaris.org/os/community/zfs/source/. ZFS is really nicely written and as i wrote before if you want to add another compression algorithm to ZFS the hard part is to implement the algorithm in kernel rather than hook it up into ZFS which is easy.

Posted by Robert Milkowski on January 29, 2007 at 04:47 PM PST #

Thanks for all the feedback! I've posted an update that includes some more information and responses to your questions and suggestions.

Posted by Adam Leventhal on January 31, 2007 at 02:34 PM PST #

Post a Comment:
Comments are closed for this entry.
About

Adam Leventhal, Fishworks engineer

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today