Compression on the Sun Storage 7000

Guest Author

Built-in filesystem compression has been part of ZFS since day one, but is only now
gaining some enterprise storage spotlight. Compression reduces the disk space needed to store data, not only increasing effective capacity but often improving performance as well
(since fewer bytes means less I/O). Beyond that, having compression built into the filesystem (as opposed to using an external appliance between your storage and your clients to do compression, for example) simplifies the
management of an already complicated storage architecture.

Compression in ZFS

Your mail client
might use WinZIP to compress attachments before sending them, or you might unzip tarballs in order to open the documents inside. In
these cases, you (or your program) must explicitly invoke a separate
program to compress and uncompress the data before actually using it. This
works fine in these limited cases, but isn't a very general solution. You
couldn't easily store your entire operating system compressed on disk, for

With ZFS, compression is built directly into the I/O pipeline. When
compression is enabled on a dataset (filesystem or LUN), data is compressed
just before being sent to the spindles and decompressed as it's read back.
Since this happens in the kernel, it's completely transparent to userland
, which need not be modified at all. Besides the initial
configuration (which we'll see in a moment is rather trivial), users need not
do anything to take advantage of the space savings offered by compression.

A simple example

Let's take a look at how this works on the 7000 series. Like all software features, compression comes free. Enabling compression
for user data is simple because it's just a share property. After creating a
new share, double-click it to modify its properties, select a compression
level from the drop-down box, and apply your changes:

Click for larger image

GZIP optionsAfter that, all new data written to the share will be compressed with the
specified algorithm. Turning compression off is just as easy: just select
'Off' from the same drop-down. In both cases, extant data will remain as-is -
the system won't go rewrite everything that already existed on the share.

Note that when compression is enabled, all data
written to the share is compressed
, no matter where it comes from: NFS, CIFS, HTTP, and FTP clients all reap the benefits. In fact,
we use compression under the hood for some of the system data (analytics data, for
example), since the performance impact is negligible (as we will see below)
and the space savings can be significant.

You can observe the compression ratio for a share in the sidebar on the share
properties screen. This is the ratio of uncompressed data size to actual
(compressed) disk space used and tells you exactly how much space
you're saving.

The cost of compression

People are often concerned about the CPU overhead associated with compression, but the actual cost is difficult to calculate. On the one hand,
compression does trade CPU utilization for disk space savings. And up to a point, if
you're willing to trade more CPU time, you can get more space savings. But by
reducing the space used, you end up doing less disk I/O, which can improve overall
performance if your workload is bandwidth-limited.

But even when reduced I/O doesn't improve overall performance (because bandwidth isn't the bottleneck), it's important to keep in mind that the 7410 has a great deal of CPU horsepower (up to 4 quad-core 2GHz Opterons), making the "luxury" of compression very affordable.

The only way to really know the impact of compression on your disk
utilization and system performance is to run your workload with different
levels of compression and observe the results. Analytics is the perfect vehicle for this: we can observe
CPU utilization and I/O bytes per second over time on shares configured with
different compression algorithms.

Analytics results

I ran some experiments to show the impact of compression on performance. Before we get to the good stuff, here's the nitty-gritty about the experiment and results:

  • These results do not demonstrate maximum performance. I intended to show the effects of compression, not the maximum throughput of our box. Brendan's already got that covered.
  • The server is a quad-core 7410 with 1 JBOD (configured with mirrored storage) and 16GB of RAM. No SSD.
  • The client machine is a quad-core 7410 with 128GB of DRAM.
  • The basic workload consists of 10 clients, each writing 3GB to its own share and then reading it back for a total of 30GB in each direction. This fits entirely in the client's DRAM, but it's about twice the size of the server's total memory. While each client has its own share, they all use the same compression level for each run, so only one level is tested at a time.
  • The experiment is run for each of the compression levels supported on the 7000 series: lzjb, gzip-2, gzip (which is gzip-6), gzip-9, and none.
  • The experiment uses two data sets: 'text' (copies of /usr/dict/words, which is fairly compressible) and 'media' (copies of the Fishworks code swarm video, which is not very compressible).
  • I saw similar results with between 3 and 30 clients (with the same total write/read throughput, so they were each handling more data).
  • I saw similar results whether each client had its own share or not.

Now, below is an overview of the text (compressible) data set experiments in terms of NFS ops and network throughput. This gives a good idea of what the test does. For all graphs below, five experiments are shown, each with a different compression level in increasing order of CPU usage and space savings: off, lzjb, gzip-2, gzip, gzip-9. Within each experiment, the first half is writes and the second half reads:

NFS and network stats

Not surprisingly, from the NFS and network levels, the experiments basically appear the same, except that the writes are spread out over a longer period for higher compression levels. The read times are pretty much unchanged across all compression levels. The total NFS and network traffic should be the same for all runs. Now let's look at CPU utilization over these experiments:

CPU usage

Notice that CPU usage increases with higher compression levels, but caps out at about 50%. I need to do some digging to understand why this happens on my workload, but it may have to do with the number of threads available for compression. Anyway, since it only uses 50% of CPU, the more expensive compression runs end up taking longer.

Let's shift our focus now to disk I/O. Keep in mind that the disk throughput rate is twice that of the data we're actually reading and writing because the storage is mirrored:

Disk I/O

We expect to see an actual decrease in disk bytes written and read as the compression level increases because we're writing and reading more compressed data.

I collected similar data for the media (uncompressible) data set. The three important differences were that with higher compression levels, each workload took less time than the corresponding text one:

Network bytes

the CPU utilization during reads was less than in the text workload:

CPU utilization

and the total disk I/O didn't decrease nearly as much with the compression level as it did in the text workloads (which is to be expected):

Disk throughput

The results can be summarized by looking at the total execution time for each workload at various levels of compression:

Summary: text data set
lzjb 1.47x
gzip-9 2.52x
Summary: media data set
off 1.00x
gzip-2 1.01x
gzip-9 1.01x
Space chart Time chart

What conclusions can we draw from these results? Of course, what we knew, that compression performance and space savings vary greatly with the compression level and type of data. But more specifically, with my workloads:

  • read performance is generally unaffected by compression
  • lzjb can afford decent space savings, but performs well whether or not it's able to generate much savings.
  • Even modest gzip imposes a noticeable performance hit, whether or not it reduces I/O load.
  • gzip-9 in particular can spend a lot of extra time for marginal gain.

Moreover, the 7410 has plenty of CPU headroom to spare, even with high

Summing it all up

We've seen that compression is free, built-in, and very easy to enable on the 7000 series. The performance effects
vary based on the workload and compression algorithm, but powerful CPUs
allow compression to be used even on top of serious loads. Moreover, the appliance provides great visibility into overall system
performance and effectiveness of compression, allowing administrators to see
whether compression is helping or hurting their workload.

Join the discussion

Comments ( 9 )
  • rnixon Monday, March 16, 2009

    curious about the cpu usage - when u find out why it capped at 50% in the text

    example, pls post.

  • Dave Pacheco Monday, March 16, 2009

    @rnixon: It appears like I'm hitting the maximum number of ZIO pipeline threads defined in spa.c: http://cvs.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/fs/zfs/spa.c#zio_taskq_threads

    The limit is hardcoded to 8 threads, or 50% of the cores we have available on this 7410. I'll post more when I have more details about this. Keep in mind that if you're running into this limit, that means you've already got 8 cores running compression full tilt.

  • PW Monday, March 16, 2009

    This is cool!. At least we knew that read the compressed the data won't impact performance (or minimal). It's good to use it in the data where we know it's more on read (i.e., web page).

    Thanks for sharing the test.

  • Adam Cath Wednesday, March 18, 2009

    Thanks for the candid experiment, Dave!

  • Peter U Thursday, March 26, 2009


    Very interesting results! Thanks for sharing it with the world.

    Here are some observation and a couple of questions.

    ># The server is a quad-core 7410 with 1 JBOD (configured with mirrored storage) >and 16GB of RAM. No SSD.

    ># The client machine is a quad-core 7410 with 128GB of DRAM.

    ># The basic workload consists of 10 clients,

    Interesting selection of server and clients. You have 10 x 7410 w/ 128GB RAM available and choose to use one w/ 16GB of RAM as server.

    What impact would you think it would have to your test result if

    1. You use 128GB RAM on the 7410 server side?

    2. You have readzillas and logzillas on the server side?



  • Dave Pacheco Thursday, March 26, 2009

    @Peter: Good questions. The 10 clients were not separate machines, but rather 10 threads on the same machine. Remember, I was trying to observe the effects of compression, not achieve maximum throughput. Using just 16GB of server DRAM allowed me to use a relatively small dataset (30GB) and still show that the server wasn't just caching writes in memory. By contrast, I chose a client with 128GB of DRAM so the data \*would\* be cached there to make sure that we were observing the performance of the server and not running into client I/O limits.

    With 128GB of DRAM on the server, the 30GB data set would fit entirely in cache. ZFS would eventually write this to disk, of course, but it may not happen for some time if the client didn't explicitly demand it. Performance would be much better, but we wouldn't observe any effects of compression on workloads where the client speed is limited by server I/O capacity. It might be interesting, though, since it would show just how much throughput the CPU could handle with compression enabled.

    Neither Readzilla nor Logzilla would really help this workload because it's just streaming I/O. The flash disks have great latency for small reads and writes, but less bandwidth than a shelf of disks. So it's faster to write and read 30GB to disk than to send it through Logzilla or Readzilla.

  • Peter U Thursday, March 26, 2009


    Thanks for answering my questions.

    So when you have L2ARC, does it cache the compressed data or the uncompressed data? I guess I am trying to figure out at what point the compression occurs and how does it interact wit the hybrid caching mechanism.

    For example, on a write, does it compress it on ARC then it flush the compressed data to disk? or does it continue to cache the data in ARC and do the compression when it gets flushed to disk?

    Similarly on read, does it decompress the data from disk then place data to ARC/L2ARC as it serve the request or is it caching the compressed data?

    If the ARC and L2ARC caches the uncompressed data, I would think that having L2ARC could make a big difference, at least in some scenarios.

    Similarly on ZIL, if the compression occurs when the data is moved from ZIL to disks, I would think it could make a big difference in some scenarios.


  • Dave Pacheco Monday, March 30, 2009

    @Peter: Compression happens as ZFS reads or writes the data to disk (not

    cache), so the ARC and L2ARC cache the uncompressed data. I believe the ZIL is uncompressed, but that shouldn't matter (see below).

    Brendan explains (see http://blogs.sun.com/brendan/entry/test) how the L2ARC behaves

    like an extension of the in-memory ARC: recently accessed data is kept in the

    ARC/L2ARC with the expectation that some of it will be accessed again. This

    works best when caching random reads of a large working set (that is, larger

    than main memory). With 600GB of L2ARC on top of the 128GB of DRAM, a random

    read is much more likely to hit in cache and not have to go to disk. And you're right that for this kind of workload, the L2ARC could hide the read-side performance penalty of compression.

    But if your workload is truly streaming (as in this experiment), then having

    accessed data recently does not suggest that you might access it again in the

    near future. As a result, the L2ARC doesn't even bother caching streaming

    workloads (see http://blogs.sun.com/brendan/entry/test).

    The ZIL is a different story: the ZIL is used to improve the \*latency\* of small, synchronous writes. You'll always have many more spindles than log devices, and thus more aggregate bandwidth to the disks than to the ZIL. So if latency isn't an issue or the writes are large (both are the case in this experiment), the system just skips the ZIL and sends the data to the spindles, since that will yield better performance. If latency \*is\* the issue, then Logzilla will help you a lot regardless of compression.

  • Peter U Tuesday, March 31, 2009


    Thank you for the explanations. In our env we use NAS to support RDBMS for both OLTP and DSS type workload, so that is where I am coming from. I should be getting our eval unit of 7410 and can't wait to get my hands on it.



Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.