X

Tamp - a Lightweight Multi-Threaded Compression Utility

UPDATE: Tamp has been ported to Linux, and is now at version 2.5


Packages for Solaris (x86 and SPARC), and a source tarball are available below.

Back Then

Many years ago (more than I care to remember), I saw an opportunity to
improve the performance of a database backup. This was before the time of
Oracle on-line backup, so the best choice at that time was to:

  1. shut down the database
  2. export to disk
  3. start up the database
  4. back up the export to tape

The obvious thing to improve here is the time between steps 1 and 3.
We had a multi-CPU system running this database, so it occurred to me
that perhaps compressing the export may speed things up.

I say "may" because it is important to remember that if the compression utility has lower throughput than the output of the database export (i.e. raw output; excluding any I/O operations to save that data) we may just end up with a different bottleneck, and not run any faster; perhaps even slower.

As it happens, this era also pre-dated gzip and other newer
compression utilities. So, using the venerable old "compress", it
actually was slower. It did save some disk space, because Oracle export
files are eminently compressible.

So, I went off looking for a better compression utility. I was now
more interested in something that was fast. It needed to not be the
bottleneck in the whole process.

What I found did the trick - It reduced the export time by 20-30%, and
saved some disk space as well. The reason why it saved time was that
it was able to compress at least as fast as Oracle's "exp" utility was
able to produce data to compress, and it eliminated some of the I/O - the real bottleneck.

More Recently

I came across a similar situation more recently - I was again doing
"cold" database restores and wanted to speed them up. It was a
little more challenging this time, as the restore was already parallel
at the file level, and there were more files than CPUs involved (72).
In the end, I could not speed up my 8-odd minute restore of ~180GB,
unless I already had the source files in memory (via the filesystem
cache). That would only work in some cases, and is unlikely to work
in the "real world", where you would not normally want this much spare
memory to be available to the filesystem.

Anyway, it took my restore down to about 3 minutes in cases where all
my compressed backup files were in memory - this was because it had
now eliminated all read I/O from the set of arrays holding my backup.
This meant I had eliminated all competing I/O's from the set
of arrays where I was re-writing the database files.

Multi-Threaded Lightweight Compression

I could not even remember the name of the utility I used years ago,
but I knew already that I would need something better. The computers
of 2008 have multiple cores, and often multiple hardware threads per
core. All of the current included-in-the-distro compression utilities (well, almost all utilities) for Unix are still
single-threaded - a very effective way to limit throughput on
a multi-CPU system.

Now, there are a some multi-threaded compression utilities available,
if not widely available:

  • PBZIP2 is a parallel implementation of BZIP2. You can find out more here
  • PIGZ is a parallel implementation of GZIP, although it turns out it is
    not possible to decompress a GZIP stream with more than one thread. PIGZ is available here.

Here is a chart showing some utilities I have tested on a 64-way Sun
T5220. The place to be on this chart is toward the bottom right-hand
corner.




Here is a table with some of the numbers from that chart:











Utility Reduction (%) Elapsed (s)
tamp 66.18 0.31
pigz --fast 71.18 1.04
pbzip2 --fast 77.17 4.17
gzip --fast 71.10 16.13
gzip 75.73 40.29
compress 61.61 18.21

To answer your question - yes, tamp really is 50-plus-times faster than "gzip --fast".

Tamp

The utility I have developed is called tamp. As the name suggests, it
does not aim to provide the best compression (although it is better
than compress, and sometimes beats "gzip --fast").

It is however a proper parallel implementation of an already fast
compression algorithm.

If you wish to use it, feel free to download it. I will be blogging in the near future on a different performance test I conducted using tamp.

Compression Algorithm

Tamp makes use of the compression algorithm from Quick LZ version 1.40. I have tested a couple of other algorithms, and the code in tamp.c can be easily modified to use a different algorithm. You can get QuickLZ from here (you will need to download source yourself if you want to build tamp).

Update, Jan 2012 - changed the downloads to .zip files, as it seems blogs.oracle.com interprets a download of a file ending in .gz as a request to compress the file via gzip before sending it. That confuses most people.


Resources

Join the discussion

Comments ( 13 )
  • Eisbaer Saturday, September 27, 2008

    ask 1:

    why you do not use the LZO 2.03

    (which is released Apr 30 2008) ?

    you wrote (README):

    - is not widely available

    ask 2:

    if using LZO

    - will the resulting file compatible

    to "lzop" (http://www.lzop.org) ?

    - is it possible to unpack the resulting file

    with lzop (http://www.lzop.org) ?

    In this case there will be

    a very widely available unpacker.

    ask 3:

    would it be possible for you

    to modify the program in the way

    to create a zip - compatible resulting file ?


  • Tim Sunday, September 28, 2008

    1. I did test with LZO 2.03, so I know that it can be used with the LZO macro in tamp.c. I have chosen to use QuickLZ for better performance.

    2. Files compressed with tamp are not compatible with any other decompression utility.

    3. I do not intend to develop something that is compatible with the ZIP format.


  • Lasse Reinhold Tuesday, September 30, 2008

    Hi Eisbaer,

    There's a fast .zip compatible compressor on http://www.quicklz.com/zip.html

    It's not as fast as LZO or QuickLZ but still alot faster than gzip -1.

    I'm not working on the project anymore and it hasn't been tested very well. Also doesn't support files larger than 2 GB.


  • Stuart Anderson Saturday, November 13, 2010

    Fantastic!

    Just what I needed to compress large ZFS snapshot images being sent with "zfs send" and mbuffer to a SAM-QFS archive system.

    It is great to be able to compress at several hundred MByte/s with just a few CPU cores.

    This should be bundled with Solaris.


  • Lasse Reinhold Tuesday, January 18, 2011

    Hi Stuart (and others) - For ZFS snapshots I believe that eXdupe from http://www.exdupe.com/ would be better suited since it also performs deduplication. At the same class of speed as tamp (~1 Gbyte/s on 8 cores).


  • Ed Tuesday, January 25, 2011

    Sir,

    This is a great tool, thank you for creating it. I agree with the other poster that this should be bundled with Solaris. I think I may have discovered a bug, however.

    I created two different flash archives (cpio archives with text headers) compressed using tamp v2.5, one 2.1GB and the other 11GB. Compression worked fine, but during decompression, both of them fail--but only when reading the compressed data from stdin (if I pass the same flash archive(s) to tamp as arguments on the command line, then decompression works fine).

    I stripped the header and piped the compressed archive through a bit of Perl on its way to tamp, to count the number of bytes read before tamp fails, and it appears that the entire archive (minus the header) is read and fed to tamp before tamp fails.

    The error tamp produces is

    tamp: fread: Error 0

    which appears to be caused by a block size discrepancy (line 1204 of tamp.c, I think). That is, the last block is smaller than tamp is expecting. I'm not sure why this would be, however, since tamp created the compressed data being fed to it.

    I tried using null bytes to pad the last block to the default block size of 256KB, but that failed, too.

    I was eventually able to get it to work by compressing the archive with a forced block size of 64KB, then padding the last block with null bytes to 64KB before feeding it to tamp during decompression. If I don't pad the last block, tamp fails with the same error.

    I know cpio pads its archives to 8KB blocks; is it possible that tamp is ignoring the padding at the end of the cpio archive during compression, or something along those lines?

    I suppose another workaround might be to force cpio to pad to 256KB?

    If I am missing something, or if you need more information/clarification, please let me know. Otherwise, I hope this feedback is helpful.

    -Ed


  • Ed Tuesday, January 25, 2011

    Just a follow up to my last comment...

    My workaround with the 64KB block size only worked for the 2.1GB archive. When I tried padding to 64KB during decompression with the 11GB archive, tamp gave me the error:

    (stdin)1: read errors

    When I try decompressing the 11GB archive without padding, I get the "tamp: fread: Error 0" error again.

    -Ed


  • Tim Tuesday, February 15, 2011

    Ed,

    Can you please send me your files (or a link to them)?

    Please post details here, or send me an e-mail at timothy dot cook at oracle dot com.

    Thanks,

    Tim


  • Jan van Haarst Friday, April 29, 2011

    Dear Tim,

    Under which licence do you publish this code ?

    As it is now, we are al in violation of the law if we download it :-)

    Oh, and it doesn't compile on Linux, using the latest (beta) version of QuickLZ:

    jvh@dev1:~/code/tamp/tamp-2.5$ make -f Makefile.linux

    cc -m32 -DNDEBUG -O3 -DQUICKLZ=1 -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -c -o tamp.o tamp.c

    tamp.c: In function ‘compress’:

    tamp.c:772: error: ‘QLZ_SCRATCH_COMPRESS’ undeclared (first use in this function)

    tamp.c:772: error: (Each undeclared identifier is reported only once

    tamp.c:772: error: for each function it appears in.)

    tamp.c:820: warning: passing argument 4 of ‘qlz_compress’ from incompatible pointer type

    quicklz.h:133: note: expected ‘struct qlz_state_compress \*’ but argument is of type ‘char \*’

    tamp.c: In function ‘decompress’:

    tamp.c:1067: error: ‘QLZ_SCRATCH_DECOMPRESS’ undeclared (first use in this function)

    tamp.c:1137: warning: passing argument 3 of ‘qlz_decompress’ from incompatible pointer type

    quicklz.h:134: note: expected ‘struct qlz_state_decompress \*’ but argument is of type ‘char \*’

    make: \*\*\* [tamp.o] Error 1


  • Lasse Reinhold Monday, November 21, 2011

    You need to download QuickLZ 1.4.x, http://quicklz.com/quicklz141.zip . Version 1.5.x isn't compatible.


  • guest Monday, November 21, 2011

    You need QuickLZ 1.4.x (can be downloaded from the changelog page). Version 1.5.x isn't compatible.


  • eisbaer Tuesday, January 17, 2012

    is there a windows binary from the compression tool tamp ?


  • guest Monday, January 23, 2012

    eisbaer,

    I have not ported Tamp to Windows, but qpress might suit your needs - available at http://www.quicklz.com/.

    Regards,

    Tim


Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.