Lzma Numbers

I recently wrote that LZMA has been used to pack more languages
onto the LiveCD. Here are some charts that show how LZMA
stacks up against someof the other popular compression algorithms.
(apologies for the poor image quality, open in another window 
for a clearer image)


These tests were run on a LiveCD archive using 7za(1). As you'll note, the compression ratio provided by LZMA is about 35% better than gzip-9. However, LZMA is more CPU intensive and as a result the compression and decompression speed is slower than the alternatives. So, for some use cases the cpu versus compression tradeoff might make LZMA unsuitable but for the LiveCD use case, it is reasonable provided we architect our solution such that the decompression speed isn't a bottleneck (Compression speed isn't a problem for the LiveCD architecture)
Comments:

Interesting numbers; in my tests, bzip2 was beating gzip by 6.07% margin when compressing.

I used -9 (maximum compression) on both.

Now for the kicker:

- gzip was stock, as delivered by hp
- bzip2 was compiled from source by myself, using the profile feedback generated by the hp's C compiler.

The OS is HP-UX 11.23 (11iv2).
The processor is hp PA-RISC 8800 @750MHz.

I guess that in addition to getting bzip2 at maximum compression to beat gzip, there is a moral to the story as well.

The moral is: Sun Studio has a profile feedback facility. Use it to get significant performance gains, and lzma is definitely a tool that will benefit from such optimizations.

Recommended reading:

"Improved code layout can improve application performance" by Darryl Grove

http://developers.sun.com/solaris/articles/codelayout.html

Posted by UX-admin on May 01, 2008 at 02:30 AM EDT #

I guess I should qualify my previous observation:

bzip2 was 6.07% FASTER when compressing at maximum compression than the stock gzip delivered with the operating system.

Posted by UX-admin on May 01, 2008 at 02:35 AM EDT #

There is no question about code optimization yielding different results. My tests were done with the versions of gzip, bzip2 and 7za as delivered on stock snv_84, no tweaking whatsoever and with default options.

Posted by Alok Aggarwal on May 01, 2008 at 03:02 AM EDT #

UX-admin: sure, profile feedback is great, but it is not that convenient to use for kernel code.

Interesting figures. I think memory footprint would be another interesting measure (it is the only advantage of lzjb over lzo for instance, except for the license).

An other point is what gets compressed. For instance, in zfs, blocks of a fixed size (not so large) are compressed independently. Here I think you are talking about compressing one huge file (in which case we should look at what implementations can use parallelism). I don't know exactly how the livecd works.

Posted by Marc on May 02, 2008 at 03:40 AM EDT #

Post a Comment:
Comments are closed for this entry.
About

aalok

Search

Top Tags
Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today
Blogroll

No bookmarks in folder

News