Lzma on OpenSolaris

The OpenSolaris 2008.05 release that is going to come
out sometime in May is going to have two versions of
the same LiveCD, one with a limited set of languages and
locales and another one with a more fuller set of languages.

One of the big challenges with creating a LiveCD with a
full set of languages was that there was limited amount
of available free space on the CD to allow for including
all the languages. How do you cram more stuff on the CD?
Compress it harder, I say! Even better, compress it with
LZMA!

The OpenSolaris kernel did not have an in-kernel implementation
of LZMA that could be taken advantage of (why do we need an
in-kernel implementation, I'll answer that in a separate blog entry). 
So, in our quest to provide one, we started looking at the LZMA SDK. 
Some of the challenges with porting the source from this SDK to the  
OpenSolaris kernel were that our lawyers were not amenable to the licensing 
terms and the compression code was all written in C++ (which, 
for the uninitiated, is strongly desisted in the kernel).

If you've ever dealt with lawyers you'll be quick to spot
that the licensing can be particularly troublesome. It was. 
But only until we contacted with author of LZMA, Igor Pavlov.
Igor was not only willing to relicense the source code under
CDDL (which would ofcourse be agreeable to the lawyers) but
also willing to re-write the compression code in C. And, he 
did that in just a matter of couple of weeks -  truly outstanding. 
That, to me, is the power behind open source and the sharing 
opportunities it provides for the broader good.

So, thank you Igor for an excellent compression algorithm
in LZMA and thanks for all your assistance in making the
OpenSolaris 2008.05 release what it is. We look forward to
working with you in the future too.
Comments:

What kind of compression ratio did you achieve with LZMA compared to others like gzip or zip?

Posted by steve on April 24, 2008 at 01:21 PM EDT #

p.s. Kudos to Igor Pavlov for working with the opensolaris team to get this in there!

Posted by steve on April 24, 2008 at 01:52 PM EDT #

Are you aware of the fact that gzip is already in the kernel? Unless there's a demonstrable need, I wouldn't be so fast to put _another_ compression algorithm in there...

Posted by Adam Leventhal on April 24, 2008 at 02:56 PM EDT #

Steve: in trying to compress some of the LiveCD components, we observed LZMA to provide on the order of 30% better compression than gzip

Adam: yes, I'm aware of that. we hit the limit in trying to compress using gzip which is why we started looking for other algorithms that provide better compression and LZMA seemed to be at the top of that list for binary data.

Posted by Alok Aggarwal on April 24, 2008 at 04:30 PM EDT #

Even with gzip-9 you aren't seeing sufficient final compressed sizes? We don't want every compression algorithm du jour being dumped into the kernel. Let's pick one and stick with it.

Posted by Adam Leventhal on April 24, 2008 at 08:22 PM EDT #

@Adam:

we've already discussed this issue before, when I suggested that bzip2 be implemented as another compression option for ZFS. You weren't exactly amenable to that.

However, LZMA is really worth implementing, because it achieves up to 66% better compression in comparison with gzip, and up to 33% better compression than bzip2, according to my tests.

Also, an added bonus is that LZMA has a faster decompression that even gzip.

This is really worthwhile to implement, and Solaris can really profit from this. Let's not let this excellent opportunity pass us by!

Posted by UX-admin on April 24, 2008 at 09:34 PM EDT #

I think that, without including everybody's favorite compression in the kernel, it is worth having a few different ones. lzma with slow write, fast read and high compression (ideal for a CD, could also be used with some kind of scrub that recompresses files that are already on disk), lzjb for the minimal impact on the system, gzip for good compression without making writes unbearably slow. If a license agreement was made, lzo would be a nice alternative to lzjb (it uses more memory, but ends up being faster and compressing better). And I can't think of more use cases, so that would probably be it.

Posted by Marc on April 25, 2008 at 06:18 AM EDT #

Adam: Give 7za(1) a whirl sometime (available in snv_79 and above) and you'll see a significant difference in the compression ratios provided by gzip-9 and LZMA. Different algorithms cater to different use case scenarios imo.

Posted by Alok Aggarwal on April 25, 2008 at 07:18 AM EDT #

Interesting. You should post the results of your tests.

Posted by Adam Leventhal on April 25, 2008 at 10:26 AM EDT #

Yes, I should, and I will, as soon as I actually go about compiling my findings.

I was truly amazed to see that something out there can beat bzip2 in compression.

Posted by UX-admin on April 26, 2008 at 02:06 AM EDT #

Post a Comment:
Comments are closed for this entry.
About

aalok

Search

Top Tags
Categories
Archives
« July 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
  
       
Today
Blogroll

No bookmarks in folder

News