Friday Apr 17, 2009

ZIP64, The Format for > 4G Zipfile, Is Now Supported

We heard you! finally:-)

The support for ZIP64, the format for > 4G ZIP file, has finally been added into the latest OpenJDK7 build(b55). This RFE (request for enhancements) had been buried in the 200+ Jar/ZIP bug/rfe pile so deep that I was not even aware of / remembered its existence until a recent call from A custom strongly asking for it. Given everyone now has 200G+ disk space (and yes, most of my kid's video clips are now around 1G after I got that new camcorder), it is relatively easy for the Jar/ZIP user to run into this 4G ceiling these days. The RFE was quickly climbing itself to the top of my to-do list and is now in b55. From now on only the sky is the limit for your Jar/ZIP file:-)

So if you have > 4G stuff (either the total size of files zipped in > 4G or the individual files themselves are > 4G) to jar/zip, try out the OpenJDK7-b55 (and later), either via the java.util.jar/zip  APIs or the Jar tool. Let me know if you find anything not working or broken, I hope it's perfect though:-)

Here is the brief background info regarding the 4G size problem in Jar and ZIP file.

(1)Various size and position offset related fields in original ZIP format are 4 bytes, so by nature ZIP has this 4G size limitation.

(2)The field for total number of files zipped/stored in ZIP's Central directory record is 2 bytes, so it has the 65536 limit (Java's ZIP file implementation has some hacky code to workaround this issue though)

(3)ZIP64 format extensions was introduced in (in spec 4.5) to address above size limitation.

(4)JDK7 now fully supports the ZIP64(tm) format extensions
defined by the PKWARE's  ZIP specification

If you are interested in source code, here is my ZIP64 "to-do" list (a copy/paste note from the spec) and the code diffs.

Thursday Feb 19, 2009

Superduper slow jar?

It's well known that jar is a "little" slow:-) How slow? On my "aged" SunBlad1000, it takes about 1 minute and 40 seconds to jar the whole rt.jar in "cf0M" mode (no compress, no manifest), costs you a little more if in compress mode.

While we did feel it is a little bit "too slow", but then figured we are talking about jarring 10 thousands of classes with a total size of 50+M, given the number of files and the total size, it might just need that much of time. So it has been this slow for years and we never took sometime to figure out if it really needs that much of the time, until someone "accidentally" noticed that "the CPU went to 100% busy for quite some time, perhaps a minute or more on my laptop, before starting to hit the disk to create the jar archive".

That sounds strange, AFAIK the major job jar is supposed to do is to copy and compress the files (into the jar), it should hit the disk from the very beginning to the end. So I took a first peek into the jar source code after so many years, it turned out we had a very "embarrassing" bug in the jar code, we were doing a O(n) look-up on a Hashtable (via the contains() method) for each and every file we were jarring, in which it really should be a O(1) look-up operation with a HashSet. Given the number of files we are working on, this "simple" mistake costs us to spend majority of the time (that 1 min 40+ sec) in "collecting" the list of files that need to jar, instead of the real "jarring" work it is supposed to do, sigh:-(

With that fixed (in JDK7 build44 and later) the jar is much faster now. Below are the quick "time measuring" numbers of 10 runs of jarring/zipping the rt.jar/zip, in "store only" mode and "zip ompression" mode.

b43: the JDK7/build43, which does not have the fix.
b47: the JDK7/build47, which does have the fix.
zip: the default zip installed on my Solaris, which is zip2.3/1999

(1)jar cf0M / zip -r0q (simply "store" no zip compression)
  1:43.7   20.6   10.2
  1:40.3   20.2    9.2
  1:40.1   21.0    9.0
  1:40.5   19.6   10.4
  1:40.9   19.6    8.7
  1:40.2   19.6    9.1
  1:40.0   18.6   10.0
  1:39.1   20.0    8.6
  1:41.3   18.5    9.0
  1:42.1   19.6    9.6

(2)jar cfM/zip -rq (with zip compression)

  1:47.0   25.3   15.7
  1:45.9   23.4   14.2
  1:44.7   23.3   14.9
  1:45.4   23.7   14.3
  1:45.6   23.3   14.3
  1:44.9   23.6   14.0
  1:45.9   23.2   14.6
  1:44.0   23.0   14.2
  1:44.9   23.3   14.8
  1:45.8   23.5   14.2

Jar is making big progress and doing much much better, though is still slower compared to the "zip". So we will continue our "catch-up" going forward (I have to say it:-) actually I do have some code that make jar much closer to zip, but it will take a while to make it into the product)

  \*1 The fix now is only in JDK7.
  \*2 If you are interested in the detail of the fix, you can have a peek at here




« June 2016