Superduper Slow Jar Command

By Xueming Shen

It's well known that creating a Jar file can be a "little" slow. How slow? On my aged SunBlad1000, it takes about 1 minute and 40 seconds to jar the whole rt.jar in cf0M mode (no compress, no manifest) -- and it costs you a little more if done in compress mode.

But then we figured we were talking about creating jars for ten of thousands of classes with a total size of over 50M. Given the number of files and the total size, it seemed a reasonable amount of time. So, until now, we assumed it really needed that time — until someone accidentally noticed that "the CPU went to 100% busy for quite some time, perhaps a minute or more on my laptop, before starting to hit the disk to create the Jar archive."

That sounds strange, as the main job the Jar is supposed to do is to copy and compress the files (into the Jar). Thus it should hit the disk from the very beginning to the end.

So I peeked into the Jar source code (after many years), and it turned out we had a very embarassing bug in the jar code: We were doing a O(n) look-up on a Hashtable (via the contains() method) for each and every file we were jarring, where it really should be a O(1) look-up operation with a HashSet. Given the number of files the command is working on, this simple mistake caused us to spend the majority of the time (that 1 min 40+ seconds) in collecting the list of files that need to jar, instead of the real "jarring" work. Sigh:-(

With that fixed (in JDK 7 build44 and later), the Jar is now much faster.

Following are the quick time-measure numbers of 10 runs of jarring/zipping the rt.jar/zip, in Store Only mode and Zip Compression mode.

  • b43: the JDK 7/build43, which does not have the fix.
  • b47: the JDK 7/build47, which does have the fix.
  • zip: the default zip installed on my Solaris, which is zip2.3/1999

jar cf0M / zip -r0q (store, no zip compression)
Build 43 Build 471 Zip
1:43.7 20.6 10.2
1:40.3 20.2 9.2
1:40.1 21.0 9.0
1:40.5 19.6 10.4
1:40.9 19.6 8.7
1:40.2 19.6 9.1
1:40.0 18.6 10.0
1:39.1 20.0 8.6
1:41.3 18.5 9.0
1:42.1 19.6 9.6
jar cfM/zip -rq (with zip compression)
Build 43 Build 471
Zip
1:47.0 25.3 15.7
1:45.9 23.4 14.2
1:44.7 23.3 14.9
1:45.4 23.7 14.3
1:45.6 23.3 14.3
1:44.9 23.6 14.0
1:45.9 23.2 14.6
1:44.0 23.0 14.2
1:44.9 23.3 14.8
1:45.8 23.5 14.2
1 The fix is in JDK 7 only, for now.

This page contains details of the fix.

We are making much progress on the Jar tool, and it is performing much better, though it is still slower compared to the Zip command. So we will continue our efforts going forward. I have to admit I do have some code that make Jar processing time much closer to Zip, but it will take time to make it into the product. Stay tuned!

Xueming Shen is an engineer at Sun Microsystems, working in the Java core technologies group.

Comments:

So that's why netbeans chokes and asphyxiates when doing build all.

Posted by guest on May 15, 2009 at 12:18 PM PDT #

Good - and long overdue.

Was quite annoyed when I first looked at the ZIP (jar) code, more than a decade ago, and saw code that was not at all efficient (and as confirmed by comparing with Info-Zip ZIP times).

Posted by Preston L. Bannister on May 16, 2009 at 05:38 AM PDT #

[Trackback] story has entered the popular today section on popurls.com

Posted by popurls.com // popular today on May 17, 2009 at 07:00 AM PDT #

There is just no way I would have picked up on this difference without a profiler. It's not obvious from a quick glance at the Hashtable docs that containsValue was going to be so expensive (if you think about it then yes it is obvious but when it is just called contains this fact is somewhat hidden).

Posted by Anon on May 17, 2009 at 07:09 AM PDT #

Thank you for your work!

Posted by Not My Real Name on May 17, 2009 at 10:19 AM PDT #

You meant an add to a vector and not the hashtable.

Posted by peterreilly on May 17, 2009 at 05:02 PM PDT #

I am a redditor, and I approve this change!

Java FTW

Posted by reddit on May 17, 2009 at 05:04 PM PDT #

Haha, nice! I actually never cared much about the performance of JAR, but good to know if things get faster :)

How about doing multithreading stuff to run faster on smp systems? ;)

Posted by Stefan Lotties on May 17, 2009 at 09:06 PM PDT #

Can you improve the unjarring too?
Unzip is vastly faster than jar x.

Posted by goodness me on May 17, 2009 at 10:17 PM PDT #

goodness me:

Some performance tuning has been done in jdk7-b46 for "unjaring" as well. It should be "much faster ("vastly faster"?) now to extract "a list of files" out of a huge jar file, if this "a list of files" is not the whole jar file. The gain is very obvious if this "a list of files" is a very small portion of a huge jar file.

For example "jar xf rt.jar sun/util" is almost 2 times faster on my system (rt.jar is compressed when measuring).

I might be able to comment more if you can provide a specific use scenario.

Please try the jar in latest jdk7, see if the "unjaring" improves in your case. And let me know the result:-)

Posted by Xueming on May 18, 2009 at 04:47 PM PDT #

Will the classes of java.util.zip (and java.util.jar) benefit of this perfomance gain?
Can we also expect these classes to run faster even if the zipped content doesn't contain too many files?

Posted by Emmanuel Puybaret on May 20, 2009 at 04:27 PM PDT #

Nice to hear that, so when can we officially enjoy jdk7 ? Any release date ?
Also, I'm waiting for the JWebPane, do you know (as an insider ) if it will come with jdk7 ? Thanks !

Posted by Frank on May 21, 2009 at 05:03 AM PDT #

Regarding JDK 7 releases: Watch for announcements at the 2009 JavaOne Conference, June 2-5, San Francisco, California. Website is http://java.sun.com/javaone/

Posted by Christine Dorffi on May 21, 2009 at 09:05 AM PDT #

Xueming: great to see someone at Sun looking into stuff like this.

I had to give up on the jar command for my backups because of its crippling 32-bitness (it cannot do > 2 GB files). I would love to see that behavior fixed. Plus, proper AES-256 encryption.

Posted by Spaceman Spiff on May 24, 2009 at 06:01 AM PDT #

The fix would help us a lot as our projects heavily rely on jar files and compiling, including auto-compiling project jar files, in netbeans takes us lots of time to wait until it has completely finished. Thank you for the fix.

Posted by Maung on May 25, 2009 at 05:59 AM PDT #

That does seem like a dumb mistake with Hashtable. In this case I guess it seems obvious, but, it might be nice if the Javadocs said explicitly what the big O is for the different methods in the various collection classes. I noticed that .NET's documentation does this which I thought was nice. I don't remember whether Java's does or not. It doesn't seem to in this case. Again, it should be pretty obvious, but, given the fact that the implementation is encapsulated and hidden it's probably not a bad idea to mention in the docs.

Posted by Jon on May 26, 2009 at 01:00 AM PDT #

Can you now look at Tomcat or Java's classloader code and find out why reloading a web app causes PermGen OutOfMemoryErrors? This IMHO is a ridiculus problem that's existed for years which I've now written off as something that will never be fixed.

Posted by Jon on May 26, 2009 at 01:02 AM PDT #

Thanks a lot.

We have seen the "little" slow in jar
creation but the reason arround it
we give to the project dimension.

Posted by alban on May 26, 2009 at 01:03 AM PDT #

The java.net web site and a lot of Sun's other web sites including this one need speed improvements also. They have a tendency to run as slow as molasses.

Posted by Jon on May 26, 2009 at 01:06 AM PDT #

Spaceman Spiff,

I definitely hope (believe) id#6599383 (
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6599383) is the last nasty 2G-4G bug we have in the jar/zip code. This one has been fixed in the latest product release 6u12, as well as JDK7. So go use jar as your backup command from now:-) Oh, one more reason to start to use jar as the backup command is that jar now supports ZIP64 format (JDK7 for now), see my blog http://blogs.sun.com/xuemingshen/entry/zip64_support_for_4g_zipfile for a bit more details. Give JDK7 a try!

Posted by Xueming Shen on May 26, 2009 at 04:55 AM PDT #

Jon,

API doc for constainsValue() does include some warning words "This operation will probably require time linear in the map size for most implementations of the Map interface". But I guess most people don't pay attention to it till you got a problem in hand, as we did:-)

I'm not a GC expert, but if you have a detailed test case or use scenario, instead of the general "Tomcat" + A webapp, I might try to pass it on...or you might have the bugid already.

Posted by Xueming Shen on May 26, 2009 at 05:32 AM PDT #

that is like 7 times improvement with the fix. amazing

Posted by daniel on May 26, 2009 at 07:14 PM PDT #

I thought the Jar command was perfect since everybody is using it! : )

Posted by Norman on May 27, 2009 at 12:00 AM PDT #

I realize now this is very important. I build or rebuild many times during a day and that's 100% lose of time. More than 20 minutes lost I estimate. It is 100% of lose of time because I have to wait until it is finished since I want to see the results right away, also each build doesn't take way too much of time so I can't really do other thing in the meanwhile. Please backport it to jdk6.

Posted by Antonio Dieguez on May 27, 2009 at 04:53 AM PDT #

Xueming, thanks much for your response.

Concerning 2-4 GB jar/zip bugs, I cannot use 6u12 because it (and 6u13) has another bug that breaks some of my gui code. 6u14 is claimed to fix that bug, so I am stuck at 6u11 until it comes out.

Oh man, I am SO glad that you have finally added 64 bit ZIP support. Thats really good to hear.

However, I am really nervous about using a beta JDK, since some of my applications (trading, data backup) are just too critical. Is there any chance that this code will get released earlier than JDK 7, say in 6u14? I would love to see if sooner. For now, I have had to resort to using TAR files for my archiving (using an Apache library of dubious quality assurance...).

Speaking of JDK versions, is there any website that documents when we can expect new JDK releases to come out, like say when 6u14 is coming out?

On a different note, I have some other benchmarking results that are likely of interest to you (and anyone else reading this blog). I recently did a thorough exploration of the effect of buffer sizes on both byte and char writing using the usual JDK OutputStreams and Writers. One result is how bad PrintStream is for char writing compared to, say, BufferedWriter. (Looking at PrintStream's obscure source code, I think that I know why--and also how to fix it). I would like to share my results with you. What is the best way to contact you, if you are interested? I found your LinkedIn page http://www.linkedin.com/pub/xueming-shen/8/138/809 but my free LinkedIn account cannot contact you. My LinkedIn page is http://www.linkedin.com/pub/brent-boyer/11/636/246 if you would like to contact me.

Posted by Spaceman Spiff on May 27, 2009 at 05:13 AM PDT #

Ouch, nasty, you could at least have asked the Hashtable if it contained the key value instead. IMHO it is way overdue that Sun removed all use of obsolete, useless, synchronized, classes like Hasktable, Vector, and StringBuffer, from all new JRE and JDK releases, when suitable alternatives are available.

I hope the jar command can be release as a standalone, with Java 1.3, 1.4, and 1.5, compatible code, so that us commercial Java developers, stuck on JDK 1.4, and 1.5, can patch our JDK's and have an easier life.

Posted by infernoz on May 28, 2009 at 06:36 AM PDT #

I got confused in my previous post, I got wrong the time of just building jars. It may be about 3 min of lost time daily at max in _my_ case, but obviously with other people it can be worse. Thanks for any improvement. Now please discover a hidden time-consuming loop at startup of java programs that someone forgot to delete :)

Posted by Antonio Dieguez on May 28, 2009 at 11:33 AM PDT #

I never bothered about the time it takes to create a jar....but this has opened my eyes....thanks...nice article....

Posted by TARUN KARMAKAR on May 31, 2009 at 09:23 PM PDT #

Is there a BugID for this jar performance enhancement done in JDK 7? I would like to vote on porting the fix to JDK 5.

Posted by Stephen Hart on June 08, 2009 at 11:09 PM PDT #

The bug number is 6496274, per Xueming.

Posted by Christine Dorffi on June 10, 2009 at 06:30 AM PDT #

Post a Comment:
Comments are closed for this entry.
About

John O'Conner

Search

Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today