X

Superduper Slow Jar Command

Guest Author
By Xueming Shen

It's well known that creating a Jar file can be a "little" slow. How slow? On my aged SunBlad1000, it takes about 1 minute and 40 seconds to jar the whole rt.jar in cf0M mode (no compress, no manifest) -- and it costs you a little more if done in compress mode.

But then we figured we were talking about creating jars for ten of thousands of classes with a total size of over 50M. Given the number of files and the total size, it seemed a reasonable amount of time. So, until now, we assumed it really needed that time — until someone accidentally noticed that "the CPU went to 100% busy for quite some time, perhaps a minute or more on my laptop, before starting to hit the disk to create the Jar archive."

That sounds strange, as the main job the Jar is supposed to do is to copy and compress the files (into the Jar). Thus it should hit the disk from the very beginning to the end.

So I peeked into the Jar source code (after many years), and it turned out we had a very embarassing bug in the jar code: We were doing a O(n) look-up on a Hashtable (via the contains() method) for each and every file we were jarring, where it really should be a O(1) look-up operation with a HashSet. Given the number of files the command is working on, this simple mistake caused us to spend the majority of the time (that 1 min 40+ seconds) in collecting the list of files that need to jar, instead of the real "jarring" work. Sigh:-(

With that fixed (in JDK 7 build44 and later), the Jar is now much faster.

Following are the quick time-measure numbers of 10 runs of jarring/zipping the rt.jar/zip, in Store Only mode and Zip Compression mode.

  • b43: the JDK 7/build43, which does not have the fix.
  • b47: the JDK 7/build47, which does have the fix.
  • zip: the default zip installed on my Solaris, which is zip2.3/1999






























jar cf0M / zip -r0q (store, no zip compression)
Build 43Build 471 Zip
1:43.7 20.6 10.2
1:40.3 20.2 9.2
1:40.1 21.0 9.0
1:40.5 19.6 10.4
1:40.9 19.6 8.7
1:40.2 19.6 9.1
1:40.0 18.6 10.0
1:39.1 20.0 8.6
1:41.3 18.5 9.0
1:42.1 19.6 9.6
jar cfM/zip -rq (with zip compression)
Build 43 Build 471
Zip
1:47.0 25.3 15.7
1:45.9 23.4 14.2
1:44.7 23.3 14.9
1:45.4 23.7 14.3
1:45.6 23.3 14.3
1:44.9 23.6 14.0
1:45.9 23.2 14.6
1:44.0 23.0 14.2
1:44.9 23.3 14.8
1:45.8 23.5 14.2

1 The fix is in JDK 7 only, for now.

This page contains details of the fix.

We are making much progress on the Jar tool, and it is performing much better, though it is still slower compared to the Zip command. So we will continue our efforts going forward. I have to admit I do have some code that make Jar processing time much closer to Zip, but it will take time to make it into the product. Stay tuned!

Xueming Shen is an engineer at Sun Microsystems, working in the Java core technologies group.

Join the discussion

Comments ( 30 )
  • guest Friday, May 15, 2009

    So that's why netbeans chokes and asphyxiates when doing build all.


  • Preston L. Bannister Saturday, May 16, 2009

    Good - and long overdue.

    Was quite annoyed when I first looked at the ZIP (jar) code, more than a decade ago, and saw code that was not at all efficient (and as confirmed by comparing with Info-Zip ZIP times).


  • popurls.com // popular today Sunday, May 17, 2009
    [Trackback] story has entered the popular today section on popurls.com
  • Anon Sunday, May 17, 2009

    There is just no way I would have picked up on this difference without a profiler. It's not obvious from a quick glance at the Hashtable docs that containsValue was going to be so expensive (if you think about it then yes it is obvious but when it is just called contains this fact is somewhat hidden).


  • Not My Real Name Sunday, May 17, 2009

    Thank you for your work!


  • peterreilly Monday, May 18, 2009

    You meant an add to a vector and not the hashtable.


  • reddit Monday, May 18, 2009

    I am a redditor, and I approve this change!

    Java FTW


  • Stefan Lotties Monday, May 18, 2009

    Haha, nice! I actually never cared much about the performance of JAR, but good to know if things get faster :)

    How about doing multithreading stuff to run faster on smp systems? ;)


  • goodness me Monday, May 18, 2009

    Can you improve the unjarring too?

    Unzip is vastly faster than jar x.


  • Xueming Monday, May 18, 2009

    goodness me:

    Some performance tuning has been done in jdk7-b46 for "unjaring" as well. It should be "much faster ("vastly faster"?) now to extract "a list of files" out of a huge jar file, if this "a list of files" is not the whole jar file. The gain is very obvious if this "a list of files" is a very small portion of a huge jar file.

    For example "jar xf rt.jar sun/util" is almost 2 times faster on my system (rt.jar is compressed when measuring).

    I might be able to comment more if you can provide a specific use scenario.

    Please try the jar in latest jdk7, see if the "unjaring" improves in your case. And let me know the result:-)


  • Emmanuel Puybaret Wednesday, May 20, 2009

    Will the classes of java.util.zip (and java.util.jar) benefit of this perfomance gain?

    Can we also expect these classes to run faster even if the zipped content doesn't contain too many files?


  • Frank Thursday, May 21, 2009

    Nice to hear that, so when can we officially enjoy jdk7 ? Any release date ?

    Also, I'm waiting for the JWebPane, do you know (as an insider ) if it will come with jdk7 ? Thanks !


  • Christine Dorffi Thursday, May 21, 2009

    Regarding JDK 7 releases: Watch for announcements at the 2009 JavaOne Conference, June 2-5, San Francisco, California. Website is http://java.sun.com/javaone/


  • Spaceman Spiff Sunday, May 24, 2009

    Xueming: great to see someone at Sun looking into stuff like this.

    I had to give up on the jar command for my backups because of its crippling 32-bitness (it cannot do > 2 GB files). I would love to see that behavior fixed. Plus, proper AES-256 encryption.


  • Maung Monday, May 25, 2009

    The fix would help us a lot as our projects heavily rely on jar files and compiling, including auto-compiling project jar files, in netbeans takes us lots of time to wait until it has completely finished. Thank you for the fix.


  • Jon Tuesday, May 26, 2009

    That does seem like a dumb mistake with Hashtable. In this case I guess it seems obvious, but, it might be nice if the Javadocs said explicitly what the big O is for the different methods in the various collection classes. I noticed that .NET's documentation does this which I thought was nice. I don't remember whether Java's does or not. It doesn't seem to in this case. Again, it should be pretty obvious, but, given the fact that the implementation is encapsulated and hidden it's probably not a bad idea to mention in the docs.


  • Jon Tuesday, May 26, 2009

    Can you now look at Tomcat or Java's classloader code and find out why reloading a web app causes PermGen OutOfMemoryErrors? This IMHO is a ridiculus problem that's existed for years which I've now written off as something that will never be fixed.


  • alban Tuesday, May 26, 2009

    Thanks a lot.

    We have seen the "little" slow in jar

    creation but the reason arround it

    we give to the project dimension.


  • Jon Tuesday, May 26, 2009

    The java.net web site and a lot of Sun's other web sites including this one need speed improvements also. They have a tendency to run as slow as molasses.


  • Xueming Shen Tuesday, May 26, 2009

    Spaceman Spiff,

    I definitely hope (believe) id#6599383 (

    http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6599383) is the last nasty 2G-4G bug we have in the jar/zip code. This one has been fixed in the latest product release 6u12, as well as JDK7. So go use jar as your backup command from now:-) Oh, one more reason to start to use jar as the backup command is that jar now supports ZIP64 format (JDK7 for now), see my blog http://blogs.sun.com/xuemingshen/entry/zip64_support_for_4g_zipfile for a bit more details. Give JDK7 a try!


  • Xueming Shen Tuesday, May 26, 2009

    Jon,

    API doc for constainsValue() does include some warning words "This operation will probably require time linear in the map size for most implementations of the Map interface". But I guess most people don't pay attention to it till you got a problem in hand, as we did:-)

    I'm not a GC expert, but if you have a detailed test case or use scenario, instead of the general "Tomcat" + A webapp, I might try to pass it on...or you might have the bugid already.


  • daniel Wednesday, May 27, 2009

    that is like 7 times improvement with the fix. amazing


  • Norman Wednesday, May 27, 2009

    I thought the Jar command was perfect since everybody is using it! : )


  • Antonio Dieguez Wednesday, May 27, 2009

    I realize now this is very important. I build or rebuild many times during a day and that's 100% lose of time. More than 20 minutes lost I estimate. It is 100% of lose of time because I have to wait until it is finished since I want to see the results right away, also each build doesn't take way too much of time so I can't really do other thing in the meanwhile. Please backport it to jdk6.


  • Spaceman Spiff Wednesday, May 27, 2009

    Xueming, thanks much for your response.

    Concerning 2-4 GB jar/zip bugs, I cannot use 6u12 because it (and 6u13) has another bug that breaks some of my gui code. 6u14 is claimed to fix that bug, so I am stuck at 6u11 until it comes out.

    Oh man, I am SO glad that you have finally added 64 bit ZIP support. Thats really good to hear.

    However, I am really nervous about using a beta JDK, since some of my applications (trading, data backup) are just too critical. Is there any chance that this code will get released earlier than JDK 7, say in 6u14? I would love to see if sooner. For now, I have had to resort to using TAR files for my archiving (using an Apache library of dubious quality assurance...).

    Speaking of JDK versions, is there any website that documents when we can expect new JDK releases to come out, like say when 6u14 is coming out?

    On a different note, I have some other benchmarking results that are likely of interest to you (and anyone else reading this blog). I recently did a thorough exploration of the effect of buffer sizes on both byte and char writing using the usual JDK OutputStreams and Writers. One result is how bad PrintStream is for char writing compared to, say, BufferedWriter. (Looking at PrintStream's obscure source code, I think that I know why--and also how to fix it). I would like to share my results with you. What is the best way to contact you, if you are interested? I found your LinkedIn page http://www.linkedin.com/pub/xueming-shen/8/138/809 but my free LinkedIn account cannot contact you. My LinkedIn page is http://www.linkedin.com/pub/brent-boyer/11/636/246 if you would like to contact me.


  • infernoz Thursday, May 28, 2009

    Ouch, nasty, you could at least have asked the Hashtable if it contained the key value instead. IMHO it is way overdue that Sun removed all use of obsolete, useless, synchronized, classes like Hasktable, Vector, and StringBuffer, from all new JRE and JDK releases, when suitable alternatives are available.

    I hope the jar command can be release as a standalone, with Java 1.3, 1.4, and 1.5, compatible code, so that us commercial Java developers, stuck on JDK 1.4, and 1.5, can patch our JDK's and have an easier life.


  • Antonio Dieguez Thursday, May 28, 2009

    I got confused in my previous post, I got wrong the time of just building jars. It may be about 3 min of lost time daily at max in _my_ case, but obviously with other people it can be worse. Thanks for any improvement. Now please discover a hidden time-consuming loop at startup of java programs that someone forgot to delete :)


  • TARUN KARMAKAR Monday, June 1, 2009

    I never bothered about the time it takes to create a jar....but this has opened my eyes....thanks...nice article....


  • Stephen Hart Tuesday, June 9, 2009

    Is there a BugID for this jar performance enhancement done in JDK 7? I would like to vote on porting the fix to JDK 5.


  • Christine Dorffi Wednesday, June 10, 2009

    The bug number is 6496274, per Xueming.


Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.Captcha