Packaging Java code

In the beginning Java applets were originally just collections of class files and other resources, such as sound and image files, published on a web server and downloaded over HTTP—one at a time—by early web browsers.

This approach was, of course, not terribly efficient in terms of bandwidth, and so the user experience over a slow connection was pretty unpleasant. There was, moreover, no way to cryptographically sign an applet in order to guarantee its integrity and authenticate its publisher.

Thus in 1996 was born the JAR file format, a simple extension of the popular ZIP archive format with manifest and signature files. In the absence of a better alternative the JAR format rapidly became the standard way to package reusable Java libraries and, with the advent of the Main-Class manifest header, even entire applications. Experience has shown, however, that the JAR format is not very well-suited to these more-ambitious uses.

Failures of expedience A JAR file is little more than a set of classes packaged into a single unit which can be transported over the wire, cryptographically verified, and, ultimately, placed on a traditional class loader’s class path. Code packaged in JAR files is hence susceptible to all the usual problems of the class-path model, in particular those arising from the fact that all classes on the path are in the same flat namespace with their visibility determined only by the order, on the path, of their containing JAR files.

There are many ways in which this model can fail. The most common failure mode occurs when two versions of a library are inadvertently placed on the class path. In this situation chaos is likely to ensue—and be very difficult to debug. Application code might appear to use the older version, the newer version, or some bizarre combination of the two, all depending on the exact differences between the versions and their placement on the class path.

This failure mode, and others, have been encountered by so many unwary developers over the years that this general set of problems has been dubbed “JAR hell.”

Modular code If we’re going to modularize the JDK then why not do the same for Java libraries and applications? If we have the facilities required to divide the JDK into a set of well-specified and separate, yet interdependent, modules then we should be able to leverage those same tools even further in order to climb out of JAR hell.

This can work because the metadata in a module is much richer than that in a simple JAR file. A module’s metadata describes its own version as well as the dependences it has upon other modules. The dependences can themselves be constrained with respect to version numbers so that, e.g., a module X can declare that it needs version 1.2.3 of module Y, or any later version.

At run time the classes in a module are not simply added to a class path. The loading and linking of the classes within a module is, rather, guided by the dependence and versioning constraints in the module’s metadata. During this process care is taken to ensure that classes in one module are visible to classes in another only when intended, and that no module is ever linked to more than one version of another.

Packaging Java libraries and applications as modules would allow us to ascend from JAR hell into the clear light of day, where versioning and dependence information is declared at compile time and then leveraged during distribution, installation, and run time. The techniques for improving download time, startup time, and memory footprint applicable to a modular JDK would, moreover, be equally applicable to modularized Java libraries and applications. Truly modular Java components would, finally, also enable us to address at least one other longstanding problem area …

Native packaging One of the age-old critiques of the Java platform is that it doesn’t integrate very well with the native operating systems upon which it runs. An oft-cited aspect of this critique is that the usual means of packaging Java code—i.e., JAR files—has no well-defined relationship to native packaging systems.

Many an application developer has coped with this impedance-mismatch problem by creating, for each target platform, a native package containing the JAR files for the application, the JAR files for all required libraries—and an entire JRE.

This is an effective solution, but a crude one which introduces problems of its own. Delivering monolithic, self-contained native packages wastes both download time and disk space. More importantly, the lack of sharing of common components—whether of libraries or of the JRE itself—makes it impossible to update those components independently in order to fix security bugs and other critical issues.

A different approach to the impedance-mismatch problem is to address it directly, by building suitable native packages for Java libraries and applications that were originally delivered as JAR files. The enterprising developers behind the JPackage Project have done exactly this, for RPM-based Linux platforms, for a wide variety of popular Java components.

A large part of the time spent in this sort of effort, whether for RPM or for any other reasonably-capable native packaging system—e.g., Debian, SVR4, or IPS—rests in identifying and encoding inter-component dependences. Simple JAR files do not contain such metadata, so in many cases quite a bit of tedious detective work is required.

If Java libraries and applications were packaged as modules, complete with accurate version and dependence metadata, then it would be almost trivial to transform them into sensible native packages. One could even imagine a single tool, delivered as part of the JDK, that would implement this transformation for common native platforms, at least for simple cases.

Now—that would be cool.

Comments:

Hi Mark

I'm all in favor of the native packaging--I look at what the Perl, Python and Ruby communities have, being able to pull down new packages into a central repository--with some envy. Maven gets you part of the way but is far from being as well integrated as what's available for other languages.

That said, this makes me wonder whether the solution you are working on will allow two different libraries to each refer to different versions of a third, dependent library? For example, if two libraries I'm importing both use CGLib, will they be able to use different versions of CGLib at runtime? Currently we find --with Maven--that this is one of our biggest problems. We regularly run into problems where an upgrade to one lib causes a new version of some other lib to be pulled in (via transitive dependencies) and if we aren't on top of it, it causes problems which only show up at runtime.

I am glad that the Java group at Sun is working on this. I think I'd prefer a little more openness--these blog entries are _very_ welcome, but I don't understand the need for secrecy regarding this topic. I can live with it for now (not that I or anyone outside the process has much choice in the issue :)) but would prefer this process and discussion about modularizing the JDK be more transparent.

Looking forward to the upcoming proposal(s).

Cheers
Patrick

Posted by Patrick Wright on December 01, 2008 at 12:49 AM PST #

Isn't JSR 277: Java Module System, tasked with solving this problem? http://jcp.org/en/jsr/detail?id=277

Franqueli

Posted by Franqueli Mendez on December 01, 2008 at 01:32 AM PST #

JAR files can in fact have rich metadata too: http://java.sun.com/j2se/1.4.2/docs/guide/jar/jar.html
This specifies versioning and dependence information which could be leveraged further by the classloader. There is already java.lang.Package in this direction.

Posted by Mark Hale on December 01, 2008 at 02:03 AM PST #

Good blog post; I think the right way to think about this is from the dynamic language perspective where I'm sitting at a REPL, and I type something like:

import org.objectweb.asm

What does that do? It drives all sorts of questions, like the obvious need for some sort of standard library identifer outside of the package name.

Right now you have to make all of these choices in an incredibly manual way by manipulating CLASSPATH. It's an extremely poor developer experience, besides all the other problems you mentioned.

Posted by Colin Walters on December 01, 2008 at 03:03 AM PST #

I've wrestled with client-deployment problem for a few projects now. I prefer a single 'executable JAR' as a deployment unit for small applications (the JAVA equivalent of Putty's single EXE that contains everything it needs).

This means a single file download for clients, then click-n-run, and no chance of lost dependencies etc. (of course this only works for smaller apps in the UNIX vein of "do one thing well"). However, the fly in the ointment is packaging library JARS within the main JAR. This is a curse as I've had to explode the libraries to put them in the main JAR. Once that is done I have an executable JAR that is completely self contained, which makes it \*extremely\* easy for the customer to use (no environment setting, batch files, setting a confusing hundred-character classpath or native launchers required).

Deployment consists of drag-n-drop to wherever they want to install (since it contains all the code and resources it needs). No need to add complexity with a complicated installer.

The 'saving bandwith' argument is a distraction from the real argument. People don't mind downloading big things, they just don't want to wait. This is they'll happily download a 2 GB client-side application but can't stand waiting 10 seconds for an applet to load, since they are conditioned to different experiences on the desktop and web. For client-side apps size is not an issue, it only matters for 'rich' web-apps. Considering it is the JDK that is the big download and most Java apps are miniscule in comparison (especially compared to their data) I'd say that this is less important than making the application reliable to the user (meaning there is no chance of dependencies breaking the application).

For me the modularity required to 'upgrade-in-place' is a fantastic idea but absolutely crappy in practice (one reason the inferior interface of the web is preferred over client-side apps is to get away from the hassle of client-side installations). Nobody I know does modular upgrades for Java apps, they replace the app instead. Sure Debian and Ubuntu can do the equivalent and it's fantastic, but most applications don't get upgraded piecemeal.

Therefore, for my stuff at least, I'd much prefer to be able to bind everything into a \*completely foolproof\* monolith rather than accept the modular upgrade myth and risk having the application break for customers. Being able to bind library JARS within the application JAR would make a big difference.

Note that apart from O/S-level libraries and true plug-in modules I'm not a big fan of large numbers of dynamic libraries either, they always cause trouble (not only the DLLs on Windows, but the .so's on Unixes/Linuxes too). On the few instances where developers have been considerate enough to provide statically-linked binaries I prefer these when I can as they are much more reliable than their dynamic counterparts.

I haven't tried FatJar yet but I guess that I'm an advocate of that deployment model for client-side Java.

Oh yeah, from what I've seen the typical .NET application deployment model ends up with hundreds of dlls and as a result is quite brittle. I don't think JAVA should head this way. Instead, the module work should make it easy to bundle things together (and possible to check dependencies at build-time not run-time).

Keep things
"As simple as possible, but no simpler" - Albert Einstein

Posted by Mike Reid on December 01, 2008 at 03:15 AM PST #

Colin Waters wrote:
"I think the right way to think about this is from the dynamic language perspective where I'm sitting at a REPL, and I type something like:

import org.objectweb.asm

What does that do? It drives all sorts of questions, like the obvious need for some sort of standard library identifer outside of the package name."

NOOOOOOOOOOOOOOOOOOOO!!!!!!!

That is a myopic developer's view of the situation and I disagree completely. This is not the right way to think of what Java needs for modularity. The need should be driven by

THE APPLICATION USER'S EXPERIENCE

not what the developer sees. Developers forgetting this for layers and layers of complication is what will doom Java. This is where Java deployment is falling over for zero-Java-skill end users (which are the majority of business types and IT support) because it is far too complicated for end-users to deal with the installation hassle and the borked dependencies.

I feel this way because I get pissed everytime some Python script falls over whenever I try and configure a printer on Ubuntu, or set the time, or anything else trivial. If an application doesn't have to work correctly 100% of the time them sure, use deployment models that can break. But if it has to work every time the user wants it then perhaps it is time to look at where applications can break through their dependencies and do something about it (like reduce them).

Native installers ease the pain somewhat but the chance still exists for the installation to leave an application that won't run. Plus, in large organisations where they have every kind of platform who wants to introduce a 'native' deployment dependency for Java? not me.

The requirements should be driven by thinking of the user experience first, and how to make the application work as expected 100% of the time. Then start worrying about what developers have to do to get there. Modularity and plugins are a nice idea, but if they make programs fail then the whole Java stack will be rejected, so reliability should be a first objective of the modularity system.

Posted by Mike Reid on December 01, 2008 at 03:33 AM PST #

The problem with JSR 277 the last time I checked is that it \*mandates\* a specific versioning format. Last time I looked it was "aa.bb.cc.dd_eee" which I find very very ugly.

In my opinion, it should not mandate a specific format because this is a very subjective matter. Let module developers specify a version string and a comparator hook against it.

Finally, on the topic of OSGi versus JSR 277 I think whatever we end up with has to be extremely clean from a design point of view. I would hate to have to specify long list of imports and exports by hand.

I would love to see the JSR 277 expert group list specific things they did to improve readability and ease-of-use. Keep that conceptual weight low people! :)

Posted by Gili on December 01, 2008 at 03:58 AM PST #

Mike, you have a different perspective. See, I want to use the JVM for building parts of the operating system itself, and I view the OS as generally blending into the applications it ships with.

OS vendors have to take responsibility for stuff they ship; having duplicated libraries is very bad.

Deploying Java apps on Windows is quite different. Obviously you can just manage this for yourself, but ideally the module system would actually help you do this bundling rather than you having to fight it.

Posted by Colin Walters on December 01, 2008 at 05:17 AM PST #

What would be neat is a really small native executable for each platform that can be packaged with any application. This executable will first look for a suitable JRE and if not present, download one (or needed modules when we have a modular JRE).
Then it would read a local XML file (webstart-like) that specifies all the required jars (and other resources) or downloads them if not present, adds them to the classpath, and then calls the main class. The XML file can also be updated at the start (if the user permits it), allowing updates to the application to be downloaded smoothly.

Posted by Mark Souza on December 01, 2008 at 07:22 AM PST #

What would really be neat is for Sun to not continuously reinvent stuff that has already been invented.

Posted by Hal on December 01, 2008 at 09:51 AM PST #

Mark, for the first half of this post I was nodding furiously in agreement. These problems have plagued Java developers for years and we all recognize them. In fact, these are the same problems that I always describe in the introduction to my talks and courses on OSGi, before describing how OSGi solves them.

For God's sake, stop reinventing stuff that already works, has already been standardised through the JCP (JSR 291) and has already been accepted by most of the rest of the Java community, including Sun's own GlassFish group. If there is something missing from OSGi that you need, then either use your membership of the OSGi Alliance to effect those changes, or revive JSR 277 and work through the requirements in the open and with the participation of the JCP and wider Java community.

Posted by Neil Bartlett on December 01, 2008 at 03:53 PM PST #

Post a Comment:
Comments are closed for this entry.
About

This blog has moved to http://mreinhold.org/blog. <script>var p = window.location.pathname.split('/'); var n = p[p.length - 1].replace(/_/g,'-'); if (n != "301") window.location = "http://mreinhold.org/blog/" + n;</script>

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today
News

No bookmarks in folder

Blogroll

No bookmarks in folder

Feeds
RSS Atom