Versioning in the Java platform
By abuckley on Jul 31, 2009
The best-versioned artifact in the Java world today is the ClassFile structure. Two numbers that evolve with the Java platform (as documented in the draft Java VM Specification, Third Edition) are found in every .class file, governing its content. But what determines the version of a particular .class file, and how is the version really used? The answer turns out to be tricky because there are many interesting versionable artifacts in the Java platform.
The source language is the most obvious. A compiler doesn't have to accept multiple versions of a source language, though javac does, via the -source flag. (-source works on a global basis; it is also conceivable to work on a local basis, accepting different versions of the source language for different compilation units.) Less obvious versioned artifacts are hidden in plain sight: character sets and compilation strategies. And .class files themselves sometimes have their versions used in surprising ways. Let's see how javac handles all these versions, and make some claims about how an "ideal" compiler might work.
In the remainder, X and Y are versions. "source language X" means "version X of a source language". "Java X" means "version X of Java SE". "javac X" means "the javac that ships in version X of the JDK".
Happily, the Java platform has used the Unicode character set from day one. Unhappily, when javac for source language X is configured to accept an earlier source language Y, it uses the Unicode version specified for source language X rather than Y. For example, javac 1.4 -source 1.3 uses Unicode 3.0, since that was the Unicode specified for Java 1.4. It should use Unicode 2.1 as specified for Java 1.3.
Claim: A compiler configured to accept source language X should use the Unicode version specified for source language X.
It is difficult for javac to use multiple Unicode versions since the standard library (notably
java.lang.Character) effectively controls the version of Unicode available, and only one version of the standard library is usually available. We will return to the issue of multiple standard libraries later.
Sidebar: You may be surprised to discover that some other languages don't use Unicode by default. A factoid from 2008's JVM Language Summit was the existence of a performance bottleneck in converting 8-bit ASCII strings (used by dynamic languages' libraries) to and from UTF-8 strings (used by canonical JVM libraries). Who knows what the 2009 JVM Language Summit will reveal?
A compilation strategy is the translation of source language constructs to idiomatic bytecode, flags, and attributes in a ClassFile. As the Java platform evolves by changing the source language and ClassFile features, a compilation strategy can evolve too. For example, javac 1.4 may compile an inner class one way when accepting the Java 1.3 source language and another way when accepting the Java 1.4 source language.
Claim: A compiler may use a different compilation strategy for each source language.
The javac flag '-target' selects the compilation strategy associated with a particular source language. This mainly has the effect of setting the version of the emitted ClassFile: 46.0 for Java 1.2, 47.0 for Java 1.3, 48.0 for Java 1.4, 49.0 for Java 1.5, 50.0 for Java 1.6. For example, javac 1.4 could compile an inner class the same way when configured with a Java 1.3 target versus a Java 1.4 target \*, but emit 47.0 and 48.0 ClassFiles respectively:
javac 1.4 -source 1.3 -target 1.3 -> 47.0
javac 1.4 -source 1.3 -target 1.4 -> 48.0
However, ClassFile version should be orthogonal to compilation strategy. For example, javac 1.4 could conceivably compile an inner class to a 48.0 ClassFile in two ways, one when configured to accept the Java 1.3 source language and another when configured to accept the Java 1.4 source language:
javac 1.4 -source 1.3 -target 1.4 -> 48.0
javac 1.4 -source 1.4 -target 1.4 -> 48.0
You would have to inspect the ClassFiles carefully to see the difference, since their versions wouldn't - don't - reveal the compilation strategy. Of course, the ClassFile version "dominates" a compilation strategy, since a strategy can only use artifacts legal in a given ClassFile version, even though the concepts are different. Joe has written more about the history of -source and -target.
The combination missing above is:
javac 1.4 -source 1.4 -target 1.3 -> 47.0
or, given that the target could refer strictly to compilation strategy and not ClassFile version:
javac 1.4 -source 1.4 -target 1.3 -> 48.0
javac does not accept a target (or compilation strategy) lower than the source language it is configured to accept. Each new version of the source language is generally accompanied by a new ClassFile version that allows the ClassFile to give meaning to new bytecode instructions, flags, and attributes. Encoding new source language constructs in older ClassFile versions is likely to be difficult. How would javac encode annotations from the Java 1.5 source language without the
Runtime[In]Visible[Parameter]Annotations attributes that appeared in the 49.0 ClassFile?
Claim: A compiler configured to accept source language X should not support a compilation strategy corresponding to a source language lower than X.
This policy can be rather restrictive. There were no changes \*\* between the Java 1.5 and 1.6 source languages, and only minor changes in the 49.0 and 50.0 ClassFiles that accompany those languages (really, platforms). Nevertheless, javac 1.6 does not accept -source 1.6 -target 1.5.
\*\* Except for a minor change in the definition of @Override to do what we meant, not what we said. Unfortunately, the definition changed in javac 1.6 but not in the JDK6 javadoc. Happily, javac 1.7 and the JDK7 javadoc are consistent.
The famous example of the restriction is that javac 1.5 does not accept -source 1.5 -target 1.4, so source code using generics cannot be compiled for pre-Java 1.5 VMs even though the generics are erased. This is partly because the compilation strategy for class literals changed between Java 1.4 and 1.5, to use the upgraded
ldc instruction in the 49.0 ClassFile rather than call
Class.forName. If javac's compilation strategy was more configurable, it would be conceivable to produce a 48.0 ClassFile from generic source code. There is however another reason why -source 1.5 -target 1.4 is disallowed ... read on.
Prior to JDK7, if javac for source language X was configured to accept an earlier source language Y, it used the ClassFile definition associated with source language X. For example, if javac 1.5 -source 1.2 reads a 46.0 ClassFile, it treats the ClassFile as a 49.0 ClassFile. This is unfortunate because user-defined attributes in the 46.0 ClassFile could share the names of attributes defined in the 49.0 ClassFile spec, and interpreting them as authentic 49.0 attributes is unlikely to succeed.
Even if javac 1.5 -source 1.2 reads a 49.0 ClassFile, there is little point in reading 49.0-defined attributes since they had no semantics in the Java 1.2 platform. This holds for non-attribute artifacts such as bridge methods too; if physically present in a 49.0 ClassFile, they should be logically invisible from a Java 1.2 point of view. In summary:
javac 1.5 -source 1.2 reading a Java 1.5 ClassFile -> should interpret as Java 1.2
javac 1.5 -source 1.5 reading a Java 1.2 ClassFile -> should interpret as Java 1.2
Claim: A compiler configured to accept source language X should interpret a ClassFile read during compilation as if the ClassFile's version is the smaller of a) the ClassFile version associated with source language X, and b) the actual ClassFile version.
In JDK7, javac behaves as per the claim. First, it interprets a ClassFile according to the ClassFile's actual version, regardless of the configured source language. For example, a 46.0 ClassFile is interpreted as it would have been in Java 1.2, ignoring attributes corresponding to a newer source language. Second, when the configured source language is older than a ClassFile, javac ignores ClassFile features newer than the source language it is configured to accept.
An important part of a compiler's environment is the standard library it is configured to use. The standard library used by javac can be configured by setting the bootclasspath. In future, a module system shipped with the JDK will allow a dependency on a particular standard library to be expressed directly.
Note that running against standard library X is deeply different than compiling against standard library X. Consider the Unicode issue raised earlier: javac implicitly uses the
java.lang.Character from the standard library against which it runs, but should use the class in the standard library for the configured source language. For example, javac 1.6 -source 1.2 should use the Unicode in effect for Java 1.2 not Java 1.6. In this case, suitable versioning can only be achieved at the application level, by javac either reflecting over the appropriate
java.lang.Character class or using overloaded
java.lang.Character.isJavaIdentifierStart/Part methods that each take a version parameter.
Things also get tricky when compiling an older source language to a newer target ClassFile version (and hence a later JVM with a newer standard library). For example, should javac 1.6 -source 1.2 -target 1.5 compile against the Java 1.2 or 1.5 standard library? Both answers have merit, which suggests further concepts are needed to disambiguate.
Using the right libraries matters at runtime too. The introduction of a source language feature in Java 1.5 - enums - added constraints on the standard library against which ClassFiles produced from the Java 1.5 source language can run. The
java.lang.Enum class must be present, and you can read the code of ObjectInputStream and ObjectOutputStream to see for yourself the mechanism for serializing enum constants. The simple way to guarantee that a suitable standard library is available for enum-using code at runtime is to ensure that only 49.0 ClassFiles are produced from the Java 1.5 source language. Such ClassFiles will not run on a Java 1.4 VM since it only accepts <=48.0 ClassFiles.
In a nutshell, the compilation strategy for enums is erasure++: an enum type compiles to an ordinary ClassFile with ordinary static members for the enum constants and ordinary static methods to list and compare constants. With a few changes in that strategy (to not extend
java.lang.Enum) and a serious amount of magic in the Java 1.5 VM (to track reflection and serialization of objects of enum type), the ClassFiles emitted by a compiler for the Java 1.5 source language could run safely enough on a Java 1.4 VM. But the drawbacks to such hackery are enormous, so erasure++ it was.
Thus, the reason why one new language feature implemented by erasure - generics - cannot run on earlier JVMs is because another new language feature - enums - is implemented by erasure. Such is life at the foundation of the Java platform.
Thanks to "Mr javac" Jon Gibbons for feedback on this entry.