So you want to change the Java Programming Language...

With the talk of closures, modules, more annotations, and other language features in the air for JDK 7, what are all the tasks that might need to happen to fully add a language feature to the platform? Besides the general advice of being open about the project's status and soliciting feedback, there are specific technical considerations for language changes. Designing language features generates a lot of interest so providing a design rationale and FAQ is especially important. Based on my experiences helping out with adding the host of language features in JDK 5, there are many interactions to consider; while some of them are obvious, others are quite surprising. The following list is not exhaustive, but everything on the list is an item to consider:

  1. Update the Java Language Specification. This is obviously a required task for a language change, but the JLS is a large and complicated document and it may not be immediately obvious how and where all the updates need to occur. Considering the JLS roughly chapter by chapter:

    • How does the grammar need to be revised?

    • How is the type system affected?

    • Are any new conversions defined?

    • Are naming conventions or name visibility modified?

    • Is the existing structure of packages, classes, or interfaces changed?

    • How can the new feature be annotated?

    • Is method resolution impacted?

    • How does the change impact source compatibility?

    • How does the change impact binary compatibility?

    • Does the feature affect the reachability of code or the definite assignment of variables?

    • Compute the Buckley Complexity Quotient of your change.

  2. Implement the language change in a compiler. Sun's javac compiler has been open-sourced and experiments are welcome in the Kitchen Sink Language project. Be warned, the kitchen sink may have a garbage disposal. However, batch compiler support alone is not sufficient; language changes today should have IDE support too.

  3. Add any essential library support. Some language changes rely on concomitant library updates. For example, all enum types are subclasses of java.lang.Enum.

  4. Write tests. Tests are good. Besides unit/regression tests for the compiler changes, under the JCP, a JSR must deliver a specification, a reference implementation, and conformance tests and language changes get integrated into the platform under the authority of a JSR. For language changes, the compiler JCK tests must be updated.

  5. Update the Java Virtual Machine Specification. Some language changes define new modifier bits, add attributes, or make other changes to the class file format. Conceivably the verifier rules need to be updated too.

  6. Update the JVM and other tools that consume classfiles. Generally each feature release updates the major number of the class file version and there are at least small changes to the classfile format. This requires at least a trivial JVM update to accept the new version number. Exposing information in the class file to the Java programming layer requires at least minor changes to let the data be passed through. Additionally, data that is passed through needs to be updated when the class is redefined through JVM TI or other interface. There are other tools in the JDK that manipulate class files too, including but not limited to:

    • pack200/unpack200: New attributes should not be stripped away and their semantics should be preserved through a compress-decompress cycle.

    • javap: New information should be printed accordingly.

  7. Update the Java Native Interface (JNI). JNI defines a way for Java code and C/C++ to call and communicate with one another. While this API has been very stable in the face of language changes, new language features may justify direct support for new kinds of Java ↔ native interactions.

  8. Update the reflective APIs. Since the reflective APIs model the language, when the language is updated, the model may need to be updated as well. Reflective APIs in the JDK as of JDK 7 include:

    • Core reflection: (java.lang.Class and java.lang.reflect.\*) Implementing core reflection changes may require new JVM entry points. The java.lang.reflect.Proxy mechanism may also need amending.

    • JSR 269 model: (javax.lang.model.\*) This API was designed to smoothly accommodate the addition of new language features. In addition, the annotation processing API in javax.annotation.processing may need changes too.

    • Doclet API: Whether or not the existing doclet API itself needs to be updated may depend on the results of JSR 260.

    • apt API: The com.sun.mirror.\* API has been superseded by JSR 269 as of JDK 6 and will not be updated for any JDK 7 or later language changes.

    • Java Platform Debugger Architecture (JPDA): This collection of APIs provides a model of a running JVM for inspection and manipulation by debuggers and other tools. As such, portions of the API model the language, especially the Java Debug Interface (JDI); JVM Tool Interface (JVMTI) and Java Debug Wire Protocol (JDWP) could also be affected.

  9. Update serialization support. Besides normal serialization, is IIOP serialization affected too?

  10. Update the javadoc output. Separate from updating the language model used for generating javadoc, the generated API documentation must include information on the new constructs.


Case Study: Enum types

I like enums and use them extensively in my API designs; however, getting them fully into the platform was a significant amount of work. At first blush, adding built-in enum types to the Java language seems straightforward. Effective Java has an item detailing two variants of the type-safe enum pattern, the language change is based on the non-subclassable variant, and implementing the basic desugaring from the new enum syntax to the pattern's boilerplate of supporting code is not very tricky. However, unexpected complications occurred with reflective APIs and with serialization and there were many details to get right throughout.

Recalling some war stories from JDK 5 development, here is a summary of the work needed to implement enums for a subset of the areas identified above:

  • Updating the Java Language Specification: Besides adding a new keyword (§3.9), a new section describing the syntax of and semantic restrictions on an enum type (§8.9), and a few miscellaneous updates, there was an update in the binary compatibility chapter:

    §13.4.26 Evolution of Enums

    Adding or reordering constants from an enum type will not break compatibility with pre-existing binaries.

    If a precompiled binary attempts to access an enum constant that no longer exists, the client will fail at runtime with a NoSuchFieldError.

    Therefore such a change is not recommended for widely distributed enums. In all other respects, the binary compatibility rules for enums are identical to those for classes.

    However, this doesn't fully capture the implementation flexibility that is available due to the various restrictions on an enum. For example, the following two implementations of an enum are functionally equivalent:

    // Implementation 1
    enum MetaSyntaticVariable {
       FOO(21),
       BAR(42);
    
       private int answer;
       MetaSyntaticVariable(int answer) {this.answer=answer;}
       public int answer() {return this.answer;}
    }
    
    // Implementation 2
    enum MetaSyntaticVariable {
       FOO {
         public int answer() {return 21;}
       },
       BAR {
          public int answer() {return 42;}
       };
    
       public abstract int answer();
    }
    

    All enums are effectively final since they only have private constructors. This is true even when anonymous classes are used to implement functionality for enum constants, as in the second example. The effective contract of an enum class is its set of enum constants, the sets of methods callable on those enum constants, and any other static methods and fields defined in the enum itself. The class file generated for the second implementation is marked abstract because of the abstract method. Usually making a non-abstract class abstract will cause binary compatibility problems (§13.4.1) due to attempts to create new instances of the class. However, this is not an issue for enums because all the creation instances must occur within the enum class due to the private constructors. At a method level, adding abstract to a method declaration in an enum won't cause problems either because the compiler insists that all instances of the class (the enum constants) have a concrete implementation of the method in question. (These kinds of implementation variations don't affect the serial form of an enum either; the serial form is based on the name of the enum type and the names of the constants.)

  • Implementing the language change in a compiler: While there is a reasonably direct desugaring of an enum into a class with a set of static final fields and a few implicit methods, there are some intricate details around the corners. For example, the enum specification requires that in source code enum types are declared neither final nor abstract. However, the classfile generated for an enum must be marked final, unless the enum has an abstract method in which case the classfile must be abstract. If the enum is abstract, then all the enum constants must be initialized with anonymous classes that provide a concrete implementation of all abstract methods in the enum type. Neal did the bulk of the work implementing enums in javac; during a very busy week during late JDK 5, I got to implement the final round of javac changes to address these kinds of details (5005748, 5009574, 5009601, 5010455, 5019572, 5020490).

  • Adding essential library support: The new type java.lang.Enum creates a typing relationship between all enums and the final methods defined in java.lang.Enum ensure that certain consistency requirements, such as no cloning, are enforced. While not strictly essential, high-performance set and map implementations for enums are prerequisites to allow enums to replace implicit integer enumerations in performance-sensitive contexts.

  • Updating the class file format: An ACC_ENUM flag is added to the set of recognized ClassFile access_flags; an analogous addition is made to the field_info access_flags to mark enum constants.

  • Updating the reflective APIs: Without a doubt, the single line of code I spent the most time working on during JDK 5 was the "one liner" Class.isEnum. A related question is, "are the anonymous classes used to implement specialized enum constants themselves enums?" After some deliberation, the answer was determined to be "no." Originally, Class.isEnum just checked that the superclass of the class in question was java.lang.Enum. However, malicious types could try to extend that class directly (and javac didn't always prohibit this) so the isEnum check was switched to testing for the presence of the ACC_ENUM bit in the class's modifiers. However, the anonymous classes for specialized enum constants need their ACC_ENUM bit set too so just checking for the modifier bit wasn't enough. The final implementation of isEnum looks both at the superclass and the modifier bit.

    The reason both enum classes and anonymous classes for enum constants need the ACC_ENUM bit set is so reflective calls of Constructor.newInstance will fail with an exception and prevent any rogue enum objects from being created, even if setAccessible is true.

    A minor omission in the specification of enums is explicitly stating whether or not the automatically generated valueOf and values methods should be marked synthetic. Although they are not present in the source, like implicit default constructors, they are described in the JLS so they are synthesized by the compiler, but should not be regarded as synthetic. Whether or not the methods are synthetic affects how certain reflective APIs should model the methods. For example, the javax.lang.model.element API for annotation processing does not present synthetic elements.

    When compiling enum constructors, javac chooses to add two synthetic parameters to set the name and ordinal of the enum constant. However, the class file format doesn't include a way to mark just a parameter of a constructor as synthetic. This same issue exists for inner class constructors. The synthetic parameters might or might not be visible through a given reflective API.

  • Updating serialization support: The recommended serial form of a type safe enum in Effective Java is based on the ordinal positions of the enum constants. This was changed for built-in enums to be based on the names of the enum constants, not their positions. The latter strategy means reordering enum constants is both a binary compatible and serial compatible change. The core serialization machinery was updated to avoid creating rogue enum constants upon deserialization. However, the platform also support the separate IIOP serialization mechanism as part of CORBA and the IIOP standard was not updated to support enums when they were added to the language.

  • Updating the javadoc output: While the generated javadoc for an enum is very similar to the generated javadoc for a class, some of the special rules for enum modifiers and the like need to be taken into consideration. One small remaining glitch is that in an enum like the second version of MetaSyntaticVariable above, the answer method will be listed as abstract even though an enum can't have a meaningfully abstract method.

Conclusions

Updating the language has the potential to fundamentally expand the set of programs that are reasonable to write, but that expansion can have a high engineering cost even when the programming interface to the feature is small. Therefore, extra judiciousness is appropriate when considering language changes. Seeking out simplicity is especially important since any extra complexity can ripple throughout many APIs and system boundaries.

Thanks to Alex, Andreas, and Neal for feedback on earlier drafts of this entry.

Comments:

Post a Comment:
Comments are closed for this entry.
About

darcy

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today
News

No bookmarks in folder

Blogroll