Wednesday Sep 12, 2012

JSR 308 Moves Forward

I am pleased to announce a number of recent milestones for JSR 308, Annotations on Java Types:

Adoption of JCP 2.8

Thanks to the agreement of the Expert Group (EG), JSR 308 operates under JCP 2.8 from September 2012.

There is a publicly archived mailing list for EG members, and a companion list for anyone who wishes to follow EG traffic by email. There is also a "suggestion box" mailing list where anyone can send feedback to the EG directly. Feedback will be discussed on the main EG list.

Co-spec lead Prof. Michael Ernst maintains an issue tracker and a document archive.

Early-Access Builds of the Reference Implementation

Oracle has published binaries for all platforms of JDK 8 with support for type annotations.

Builds are generated from OpenJDK's type-annotations/type-annotations forest (notably the langtools repo). The forest is owned by the Type Annotations project.

Integration with Enhanced Metadata

On the enhanced metadata mailing list, Oracle has proposed support for repeating annotations in the Java language in Java SE 8. For completeness, it must be possible to repeat annotations on types as well as declarations. The implementation of repeating annotations on declarations is already in the type-annotations/type-annotations forest (and hence in the early-access builds above) and work is underway to extend it to types.

Wednesday Feb 15, 2012

JLS7 and JVMS7 online

I am pleased to announce that the Java SE 7 Editions of the Java Language Specification and the JVM Specification are available at http://docs.oracle.com/javase/specs/ in both PDF and HTML form.

We refer to these specifications as JLS7 and JVMS7 to emphasize that both are part of Java SE 7. Only a major Java SE release can change the Java language and JVM.

There is no JLS4 or JVMS3. Historically, the JLS and JVMS pre-date the Java Community Process and hence the concept of Java SE. They were versioned by book editions, e.g. JLS 1st Edition (JLS1) and JVMS 2nd Edition (JVMS2). As time went on, production difficulties caused the book editions to diverge, leading to a confusing situation where Java SE 5.0 incorporated JLS3 and a complex combination of JVMS2 + JSR 14 + JSR 45 + JSR 175 + JSR 201. In my view, it would be needlessly confusing if Java SE 7 incorporated JLS4 and JVMS3, while Java SE 8 incorporated JLS5 and JVMS4. Adopting the SE version number for the specifications is easier and more transparent for everyone. The specifications after JLS7 and JVMS7 will be JLS8 and JVMS8.

The specifications are written in DocBook and rendered as PDF and HTML via DocBook XSL. The HTML is well-formed and has a consistent naming scheme for files and anchors. You can link directly to any section via a predictable URL, such as:

http://docs.oracle.com/javase/specs/jls/se7/html/jls-14.html#jls-14.20.3
http://docs.oracle.com/javase/specs/jls/se7/jls7.pdf#jls-14.20.3

or:

http://docs.oracle.com/javase/specs/jvms/se7/html/jvms-6.html#jvms-6.5.invokedynamic
http://docs.oracle.com/javase/specs/jvms/se7/jvms7.pdf#jvms-6.5.invokedynamic

Officially, the specifications have been available in PDF form since July 2011, in Annex 3 of the Final Release of JSR 336 (Java SE 7). The specifications on docs.oracle.com incorporate minor spelling and formatting improvements.

Per the JCP pages for JSR 901 and JSR 924, please report technical errors in the JLS or JVMS to me directly. All proposals for new features should be made through the JDK Enhancement Proposal Process.

I hope to publish JLS7 and JVMS7 as books later in 2012.

Tuesday Jan 31, 2012

JSR 308 Early Draft Review

I am pleased to announce an Early Draft Review of JSR 308, which extends the Java language in Java SE 8 so that annotations may appear on essentially any use of a type.

This generalization of annotations with respect to Java SE 7 (where annotations may only appear on declarations) enables new uses of annotations, such as the Checker Framework. JSR 308 itself makes no commitment about the semantics of annotations which might appear in any given location.

The PDF available for Early Draft Review covers language changes and class file support, and sketches interactions with other language features planned for Java SE 8. We expect the next milestone review will include APIs for reflection (java.lang.reflect) and annotation processing (JSR 269's javax.lang.model). The reference implementation will be moved from the jsr308-langtools project into OpenJDK's Type Annotations project by that time.

Thursday Nov 17, 2011

QCon SF 2011

To San Francisco for QCon SF 2011, where I spoke on Java SE: Where We've Been, Where We're Going.

QCon is much further "up the stack" than JavaOne, so has far fewer talks about the "foundation", Java SE. I thought it was important to review the features delivered in Java SE 7 before discussing what's planned for Java SE 8. This worked out well, as most of the audience were using Java SE 6. The language changes in SE 7 look small, but examining merely two of them - precise rethrow and suppressed exceptions - reveals a new exception handling idiom applicable to many thousands of Java classes.

And thumbs up to the QCon organizers for the instant feedback mechanism!

Thursday Mar 31, 2011

Maintenance Review of the Java Language Specification

The Java Language Specification is the authoritative definition of the Java programming language. Officially, the Specification is maintained in the Java Community Process as JSR 901. The Specification was last updated in 2004 by JSRs 14, 133, 175, and 201, and published as the book, "The Java Language Specification, Third Edition" (2005).

Since that time, numerous corrections and clarifications to the Specification have been recorded in Sun/Oracle's public bug tracking system. Often, they align with changes made in JDK7's javac, the Reference Implementation of a compiler for the Java programming language.

Oracle has now produced a cohesive document integrating these corrections and clarifications. Per the JCP maintenance procedure, Oracle initiated a Maintenance Review of JSR 901 in March 2011. It proposes the "Java SE 7 Edition" of the Java Language Specification.

In an effort to support long-term readability and testability, Oracle has strongly differentiated normative material from informative material in the Specification. For example, the compile-time errors possible for a field declaration are normative, but the conventional order of modifiers in a field declaration is informative. More details are given in the change log.

Changes for JSR 334 (a.k.a. Project Coin) are not included because they are not yet final. They will be integrated before Java SE 7 goes final.

As Maintenance Lead for JSR 901, I hope you find the proposed Specification interesting and useful. If you have substantive technical comments about the specific changes proposed for the Java SE 7 Edition, please send them to me directly.

Monday Feb 28, 2011

Maintenance Review of the Java VM Specification

The Java Virtual Machine Specification is the authoritative reference for the design of the Java virtual machine that underpins the Java SE platform. In an implementation-independent manner, the Specification describes the architecture, linking model, and instruction set of the Java virtual machine, plus the class file format.

Many Java developers are familiar with the book, "The Java Virtual Machine Specification, Second Edition" (1999). Officially, the Specification is maintained in the Java Community Process as JSR 924. The Specification has incorporated changes arising in 2004 from a Maintenance Review for Java SE 5.0, and in 2006 from JSR 202 in Java SE 6.

However, no single document was available that incorporated all these changes plus the smaller corrections and improvements that are made from time to time. A single document is needed to serve as the base for further changes in Java SE 7 and beyond.

Oracle has now produced such a document. Per the JCP maintenance procedure, Oracle initiated a Maintenance Review of JSR 924 in February 2011. It proposes the "Java SE 7 Edition" of the Java Virtual Machine Specification.

Changes for JSR 292 are not included because they are not yet final. They will be integrated before Java SE 7 goes final.

As Maintenance Lead for JSR 924, I hope you find the proposed Specification interesting and useful. If you have substantive technical comments about the specific changes proposed for the Java SE 7 Edition, please send them to me directly.

Friday Jan 22, 2010

JSR 294 and Module Systems

JSR 294 is often, and incorrectly, described as a module system. In fact, JSR 294 provides language and VM features for the benefit of module systems such as OSGi and Jigsaw, similar to how JSR 292 provides VM features for the benefit of dynamic language runtimes such as JRuby and Jython. Where JSR 292 standardizes linkage protocols, but not linkage behavior, JSR 294 standardizes module accessibility, but not module boundaries.

Please see this short PDF on JSR 294 and Module Systems to understand the relationship of JSR 294 to module systems such as OSGi and Jigsaw. BJ Hargrave (OSGi CTO) has also blogged on how a module system may use JSR 294 to enforce boundaries in "I am Visible but am I Accessible?".

Friday Jul 31, 2009

Versioning in the Java platform

The best-versioned artifact in the Java world today is the ClassFile structure. Two numbers that evolve with the Java platform (as documented in the draft Java VM Specification, Third Edition) are found in every .class file, governing its content. But what determines the version of a particular .class file, and how is the version really used? The answer turns out to be tricky because there are many interesting versionable artifacts in the Java platform.

The source language is the most obvious. A compiler doesn't have to accept multiple versions of a source language, though javac does, via the -source flag. (-source works on a global basis; it is also conceivable to work on a local basis, accepting different versions of the source language for different compilation units.) Less obvious versioned artifacts are hidden in plain sight: character sets and compilation strategies. And .class files themselves sometimes have their versions used in surprising ways. Let's see how javac handles all these versions, and make some claims about how an "ideal" compiler might work.

In the remainder, X and Y are versions. "source language X" means "version X of a source language". "Java X" means "version X of Java SE". "javac X" means "the javac that ships in version X of the JDK".

Character set

Happily, the Java platform has used the Unicode character set from day one. Unhappily, when javac for source language X is configured to accept an earlier source language Y, it uses the Unicode version specified for source language X rather than Y. For example, javac 1.4 -source 1.3 uses Unicode 3.0, since that was the Unicode specified for Java 1.4. It should use Unicode 2.1 as specified for Java 1.3.

Claim: A compiler configured to accept source language X should use the Unicode version specified for source language X.

It is difficult for javac to use multiple Unicode versions since the standard library (notably java.lang.Character) effectively controls the version of Unicode available, and only one version of the standard library is usually available. We will return to the issue of multiple standard libraries later.

Sidebar: You may be surprised to discover that some other languages don't use Unicode by default. A factoid from 2008's JVM Language Summit was the existence of a performance bottleneck in converting 8-bit ASCII strings (used by dynamic languages' libraries) to and from UTF-8 strings (used by canonical JVM libraries). Who knows what the 2009 JVM Language Summit will reveal?

Compilation strategy

A compilation strategy is the translation of source language constructs to idiomatic bytecode, flags, and attributes in a ClassFile. As the Java platform evolves by changing the source language and ClassFile features, a compilation strategy can evolve too. For example, javac 1.4 may compile an inner class one way when accepting the Java 1.3 source language and another way when accepting the Java 1.4 source language.

Claim: A compiler may use a different compilation strategy for each source language.

The javac flag '-target' selects the compilation strategy associated with a particular source language. This mainly has the effect of setting the version of the emitted ClassFile: 46.0 for Java 1.2, 47.0 for Java 1.3, 48.0 for Java 1.4, 49.0 for Java 1.5, 50.0 for Java 1.6. For example, javac 1.4 could compile an inner class the same way when configured with a Java 1.3 target versus a Java 1.4 target \*, but emit 47.0 and 48.0 ClassFiles respectively:

javac 1.4 -source 1.3 -target 1.3 -> 47.0

javac 1.4 -source 1.3 -target 1.4 -> 48.0

\* It doesn't, as per Neal's comment, but suppose for sake of argument it does.

However, ClassFile version should be orthogonal to compilation strategy. For example, javac 1.4 could conceivably compile an inner class to a 48.0 ClassFile in two ways, one when configured to accept the Java 1.3 source language and another when configured to accept the Java 1.4 source language:

javac 1.4 -source 1.3 -target 1.4 -> 48.0
javac 1.4 -source 1.4 -target 1.4 -> 48.0

You would have to inspect the ClassFiles carefully to see the difference, since their versions wouldn't - don't - reveal the compilation strategy. Of course, the ClassFile version "dominates" a compilation strategy, since a strategy can only use artifacts legal in a given ClassFile version, even though the concepts are different. Joe has written more about the history of -source and -target.

The combination missing above is:

javac 1.4 -source 1.4 -target 1.3 -> 47.0

or, given that the target could refer strictly to compilation strategy and not ClassFile version:

javac 1.4 -source 1.4 -target 1.3 -> 48.0

javac does not accept a target (or compilation strategy) lower than the source language it is configured to accept. Each new version of the source language is generally accompanied by a new ClassFile version that allows the ClassFile to give meaning to new bytecode instructions, flags, and attributes. Encoding new source language constructs in older ClassFile versions is likely to be difficult. How would javac encode annotations from the Java 1.5 source language without the Runtime[In]Visible[Parameter]Annotations attributes that appeared in the 49.0 ClassFile?

Claim: A compiler configured to accept source language X should not support a compilation strategy corresponding to a source language lower than X.

This policy can be rather restrictive. There were no changes \*\* between the Java 1.5 and 1.6 source languages, and only minor changes in the 49.0 and 50.0 ClassFiles that accompany those languages (really, platforms). Nevertheless, javac 1.6 does not accept -source 1.6 -target 1.5.

\*\* Except for a minor change in the definition of @Override to do what we meant, not what we said. Unfortunately, the definition changed in javac 1.6 but not in the JDK6 javadoc. Happily, javac 1.7 and the JDK7 javadoc are consistent.

The famous example of the restriction is that javac 1.5 does not accept -source 1.5 -target 1.4, so source code using generics cannot be compiled for pre-Java 1.5 VMs even though the generics are erased. This is partly because the compilation strategy for class literals changed between Java 1.4 and 1.5, to use the upgraded ldc instruction in the 49.0 ClassFile rather than call Class.forName. If javac's compilation strategy was more configurable, it would be conceivable to produce a 48.0 ClassFile from generic source code. There is however another reason why -source 1.5 -target 1.4 is disallowed ... read on.

Environment

Prior to JDK7, if javac for source language X was configured to accept an earlier source language Y, it used the ClassFile definition associated with source language X. For example, if javac 1.5 -source 1.2 reads a 46.0 ClassFile, it treats the ClassFile as a 49.0 ClassFile. This is unfortunate because user-defined attributes in the 46.0 ClassFile could share the names of attributes defined in the 49.0 ClassFile spec, and interpreting them as authentic 49.0 attributes is unlikely to succeed.

Even if javac 1.5 -source 1.2 reads a 49.0 ClassFile, there is little point in reading 49.0-defined attributes since they had no semantics in the Java 1.2 platform. This holds for non-attribute artifacts such as bridge methods too; if physically present in a 49.0 ClassFile, they should be logically invisible from a Java 1.2 point of view. In summary:

javac 1.5 -source 1.2 reading a Java 1.5 ClassFile -> should interpret as Java 1.2
javac 1.5 -source 1.5 reading a Java 1.2 ClassFile -> should interpret as Java 1.2

Claim: A compiler configured to accept source language X should interpret a ClassFile read during compilation as if the ClassFile's version is the smaller of a) the ClassFile version associated with source language X, and b) the actual ClassFile version.

In JDK7, javac behaves as per the claim. First, it interprets a ClassFile according to the ClassFile's actual version, regardless of the configured source language. For example, a 46.0 ClassFile is interpreted as it would have been in Java 1.2, ignoring attributes corresponding to a newer source language. Second, when the configured source language is older than a ClassFile, javac ignores ClassFile features newer than the source language it is configured to accept.

An important part of a compiler's environment is the standard library it is configured to use. The standard library used by javac can be configured by setting the bootclasspath. In future, a module system shipped with the JDK will allow a dependency on a particular standard library to be expressed directly.

Note that running against standard library X is deeply different than compiling against standard library X. Consider the Unicode issue raised earlier: javac implicitly uses the java.lang.Character from the standard library against which it runs, but should use the class in the standard library for the configured source language. For example, javac 1.6 -source 1.2 should use the Unicode in effect for Java 1.2 not Java 1.6. In this case, suitable versioning can only be achieved at the application level, by javac either reflecting over the appropriate java.lang.Character class or using overloaded java.lang.Character.isJavaIdentifierStart/Part methods that each take a version parameter.

Things also get tricky when compiling an older source language to a newer target ClassFile version (and hence a later JVM with a newer standard library). For example, should javac 1.6 -source 1.2 -target 1.5 compile against the Java 1.2 or 1.5 standard library? Both answers have merit, which suggests further concepts are needed to disambiguate.

Using the right libraries matters at runtime too. The introduction of a source language feature in Java 1.5 - enums - added constraints on the standard library against which ClassFiles produced from the Java 1.5 source language can run. The java.lang.Enum class must be present, and you can read the code of ObjectInputStream and ObjectOutputStream to see for yourself the mechanism for serializing enum constants. The simple way to guarantee that a suitable standard library is available for enum-using code at runtime is to ensure that only 49.0 ClassFiles are produced from the Java 1.5 source language. Such ClassFiles will not run on a Java 1.4 VM since it only accepts <=48.0 ClassFiles.

In a nutshell, the compilation strategy for enums is erasure++: an enum type compiles to an ordinary ClassFile with ordinary static members for the enum constants and ordinary static methods to list and compare constants. With a few changes in that strategy (to not extend java.lang.Enum) and a serious amount of magic in the Java 1.5 VM (to track reflection and serialization of objects of enum type), the ClassFiles emitted by a compiler for the Java 1.5 source language could run safely enough on a Java 1.4 VM. But the drawbacks to such hackery are enormous, so erasure++ it was.

Thus, the reason why one new language feature implemented by erasure - generics - cannot run on earlier JVMs is because another new language feature - enums - is implemented by erasure. Such is life at the foundation of the Java platform.

Thanks to "Mr javac" Jon Gibbons for feedback on this entry.

Tuesday May 12, 2009

Draft of the Java VM Specification, Third Edition

The Second Edition of the Java Virtual Machine Specification was published in 1999 and describes the Java SE platform circa JDK 1.2. Since then, numerous JSRs have updated the content, notably JSR 14 (generics) in Java SE 5.0 and JSR 202 (typechecking verification) in Java SE 6. Some of these updates are on the maintenance page for the Second Edition. However, no single document has been available that incorporated all these updates plus the smaller corrections and improvements that are made from time to time.

Certain JCP procedures are required to produce an official Third Edition of the Java Virtual Machine Specification. In the meantime, I am making available a draft of the Third Edition (ZIP, 1.9MB) to let the Java community observe the changing structure of the specification. There is an ongoing effort to identify and remove a) references to the Java Language Specification and b) assumptions about the compilation process that produced a ClassFile.

To emphasize the informal nature of the draft, I am not providing a change log or anything else that could be construed as starting a formal review. Nor are potential updates from JSR 292 and JSR 294 included; the draft pertains solely to Java SE 6 as defined by JSR 270 in 2006.

Saturday Feb 07, 2009

FOSDEM 2009

Back to Belgium for my first time at FOSDEM, where I presented on progress Towards a Universal VM in the Free Java track. (An updated form of my Devoxx talk with Brian Goetz.) It was good to meet Andrew Haley at last, chat with Martin Odersky about Scala and modularity, and of course catch up with Dalibor.

Monday Dec 22, 2008

Devoxx 2008

Devoxx is always a pleasure to attend for the energy and enthusiasm that 3000+ Java developers bring from all over Europe. I made two presentations this year: one on modularity in Java that dove-tailed with Mark Reinhold's keynote, and one with Brian Goetz on the JVM's progress towards being a "universal" VM for all programming languages. Thanks to the many hundreds who attended and gave warm feedback.

Bear in mind that these presentations are targeted at a broad developer base, and that we cannot address every last detail of a topic in 45-50 minutes. Just because something is missing from the slides doesn't mean we don't care about it, or that it's not important, or that it didn't come up in Q&A or a BOF.

In other news, Stephen Colebourne continued his fine tradition of taking the pulse of the community regarding Java language changes. By asking people to rank features globally, we gained crucial information over a local yes/no vote on each feature. With yes/no voting, imagine if 86 people vote yes for properties and 53 vote no, while 51 people vote yes for multi-line strings and 9 vote no. All you really learn is that the properties "community" is more divided than the multi-line strings "community". This saps authority from the larger number of yes votes for properties. Plus, yes/no voting allows different communities to talk past each other forever, ignoring the fact that Java language designers can consider the needs of only one community: everybody. So while there is debate about which features to include in the rankings, overall I was very pleased with how informative and decisive the community's rankings were.

Friday Sep 05, 2008

Named parameters

In a method call, it can be convenient to label the actual parameters according to the method's formal parameter names. For example, the method void m(int x, int y) {} could be called as m(x:4, y:5). This is especially worthwhile in two cases:


  • When a method has adjacent parameters with the same type but different semantics. For example, Math.pow(double,double) might be easier to use if declared as double pow(double raise, double toThePower) {} and called as pow(raise: 4, toThePower: 5). An entrypoint to a banking application void login(boolean showBalance, boolean showOffers, boolean trackActivity) could be called as login(showBalance: getUserPrefs(), showOffers: true, trackActivity: false).

  • When a method has many parameters, regardless of their type. Too many parameters indicates misdesign - one of the Epigrams of Programming is "If you have a procedure with 10 parameters, you probably missed some" - but it would never do to impose a hard limit, so inline documentation from named parameters is the best option.

Named parameters raise some interesting questions:


  • If both actual and formal parameters are named, can the actual parameters be reordered w.r.t. the formal parameters?
  • Must all actual parameters be named, or only some?

If parameter order is fixed, then names could potentially be omitted. Consider the method:

void m(int x, int y, int z) {}

It's not difficult to match actual parameters to formal parameters, even without names:

m(x:1, y:2, 3) or
m(1, 2, z:3)

But if parameter order is variable, then omitting names is a disaster in the making. The call:

m(z:3, 1, 2)

makes you work to realize that x binds to 1 and y to 2. In the worse case, you destroy the convenience of reordering because you must match all actual parameters to formal parameters to understand the bindings, as in:

m(y:2, 1, x:3)

So, allowing reordering is the crucial question. Some will say that the whole point of named parameters is to aid readability of the caller in the context of an obtuse or verbose callee, and that reordering can improve readability. Others will disagree. Personally, I think the dissonance is greater when some parameters are named and some are not, than when all parameters are named but given out of order. Therefore, I would like to allow reordering and disallow omission.

Reordering actually has a profitable interaction with variable-arity methods. Consider:

void m(int x, int... y) {}

It would be nice to call it as:

m(x:1, y:2, y:3) or
m(x:1, y:{2,3})

Allowing reordering would allow the vararg parameter to come first:

m(y:2, y:3, x:1) or
m(y:{2,3}, x:1)

or even allow the vararg parameter to be distributed:

m(y:2, x:1, y:3)

This undoes the tradition of variable-arity parameters coming last, but then, they're only last because in any other position and without names, you can't differentiate a variable-arity actual parameter from a fixed-arity actual parameter. Named parameters with reordering make the last-position requirement unnecessary, and also the requirement for only one vararg per method. It would be quite reasonable to declare:

void m(int... x, int... y, int z) {}

and call:

m(x:1, y:100, x:2, y:200, x:3, y:300, z:1000)

if there is a natural association of x values with y values.

Clearly, reordering is a powerful concept. That usually means complexity. Let's look at how method call works, and how named parameters with or without reordering would affect it.

Into the heart of darkness: Overload resolution

Method resolution is the process of matching the method name and set of actual parameters supplied in a call to a method declaration in the receiver's class. If successful, the method call is resolved. Because the Java language and virtual machine support method overloading, a Java compiler uses the actual parameter types of a method call to select a single method declaration (obviously with the right name) with the "best" matching formal parameter types. This process is called overload resolution. It is rather complex because overloading is tricky in principle and because methods can be generic, of variable arity, and have formal parameters which require boxing/unboxing conversion of the actual parameters. On the bright side, the complexity is only at compile-time, because at run-time, the JVM's invokevirtual instruction simply calls a method with the exact name and formal parameter types chosen by the compiler. The important point is that the language and VM both use formal parameter types to resolve a method call. Static method resolution (i.e. overload resolution) and dynamic method resolution (i.e. invokevirtual's lookup) are aligned.

To involve parameter names as well as parameter types, there are two fundamental approaches. One is to use names as a simple static sanity check and leave static and dynamic resolution untouched. The other is to aggressively thread names through static and dynamic resolution. Let's compare the approaches.

Conservative

The conservative approach is to leave overload resolution unchanged, and add a final step to check that the name of each actual parameter matches the name of the corresponding formal parameter in the resolved method. Formal parameter names are not stored in classfiles today, but there is an RFE to make them available at runtime, and the first step would be to reify them in the classfile. Let's assume that's been done, and that a Java compiler can see the formal parameter names of any method declaration. Of course, if the resolved method is in a legacy classfile without formal parameter names, then the name-matching step must be skipped, but that's OK because it was only a sanity check anyway. The logic of this approach is simple, at the cost of not allowing reordering of actual parameters.

Aggressive

The aggressive approach is to alter overload resolution when it identifies the set of potentially applicable methods. The set would consist of precisely those methods whose formal parameter names match those at the caller, up to ordering and varargs (see below). There would never be a compile-time error about a call using an actual parameter name which is not a formal parameter name of the resolved method, because the set of potentially applicable methods is correct by construction. Which potentially applicable method is resolved is up to the actual and formal parameter types as usual.

This approach is distinctly unfriendly to migration compatibility, because compiling against a legacy classfile, without formal parameter names, will mean there are no potentially applicable methods for the call. Dropping back to traditional overload resolution based on classfile version is ugly, and having to ignore actual parameter names when they were intended to play a central role in resolution is repugnant.

Perhaps being aggressive is worthwhile if it allows richer overloadings than today, based on names as well as types? Currently, overloadings "erase" formal parameter names and are legal up to formal parameter types:

void m(Object x, String y) {} // m(Object,String)
void m(String x, Object y) {} // m(String,Object)

The following is illegal because both signatures "erase" to m(Object,String) (they are override-equivalent in JLS terms) so a call which does not use named parameters cannot differentiate between them:

void m(Object x, String y) {} // m(Object,String)
void m(Object y, String x) {} // m(Object,String)

You might say this is a shame, and that override-equivalence should take formal parameter names into account, because a call using named parameters can differentiate:

m(x:new Object(), y:"hi") // resolves to m(Object x, String y)
m(x:"hi", y:new Object()) // resolves to m(Object y, String x)

However, migration compatibility dictates that we can't assume all calls will used named parameters, and that it is unreasonable for aggressive overloadings which assume such callers to break other callers. Therefore, the overloading must stay illegal and we may as well keep static method resolution based on types. This is just as well, because some legal type-based overloadings cause problems for the aggressive named-based approach. Consider these methods:

void m(Object x, String y) {} // m(Object,String)
void m(String y, Object x) {} // m(String,Object)

and this call:

m(x:new Object(), y:"hi")

Its set of potentially applicable methods contains duplicates:

void m(Object x, String y) {} // m(Object,String)
void m(Object x, String y) {} // m(String,Object) after shuffling formal parameters to match the actual parameter ordering

The call is thus ambiguous. It makes no sense to aggressively change overload resolution to use names when doing so inherently rules out the use of names to call some legacy methods.

Since we say that no two methods can have formal parameters with the same types and different names, dynamic method resolution need not change. This is convenient because allowing name-and-type-based overloadings would mean significant VM changes. Today, a classfile cannot store duplicate methods (same name and formal parameter types) and the invokevirtual instruction could not differentiate between them in any case. A compiler would need to use invokedynamic to accurately resolve such methods.

As a matter of interest, is it possible to use names and types at static method resolution and then erase the names (akin to generics), so types alone are used at dynamic resolution? Not if you want to stay sane. Consider these methods:

void m(Number x) {}
void m(Integer y) {}

and the call:

m(x:5)

which statically resolves to m(Number). Changing the name of its formal parameter:

void m(Number z) {} // z not x
void m(Integer y) {}

means the types still match at runtime (i.e. it's a binary-compatible change) but recompiling the call would give a compile-time error because no methods are potentially applicable. This discrepancy is typical of features implemented by erasure. And not recompiling the call means invokevirtual now targets a less-specific method in terms of type (m(Number) rather than m(Integer)) for a reason (compile-time belief in the presence of an x formal parameter) which no longer holds. More discrepancy.

Finally, note the oddity that m(x:5) resolves to m(Number) while m(5) resolves to m(Integer). It seems that the aggressive approach is dead.

In conclusion, named parameters are possible in Java, but reordering - a considerable benefit - is incompatible with a practical design that preserves high levels of compatibility and usability. Nevertheless, named parameters increase readability where it's needed most, and would be an interesting addition to the language.

Thanks to Jon Gibbons, Maurizio Cimadamore, and Keith McGuigan for feedback and assistance.

Monday Jul 28, 2008

A wrinkle with 'module'

We hoped very much that the 'module' restricted keyword could be disambiguated everywhere in the language with only a fixed syntactic lookahead. That is, a compiler could treat 'module' as an identifier everywhere except in certain productions, for which a simple algorithm would use the immediate context to determine if 'module' was a modifier or an identifier. Even edge cases seemed to support this hope:

module class C { ...
module module module;
module module module() { ...

However, consider this code:

class foo {
module foo() { ...

Is foo() a method with a return type of 'module', or a module-private constructor? The former is legal today, and though it's very bad practice to have a method take the name of the class, it must remain legal in JDK7. So how can we disambiguate it from a module-private constructor?

One option is to do semantic analysis of the subsequent method body. javac currently perceives any 'return' statement in the body of a constructor, regardless of control flow, as an error, so just checking for no returns would be enough to claim it's not a constructor. But this level of analysis is completely inappropriate in a parser.

Another option, which we prefer, is to recognize that method name == class name is bad practice and that it's defensible to parse the term:

  'module' <identifier> '('
as a module-private constructor if the identifier is equivalent to the class name. If you're currently using the term to declare a method which returns a 'module' object, the compiler will complain about the method's 'return' statement(s) having an expression - but fear not, you will be able to put the 'package' modifier on your dubious declaration to make the compiler realize that 'module' is a return type not a modifier. (Actually, any accessibility modifier will do.) Admittedly, this means that the 'module' restricted keyword is not 100% backward-compatible, but it's pretty close. We've thought for years about introducing 'package' as an explicit modifier, to increase consistency and to allow package-package interface members. It finally looks like we have a compelling reason to do it.

If constructors were more strongly called out in the language, then no ambiguity would occur for 'module'. A 'constructor' modifier would suffice, or mandating a reserved name like 'init', or just defining any method-like declaration as a constructor if it has the name of the class.

Compared to class types, enum types are simple. Their constructors cannot be 'public' or 'protected' because creation of enum objects is heavily controlled. Making module-private constructors illegal is a no-brainer. Therefore, in a poorly named method (shares the enum's name), a 'module' identifier is automatically a return type. Interface types are also simple; with no constructors to worry about, the simple syntactic lookahead rules disambiguate 'module'-as-modifier from 'module'-as-identifier in any method, poorly named or not.

Edit: Clarified that a 'return' in an existing 'module <identifier> (' method would have an expression, which is illegal in a constructor.

Wednesday Jun 25, 2008

Bootstrapping modules into Java

We plan to modularize the source of the Java compiler in Java SE 7, i.e. group its packages into modules. As with any large piece of software, modularization brings more precise dependencies, a clearer API, and easier reuse.

A module-aware compiler will be needed to compile the modularized SE 7 compiler source. The compiler in SE 6 is not module-aware. This is a problem because Sun has a policy that the Java compiler source for SE n must be compilable by the compiler in SE n-1. (And that the resulting SE n compiler can execute on SE n-1.)

We can solve the problem via a two-step process:

1) Bootstrap. We will use the SE 6 compiler to compile the SE 7 compiler source, hiding the package-info.java files which associate packages in the SE 7 compiler source with modules. The result of the SE 6 compiler run will be a "bootstrap" SE 7 compiler which is module-aware (knows how to compile modularized code) but is not itself modularized.

2) Modularize. We run the bootstrap SE 7 compiler on SE 6 to compile the SE 7 compiler source again. Since the bootstrap compiler is module-aware, the package-info.java files can be visible. The result of the SE 7 compiler run will be a "real" SE 7 compiler which is module-aware and is itself modularized.

Now we have a modularized SE 7 compiler, we need a module-aware JVM to run it on. The SE 7 JVM is written in C++ so can be made module-aware independently of these compiler shenanigans.

What about core libraries? A modularized compiler should ship with modularized libraries. Compiling a modularized library requires a module-aware compiler, which happily we have after step 1. So in step 2, we can run the bootstrap compiler on SE 6 to compile not only the modularized SE 7 compiler but also the modularized SE 7 library source.

Ultimately, we have a fully modularized SE 7 reference implementation, containing a module-aware JVM, a module-aware and modularized compiler, and a module-aware and modularized set of libraries.

Wednesday Jun 04, 2008

Consistent module membership declarations

You can use the 'module' keyword at the start of a compilation unit to declare which module the unit's types belong to. This keeps important information about program organization close to the code. We require every compilation unit in a package to declare the same module membership, since no-one likes split packages. (I will not discuss package-info module declaration here.) What should a compiler do if it comes across inconsistent compilation units, or even inconsistent classfiles?

Consider these two compilation units, and that R is compiled first:

P/Q.java:
module M;
package P;
... new R(); ...

P/R.java:
module N;
package P;
public class R { ... }

Since R is public, it's not material to Q that R declares a different module than Q declares. We could let the inconsistency slide when compiling Q, only raising an error if Q tries to access a module-private member of R or some module-private type in P. But this would let a package become really split. There will be lots of public types joining modules and staying 'public', so the problem will be common.

Our plan is to require a compiler to give an error if any reference is made, from the current compilation unit, to a type claiming to be in the same package but different module. This catches potential split packages early. It won't matter whether the referenced type is in a compilation unit or a classfile, nor whether the referenced type or its referenced member is public. The error is logically the fault of the referenced type (P.R) though it should be reported in the context of compiling the referring type (P.Q).

Inspired by JLS 7.6, the rule is:

- The host system must enforce the restriction that it is a compile-time error if an observable compilation unit C belongs to a module which is not consistent with the module of any other compilation unit D in the same package as C to which code in C refers (directly or indirectly).

- The host system may choose to enforce the restriction that it is a compile-time error if an observable compilation unit belongs to a module which is not consistent with the module of any other observable compilation unit in the same package.

About

Alex Buckley is the Specification Lead for the Java language and JVM at Oracle.

Search

Categories
  • Java
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today
Feeds