Thursday Sep 10, 2009

Unhashing a String

My working on the strings in switch implementation has gotten swapped back in. The implementation is following the translation strategy outlined in the strings in switch proposal: find a perfect hash function for the input strings and semantically replace case "Foo": with case hash("Foo"): (along with some additional checks and logic) where hash("Foo") is computed as a constant by the compiler.

With this approach, to write the regression tests it is helpful to be able to construct a string with a given hash value to force collisions and test the alternate logic, which led me to write the "unhash" method below to create a string with a given hash code:

  \* Returns a string with a hash equal to the argument.
  \* @return string with a hash equal to the argument.
 public static String unhash(int target) {
    StringBuilder answer = new StringBuilder();
    if (target < 0) {
        // String with hash of Integer.MIN_VALUE, 0x80000000

        if (target == Integer.MIN_VALUE)
            return answer.toString();
        // Find target without sign bit set
        target = target & Integer.MAX_VALUE;
    unhash0(answer, target);
    return answer.toString();

private static void unhash0(StringBuilder partial, int target) {
    int div = target / 31;
    int rem = target % 31;

    if (div <= Character.MAX_VALUE) {
        if (div != 0)
    } else {
        unhash0(partial, div);

The algorithm for hashing strings multiplies the old hash value by 31 and adds in the integer value of the next character:

h = 0;
for (int i = 0; i < len; i++) {
    h = 31\*h + val[off++];

The unhash method works in reverse; for the non-negative values handled by unhash0, divide the target value by 31:

  • if the quotient is less than Character.MAX_VALUE, the quotient and remainder can be fully captured using at most two characters.

  • if the quotient is greater than or equal to Character.MAX_VALUE, unhash the quotient and append to that string a character for the remainder.

With some additional care, negative values can reuse the same process. The key observation is that if a string hashes to Integer.MIN_VALUE (0x80000000), subsequent multiples by 31 (0x1f) do not change the result since
0x1f × 0x80000000 = 0xf80000000
which is again 0x80000000 when limited to int range. Therefore, the sign bit can be set and then the remaining bits handled as if the target were positive.

Using a few minutes of computer time, I tested the unhash method on all integer values and it always returned a correct string. When available, exhaustive testing is pleasantly simple and reassuring! The generated strings are relatively short; as shown in the table below, for non-negative values the average length is slightly over four characters.

Distribution of String Lengths of Unhashed Non-negative Values
Length Frequency
1 31
2 2,031,585
3 60,948,480
4 1,889,402,880
5 195,100,672
Total 2,147,483,648

The unhash of 0 could be special-cased to return the empty string of length zero rather than a length-one string of the \\u0000 character, but this was not necessary for the purposes at hand. Likewise, generating somewhat shorter strings for negative hash values may be possible, but further investigation is not needed just to be able to generate collisions. As is stands, unhash will return a string whose length is at most ten; five characters for a negative sign bit and at most another five characters for the non-sign bits.

Friday Sep 04, 2009

Project Coin: Solidarity with C++ Evolution

Recently I read with interest Bjarne Stroustrup's HOPL III paper Evolving a language in and for the real world: C++ 1991-2006. Despite the numerous technical differences between Java and C++, I was struck by some of the similarities in community involvement and expectations in the evolution of both languages. Selected excerpts from the paper are in block quotes below.

In particular, this very open process [in the C++ committee] is vulnerable to disruption by individuals whose technical or personal level of maturity doesn’t encourage them to understand or respect the views of others. Part of the consideration of a proposal is a process of education of the committee members. Some members have claimed — only partly in jest — that they attend to get that education.

The Project Coin mailing list is a world-readable and world-writable list. While this approach does let anyone join in, the traffic can be very high and at times the signal to noise ratio was quite low. In the future, I'll be inclined to impose temporary moderation on the list to quell unproductive email storms.

The answer to “Why didn’t we provide a much more useful library?” is simpler: We didn’t have the resources (time and people) to do significantly more than we did.

The most common reaction to these extensions among developers is “that was about time; why did it take you so long?” and “I want much more right now”. That’s understandable (I too want much more right now — I just know that I can’t get it), but such statements reflect a lack of understanding what an ISO committee is and can do.

As ever, there are far more proposals than the committee could handle or the language could absorb. As ever, even accepting all the good proposals is infeasible. As ever, there seems to be as many people claiming that the committee is spoiling the language by gratuitous complicated features as there are people who complain that the committee is killing the language by refusing to accept essential features. If you take away consistent overstatement of arguments, both sides have a fair degree of reason behind them. The balancing act facing the committee is distinctly nontrivial.

Viewed over the long term, one goal to evolving a platform is trying maximize value delivered over time. This is analogous to a net present value-style consideration from economics. A feature delivered in the future is less valuable than having the feature today, but the value of choosing to do a feature needs to be weighed against the opportunity costs of doing something else instead. Developers are chronically optimistic and eager to deliver something sooner rather than later, especially when the next release vehicle may be in the relatively distant future. As previously indicated, I too would prefer to see additional language changes as part of Project Coin in JDK 7. However, given the available resources, overcommitting to a large set of features is not responsible; either the large set won't get done in the end, it won't get done well, or the schedule would slip — all of which lead to reduced value too.

Much of the best standards work is invisible to the average programmer and appears quite esoteric and often boring when presented. The reason is that a lot of effort is expended in finding ways of expressing clearly and completely “what everyone already knows, but just happens not to be spelled out in the manual” and in resolving obscure issues that—at least in theory—don’t affect most programmers. The maintenance is mostly such “boring and esoteric” issues.

Some attendees of my JavaOne talk this year were not happy with the length of time spent relating complications with adding enum types in JDK 5. However, I included such a large section on the apparent simplicity of enums still leading to many surprising complexities to help convey the disproportionate efforts that adding even modest features to the language can take.

I fully expect to be surprised in the future with novel interactions and issues as experience is gained with the Project Coin features and prudent planning anticipates the need to deal with such surprises.

Thursday Sep 03, 2009

Java Posse #277 Feedback: Not a view from an ivory tower

The entry below is a slightly edited copy of a message I used to start a new thread on the Java Posse's Google Group, largely in response to comments make by Dick Wall in the first twenty minutes of episode #277 of the Java Posse podcast.

After listening to episode 277, I'm led to conclude I'm thought of by some as one of the "ivory tower guys" who "just says no" to ideas about changing the Java programming language.

I have a rather different perspective.

In November 2006, Sun published javac and related code under the familiar GPLv2 with Classpath exception. [1]

Shortly thereafter in January 2007, no less a Java luminary than James Gosling endorsed the Kitchen Sink Language (KSL) project. [2] In James' words KSL is "A place where people could throw language features, no matter how absurd, just so that folks could play around" since he has "... never been real happy with debates about language features, I'd much rather implement them and try them out." [3]

KSL received no significant community response.

Later in 2007, after the remaining core components of the platform were published as open source software as part of OpenJDK during JavaOne, in November Kijaro was created. [4] Kijaro is similar in spirit to KSL, but does note require contributors to sign the Sun Contributor Agreement (SCA). Before Project Coin, Kijaro saw a modest number of features developed, fewer than ten, which is also not a particular vigorous community response given the millions of Java developers in the world.

The earliest posts on what would become Project Coin mentioned the utility of prototypes, the Project Coin proposal form included a section to provide a link to an optional prototype, and I repeated stated throughout Project Coin the helpfulness of providing a prototype along with a proposal.

Despite the availability of the OpenJDK sources for javac and the repeated suggestions to produce prototypes, only a handful of prototypes were developed for the 70 proposals sent into Project Coin.

Dick asked rhetorically during the podcast whether alternative projects exploring language changes were inevitable as the "only approach given strict control exercised over the JVM [by Sun]."

IMO, such approaches are inevitable only if Sun's repeated efforts to collaborate continue to be ignored.

Classes on compilers are a core component of many undergraduate compiler science curricula. All the Java sources in the JDK 7 langtools repository adds up to about 160,000 lines of code and javac itself is a bit over 70,000 lines currently. These are far from trivial code bases and some portions of them are quite tricky, but implementing certain changes isn't that hard. Really. Try it out.

Dick says toward the end of the opening segment "If people do want to do this stuff, right now they are being told they can't."

I certainly do not have the authority to tell others what they can and cannot do. Indeed, I have advocated producing prototypes of language changes as a much more productive outlet than whining and pouting that other people aren't busy implementing the language changes you want to little avail. Others have already noted in previous messages to this group the irony of Lombok using the annotation processing facility I added to javac in JDK 6 as an alternate way to explore language changes (together with an agent API to rewrite javac internal classes!) . However, way back before JDK 5 shipped in 2004, we at Sun recognized that annotation processors by themselves would be a possible way to implement certain kinds of de facto language changes. The apt tool and later javac were always designed to be general meta- programming frameworks not directly tied to annotations; for example, an annotation processor can process a type containing no annotations to, say, enforce a chosen extra-linguistic check based on the structure of the program. [Such as the naming convention checker shipped as a sample annotation processor in JDK 6.]

As an example of what can be done just using annotation processing, long-time annotation processing hacker Bruce Chapman implemented "multi-line strings" as part of his rapt project [5]; the value of the string is populated from a multi-line comment. After repeatedly outlining how it would be possible to do so on the annotation processing forum [6], I've gotten around to hacking up a little proof- of-concept annotation processor based implementation of Properties. [7] The user writes code like

public class TestProperty extends TestPropertyParent {
   protected TestProperty() {};

   protected int property1;

   @ProofOfConceptProperty(readOnly = false)
   protected long property2;

   protected double property3;

   public static TestProperty newInstance(int property1,
                      long property2,
                      double property3) {
       return new TestPropertyChild(property1, property2, property3);

and the annotation processor generates the superclass and subclass to implement the desired semantics, including the getter and setter methods, etc. Using annotation processors in this way is a bit clunky compared to native language support, but if people want to experiment, the capabilities have been standardized as part of the platform since JDK 6.

It is technically possible to take the OpenJDK sources and construct a version of javac that accepts language extensions; after all, this is how we generally evolve the language and also how the JSR 308 changes were developed before going back into the JDK 7 code base. Additionally, the IcedTea project and the shipping of OpenJDK 6 in Linux distributions has provided an existence proof that folks other than Sun can take the entire OpenJDK code base, add various patches and additional components to it, and ship it as a product.

Given the OpenJDK sources Sun has published, subject to the license and trademark terms and conditions, anyone is free to implement and use language changes, as long as they assume the costs and responsibilities for doing so. Experimentation has long been encouraged and experiences from experiments using language changes on real code bases would certainly better inform language evolution decisions. Unfortunately, others have generally not done these experiments, or if the experiments have been done, the results have not be shared.

I also do not have the power to prevent others from using non-Java languages on the JVM or to force others to run anything on the JVM, nor would I want to exercise such powers even if I had them. Indeed, for some time Sun has endorsed having additional languages for the platform and the main beneficiary of John Rose's JSR 292 work will not be the Java language, but all the non-Java languages hosted on top of the JVM.

I do have the authority to speak on what Sun will and will not spend our resources on in relation to Project Coin, certainly a right any organization reserves to do with its resources.

If there are frustrations waiting for Java language changes, I assure you there are also frustrations working on Java language changes. For example, I find it frustrating (and self-inconsistent) that people state "I don't have technical expertise in this area" while simultaneously expecting their preferences to be selected without any contribution on their part. [8]

Finally, going back to a white paper from 1996, the design of Java quite intentionally said "No!" to various widely-used features from the C/C++ world including a preprocessor and multiple inheritance. Again since the beginning, Java admittedly borrowed features from many other established languages. [9] Given the vast number of potential ways to change the language that have been proposed, many language changes will continue to be called and few will continue to be chosen. In any endeavor there is a tension to balance stability and progress. For the Java language, given the vast numbers of programmers and existing code bases, we try to err on the side of being conservative (saying "No." by default) first to do no harm, but also to preserve the value of existing Java sources, class files, and programmer skills.

There are many other fine languages which run on the JVM platform and I expect the Java language to continue to adopt changes, big and small, informed both by direct experiments with prototypes and by experiences with other languages.

[5]; see also Bruce's

Wednesday Sep 02, 2009

Properties via Annotation Processing

The annotation processing APIs provided by apt in JDK 5 and javac in JDK 6 both present a read-only view of source code; by design directly modifying the input sources is not supported through either annotation processing API. Recently, Project Lombok has used the hook of being able to run an annotation processor inside javac to start an agent which rewrites javac internal classes to change the compiler's behavior; yikes! Such extreme measures are not needed to get much of the effect of modifying the input sources. As I've outlined in the annotation processing forum over the years, just using standard annotation processing to generate a subclass or superclass of a type being processed is a very powerful technique for controlling the ultimate semantics and behavior of the original class. For example, I've hacked up a proof-of-concept annotation type and matching processor to generate property-style getter and setter methods based on annotations on fields.

Concretely, the programmer writes something like

public class TestProperty extends TestPropertyParent {
    protected TestProperty() {}

    protected int property1;

    @ProofOfConceptProperty(readOnly = false) // Generate a setter too.
    protected long property2;

    protected double property3;
    public static TestProperty newInstance(int property1,
                                           long property2,
                                           double property3) {
        return new TestPropertyChild(property1, property2, property3);

and, after suitable annotation processing, using the TestProperty type as in

public class Main {
    public static void main(String... args) {
        TestProperty testProperty = TestProperty.newInstance(1, 2, 3);

    private static void output(TestProperty testProperty) {
        System.out.format("%d, %d, %g%n",

produces the expected output:

prompt$ java Main 
1, 2, 3.00000
1, 42, 3.00000

This approach does have limitations; primarily the annotated class like TestProperty needs to be written to allow its superclass and subclass(es) to be generated. Since it runs as part of the build, the annotation processor needs to be built separately beforehand. The javac command to run the annotation processor looks like:

javac -s ../gen_src/ -d ../bin -processor foo.PocProcessor -cp ../lib 

Good practice sets an output location for generated source code, TestPropertyParent and TestPropertyChild in this case, separate from the output location for class files. When compiling this, javac emits an error before the superclass and subclass are generated, but the entire compilation process completes fine correctly generating the source files and compiling all the files together. Java IDEs have varying levels of support for annotation processing; check your IDEs' documentation for details.

The annotation type and processor is only a proof of concept; many possible refinements are left as "exercises for the reader" including:

  • Developing a second annotation type to mark a class separate from the annotation to configure how each field should be treated.

  • Additional structural checks on the annotated code, proper modifiers on fields and constructors, etc.

  • Generation of equals and hashCode methods.

While using an annotation processor to approximate properties is awkward compared to built-in language support, annotation processors can be used today as part of some toolchains and they are configurable by the user. The code provided should be enough of a starting part for others to experiment with using annotation processors in this fashion; have fun.

Friday Aug 28, 2009

Project Coin: The Final Five (Or So)

First, thanks to all those who submitted interesting proposals and thoughtful comments to Project Coin; a demonstrably vibrant community wants to evolve the Java programming language!

Without further ado, the final Project Coin changes accepted for inclusion in JDK 7 are:

The specification, implementation, and testing of these changes are not final and will continue to evolve as interactions are explored and issues cleared. Two of the listed items are combinations of multiple submitted proposals. The omnibus proposal for improved integer literals includes at least binary literals and underscores in numbers; a way to better handle unsigned literals is desired too. Language support for Collections covers collection literals and allows for developing indexing access for Lists and Maps, assuming the technical problems are resolved.

That leaves a few proposals that went through further consideration, but were not accepted on the final list:

Improved exception handling would be a fine change for the language, but it ventures near the type system and we do not estimate we have resources to fully develop the feature within JDK 7. I would like to see improved exception handling reconsidered in subsequent efforts to evolve the language. While the Elvis operator and related operators are helpful in Groovy, differences between Groovy and Java, such as the presence of primitive types and interactions with boxing/unboxing render the operators less useful in Java. JDK 7 will support other ways to ease the pain of null-handling, such as the null checking enabled by JSR 308. Aggregates natively supporting more than 32-bits worth of entries will no doubt be increasingly needed over time. Language support for collections may allow such facilities to be developed as libraries until the platform directly handles large data structures.

The coin-dev list will remain available for the continued refinement of the accepted changes. Further discussion of proposals not selected is off-topic for the list.

The majority of the implementation of the Project Coin changes should be in JDK 7's development Mercurial repositories by the end of October 2009.

In due course, we intend to file a JSR covering the Project Coin changes.

Monday Aug 17, 2009

Project Coin: Elephants, Knapsacks, and Language Features

Paraphrasing some thoughts already sent to the Project Coin list and discussed at JavaOne this year, there continues to be traffic on the list and elsewhere about the criteria for proposal selection (and non-selection) and those criteria are worth elaborating ahead of the final proposal list being determined in the near future.

First, a reminder from some earlier blog entries describing the context for Project Coin:

"Especially with the maturity of the Java platform, the onus is on the proposer to convince that a language change should go in; the onus is not to prove the change should stay out."
Criteria for desirable small language changes, December 23, 2008

"Given the rough timeline for JDK 7 and other on-going efforts to change the language, such as modules and annotations on types, only a limited number of small changes can be considered for JDK 7."
Guidance on measuring the size of a language change, December 11, 2008

With nearly 70 proposals submitted to the mailing list and the Sun bug database having well over 100 open "requests for enhancements" (rfe's) for the language, the large majority of those proposals and rfe's will not be included in JDK 7 as part of Project Coin or any other effort.

Project Coin will be limited to around 5 proposals total. That's it.

Therefore for Project Coin, in addition to determining whether a proposal to change the language is in and of itself appropriate, a determination also has to be made as to whether the change is more compelling than all but four or so other proposals. In economic terms, there an an opportunity cost in the proposal selection; that is, because of finite resources, choosing to have a particular proposal in the platform removes the opportunity to do other proposals. There will be good, compelling proposals that would improve the language not selected for Project Coin because there are a full set of better, more compelling proposals that are more useful to include instead. Having available prototypes for proposals, running the existing tests, and writing new tests can only better inform the continuing proposal evaluation and selection.

Part of evaluating proposals is judging their utility to the Java platform as a whole. In this way, I've long thought the Java platform is like the elephant in the parable about the blind men and the elephant:

Six Blind Dukes and an Elephant

While each Duke and each of us may know and understand our own usage of Java quite well and have valid ideas about how Java could be changed to improve programming in our own areas of interest (properties! operator overloading! design by contract! your favorite feature!), putting together all that accurate local information might just result in a patchwork elephant:

Patchwork Elephant

Rather than just a collection of local views, a broad perspective is needed to have an accurate, unified picture of the platform so Java can keep evolving and improving as a general-purpose programming language. This approach favors features with broader applicability. For example, a usability improvement to generics, like the diamond operator which allows type parameters to be elided during constructor calls, is usable more often than, say, one of the various proposals to allow extensible or abstract enum types, a change that would only helpful in a small minority of enum declarations.

Even with a broad perspective, there are complexities in feature selection because choosing a set of language proposals is a kind of knapsack problem. That is, each feature has some discrete size and complexity to implement and confers some improvement to the language. There is a bounded size and complexity budget and the goal is maximizing the value held in the knapsack, the value of the set of improvements shipped in a release. Of note is that implementing a language change has much more of a discrete size (or a small selection of possible sizes) rather than a continuous range of possible sizes. In other words, because of the coordinated set of deliverables associated with a language change, it may be reasonable to implement 0%, 50,% or 100% of a possible feature but no other fraction. And doing 50% of the feature might take on quarter of the effort of doing the whole thing or three fourths of that effort.

Even when precise costs and benefits can be quantified, because of these discrete sizes the "greedy" algorithm of putting the highest value / cost item in the knapsack first can lead to globally poor results. If nothing else, having a pre-pass to reduce the number of proposals being considered for further review greatly reduces the combinatorial possibilities of subsets of features that could be included in a release.

Thursday Jul 16, 2009

Project Coin: Literal Grammar Hackery

Correction: External to this blog, it was been pointed out to me that the original grammar disallowed two-digit numbers, which is unintended. The fix is to make the DigitsAndUnderscores component in Digit DigitsAndUnderscores Digit optional, as done in the corrected grammar below

Circling back to look at some unresolved technical details of the underscores in numbers proposal, I wrote up a combined grammar to allow binary literals as well as underscores as separators between digits. That is, underscores cannot appear as the first or last character in a sequence of digits.

The basic grammar change is to convert the definition of Digits (in any base) from the simple left recursive list of digits found in JLSv3, like

Digits Digit

to a list where underscores can appear between numbers but the list must start and end with a digit:

Digit DigitsAndUnderscoresopt Digit
DigitsAndUnderscores DigitOrUnderscore

This grammar is unambiguous, but as written it requires a look ahead of more than 1 because the recursion is in the middle of the Digits production. I have not attempted any of the usual grammar refactorings to restore a look ahead of 1 since in practice purging the underscores will be implemented by a small amount of additional logic in the scanner as opposed to the actual parsing machinery.

The existing rules for distinguishing decimal and octal literals cause minor grammar complications to accommodate underscores immediately after the first digit. Octal numbers must start with a leading zero digit and nonzero decimal numbers must start with a nonzero digit, requirements reflected in rules like NonZeroDigit Digitsopt. To allow underscores after the first digit, a new rule requiring at least one underscore is added, such as NonZeroDigit Underscores Digits. The structure of binary literals is straightforward and entirely analogous to hexadecimal ones. Changing the digit-level productions automatically allows underscores in floating-point literals without the need to explicitly update the rules for those literals.

Productions in blue below are additional or changed productions to existing non-terminals; the other non-terminals below are newly introduced to support the enhanced literal syntax.

BinaryNumeral IntegerTypeSuffixopt
0 b BinaryDigits
0 B BinaryDigits
NonZeroDigit Digitsopt
NonZeroDigit Underscores Digits
Underscores _
Digit DigitsAndUnderscoresopt Digit
DigitsAndUnderscores DigitOrUnderscore
HexDigit HexDigitsAndUnderscoresopt HexDigit
HexDigitsAndUnderscores HexDigitOrUnderscore
0 OctalDigits
0 Underscores OctalDigits
OctalDigit OctalDigitsAndUnderscoresopt OctalDigit
OctalDigitsAndUnderscores OctalDigitOrUnderscore
BinaryDigit BinaryDigitsAndUnderscoresopt BinaryDigit
BinaryDigitsAndUnderscores BinaryDigitOrUnderscore
BinaryDigit: one of
0 1

Tuesday Jun 02, 2009

JavaOne 2009: Project Coin Slides Posted

I presented my technical session about Project Coin, titled more verbosely Small Language Changes in JDK™ Release 7, this afternoon at JavaOne. I've posted the slides. Besides discussing the proposals under further consideration, I also went over some of the experiences from JDK 5 and other considerations we take into account when evolving the language.

Wednesday May 27, 2009

Project Coin: For further consideration, round 2

The first group of proposals selected for further consideration were:

After due deliberation, and next set of proposals meeting the Project Coin criteria for further consideration are:

All the selected proposals were reviewed and judged to have favorable effort to reward ratios and to preserve the essential character of the language.

Work should continue refining the selected proposals and producing prototypes. In particular, a unified proposal for integer literals should be produced.

Language change proposals not on the combined "for further consideration" list will not be included in JDK 7; there is no need for continued discussion about them on the Project Coin mailing list. Detailed rationales for why particular proposals were not selected will not be provided.

Final selection of the five or so proposals to be included in the platform will occur within the next few months.

Wednesday Apr 15, 2009

JavaOne 2009: Small Language Changes in JDK™ Release 7

My JavaOne 2009 proposal for a talk about Small Language Changes in JDK™ Release 7 was accepted and scheduled for the first day of the conference, 3:20 PM to 4:20 PM on June 2, 2009. Besides discussing experiences evaluating proposals submitted to Project Coin, I plan to include some broader thoughts on Java language evolution, including sharing some informative war stories from back in JDK 5.

Tuesday Mar 31, 2009

Project Coin: The Call for Proposals Phase is Over!

Update: Added links to proposal for switch for all types and simple expressions and a link to a revised method chaining proposal.

Project Coin's call for proposals phase is now over! Thirty four days long, the proposal period included nearly 70 proposals being sent to the mailing list, 19 coming in over the last two days, and over 1100 messages on the list discussing those proposals and related topics. With the flurry of pre-deadline activity over, the more deliberative task of finishing reviewing and evaluating the proposals awaits. Including several sent in a few hours after deadline, the proposals received since week four are:

The figure below graphs when proposals were received; nothing like an impending deadline to focus the mind!

Project Coin Proposal Submissions

Sometime after the next for further consideration cut is made, I'll post some thoughts on and reaction to the call for proposals phase as a whole. This will not include a detailed analysis of why each proposal was or was not chosen; however, there will be discussion of common aspects of proposals that led them to be selected or not.

Friday Mar 27, 2009

Project Coin: Week 4 Update

Update: Corrected to include Stephen Colebourne's enhanced enhanced for loop proposal.

Further update: Added links to updated large arrays and compile time access proposals.

Project Coin's fourth week saw continued lively traffic on the mailing list. As the submission deadline approaches, a flurry of new proposals were sent in:

The field of over two dozen proposals previously sent in over the first three weeks of Project Coin was narrowed to six proposals still in consideration for inclusion in JDK 7. The proposals submitted this week and until the end of the call for proposals period will be similarly evaluated for their appropriateness to be added to the language. Finally, the combined list of candidate changes will be produced.

Tuesday Mar 24, 2009

A trail of coins

For those interested in specifically following Project Coin related posts, my Project Coin entries are tagged with "projectcoin":

Project Coin: For further consideration...

In the first three weeks of Project Coin over two dozen proposals have been sent to the mailing list for evaluation. The proposals have ranged the gamut from new kinds of expressions, to new statements forms, to improved generics support. Thanks to all Java community members who have sent in interesting, thoughtful proposals and contributed to informative discussions on the list!

While there is a bit less than a week left in the call for proposals period, there has been enough discussion on the list to winnow the slate of proposals sent in so far to those that merit further consideration for possible inclusion in the platform.

First, Bruce Chapman's proposal to extend the scope of imports to include package annotations will be implemented under JLS maintenance so further action is unnecessary on this matter as part of Project Coin. Second, since the JSR 294 expert group is discussing adding a module level of accessibility to the language, the decision of whether or not to include Adrian Kuhn's proposal of letting "package" explicitly name the default accessibility level will be deferred to that body. Working with Alex, I reviewed the remaining proposals. Sun believes that the following proposals are small enough, have favorable estimated reward to effort ratios, and advance the stated criteria of making things programmers do everyday easier or supporting platform changes in JDK 7:

As this is just an initial cut and the proposals are not yet in a form suitable for direct inclusion in the JLS, work should continue to refine these proposed specifications and preferably also to produce prototype implementations to allow a more thorough evaluation of the utility and scope of the changes. The email list should focus on improving the selected proposals and on getting any remaining new proposals submitted; continued discussion of the other proposals is discouraged.

The final list of small language changes will be determined after the call for proposals is over so proposals sent in this week are certainly still in the running! The final list will only have around five items so it is possible not all the changes above will be on the eventual final list.

Friday Mar 20, 2009

Project Coin: Week 3 Update

Project Coin's third week was another week of lively traffic on the mailing list. New proposals were sent in:

existing proposals were revised:

and discussion continued on ARM and other proposals. The scoping and utility of a few pre-proposals was discussed on the list too.

Ten days remain to get language change proposals in! (Purely libraries changes will be handled by other JDK 7 processes.)

Friday Mar 13, 2009

Project Coin: Week 2 Update

After the vigorous start of week 1, the pace of new proposals being sent to the list slowed:

However, brisk discussion continued on refining and exploring ARM blocks and their variations.

Thursday Mar 12, 2009

Language Model Changes as of JDK 7 Build 50

To date, there have been a few API changes to javax.lang.model.\* in JDK 7. Early in the release, SourceVersion.RELEASE_7 was added to correspond to any new source-level changes coming in JDK 7 (6458819). Eventually, there will be changes to support the modulue construct being added by JSR 294; changes may or may not be needed for language changes coming from Project Coin and JSR 308. Once modulues are added, the JSR 269 API elements meant to cope with the expression problem will be tested. In the mean time, some minor API cleanups have occurred, one to make handling exceptions easier (6794071), another to group common functionality in mixin interfaces (6460529), and some minor API clarifications (6501749). Along the way, there has been a little general bug fixing too (6478017, 6583626, 6498938).

Small tweaks and bug fixing will continue to occur in the javax.lang.model.\* and javax.annotation.processing packages throughout JDK 7 in addition to changes needed to support new language features.

Friday Mar 06, 2009

Project Coin: Week 1 Update

In its first week, Project Coin enjoyed a vigorous start with well over a dozen proposals submitted:

Traffic on the the list has been high, with lots of feedback and analysis leading to some revised proposals.

A few general comments on the proposals that have been sent in so far to help refine those proposals and improve future proposals before they are sent in.

The proposals submitted to Project Coin should already be well thought-through. The goal is to have in short order specifications approaching JLS quality, preferably with a prototype to help validate the design. The feedback on the list should be much closer to finding and illuminating any remaining dark corners of a proposal rather than fleshing out its basic structure. If a proposal does not cite chapter and verse of the JLS for parts of its specification, that is a good indication the proposal is too vague. All affected sections of the JLS should be listed, including binary compatibility and the flow analysis in definite assignment.

It is fine if someone posts to the list to solicit help writing a proposal for a given change.

Proposal writers should be aware of the size and scope parameters established for the project; for background see:

Also, proposal writers should search Sun's bug database for bugs related to the change. The URL for the database is; Java specification issues are in category Java SE and subcategory specification. Of course the database is also searchable with your favorite search engine restricted to that site. Besides the evaluation field from the bug database, the external comment can often also have valuable insight into and discussion of alternatives to solving the problem or reasons why the problem shouldn't be solved.

As has already been happening on the list, authors and advocates of a proposal are responsible for responding to feedback and incorporating changes into any subsequent iterations of the proposal. For now, I think it is adequate to just send the revised proposals to the list. Only if there turns out to be frequent change would a more formal tracking system be warranted. Keeping such discussions on the list is important both to allow easy, centralized tracking of the proposal drafts and also for future language archaeologists who are curious about why a particular decision was made.

After a few iterations of feedback and refinements, the specification and compilation strategy should be sufficiently detailed to provide high-confidence that the proposal is practical and can be reduced to practice. For example, I think the initial proposal for the admittedly simple strings in switch change provides adequate detail on these fronts.

Sunday Mar 01, 2009

Project Coin: Proposal for Strings in switch

Below is a Project Coin language proposal form I wrote for Strings in switch; send any comment to the Project Coin mailing list.


AUTHOR(S): Joseph D. Darcy

Provide a two sentence or shorter description of these five aspects of the feature:
FEATURE SUMMARY: Should be suitable as a summary in a language tutorial.

Add the ability to switch on string values analogous to the existing ability to switch on values of the primitive types.

MAJOR ADVANTAGE: What makes the proposal a favorable change?

More regular coding patterns can be used for operations selected on the basis of a set of constant string values; the meaning of the new construct should be obvious to Java developers.

MAJOR BENEFIT: Why is the platform better if the proposal is adopted?

Potentially better performance for string-based dispatch code.

MAJOR DISADVANTAGE: There is always a cost.

Some increased implementation and testing complexity for the compiler.

ALTERNATIVES: Can the benefits and advantages be had some way without a language change?

No; chained if-then-else tests for string equality are potentially expensive and introducing an enum for its switchable constants, one per string value of interest, would add another type to a program without good cause.

Show us the code!
SIMPLE EXAMPLE: Show the simplest possible program utilizing the new feature.

String s = ...
switch(s) {
 case "foo":

ADVANCED EXAMPLE: Show advanced usage(s) of the feature.

String s = ...
switch(s) {
case "quux":
   // fall-through

 case "foo":
 case "bar":

 case "baz":
   // fall-through


SPECIFICATION: Describe how the proposal affects the grammar, type system, and meaning of expressions and statements in the Java Programming Language as well as any other known impacts.

The lexical grammar is unchanged. String is added to the set of types valid for a switch statement in JLSv3 section 14.11. Since Strings are already included in the definition of constant expressions, JLSv3 section 15.28, the SwitchLabel production does not need to be augmented. The existing restrictions in 14.11 on no duplicate labels, at most one default, no null labels, etc. all apply to Strings as well. The type system is unchanged. The definite assignment analysis of switch statement, JLSv3 section 16.2.9, is unchanged as well.

COMPILATION: How would the feature be compiled to class files? Show how the simple and advanced examples would be compiled. Compilation can be expressed as at least one of a desugaring to existing source constructs and a translation down to bytecode. If a new bytecode is used or the semantics of an existing bytecode are changed, describe those changes, including how they impact verification. Also discuss any new class file attributes that are introduced. Note that there are many downstream tools that consume class files and that they may to be updated to support the proposal!

One way to support this change would be to augment the JVM's lookupswitch instruction to operate on String values; however, that approach is not recommended or necessary. It would be possible to translate the switches to equivalent if-then-else code, but that would require unnecessary equality comparisons which are potentially expensive. Instead, a switch should occur on a predictable and fast integer (or long) function value computed from the string. The most natural choice for this function is String.hashCode, but other functions could also be used either alone or in conjunction with hashCode. (The specification of String.hashCode is assumed to be stable at this point.) If all the string labels have different lengths, String.length() could be used instead of hashCode. Generally a String.equals() check will be needed to verify the candidate string's identity in addition to the evaluation of the screening function because multiple string inputs could evaluate to the same result.

A single case label, a single case label with a default, and two case labels can be special-cased to just equality checks without function evaluations. If there are collisions in String.hashCode on the set of case labels in a switch block, a different function without collisions on that set of inputs should be used; for example ((long)s.hashCode<<32 ) + s.length()) is another candidate function.

Here are desugarings to currently legal Java source for the two examples above where the default hash code do not collide:

// Simple Example
if (s.equals("foo")) { // cause NPE if s is null

// Advanced example
{  // new scope for synthetic variables
 boolean $take_default = false;
 boolean $fallthrough = false;
 $default_label: {
     switch(s.hashCode()) { // cause NPE if s is null
     case 3482567: // "quux".hashCode()
         if (!s.equals("quux")) {
             $take_default = true;
             break $default_label;
         $fallthrough = true;
               case 101574: // "foo".hashCode()
         if (!$fallthrough && !s.equals("foo")) {
             $take_default = true;
             break $default_label;
         $fallthrough = true;
     case 97299:  // "bar".hashCode()
         if (!$fallthrough && !s.equals("bar")) {
             $take_default = true;
             break $default_label;

     case 97307: // "baz".hashCode()
         if (!s.equals("baz")) {
             $take_default = true;
             break $default_label;
         $fallthrough = true;

         $take_default = true;
         break $default_label;

In the advanced example, the boolean "fallthrough" variable is needed to track whether a fall-through has occurred so the string equality checks can be skipped. If there are no fall-throughs, this variable can be removed. Likewise, if there is no default label in the original code, the $take_default variable is not needed and a simple break can be used instead.

In a translation directly to bytecode, the synthetic state variables can be replaced with goto's; expressing this in pseudo Java source with goto:

// Advanced example in pseudo Java with goto
switch(s.hashCode()) { // cause NPE if s is null
case 3482567: // "quux".hashCode()
   if (!s.equals("quux"))
       goto $default_label;
   goto $fooOrBarCode_label;
  case 101574: // "foo".hashCode()
   if (!s.equals("foo"))
       goto $default_label;
   goto $fooOrBarCode_label;

case 97299:  // "bar".hashCode()
   if (!s.equals("bar"))
       goto $default_label;


case 97307: // "baz".hashCode()
   if (!s.equals("baz"))
       goto $default_label;


Related to compilation, a compiler's existing diagnostics around falling through switches, such as javac's -Xlint:fallthrough option and @SuppressWarnings("fallthrough"), should work identically on switch statements based on Strings.

TESTING: How can the feature be tested?

Generating various simple and complex uses of the new structure and verifying the proper execution paths occur; combinations to test include switch statements with and without fall-throughs, with and without collisions in the hash codes, and with and without default labels.

LIBRARY SUPPORT: Are any supporting libraries needed for the feature?


REFLECTIVE APIS: Do any of the various and sundry reflection APIs need to be updated? This list of reflective APIs includes but is not limited to core reflection (java.lang.Class and java.lang.reflect.\*), javax.lang.model.\*, the doclet API, and JPDA.

Only reflective APIs that model statements in the source language might be affected. None of core reflection, javax.lang.model.\*, the doclet API, and JDPA model statements; therefore, they are unaffected. The tree API in javac, does model statements, but the existing API for switch statements is general enough to model the revised language without any API changes.

OTHER CHANGES: Do any other parts of the platform need be updated too? Possibilities include but are not limited to JNI, serialization, and output of the javadoc tool.


MIGRATION: Sketch how a code base could be converted, manually or automatically, to use the new feature.

Look for sequences of if ("constant string".equals(foo)) or if (foo.equals("constant string")) and replace accordingly.

BREAKING CHANGES: Are any previously valid programs now invalid? If so, list one.

All existing programs remain valid.

EXISTING PROGRAMS: How do source and class files of earlier platform versions interact with the feature? Can any new overloadings occur? Can any new overriding occur?

The semantics of existing class files and legal source files and are unchanged by this feature.

EXISTING BUGS: Please include a list of any existing Sun bug ids related to this proposal.

5012262 Using Strings and Objects in Switch case statements.


No prototype at this time.

Friday Feb 27, 2009

Project Coin Now Live

The Project Coin OpenJDK page and mailing list are now live. The call for proposal period will run until March 30, 2009. Let the proposing begin!




« July 2016

No bookmarks in folder