Friday Jul 23, 2010

Understanding Inception using try-with-resources

SPOILERS for the film "Inception" as represented using the try-with-resources statement.[Read More]

Thursday Jul 22, 2010

Writing javac regression and unit tests for new language features

With Java language changes in progress for JDK 7, such as Project Coin's strings in switch, improved exception handling, and try-with-resources statement, writing effective regression and unit tests for javac is an integral component of developing the new language features.

Unit and regression tests differ from conformance tests. Unit and regression tests focus on achieving code coverage on an implementation while conformance tests focus on specification coverage, having tests that probe a high percentage of the testable assertions in the specification. The underlying functionality probed by both styles of testing is shared (source code is accepted or rejected, compiled class files do or do not implement the required semantics), but the different test methodologies differently categorize the space of concerns to be spanned. For more information on the methodology of conformance testing, see these blog entries about finding assertions in the Java language specification and on direct and derived assertions.

While there is overlap in the functionality testable by unit/regression tests and conformance tests, a key difference is that unit/regression tests can be written against implementation-specific functionality while conformance tests must be valid for any implementation (since all implementations are written against same specification). For example, a conformance test suite, like the JCK compiler test suite, can test that a particular source file is properly accepted or rejected by a compiler. Testing that the compiled class file(s) accurately implement the semantics of the source is also in-bounds for a conformance test. (The JCK test suite uses an interview process to configure the test suite to work with a particular compiler so that compiler-specific properties such as how acceptance/rejection of sources is indicated can be accommodated.) The JCK compiler test suite includes both positive tests (testing that programs that are in the language are properly accepted) and negative tests (testing that programs that are outside the language are properly rejected). Besides properties that can be verified in conformance-style testing, the regression and unit tests for a new language feature as implemented in javac also need to verify that various javac-specific properties hold.

For both positive and negative aspects of a language feature, unit and regression tests can cover all properties of interest for a particular compiler. A subset of those properties are also in the domain of conformance tests:

  • Negative Tests

    • Conformance and unit/regression: Invalid source files are rejected. This includes that sources only valid under, say, -source 7, are rejected when javac is run under -source 6 and earlier source settings.

    • Unit/regression only: The expected error messages are provided that reference the right source locations.

  • Positive Tests

    • Conformance and unit/regression: Valid source is compiled.

    • Conformance and/or unit/regression: Proper modeling of the new language construct. Depending on the language feature, the feature may be surfaced in the standard language model API javax.lang.model.\*, which can be tested with a conformance test. However, aspects of a language feature reflected in the javac tree API are only testable with regression or unit tests.

    • Conformance and unit/regressoin: Resulting class files are structurally well-formed. This includes passing verification and other checks done by the JVM upon loading and linking.

    • Unit/regression only: Resulting class files follow compiler-specific idioms. There are many ways a compiler can transform source code into valid class files which have the correct operational semantics.

    • Conformance and unit/regression: Resulting class file(s) when run have correct operational semantics. (Being able to run a class file implies the class file is well-formed.) Tests of operational semantics should be structured so that the code in question must positively run for the test to pass. For example, if a piece of code, like a string switch, was erroneously compiled to no byte codes, the test should fail.

An example of a negative test verifying both conformance and implementation-specific properties is a test included in the changeset for strings in switch:

/*
 * @test  /nodynamiccopyright/
 * @bug 6827009
 * @summary Check for case labels of different types.
 * @compile/fail -source 6 BadlyTypedLabel1.java
 * @compile/fail/ref=BadlyTypedLabel1.out -XDrawDiagnostics BadlyTypedLabel1.java
 */
class BadlyTypedLabel1 {
    String m(String s) {
        switch(s) {
        case "Hello World":
            return(s);
        case 42:
            return ("Don't forget your towel!");
        }
    }
}

Decoding the initial comments as jtreg directives, @test indicates the file is a jtreg test; jtreg is the test harness used for JDK regression and unit tests. By default, jtreg builds the source file and run its main method; however, for compiler tests that combination of actions is often inappropriate. The directive
@compile/fail -source 6 BadlyTypedLabel1.java
means that jtreg should compile the indicated source and expect a failure, meaning the overall test will fail if the compile succeeds. In this case, the @compile/fail directive tests that a string switch is rejected under -source 6. The next directive
@compile/fail/ref=BadlyTypedLabel1.out -XDrawDiagnostics BadlyTypedLabel1.java
is more specific. Not only must the compilation fail, but as specified in the ref= option to @compile/fail, the reported error messages must match the expected output in file BadlyTypedLabel1.out:

BadlyTypedLabel1.java:13:14: compiler.err.prob.found.req: (compiler.misc.incompatible.types), int, java.lang.String
1 error 

The -XDrawDiagnostics option to javac turns on "raw diagnostics" where the source location and resource keys are output instead of the localized text of the error messages. (For more information about javac diagnostics see Improving javac diagnostics and Playing with formatters.) The effect of the second @compile/fail line is thus to verify the proper error message is generated at "case 42:" where an integer label erroneously occurs inside a string switch. Since this kind of test relies on checking messages reporting at particular code locations, no explicit copyright occurs in such test files so that the golden output files do not have to updated if the length of the copyright notice changes; "/nodynamiccopyright/" indicates the explicit copyright is omitted for this reason.

For positive tests, a @compile directive without /fail declares a compile must succeed for the overall test to pass. Modeling tests can generally be run as annotation processors over selected source code. Since annotation processing is built into javac as of JDK 6, @compile directives are one option for running annotation processing tests. The tree API can also be tested via annotation processors by using the Trees class to bridge between javax.lang.model.element.Element objects and the corresponding abstract syntax trees, a technique also used in some regression tests for ordinary javac bug fixes.

The regression tests for multi-catch include checks for generating idiomatic class files. There are a variety of ways the multi-catch aspect of improved exception handling could be implemented. One way to compile multi-catch would be to duplicate the code blocks for each exception, in source terms treating

try {...}
catch (A | B except) {Statements}

as

try {...}
catch (A except) {Statements}
catch (B except) {Statements}

However, for javac we do not consider this compilation strategy to result in an acceptable class file. (There are many implicit requirements for generating class files from Java source code, but generally no explicit specification for the compiler's behavior.) Instead, for javac we require catching the multiple exceptions to be represented in the class file as repeated entries in the table of exception ranges stored in the class file to map exceptions to their handling catch blocks. This check is implemented by using javap APIs to introspect on the structure of the exception table.

Testing operational semantics can be challenging due to the difficulty of computing a known-good result all code paths of interest. For strings in switch, testing the operational semantics of the generated code included comparing the control paths taken through a switch on an enum type with the control paths taken through an analogous switch on the names of the enum constants:

private static int 
enumSwitch(MetaSynVar msv) {
    int result = 0;
    switch(msv) {
    case FOO:
        result |= (1<<0);
        // fallthrough:

    case BAR:
    case BAZ:
        result |= (1<<1);
        break;

    default:
        switch(msv) {
        case QUX:
            result |= (1<<2);
            break;

        case QUUX:
            result |= (1<<3);

        default:
            result |= (1<<4);
        }
        result |= (1<<5);
        break;

    case MUMBLE:
        result |= (1<<6);
        return result;

    case FOOBAR:
        result |= (1<<7);
        break;
    }
    result |= (1<<8);
    return result;
}
private static int 
stringSwitch(String msvName) {
    int result = 0;
    switch(msvName) {
    case "FOO":
        result |= (1<<0);
        // fallthrough:

    case "BAR":
    case "BAZ":
        result |= (1<<1);
        break;

    default:
        switch(msvName) {
        case "QUX":
            result |= (1<<2);
            break;

        case "QUUX":
            result |= (1<<3);

        default:
            result |= (1<<4);
        }
        result |= (1<<5);
        break;

    case "MUMBLE":
        result |= (1<<6);
        return result;

    case "FOOBAR":
        result |= (1<<7);
        break;
    }
    result |= (1<<8);
    return result;
}

Matching code sections in the two switch statements are identified with the same bit position; executing a code section sets the corresponding bit. The test loops over all enum constants and verifies that the set of code sections run when the enum constant itself is switched on matches the set of code sections run when the name of the enum constant is switched on. Notice that if the string switch statement did nothing, the result would be set to 0, which would not match any proper run of the enum-based switch statement. Inside javac, enum switches and string switches have different implementations so checking the new string switch behavior against the old enum switch behavior is a reasonable approach to validate string switch functionality.

Analogous checks implemented with annotations were used to see if exceptions were caught in the proper catch blocks for improved exception handling.

Analogous approaches were employed to develop the unit and regression tests for try-with-resources.

Thanks to Alex, Maurizio, Jon, and Brian for comments on initial drafts of this entry.

Friday Jul 16, 2010

Project Coin ARM Implementation

I'm happy to announce that starting with a prototype written by Tom Ball, Oracle's javac team has produced and pushed an implementation of the try-with-resources statement, otherwise known as ARM blocks, into JDK 7.

Today the resourceful can apply a changeset to a copy of the JDK 7 langtools repository and do a build to get a compiler supporting this feature. Otherwise, following the integration process, support for try-with-resources statements will be available in the promoted JDK 7 builds in due course.

Besides possible refinements to the semantics of the translations, there will likely be other small adjustments to the implementation of the language feature in the future. For example, the lint category named "arm" will likely be renamed to something more like "resource."

Thursday Jul 15, 2010

Project Coin: Updated ARM Spec

Starting with Project Coin proposal for Automatic Resource Management (ARM) (Google Docs version), in consultation with Josh Bloch, Maurizio, Jon, and others, Alex and I have produced a specification for ARM blocks that is much closer to Java Language Specification (JLS) style and rigor. The specification involves changes to the existing JLS section §14.20 "The try statement," and will eventually introduce a new subsection §14.20.3 "Execution of try-with-resources," although the specification below is not partitioned as such. Non-normative comments about the specification text below appear inside "[]". Differences between the new specification and the earlier Project Coin proposal for ARM are discussed after the specification.


SYNTAX: The existing set of grammar productions for TryStatement in JLS §14.20 is augmented with:

TryStatement:
try ResourceSpecification Block Catchesopt Finallyopt

Supporting new grammar productions are added:

ResourceSpecification:
( Resources )
Resources:
Resource
Resource ; Resources
Resource:
VariableModifiers Type VariableDeclaratorId = Expression
Expression

[An implication of the combined grammar is that a try statement must have at least one of a catch clause, a finally block, and a resource specification. Furthermore, it is permissible for a try statement to have exactly one of these three components. Note that it is illegal to have a trailing semi-colon in the resource specification.]

A try-with-resources statement has a resource specification that expresses resources to be automatically closed at the end of the Block. A resource specification declares one or more local variables and/or has one or more expressions, each of whose type must be a subtype of AutoCloseable or a compile-time error occurs.

If a resource specification declares a variable, the variable must not have the same name as a variable declared earlier in the resource specification, a local variable, or parameter of the method or initializer block immediately enclosing the try statement, or a compile-time error occurs.

The scope of a variable declared in a resource specification of a try-with-resources statement (§14.20) is from the declaration rightward over the remainder of the resource specification and the entire Block associated with the try. Within the Block of the try, the name of the variable may not be redeclared as a local variable of the directly enclosing method or initializer block, nor may it be redeclared as an exception parameter of a catch clause in a try statement of the directly enclosing method or initializer block, nor may it be redeclared as a variable in the resource specification, or a compile-time error occurs. However, a variable declared in a resource specification may be shadowed (§6.3.1) anywhere inside a class declaration nested within the Block of the try.

The meaning of a try-with-resources statement with a Catches clause or Finally block is given by translation to a try-with-resources statement with no Catches clause or Finally block:

try ResourceSpecification
  Block
Catchesopt
Finallyopt

try {
  try ResourceSpecification
  Block
}
Catchesopt
Finallyopt

In a try-with-resources statement that manages a single resource:

  • If the initialization of the resource completes abruptly because of a throw of a value V, or if the Block of the try-with-resources statement completes abruptly because of a throw of a value V and the automatic closing of the resource completes normally, then the try-with-resources statement completes abruptly because of the throw of value V.

  • If the Block of the try-with-resources statement completes abruptly because of a throw of a value V1, and the automatic closing of the resource completes abruptly because of a throw of a value V2, then the try-with-resources statement completes abruptly because of the throw of value V1, provided that V2 is an Exception. In this case, V2 is added to the suppressed exception list of V1. If V2 is an error (i.e. a Throwable that is not an Exception), then the try-with-resources statement completes abruptly because of the throw of value V2. In this case, V1 is not suppressed by V2.

If a try-with-resources statement that manages multiple resources:

  • If the initialization of a resource completes abruptly because of a throw of a value V, or if the Block of the try-with-resources statement completes abruptly because of a throw of a value V (which implies that the initialization of all resources completed normally) and the automatic closings of all resources completes normally, then the try-with-resources statement completes abruptly because of the throw of value V.

  • If the Block of the try-with-resources statement completes abruptly because of a throw of a value V1, and the automatic closings of one or more resources (that were previously successfully initialized) complete abruptly because of throws of values V2...Vn, then the try-with-resources statement completes abruptly because of the throw of a value Vi (1 ≤ i ≤ n) determined by the translation below.

The exceptions that can be thrown by a try-with-resources statement are the exceptions that can thrown by the Block of the try-with-resources statement plus the union of the exceptions that can be thrown by the automatic closing of the resources themselves. Regardless of the number of resources managed by a try-with-resources statement, it is possible for a Catchesopt clause to catch an exception due to initialization or automatic closing of any resource.

A try-with-resources statement with a ResourceSpecification clause that declares multiple Resources is treated as if it were multiple try-with-resources statements, each of which has a ResourceSpecification clause that declares a single Resource. When a try-with-resources statement with n Resources (n > 1) is translated, the result is a try-with-resources statement with n-1 Resources. After n such translations, there are n nested try-catch-finally statements, and the overall translation is complete.

The meaning of a try-with-resources statement with a ResourceSpecification clause and no Catches clause or Finally block is given by translation to a local variable declaration and a try-catch-finally statement. During translation, if the ResourceSpecification clause declares one Resource, then the try-catch-finally statement is not a try-with-resources statement, and ResourceSpecificationtail is empty. If the ResourceSpecification clause declares n Resources, then the try-catch-finally statement is treated as if it were a try-with-resources-catch-finally statement, where ResourceSpecificationtail is a ResourceSpecification consisting of the 2nd, 3rd, ..., nth Resources in order. The translation is as follows, where the identifiers #primaryException, #t, and #suppressedException are fresh:

try ResourceSpecification
  Block

{
final VariableModifiers_minus_final R #resource = Expression;
Throwable #primaryException = null;

try ResourceSpecificationtail
  Block
catch (final Throwable #t) {
  #primaryException = t;
  throw #t;
} finally {
  if (#primaryException != null) {
    try {
      #resource.close();
    } catch(Exception #suppressedException) {
      #primaryException.addSuppressedException(#suppressedException);
    }
  } else {
    #resource.close();
  }
}

If the Resource being translated declares a variable, then VariableModifiers_minus_final is the set of modifiers on the variable (except for final if present); R is the type of the variable declaration; and #resource is the name of the variable declared in the Resource.

Discussion: Resource declarations in a resource specification are implicitly final. For consistency with existing declarations that have implicit modifiers, it is legal (though discouraged) for a programmer to provide an explicit "final" modifier. By allowing non-final modifiers, annotations such as @SuppressWarnings will be preserved on the translated code. It is unlikely that the Java programming language will ever ascribe a meaning to an explicit final modifier in this location other than the traditional meaning.
[Unlike the new meaning ascribed to a final exception parameter.]

Discussion: Unlike the fresh identifier in the translation of the enhanced-for statement, the #resource variable is in scope in the Block of a try-with-resources statement.

If the Resource being translated is an Expression, then the translation includes an local variable declaration for which VariableModifiers_minus_final is empty; the type R is the type of the Expression (under the condition that the Expression is assigned to a variable of type AutoCloseable); and #resource is a fresh identifier.

Discussion: The method Throwable.addSuppressedException has a parameter of type Throwable, but the translation is such that only an Exception from #resource.close() will be passed for suppression. In the judgment of the designers of the Java programming language, an Error due to automatic closing of a resource is sufficiently serious that it should not be automatically suppressed in favor of an exception from the Block or the initialization or automatic closing of lexically rightward resources.
[However, perhaps such an Error should instead be recorded as suppressing an exception from the Block or other lexically rightward component.]

Discussion: This translation exploits the improved precision of exception analysis now triggered by the rethrow of a final exception parameter.

The reachability and definite assignment rules for the try statement with a resource specification are implicitly specified by the translations above.


Compared to the earlier proposal, this draft specification:

  • Assumes the revised supporting API with java.lang.AutoCloseable as the type indicating participation in the new language feature.

  • Changes the official grammar for a declared resource from
    LocalVariableDeclaration
    to
    VariableModifiers Type VariableDeclaratorId = Expression
    The former syntactically allowed code like
    AutoCloseable a, b, c
    which would not be useful in this context.

  • Preserves modifiers on explicitly declared resources, which implies @SuppressWarnings on a resource should have the intended effect.

  • States how the exception behavior of close methods is accounted for in determining the set of exceptions a try-with-resource statement can throw.

  • Gives a more precise determination of the type used for the local variable holding a resource given as an Expression. This precision is important to allow accurate exception information to be computed.

  • Provides typing constraints so that type inference works as expected if the Expression given as a Resource in a ResourceSpecification is, say, a generic method or null.

Compiler changes implementing this revised specification remain in progress. After experience is gained with the initial implementation, I expect various changes to the feature to be contemplated:

  • Dropping support for a resource to be specified as a general Expression. Nontrivial specification and implementation complexities arise from allowing a general Expression to be used as resource. Allowing a restricted expression that was just a name may provide nearly all the additional flexibility at marginal additional implementation and specification impact.

  • Adjustments to the suppressed exception logic: in the present specification, an incoming primary exception will suppress an Exception thrown by a close method; however, if the close method throws an error, that error is propagated out without suppressing an incoming primary exception. Possible alternatives include having a primary exception in a try-with-resource statement suppress all subsequent Throwables originating in the statement and having a non-Exception thrown by a close suppress any incoming primary exception.

    These alternatives could be implemented by replacing the translated code

        try {
          #resource.close();
        } catch(Exception #suppressedException) {
          #primaryException.addSuppressedException(#suppressedException);
        }

    with

        try {
          #resource.close();
        } catch(Throwable #suppressedException) {
          #primaryException.addSuppressedException(#suppressedException);
        }

    or

        try {
          #resource.close();
        } catch(Exception #suppressedException) {
          #primaryException.addSuppressedException(#suppressedException);
        } catch(Throwable #throwable) {
          #throwable.addSuppressedException(#primaryException);
          throw #throwable;

        }

    respectively.

Thursday Jul 08, 2010

JavaOne 2010 Talks Scheduled

My two JavaOne talks this year have now been scheduled:

  • Monday, September 20, 11:30AM, Project Coin: Small Language Changes for JDK 7

    Abstract: Project Coin is an effort to select and implement a set of small language changes to enhance programmer productivity in JDK 7. Project Coin language changes include improved integer literals, strings in switch, and the diamond operator. This session will describe the language changes and demo IDE support. In addition, aspects of the selection process and criteria for general language evolution will be discussed.
  • Tuesday, September 21, 8:00PM, Patents, Copyrights, and TMs: An Intellectual Property Primer for Engineers

    Abstract: Increasingly, software engineers interact with a complicated landscape of intellectual property (IP), from software patents, to various copyright licenses, to trademarks and trade secrets. Formulated in terms familiar to engineers, this session will summarize the different types of intellectual property, with a focus on U.S. law. Copyright terms, criteria for patentability, and freedom to operate will all be discussed with examples. The speaker is not a lawyer (he's an engineer) and this talk (and this abstract) does not constitute legal advice. However, the speaker has been issued several U.S. patents and has studied IP in a number of graduate courses.

Lots to do between now and September; see you in San Francisco!

Tuesday Jul 06, 2010

Project Coin: Bringing it to a Close(able)

As a follow-up to the initial API changes to support automatic resource management (ARM) I wrote an annotation processor, CloseableFinder, to programmatically look for types that were candidates to be retrofitted as Closeable or AutoCloseable.

The processor issues a note for a type that has a public no-args instance method returning void whose name is "close" where the type does not already implement/extend Closeable or AutoCloseable. Based on the exceptions a close method is declared to throw, the processor outputs whether the type is a candidate to be retrofitting to just AutoCloseable or to either of Closeable and AutoCloseable. Which of Closeable and AutoCloseable is more appropriate can depend on the semantics of the close method not captured in its signature. For example, Closeable.close is defined to be idempotent, repeated calls to close have no effect. If a close method is defined to not be idempotent, without changing the specification the type can only be correctly retrofitted to AutoCloseable.

To use the processor, first compile it and then configure your compiler or IDE to run the processor. The processor can be compiled under JDK 6. Once compiled, it can be run either under JDK 6 or under a JDK 7 build that has the AutoCloseable interface; the processor will configure itself appropriately based on the JDK version it is running under. For javac, the command line to run the processor can look like:

javac -proc:only \\
-processor CloseableFinder \\
-processorpath Path_to_processor \\
List_of_files

A thread on build-dev discusses how to run an annotation processor over the JDK sources; a larger than default heap size may be needed to process all the files in one command. When run over the JDK 7 sources, the processor finds many candidate types to be retrofitted. After consulting with the teams in question, an additional nine types were retrofitted to work with ARM, two in java.beans, two in java.io, one in java.util, and four in javax.sound; these additional retrofittings have been pushed into JDK 7 and will appear in subsequent builds.

Besides the potential updating of JDBC at some point in the future, other significant retrofitting of JDK classes in java.\* and javax.\* to AutoCloseable/Closeable should not be expected. Unofficial JDK APIs in other namespaces might be examined for retrofitting in the future. The compiler changes to support the ARM language feature remain in progress.

Wednesday Jun 23, 2010

Project Coin: ARM API

The initial API changes to support the Project Coin feature automatic resource management (ARM) blocks have been pushed into JDK 7 (langtools, jdk) and will appear in subsequent builds. The corresponding compiler changes to support the actual language feature remain in progress.

The initial API work to support ARM was divided into two pieces, essential API support and retrofitting platform classes. The essential support includes:

  • A new interface java.lang.AutoCloseable which defines a single method
    void close() throws Exception

  • A new enum constant in the language model:
    javax.lang.model.element.ElementKind.RESOURCE_VARIABLE

  • Methods on java.lang.Throwable to add and retrieve information about suppressed exceptions, including printing out suppressed exceptions in stack traces.

The retrofitting includes:

  • Having java.io.Closeable extend java.lang.AutoCloseable. (From a typing perspective, a subtype of AutoCloseable can be declared to throw fewer exceptions than the supertype. Therefore is is fine for the close method in AutoCloseable to throw Exception and the close method in Closeable to throw the more specific IOException. It would even be fine for the close method in a subtype of AutoCloseable to be declared to throw no exceptions at all.)

  • Adding a close method to java.nio.channels.FileLock and having FileLock implement AutoCloseable.

  • Adding Closeable as an interface implemented by javax.imageio.stream.ImageInputStream.

Other platform classes may be retrofitted to implement AutoCloseable or Closable in future builds.

Compared to the API support in earlier versions of the ARM proposal, the top-level interface to mark participation in ARM is in package java.lang rather than its own package and, after consultation with the JDBC and graphics teams, neither java.sql.\* nor java.awt.Graphics were retrofitted for ARM.

Tuesday Jun 15, 2010

Syntax Sin Tax

In various forums, recent discussion about Project Lambda have commented on, and often noted in dismay, the current syntax for lambda expressions in the initial prototype. "Don't panic!" is advice as valid for work on language evolution as on any other endeavor. Since syntax is the easiest aspect of a language change to form an opinion on, it is the aspect of language changes most susceptible to bikeshedding. While syntax is an important component of language changes, it is far from the only important component; the semantics matter too! Fixation on the syntax of a feature early in its development is premature and counterproductive. Having a prototype to gain actual experience with the feature is more valuable than continued informed analysis and commentary without working code. I believe this diagram included in a talk on the Project Coin language change process holds for language changes in Java more generally:

While proposing and commenting can be helpful, the effort required to produce a prototype is disproportionally beneficial and the incremental effort using the prototype has even higher leverage. Experience trumps speculation. And not all efforts lead to positive results; complaining and obstructing alone are rarely helpful contributions.

Just the engineering needed to fully deliver a language changes involves many coordinated deliverables even without including documentation, samples and user guides. A consequence of an open style of development is that changes are pushed early, even if not often, and early changes imply the full fit and finish of a final product will of necessity not be present from the beginning. Long digressions on small issues, syntactical or otherwise, are a distraction from the other work that needs to get done.

True participation in a project means participating in the work of the project. The work of a language change involves much more than just discussing syntax. Once a prototype exists, the most helpful contribution is to use the prototype and report experiences using it.

Wednesday Jun 09, 2010

Project Coin: Inducing contributory heap pollution

US patent law defines various kinds of patent infringement, as do other jurisdictions. (I am not a lawyer! This is not legal advice! Check your local listings! Don't kill kittens! Example being used for analogy purposes only! ) One can infringe on a patent directly, say, by making, using, selling, offering to sell, or importing a patented widget without a suitable license. A computer scientist looking to infringe might (erroneously) believe the conditions for infringement can be circumvented by applying the familiar technique of adding a level of indirection. For example, one indirection would be selling 90% percent of the patented widget, leaving the end-user to complete the final 10% and thereby infringe. Such contributory infringement is also verboten. Likewise, providing step-by-step instructions on how to infringe the patent is outlawed as inducing infringement. Putting both techniques together, inducing contributory infringement is also disallowed.

Starting in JDK 5, a compiler must issue mandatory unchecked warnings at sites of possible heap pollution:

Java Language Specification, Third Edition — §4.12.2.1 Heap Pollution
It is possible that a variable of a parameterized type refers to an object that is not of that parameterized type. This situation is known as heap pollution. This situation can only occur if the program performed some operation that would give rise to an unchecked warning at compile-time.

One case where unchecked warnings occur is a call to a varargs method where the type of the variable argument is not reifiable. That is, where the type information for the parameter is not fully expressible at runtime due to the erasure of generics. Varargs are implemented using arrays and arrays are reified; that is, the component type of an array is stored internally and used when needed for various type checks at runtime. However, the type information stored for an array's component type cannot store the information needed to represent a non-reifiable parameterized type.

The mismatch between reified arrays being used to pass non-reified (and non-reifiable) parameterized types is the basis for the unchecked warnings when such conflicted methods are called. However in JDK 5, only calling one of conflicted methods causes a compile-time warning; declaring such a method doesn't lead to any similar warning. This is analogous to the compiler only warning of direct patent infringement, while ignoring or being oblivious too indirect infringement. While the mere existence of a conflicted varargs method does not cause heap pollution per se, its existence contributes to heap pollution by providing an easy way to cause heap pollution to occur and induces heap pollution by offering the method to be called. By this reasoning, if method calls that cause heap pollution deserve a compiler warning, so do method declarations which induce contributory heap pollution.

Additionally, the warnings issued for some calls to varargs methods involving heap pollution are arguably spurious since nothing bad happens. For example, calling various useful helper varargs methods in the platform trigger unchecked warnings, including:

These three methods all iterate over the varargs array pulling out the elements in turn and processing them. If the varargs array is constructed by the compiler using proper type inference, the bodies of the methods won't experience any ClassCastExceptions due to handling of the array's elements. Currently, to eliminate the warnings associated with calling these methods, each call site needs a @SuppressWarnings("unchecked") annotation.

To address these usability issues with varargs, Project Coin accepted simplifying varargs as one if the project's changes. The initial prototype version of this feature pushed by Maurizio, has several parts:

  • A new mandatory compiler warning is generated on declaration sites of problematic varargs methods that are able to induce contributory heap pollution.

  • The ability to suppress those mandatory warnings at a declaration site using an @SuppressWarnings("varargs") annotation. The warnings may also be suppressing using the -Xlint:-varargs option to the compiler.

  • If the @SuppressWarnings("varargs") annotation is used on a problematic varargs method declaration, the unchecked warnings at call sites of that method are also suppressed.

This prototype will allow experience to be gained with the algorithms to detect and suppress the new mandatory warnings. However, the annotation used to suppress the warnings should be part of the varargs method's contract, denoting that when a compiler-constructed array is passed to the method nothing bad will happen, for a suitable definition of nothing bad. Therefore, an @Documented annotation type needs to be used for this purpose and SuppressWarnings is not @Documented. Additionally, the suppressing annotation for varags should also be @Inherited so the method implementation restrictions are passed on to subclasses.

Subsequent design discussions about the new annotation type with the properties in question to suppress the varargs warnings as well as criteria for the annotation to be correctly applied can occur on the Project Coin mailing list.

Monday May 10, 2010

JavaOne 2010 Talks Accepted!

I was happy to be notified today that I have two JavaOne talks accepted for this year. The first is a session on Project Coin: Small Language Changes for JDK 7 in the core Java platform track. When I spoke at JavaOne 2009 about Project Coin, the feature selection wasn't yet finalized and I covered some of the general concerns that influence language evolution and spoke on feature selection methodology. Now that additional Coin features are becoming available in JDK 7 builds, this year I expect to focus more on experiences implementing and using the new features.

My second accepted talk is a bof on Patents, Copyrights, and TMs: An Intellectual Property Primer for Engineers over in the Java Frontier track. I've wanted to put together a talk on this topic for several years to condense what I've learned from taking a few classes on intellectual property and filing (and eventually getting issued) several patents.

See you in San Francisco in a few short months!

Monday May 03, 2010

Project Coin: multi-catch and final rethrow

As alluded to as a possibility previously, I'm happy to announce that improved exception handling with multi-catch and final rethrow will be part of an upcoming JDK 7 build. Improved exception handling is joining other Project Coin features available in the repository after successful experiences with a multi-catch implementation developed by Maurizio Cimadamore.

Maurizio's work also revealed and corrected a flaw in the originally proposed static analysis for the set of exception that can be rethrown; from the original proposal form for this feature:

[a] final catch parameter is treated as throwing precisely those exception types that

  • the try block can throw,
  • no previous catch clause handles, and
  • is a subtype of one of the types in the declaration of the catch parameter

Consider a final rethrow statement as below where the dynamic class of a thrown exception differs from the static type (due to a cast in this case):

class Neg04 {
  static class A extends Exception {}
  static class B extends Exception {}

  void test(boolean b1, boolean b2) throws B {
      try {
          if (b1) {
              throw new A();
          } else if (b2) {
              throw new B();
          } else {
              throw (Throwable)new Exception();
          }
      }
      catch (A e) {}
      catch (final Exception e) {
          throw e;
      }
      catch (Throwable t) {}
  }
}

The set of exceptions thrown by the try block is computed {A, B, Throwable}; therefore, the set of exceptions that can be rethrown is the set of exceptions from the try block:

  1. minus A, handled by a previous catch clause, giving {B, Throwable}
  2. minus Throwable since Throwable is not a subtype of one of the types declared for the catch parameter (just Exception in this case), leaving only {B}

However, if an Exception is thrown from the try block it should be caught in the "catch(final Exception e)" clause even if the exception is cast to Throwable since catch clauses work based on the runtime class of the exceptions being thrown.

To address this, the third clause is changed to

  • is a subtype/supertype of one of the types in the declaration of the catch parameter

More formally, this clause covers computing a join over the set of thrown exceptions, eliminating subtypes. In the example above {Throwable} is computed as the set of exceptions being throwable from the try block. This is then intersected with the exceptions that can be caught by the catch block, resulting in {Exception}, a properly sound result.

Very general exception types being thrown by a try block would reduce the utility of multi-catch since only imprecise information would be available. Fortunately, from analyzing the JDK sources, throwing a statically imprecise exception seems rare, indicating multi-catch with the amended specification should still be very be helpful in practice.

Today the adventurous can apply a changest to a copy of the JDK 7 langtools repository and do a build to get a compiler supporting this feature. Otherwise, following the integration process, improved exception handling will appear in the promoted JDK 7 builds in due course.

Thursday Feb 25, 2010

Notions of Floating-Point Equality

Moving on from identity and equality of objects, different notions of equality are also surprisingly subtle in some numerical realms.

As comes up from time to time and is often surprising, the "==" operator defined by IEEE 754 and used by Java for comparing floating-point values (JLSv3 §15.21.1) is not an equivalence relation. Equivalence relations satisfy three properties, reflexivity (something is equivalent to itself), symmetry (if a is equivalent to b, b is equivalent to a), and transitivity (if a is equivalent to b and b is equivalent to c, then a is equivalent to c).

The IEEE 754 standard defines four possible mutually exclusive ordering relations between floating-point values:

  • equal

  • greater than

  • less than

  • unordered

A NaN (Not a Number) is unordered with respective to every floating-point value, including itself. This was done so that NaNs would not quietly slip by without due notice. Since (NaN == NaN) is false, the IEEE 754 "==" relation is not an equivalence relation since it is not reflexive.

An equivalence relation partitions a set into equivalence classes; each member of an equivalence classes is "the same" as the other members of the classes for the purposes of that equivalence relation. In terms of numerics, one would expect equivalent values to result in equivalent numerical results in all cases. Therefore, the size of the equivalence classes over floating-point values would be expected to be one; a number would only be equivalent to itself. However, in IEEE 754 there are two zeros, -0.0 and +0.0, and they compare as equal under ==. For IEEE 754 addition and subtraction, the sign of a zero argument can at most affect the sign of a zero result. That is, if the sum or difference is not zero, a zero of either sign doesn't change the result. If the sum or differnece is zero and one of the arguments is zero, the other argument must be zero too:

  • -0.0 + -0.0 ⇒ -0.0

  • -0.0 + +0.0 ⇒ +0.0

  • +0.0 + -0.0 ⇒ +0.0

  • +0.0 + +0.0 ⇒ +0.0

Therefore, under addition and subtraction, both signed zeros are equivalent. However, they are not equivalent under division since 1.0/-0.0 ⇒ -∞ but 1.0/+0.0 ⇒ +∞ and -∞ and +∞ are not equivalent.1

Despite the rationales for the IEEE 754 specification to not define == as an equivalence relation, there are legitimate cases where one needs a true equivalence relation over floating-point values, such as when writing test programs, and cases where one needs a total ordering, such as when sorting. In my numerical tests I use a method that returns true for two floating-point values x and y if:
((x == y) &&
(if x and y are both zero they have the same sign)) ||
(x and y are both NaN)
Conveniently, this is just computed by using (Double.compare(x, y) == 0). For sorting or a total order, the semantics of Double.compare are fine; NaN is treated as being the largest floating-point values, greater than positive infinity, and -0.0 < +0.0. That ordering is the total order used by by java.util.Arrays.sort(double[]). In terms of semantics, it doesn't really matter where the NaNs are ordered with respect to ther values to as long as they are consistently ordered that way.2

These subtleties of floating-point comparison were also germane on the Project Coin mailing list last year; the definition of floating-point equality was discussed in relation to adding support for relational operations based on a type implementing the Comparable interface. That thread also broached the complexities involved in comparing BigDecimal values.

The BigDecimal class has a natural ordering that is inconsistent with equals; that is for at least some inputs bd1 and bd2,
c.compare(bd1, bd2)==0
has a different boolean value than
bd1.equals(bd2).3
In BigDecimal, the same numerical value can have multiple representations, such as (100 × 100) versus (10 × 101) versus (1 × 102). These are all "the same" numerically (compareTo == 0) but are not equals with each other. Such values are not equivalent under the operations supported by BigDecimal; for example (100 × 100) has a scale of 0 while (1 × 102) has a scale of -2.4

While subtle, the different notions of numerical equality each serve a useful purpose and knowing which notion is appropriate for a given task is an important factor in writing correct programs.


1 There are two zeros in IEEE 754 because there are two infinities. Another way to extend the real numbers to include infinity is to have a single (unsigned) projective infinity. In such a system, there is only one conceptual zero. Early x87 chips before IEEE 754 was standardized had support for both signed (affine) and projective infinities. Each style of infinity is more convenient for some kinds of computations.

2 Besides the equivalence relation offered by Double.compare(x, y), another equivalence relation can be induced by either of the bitwise conversion routines, Double.doubleToLongBits or Double.doubleToRawLongBits. The former collapses all bit patterns that encode a NaN value into a single canonical NaN bit pattern, while the latter can let through a platform-specific NaN value. Implementation freedoms allowed by the original IEEE 754 standard have allowed different processor families to define different conventions for NaN bit patterns.

3 I've at times considered whether it would be worthwhile to include an "@NaturalOrderingInconsistentWithEquals" annotation in the platform to flag the classes that have this quirk. Such an annotation could be used by various checkers to find potentially problematic uses of such classes in sets and maps.

4 Building on wording developed for the BigDecimal specification under JSR 13, when I was editor of the IEEE 754 revision, I introduced several pieces of decimal-related terminology into the draft. Those terms include preferred exponent, analogous to the preferred scale from BigDecimal, and cohort, "The set of all floating-point representations that represent a given floating-point number in a given floating-point format." Put in terms of BigDecimal, the members of a cohort would be all the BigDecimal numbers with the same numerical value, but distinct pairs of scale (negation of the exponent) and unscaled value.

Thursday Feb 11, 2010

Project Coin: Taking a Break for Strings in Switch

The initial way a string switch statement was implemented in JDK 7 was to desugar a string switch into a series of two switch statements, the first switch mapping from the argument string's hash code to the ordinal position of a matching string label followed by a second switch mapping from the computed ordinal position to the code to be executed. Before this approach was settled on, Jon, Maurizio, and I had extensive discussions about alternative implementation techniques. One approach from Maurizio we seriously considered using employed labeled break statements (in lieu of unavailable goto statements) to allow a string switch to be desugared into a single integer switch statement. In this approach as well, the basis for the integer switch built around the strings' hash codes.

One kind of complication in desugaring string switch statements stems from irregular control flow, such as when control transfers to one label, code is executed, and then control falls through to the code under the next label rather than exiting the switch statement after the initial code execution. When using hash codes to identify the string being switched on, another class of complications stem from dealing with the possibility of hash collisions, the situation where two distinct strings have the same hash code. A string can be constructed to have any integer hash code so collisions are always a possibility. Since many strings have the same hash code, it is not sufficient to verify the string being switched on just has the same hash value as a string case label; the string being switched on must be checked for equality with the case label string. Furthermore, when two string case labels have the same hash value, a string being switched on with a matching hash code must be checked for equality potentially against both case labels.

While relying on hash codes to implement string switch is contentious with some, the hashing algorithm of java.lang.String is an extremely stable part of the platform and there would be too much behavioral compatibility risk in changing it. Therefore, the stability of the algorithm can be relied on as a resource in possible string switch implementations. Switching on the hash code in the desugaring confers a number of benefits. First, most immediately the hash code maps the string to an integer value, matching the type required for the existing switch statement. Second, switching on the hash code of a string bounds the worst case behavior. The simplest way to see if a chosen string is in a set of other strings, such as the set of string case labels, would be to compare the chosen string to each of the strings in the set. This could be expensive since the chosen string would need to be traversed many times, potentially once for each case label. The hash code of a string is typically cached after it is first computed. Therefore, when switching on the hash code, the chosen string is not expected to be traversed more than twice (once to compute the hash code if not cached, again to compare against strings from the set of strings with the same hash value — a set usually with only one element).

If instead of a series of two desugared switch statements, only a single switch statement were desired in the desugaring, extra synthetic state variables could be used to contend with hash collisions, fall-through control flows, and default cases, as described in the Project Coin strings in switch proposal. A goto construct could be used to eliminate state variables, but goto is neither available in the source language nor in javac's intermediate representation. However, by a novel use of nested labeled breaks, a single switch statement can be used in the desugaring without introducing additional synthetic control variables.

Consider the strings switch statement in the method f below

static void f(String s) { // Original sugared code
  switch (s) {
    case "azvl":
      System.out.println("azvl: "+s); // fallthrough
    case "quux":
      System.out.println("Quux: "+s); // fallthrough
    case "foo":
      int i = 5; //fallthrough
    case "bar":
      System.out.println("FooOrBar " + (i = 6) + ": "+s);
      break;
    case "bmjrabc": // same hash as "azvl"
      System.out.println("bmjrabc: "+s);
      break;
    case "baz":
       System.out.println("Baz " + (i = 7) + ": "+s); // fallthrough
    default:
      System.out.println("default: "+s);
  }
}

and the following desugaring procedure. Create a labeled block to enclose the entire switch statement. Within that enclosing block, create a series of nested blocks, one for each case label, including a default option, if any. In the innermost block, have a switch statement based on the hash code of the strings in the original case labels. For each hash value present in the set of case labels, have an if-then-else chain comparing the string being switched on to the cases having that hash value, breaking to the corresponding label if there is a match. If a match does not occur, if the original switch has a default option, a break should transfer control to the label for the default case; if the original case does not have a default option, a break should occur to the switch exit label.

If a hash value only corresponds to a single case label, the sense of the equality/inequality comparison in the desugared code can be tuned for branch prediction purposes. After the block for a case label is closed, the code for that alternative appears. In the original switch code, there are two normal completion paths of interest: the code for an alternative is run and execution falls through to the next alternative or there is an unlabeled break to exit the switch. In the desugaring, these paths are represented by execution falling through to code for the next alternative and by a labeled break to the label synthesized for the switch statement exit. The preservation of fall through semantics is possible because the code interspersed in the nested labeled statements appears in the same textual order as in the original "sugared" string switch. Local variables can be declared in the middle of a switch block. In desugared code, such variable declarations are hoisted out to reside in the block for the entire switch statement; the declaration of the variable and its uses are then renamed to a synthetic value to avoid changing the meaning of names in other scopes. Sample results of this procedure are shown below.

static void f(String s) { // Desugared code
  $exit: {
    int i$foo = 0;
    $default_label: {
      $baz: {
        $bmjrabc: {					    
          $bar: {
            $foo: {
              $quux: {
                $azvl: {
                  switch(s.hashCode()) { // cause NPE if s is null
	          case 3010735: // "azvl" and "bmjrabc".hashCode()
                      if (s.equals("azvl"))
		        break $azvl;          
                      else if (s.equals("bmjrabc"))
                        break $bmjrabc;
                      else
                        break $default_label;
                    case 3482567: // "quux".hashCode()
                      if (!s.equals("quux")) // inequality compare
                        break $default_label;
                      break $quux;
                    case 101574: // "foo".hashCode()
                      if (s.equals("foo")) // equality compare
                        break $foo;
                      break $default_label;
                    case 97299:  // "bar".hashCode()
                      if (!s.equals("bar"))
                        break $default_label;
                      break $bar;
                    case 97307: // "baz".hashCode()
                      if (!s.equals("baz"))
                        break $default_label;
                      break $baz;
                    default:
                      break $default_label;
                  }//switch
                }//azvl
                System.out.println("azvl: "+s); // fallthrough
              } //quux
              System.out.println("Quux: "+s); // fallthrough
            } //foo
            i$foo = 5;
          }//bar
          System.out.println("FooOrBar " + (i$foo = 6) + ": "+s);
          break $exit;
        }//bmjrabc
        System.out.println("bmjrabc: " + s);
        break $exit;
      } //baz
      System.out.println("Baz " + (i$foo = 7) + ": "+s); // fallthrough
    }//default_label
    System.out.println("default: "+s);
  }//exit
}

While the series of two switches and the labeled break-based desugaring were both viable alternatives, we choose the series of two switches since the transformation seemed more localized and straightforward. The two-switch solution also has simpler interactions with debuggers. If string switches become widely used, profiling information can be used to guide future engineering efforts to optimize their performance.

Monday Nov 30, 2009

Projec Coin: Post-Devoxx Update, closures and exception handling

As has been announced recently at Devoxx and covered in various places, including threads on the coin-dev mailing list, Mark Reinhold made several announcements about JDK 7 at this year's Devoxx:

  1. JDK 7 will have a form of closures.

  2. The JDK 7 schedule is being extended to fall 2010.

On the first announcement, the coin-dev list is not the appropriate forum to discuss closures in Java. Closures are hereby decreed as off-topic for coin-dev.

Mark's blog entry "Closures for Java" invites those with an informed opinion to participate in the current discussion; watch Mark's blog for news about creation of a new list or project, etc., to host this closures effort.

On the second announcement, while the JDK 7 schedule has been extended, many of the current final five (or so) Project Coin features have not yet been fully implemented, specified, and tested. Therefore, there will not be a general reassessment of Project Coin feature selection or another call for proposals in JDK 7. The final five (or so) proposals remain selected for inclusion in JDK 7 and work will continue to complete those features. However, given its technical merit and the possibility of providing useful infrastructure for ARM, improved exception handling is now being reconsidered for inclusion in JDK 7. No other "for further consideration" proposal is under reconsideration.

Wednesday Nov 18, 2009

Project Coin at Devoxx 2009

Wednesday evening local time I gave a talk about Project Coin at Devoxx 2009. The video of the talk will be available on Parleys in due course; in the mean time, my slides are online. Temping the wrath of the demo gods, I enjoyed showing the support Project Coin features have today in a developer build of NetBeans.

Project Coin: Milestone 5 Language Features in NetBeans

To go along with the language changes available in JDK 7 milestone 5, the NetBeans team has created a developer build of NetBeans supporting the same set of language changes, including improved integer literals, the diamond operator, and strings in switch.

In addition to just accepting the new syntax, the NetBeans build has some deeper support too. For example, when auto-completing on a constructor with type arguments, the diamond operator is offered as a completion. To see what bounds were computed for a diamond instance, you can just hit Ctrl and click on the constructor; the bounds will appear in a pop-up. To encourage use of strings in switch, NetBeans will recognized the pattern of "if(s.equals("foo") {...} else if (s.equals("bar)..." and offer to transform that into strings in switch, just click on the "Convert to switch" lightbulb on the left hand side.

This NetBeans build with Coin support is based on NetBeans 6.8, but when 6.8 ships later this year it will not include support for Project Coin features. Project Coin support will be included in subsequent NetBeans releases.

Wednesday Nov 04, 2009

Project Coin: Anatomy of adding strings in switch to javac

Earlier this week, I happily pushed an implementation of Project Coin's strings in switch language feature into a repository being used for JDK 7 milestone 5. JDK 7 binaries with strings in switch will be available for downloading in due course.

The javac compiler uses the standard compiler architecture of having successive phases and adding strings in switch required modifications to several links in the chain of phases.

Since JDK 1.4.x, javac has been multiple compilers in one since it has supported multiple source settings to specify what version of the language to use. (But when cross-compiling to an older language version, make sure to set the bootclasspath!) Information about which features are supported in a given version is encapsulated in javac's class com.sun.tools.javac.code.Source. When other parts of the compiler need to query if a feature is supported, they call a boolean method like Source.allowFeature(). Strings in switch added another method to this class; in diff notation:

+ public boolean allowStringsInSwitch() {
+ return compareTo(JDK1_7) >= 0;
+ }

The text of a source file being compiled is first turned into an abstract syntax tree (AST). The lexing and parsing processes which create the trees only verify the syntactic structure of the code. Semantic checks are done in subsequent phases of the compiler; many of those type checks are done during "attribution" by code in com.sun.tools.javac.comp.Attr. For switch statements, the constant-ness of case labels, having no repeated labels, and the labels' types matching the type of the expression being switched on are all handled in Attr. Attr also enforces the type restrictions on the switch expression; the expression can have an integer type, an enum type (as of JDK 5), and now a string type (as of JDK 7).

Because the trees in javac were sufficiently abstract, modifying Attr to support strings in switch was very straightforward. Most of the code was just for adding the allowStringsInSwitch check and generating an error message specific for a non-constant string. As shown in the patch below, no code implementing the actual semantic checks for non-repeated labels, etc. had to be adjusted to support strings in switch; all the existing checking logic was reused without modification.

--- a/src/share/classes/com/sun/tools/javac/comp/Attr.java	Thu Aug 27 13:40:48 2009 +0100
+++ b/src/share/classes/com/sun/tools/javac/comp/Attr.java	Mon Nov 02 21:36:59 2009 -0800
@@ -115,6 +115,8 @@ public class Attr extends JCTree.Visitor
         allowBoxing = source.allowBoxing();
         allowCovariantReturns = source.allowCovariantReturns();
         allowAnonOuterThis = source.allowAnonOuterThis();
+        allowStringsInSwitch = source.allowStringsInSwitch();
+        sourceName = source.name;
         relax = (options.get("-retrofit") != null ||
                  options.get("-relax") != null);
         useBeforeDeclarationWarning = options.get("useBeforeDeclarationWarning") != null;
@@ -166,6 +168,16 @@ public class Attr extends JCTree.Visitor
      \* API warnings.
      \*/
     boolean enableSunApiLintControl;
+
+    /\*\*
+     \* Switch: allow strings in switch?
+     \*/
+    boolean allowStringsInSwitch;
+
+    /\*\*
+     \* Switch: name of source level; used for error reporting.
+     \*/
+    String sourceName;
 
     /\*\* Check kind and type of given tree against protokind and prototype.
      \*  If check succeeds, store type in tree and return it.
@@ -886,7 +898,15 @@ public class Attr extends JCTree.Visitor
         boolean enumSwitch =
             allowEnums &&
             (seltype.tsym.flags() & Flags.ENUM) != 0;
-        if (!enumSwitch)
+        boolean stringSwitch = false;
+        if (types.isSameType(seltype, syms.stringType)) {
+            if (allowStringsInSwitch) {
+                stringSwitch = true;
+            } else {
+                log.error(tree.selector.pos(), "string.switch.not.supported.in.source", sourceName);
+            }
+        }
+        if (!enumSwitch && !stringSwitch)
             seltype = chk.checkType(tree.selector.pos(), seltype, syms.intType);
 
         // Attribute all cases and
@@ -909,7 +929,8 @@ public class Attr extends JCTree.Visitor
                     Type pattype = attribExpr(c.pat, switchEnv, seltype);
                     if (pattype.tag != ERROR) {
                         if (pattype.constValue() == null) {
-                            log.error(c.pat.pos(), "const.expr.req");
+                            log.error(c.pat.pos(),
+                                      (stringSwitch ? "string.const.req" : "const.expr.req"));
                         } else if (labels.contains(pattype.constValue())) {
                             log.error(c.pos(), "duplicate.case.label");
                         } else {

Two new error messages were added for strings in switch. The first reports that a constant string expression is required rather than just a constant expression. The second reports that a string switch statement has been found in a source level where that feature is not allowed. The javac convention for this kind of error message is to report the source level being used in the current compile and the source level where the structure starts being supported:

+compiler.err.string.const.req=\\
+    constant string expression required

+compiler.err.string.switch.not.supported.in.source=\\
+    strings in switch are not supported in -source {0}\\n\\
+(use -source 7 or higher to enable strings in switch)

The bulk of the compiler changes to support strings in switch were made in com.sun.tools.javac.comp.Lower, the compiler phase which translates away syntactic sugar, lowering structures to trees implementing the formerly sugar-coated functionality. For example, Lower already had a method to translate enum switches into a switch on an integer value retrieved from an enum → int map. The initial strings in switch implementation uses a similar technique: a single string switch in source code is lowered into a series of two switches. The first switch is a new synthesized switch on the string's hash code, which gets mapped to a label's ordinal position on the list of case statements. The second switch is structurally identical to the original string switch from the source except that the string case labels are replaced by integer positions and the computed position from the synthesized switch is the expression being switched on.

Discussion of various design issues and implementation alternatives can be found in the full code changes in Lower; see the webrev of the changeset for details. The chained series of switch statements to implement strings is switch is a refinement of the implementation approaches outlined in the original Project Coin strings in switch proposal.

Various smaller implement points about the code in Lower are worth noting too.

A pair of maps are built to hold information about translating string case labels → positions and hash codes → case labels with that hash. In order to make the compiler's output predictable and repeatable, the maps and sets used in these data structures are LinkedHashMaps and LinkedHashSets rather than just HashMaps and HashSets. In terms of functional correctness of code generated during a given compile, using HashMap and HashSet would be fine; the iteration order does not matter. However, we find it beneficial to have javac's output not vary based on implementation details of system classes.

The code synthesis in Lower uses a compiler-internal tree maker API. The resulting trees are unprocessed; the various consistency properties and derived information calculated by earlier compiler phases are not directly enforced. The code in Lower is responsible for generating correct code and computing necessary derived information; if this is not done correctly, later phases like Gen, which translates trees into byte code, can fail. For example, trees have an associated source position; this is preserved in debugging information. The position for the initial synthesized switch is set to the source position of the start of the original switch statement. The tree node for a break statement stores information about the target of the break, which is another tree node. Therefore, break statements synthesized as part of the initial switch need this target information to be set and any break nodes in from original switch statement need to have their targets reset to the newly generated enclosing switch where integer labels have replaced the string ones. This target patching logic gets thoroughly exercised by the tests with nested strings in switch statements!

The new tests verify that the proper structural checks are enforced for string switches as well as verify that the proper execution paths are taken on different inputs for switches with a variety of control flow shapes, including multiple case labels, case labels with colliding hash codes, and nested switches.

Now that strings in switch are specified, implemented, and tested, I'm looking forward to working on the remaining Project Coin features on the final acceptance list.

Wednesday Sep 23, 2009

Java Posse #279: A View from an Eggshell-Colored Clock Tower

Dick Wall and the rest of the Java Posse graciously offered to interview Alex Buckley and me one evening at the JVM Language Summit last week. Our discussion about general evolution of the Java programming language, including Project Coin and reactions to previous podcasts and Google group threads, is now available as episode #279 of the Java Posse podcast.

Monday Sep 14, 2009

Java Posse #277 Feedback: Still not a view from an ivory tower

A follow-up entry to Dick Wall's Google Group post to my earlier reaction to Java language evolution and management concerns raised in the first twenty minutes of episode #277 of the Java Posse podcast.


Anyone can have an opinion. Having an informed opinion takes some effort. Implementing the conclusions of an informed opinion can take considerably more effort.

Project Coin != http://bugreport.sun.com/bugreport/

For many years people with ideas for language changes (and other matters) have been welcome to submit them to bugs.sun.com; there is no expectation that something other than a rough idea is required. These ideas are evaluated and there are well over 100 open "requests for enhancement" (rfes) related to language changes. I reviewed all of these open ideas before embarking on Project Coin. Many other submitted language rfes have been considered and subsequently closed.

Project Coin explicitly offered a different social contract than bugs.sun.com; beyond just a vague idea, contributors were invited to participate in the work of bringing a language change into the platform. To be clear, the abundance of open language rfes means an additional idea for language change in and of itself has essentially no value. Instead, the coin of the realm in Project Coin was the analysis of the impacts and benefits of a language change and code for a prototype implementation. Those are valuable because they are essential components of the work needed to bring a language change to the platform. The Project Coin Proposal form [1] guided the analysis and the OpenJDK langtools repository gave a starting point for a compiler prototype. People could collaborate on different aspects of a proposal and the Project Coin list explicitly made such requests for assistance on-topic. The recommendation for prototypes was not made for punitive purposes; rather it was made so that more accurate information could be gathered about the language change. Quoting a comment left by Mark Mahieu on my blog [2]:

"By the time I posted a link to a runnable prototype [of the Elvis operator] to the list, my own understanding was much, much clearer, and very different; after delving into the real detail I'd come to the conclusion that it's not a good enough fit (for Java) in the form proposed."

In contrast to this productive discourse, take the brouhaha over not including multi-catch in the Coin final five left in comments on my blog. [3] My message announcing the final five makes clear that this decision was made based on resourcing concerns rather than the merits of the idea itself. Not one of the people leaving comments full of wailing and gnashing of teeth about the omission offered to do anything to help implement the feature.

It is far easier to impose demands than to satisfy them. When there is no "cost connection" between those imposing demands and those satisfying them, ridiculous expectations can result, such as this individual [4] whose series of requests to jdk7-dev in June I will paraphrase:

"Hi. I'm some random Java developer with admittedly little technical expertise [5] and no money. I read one blog entry written by Neal Gafter several years ago [6]; despite my lack of money and admitted lack of technical expertise, I think reading that single blog entry written by someone else should imbue me with enough authority to dictate [7] how other people should allocate their resources to work on the cool Java language changes I personally want to see."

I have exactly zero respect for this line of thinking and see no reason to tolerate it.

If someone says he doesn't know what he is talking about, I believe him. I also take the next logical step of not giving much credence to his conclusions and demands.

No one is stopping this fellow or any other interested person from taking a compiler course, downloading the OpenJDK sources to javac, reading the considerable programming language literature of generics, and working on an implementation of reified generics or some other language variant. (The careful reader will note that Microsoft's papers describing reified generics in CLR emphasize how fast they were able to make List<int> go and do not focus on the performance penalty paid by List<Integer> because of the extra level of indirection between an object and its dispatch table.)

On the subject of listening to developers ideas for changes, I posted some related thoughts to the coin list back in July [8]:

"Design by committee" is often derided as an inappropriate way to manage technical projects. Simple polling about technical issues is design by committee where the committee can be arbitrarily large and any pretense of expertise in the area is removed. Therefore, polling would be a poor way to drive specific technical decisions about the Java Programming Language. One of the benefits of working in a technical field is that technical problems often have right answers, regardless of how many people agree or disagree with them.

This is not intended to be a slight against Java programmers who contributed suggestions informed by their daily experiences with the language to the Coin alias, to Sun's bug database, or elsewhere. Rather, it is a recognition that, just as being a player and being a coach are district roles requiring distinct skills, using the language and evolving it and district tasks with related but separate skills sets. Polling can provide a good indication about what pain points people are experiencing; that is an important input, but only one kind of input, to how the language should change.

Most Java programmers do not need to be language experts. This is very much a feature and not a bug of the Java ecosystem. Not having to be a language expert means Java programmers have time to be experts about other things :-)

Moreover, the responsibilities of stewardship including preserving the conceptual integrity of the platform, which does not necessarily follow from point decisions.

I don't understand who is being accused of preventing or impeding use of, say, Scala. For its part, Sun has encouraged experimentation with the Java language changes and has funded work to improve the support of non-Java language on top of the JVM too. Lack of corporate sponsorship of particular other languages should certainly not be equated to impeding their use. Conversely, Java developers are in no way obliged to participate in Project Coin or OpenJDK activities. However, if the extent of a developer's interaction with those working on the language is leaving childish comments on blogs, don't expect to have much influence over the results.

[1] Project Coin: Small Language Change Proposal Form Available
[2] http://blogs.sun.com/darcy/entry/project_coin_final_five#comment-1252023525000
[3] http://blogs.sun.com/darcy/entry/project_coin_final_five#comments
[4] http://mail.openjdk.java.net/pipermail/jdk7-dev/2009-June/000666.html
[5] http://mail.openjdk.java.net/pipermail/jdk7-dev/2009-June/000704.html
[6] http://mail.openjdk.java.net/pipermail/jdk7-dev/2009-June/000675.html
[7] http://mail.openjdk.java.net/pipermail/jdk7-dev/2009-June/000686.html
[8] http://mail.openjdk.java.net/pipermail/coin-dev/2009-July/002120.html

Thursday Sep 10, 2009

Unhashing a String

My working on the strings in switch implementation has gotten swapped back in. The implementation is following the translation strategy outlined in the strings in switch proposal: find a perfect hash function for the input strings and semantically replace case "Foo": with case hash("Foo"): (along with some additional checks and logic) where hash("Foo") is computed as a constant by the compiler.

With this approach, to write the regression tests it is helpful to be able to construct a string with a given hash value to force collisions and test the alternate logic, which led me to write the "unhash" method below to create a string with a given hash code:

/\*\*
  \* Returns a string with a hash equal to the argument.
  \* @return string with a hash equal to the argument.
  \*/
 public static String unhash(int target) {
    StringBuilder answer = new StringBuilder();
    if (target < 0) {
        // String with hash of Integer.MIN_VALUE, 0x80000000
        answer.append("\\u0915\\u0009\\u001e\\u000c\\u0002");

        if (target == Integer.MIN_VALUE)
            return answer.toString();
        // Find target without sign bit set
        target = target & Integer.MAX_VALUE;
    }
        
    unhash0(answer, target);
    return answer.toString();
}

private static void unhash0(StringBuilder partial, int target) {
    int div = target / 31;
    int rem = target % 31;

    if (div <= Character.MAX_VALUE) {
        if (div != 0)
            partial.append((char)div);
        partial.append((char)rem);
    } else {
        unhash0(partial, div);
        partial.append((char)rem);
    }
}

The algorithm for hashing strings multiplies the old hash value by 31 and adds in the integer value of the next character:

h = 0;
for (int i = 0; i < len; i++) {
    h = 31\*h + val[off++];
}

The unhash method works in reverse; for the non-negative values handled by unhash0, divide the target value by 31:

  • if the quotient is less than Character.MAX_VALUE, the quotient and remainder can be fully captured using at most two characters.

  • if the quotient is greater than or equal to Character.MAX_VALUE, unhash the quotient and append to that string a character for the remainder.

With some additional care, negative values can reuse the same process. The key observation is that if a string hashes to Integer.MIN_VALUE (0x80000000), subsequent multiples by 31 (0x1f) do not change the result since
0x1f × 0x80000000 = 0xf80000000
which is again 0x80000000 when limited to int range. Therefore, the sign bit can be set and then the remaining bits handled as if the target were positive.

Using a few minutes of computer time, I tested the unhash method on all integer values and it always returned a correct string. When available, exhaustive testing is pleasantly simple and reassuring! The generated strings are relatively short; as shown in the table below, for non-negative values the average length is slightly over four characters.

Distribution of String Lengths of Unhashed Non-negative Values
Length Frequency
1 31
2 2,031,585
3 60,948,480
4 1,889,402,880
5 195,100,672
Total 2,147,483,648

The unhash of 0 could be special-cased to return the empty string of length zero rather than a length-one string of the \\u0000 character, but this was not necessary for the purposes at hand. Likewise, generating somewhat shorter strings for negative hash values may be possible, but further investigation is not needed just to be able to generate collisions. As is stands, unhash will return a string whose length is at most ten; five characters for a negative sign bit and at most another five characters for the non-sign bits.

About

darcy

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today
News

No bookmarks in folder

Blogroll