Project Coin: Proposal for Strings in switch

Below is a Project Coin language proposal form I wrote for Strings in switch; send any comment to the Project Coin mailing list.


PROJECT COIN SMALL LANGUAGE CHANGE PROPOSAL FORM v1.0

AUTHOR(S): Joseph D. Darcy

OVERVIEW
Provide a two sentence or shorter description of these five aspects of the feature:
FEATURE SUMMARY: Should be suitable as a summary in a language tutorial.

Add the ability to switch on string values analogous to the existing ability to switch on values of the primitive types.

MAJOR ADVANTAGE: What makes the proposal a favorable change?

More regular coding patterns can be used for operations selected on the basis of a set of constant string values; the meaning of the new construct should be obvious to Java developers.

MAJOR BENEFIT: Why is the platform better if the proposal is adopted?

Potentially better performance for string-based dispatch code.

MAJOR DISADVANTAGE: There is always a cost.

Some increased implementation and testing complexity for the compiler.

ALTERNATIVES: Can the benefits and advantages be had some way without a language change?

No; chained if-then-else tests for string equality are potentially expensive and introducing an enum for its switchable constants, one per string value of interest, would add another type to a program without good cause.

EXAMPLES
Show us the code!
SIMPLE EXAMPLE: Show the simplest possible program utilizing the new feature.

String s = ...
switch(s) {
 case "foo":
   processFoo(s);
   break;
}

ADVANCED EXAMPLE: Show advanced usage(s) of the feature.

String s = ...
switch(s) {
case "quux":
   processQuux(s);
   // fall-through

 case "foo":
 case "bar":
   processFooOrBar(s);
   break;

 case "baz":
    processBaz(s);
   // fall-through

 default:
   processDefault(s);
   break;
}

DETAILS
SPECIFICATION: Describe how the proposal affects the grammar, type system, and meaning of expressions and statements in the Java Programming Language as well as any other known impacts.

The lexical grammar is unchanged. String is added to the set of types valid for a switch statement in JLSv3 section 14.11. Since Strings are already included in the definition of constant expressions, JLSv3 section 15.28, the SwitchLabel production does not need to be augmented. The existing restrictions in 14.11 on no duplicate labels, at most one default, no null labels, etc. all apply to Strings as well. The type system is unchanged. The definite assignment analysis of switch statement, JLSv3 section 16.2.9, is unchanged as well.

COMPILATION: How would the feature be compiled to class files? Show how the simple and advanced examples would be compiled. Compilation can be expressed as at least one of a desugaring to existing source constructs and a translation down to bytecode. If a new bytecode is used or the semantics of an existing bytecode are changed, describe those changes, including how they impact verification. Also discuss any new class file attributes that are introduced. Note that there are many downstream tools that consume class files and that they may to be updated to support the proposal!

One way to support this change would be to augment the JVM's lookupswitch instruction to operate on String values; however, that approach is not recommended or necessary. It would be possible to translate the switches to equivalent if-then-else code, but that would require unnecessary equality comparisons which are potentially expensive. Instead, a switch should occur on a predictable and fast integer (or long) function value computed from the string. The most natural choice for this function is String.hashCode, but other functions could also be used either alone or in conjunction with hashCode. (The specification of String.hashCode is assumed to be stable at this point.) If all the string labels have different lengths, String.length() could be used instead of hashCode. Generally a String.equals() check will be needed to verify the candidate string's identity in addition to the evaluation of the screening function because multiple string inputs could evaluate to the same result.

A single case label, a single case label with a default, and two case labels can be special-cased to just equality checks without function evaluations. If there are collisions in String.hashCode on the set of case labels in a switch block, a different function without collisions on that set of inputs should be used; for example ((long)s.hashCode<<32 ) + s.length()) is another candidate function.

Here are desugarings to currently legal Java source for the two examples above where the default hash code do not collide:

// Simple Example
if (s.equals("foo")) { // cause NPE if s is null
 processFoo(s);
}


// Advanced example
{  // new scope for synthetic variables
 boolean $take_default = false;
 boolean $fallthrough = false;
 $default_label: {
     switch(s.hashCode()) { // cause NPE if s is null
     case 3482567: // "quux".hashCode()
         if (!s.equals("quux")) {
             $take_default = true;
             break $default_label;
         }
         processQuux(s);
         $fallthrough = true;
               case 101574: // "foo".hashCode()
         if (!$fallthrough && !s.equals("foo")) {
             $take_default = true;
             break $default_label;
         }
         $fallthrough = true;
     case 97299:  // "bar".hashCode()
         if (!$fallthrough && !s.equals("bar")) {
             $take_default = true;
             break $default_label;
         }
         processFooOrBar(s);
         break;

     case 97307: // "baz".hashCode()
         if (!s.equals("baz")) {
             $take_default = true;
             break $default_label;
         }
         processBaz(s);
         $fallthrough = true;

     default:
         $take_default = true;
         break $default_label;
     }
 }
 if($take_default)
     processDefault(s);
} 

In the advanced example, the boolean "fallthrough" variable is needed to track whether a fall-through has occurred so the string equality checks can be skipped. If there are no fall-throughs, this variable can be removed. Likewise, if there is no default label in the original code, the $take_default variable is not needed and a simple break can be used instead.

In a translation directly to bytecode, the synthetic state variables can be replaced with goto's; expressing this in pseudo Java source with goto:

// Advanced example in pseudo Java with goto
switch(s.hashCode()) { // cause NPE if s is null
case 3482567: // "quux".hashCode()
   if (!s.equals("quux"))
       goto $default_label;
   goto $fooOrBarCode_label;
  case 101574: // "foo".hashCode()
   if (!s.equals("foo"))
       goto $default_label;
   goto $fooOrBarCode_label;

case 97299:  // "bar".hashCode()
   if (!s.equals("bar"))
       goto $default_label;

   $fooOrBarCode_label:
   processFooOrBar(s);
   break;

case 97307: // "baz".hashCode()
   if (!s.equals("baz"))
       goto $default_label;
   processBaz(s);

default:
$default_label:
   processDefault(s);
   break;
} 

Related to compilation, a compiler's existing diagnostics around falling through switches, such as javac's -Xlint:fallthrough option and @SuppressWarnings("fallthrough"), should work identically on switch statements based on Strings.

TESTING: How can the feature be tested?

Generating various simple and complex uses of the new structure and verifying the proper execution paths occur; combinations to test include switch statements with and without fall-throughs, with and without collisions in the hash codes, and with and without default labels.

LIBRARY SUPPORT: Are any supporting libraries needed for the feature?

No.

REFLECTIVE APIS: Do any of the various and sundry reflection APIs need to be updated? This list of reflective APIs includes but is not limited to core reflection (java.lang.Class and java.lang.reflect.\*), javax.lang.model.\*, the doclet API, and JPDA.

Only reflective APIs that model statements in the source language might be affected. None of core reflection, javax.lang.model.\*, the doclet API, and JDPA model statements; therefore, they are unaffected. The tree API in javac, does model statements, but the existing API for switch statements is general enough to model the revised language without any API changes.

OTHER CHANGES: Do any other parts of the platform need be updated too? Possibilities include but are not limited to JNI, serialization, and output of the javadoc tool.

No.

MIGRATION: Sketch how a code base could be converted, manually or automatically, to use the new feature.

Look for sequences of if ("constant string".equals(foo)) or if (foo.equals("constant string")) and replace accordingly.

COMPATIBILITY
BREAKING CHANGES: Are any previously valid programs now invalid? If so, list one.

All existing programs remain valid.

EXISTING PROGRAMS: How do source and class files of earlier platform versions interact with the feature? Can any new overloadings occur? Can any new overriding occur?

The semantics of existing class files and legal source files and are unchanged by this feature.

REFERENCES
EXISTING BUGS: Please include a list of any existing Sun bug ids related to this proposal.

5012262 Using Strings and Objects in Switch case statements.

URL FOR PROTOTYPE (optional):

No prototype at this time.

Comments:

Your simplest example could be shortened - remove the 'break'

Posted by lowell on March 01, 2009 at 09:38 AM PST #

Hmmm, that NPE on null looks dodgy. Following the Principle of Least Surprise, I would expect the following to work:

switch (s) {
case "foo": ....
case "bar":
case null:
default:
}

and/or I would null values to trigger the default clause.

Posted by Noel Grandin on March 03, 2009 at 03:29 PM PST #

How about objects in 'switch'? They'll call the toString method in this case.

Posted by Rael on March 04, 2009 at 08:53 PM PST #

Null values are not in the type system. That would mean that there is no way for the compiler to know that that switch construct to work as proposed, since it's doing s.hashcode() right at the start.
A
if(s != null)
switch(s.hashcode()){...}
else
warning();
might work.

Posted by guest on March 05, 2009 at 10:47 PM PST #

How about case-insensitive switch?

Posted by Weijun on March 06, 2009 at 08:33 PM PST #

What would be even awesomer is regex and/or glob string switches ala TCL.

Posted by Robert Dale on March 07, 2009 at 11:39 PM PST #

I would definitely disallow Object since with autounboxing, it will be unclear what the meaning of a switch on an Integer would mean.

The proposal should mention that the sometimes-suggested approach of introducing an enum and switching on that doesn't handle the usecase where the strings are not valid enum values, such as
switch(s) {
case " " :
System.out.print("three spaces"); break;
case "for" :
System.out.print("for"); break;
default:
System.out.print("Something else"); break;
}

I agree about letting null appear as an option (or go to default), following the principle of least surprise.

Case insensitivity is Locale dependant concept, which I definitely do not think should go into the language - let us keep that in the libraries.

While regex's are great, allowing them would open for a case where a value matched more than one label, leading to ambiguity. In addition, the performance of a switch would be much less predictable, since matching a regex can be a very timeconsuming operation.

For the implementation details, it might be worth including an initial check of str.length()<=4, since the performance of hashCode is not all that great for a 10 Mb string.

Posted by Niels Harremoes on March 16, 2009 at 04:39 AM PDT #

Post a Comment:
Comments are closed for this entry.
About

darcy

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today
News

No bookmarks in folder

Blogroll