Better Closures

As I mentioned earlier, a closure per se has a special nature which it does not share with other kinds of functions. A closure is produced from snippet of code, written in the midst of a larger block of code, in such a way that the meaning of names within the snippet are consistent with the enclosing block.

Let‘s continue to examine the case for (a) better closures and (b) slightly better type parameters but not (c) function types, with the most enjoyable part, (a). We will find that giving up function types produces the most compact, natural-looking syntax for separable code snippets.

(a) Better Closures

Closures arose from the happy combination of functional language (initially Lisp‘s version of lambda calculus) with a realization of the benefits of simple, uniform treatment of names (lexical scoping, referential transparency, etc.). As noted before, a language can incorporate those benefits without an explicit function type system. This note describes one way to do that for Java.

Let's start by noting that inner classes are imperfect closures. The imperfections are (1) inaccessible nonfinal local variables, (2) inaccessible statement labels, (3) bulky syntax, and (4) shadowing by class and method scopes. As a group, these imperfections stem from caution in the early days of Java; the designers avoided constructs that could invisibly slow down the system. (Nobody had an optimizing JIT yet.) Imperfections (3) and (4) are due, moreover, to the very explicit “class-ness” of the syntax which wraps the programmer‘s snippet of code, even if the class is anonymous. For better or worse, everything needed to look like a class.

For the curious and/or pedantic, I will explain more about these imperfections later. But first I‘d rather assume we‘re on the same page about inner classes, and propose some incremental improvements, in the direction of closures with full-blown lexical scoping of all names and labels. The four imperfections motivate four groups of improvements: (1) fully visible local variables, (2) fully visible statement labels, (3) concise syntax, and (4) suppression of extranous scopes.

(1) Fully Visible Local Variables

(1a) Allow the public keyword in all places where final is allowed. This keyword does not affect the meaning of the variable at all, but makes it fully accessible within its scope.

(1b) Allow uplevel access to a local variable if it is declared public or final, or if it could be declared final. (That is, it is statically single assigned.)

  public int counter = 0;
  int increment = 10;      // acts like a final
  Thread t = new MyFancyRepeatingThread(
    new Runnable() { public void run() {
      counter += increment; } });
  // I know how to avoid racing the counter:
  t.start(); t.join();
  println("thread ran "+(counter / increment)+" times");

Why not just abolish the final rule? Because there are at least two classes of serious bug which come from the combination of true closures with mutable variables. Declaring a variable public has an appropriate connotation of danger, and will convey the need for caution to authors, code reviewers, and bug-hunters.

This is a key step toward what Guy Steele calls “full-blown closures”. But there‘s more…, because closures also turn out to be great (in combination with visitor-like patterns) for creating new kinds of loops, searches, and other control flow.

(2) Fully Visible Statement Labels

(2a) Allow all return, break, and continue statements to transfer control to (respectively) the innermost matching method definition, matching statement, or matching loop body statement.

It is helpful, when thinking about the meaning of branches, to remember that a continue is always equivalent to a break from the loop's body, and a return can be rendered as a variable assignment followed by a break from the method's body. Thus all branches can be converted to breaks with suitable shuffling of labels.

Label matching rules are unchanged, and continue to take labels into account. (We can view an unlabeled break or continue to refer to some fixed label, which is implicitly attached to all loops and switches. A return can taken to refer to the label of its enclosing method definition (as in BGGA).

(2b) Define the semantics of non-local branches compatibly with Common Lisp or Smalltalk. A branch statement can break out of any statement whose label is lexically in scope.

This lets the programmer continue to insert early-return logic into loops, even if those loops are implemented by library code:

  loop:
  mycoll.forEach(new Visitor() {
    public void visit(T x) {
      if (ranOutOfTime())  break loop;
      if (!isTrivial(x))   res.add(x); } });

Use unspecified subtypes of Throwable to manage the popping of stack frames, preserving the integrity of try/finally and catch(Throwable), but otherwise keeping the operation opaque to users. As before, blocks and method activations can only exit once, either normally or abruptly. An attempt to exit a block a second time will throw an unchecked exception, akin to a stack overflow, probably with a helpful backtrace.

(2c) Allow return statements to carry a label. The label must be the name M of an enclosing method definition. The syntax is return M: or return M:X . The expression X (if any) is the value returned from the method M

(2d) For the sake of error checking a certain class of concurrent library, define a standard compile-time statement and method parameter annotation BreakProtect. It is a compile-time error for an anonymous inner class or closure expression to be passed directly to a method parameter marked with this annotation. It is a compile-time error for a branch within a statement marked with a BreakProtect annotationto break out of that statement. And it is an error for a closure expression converted to a marked method parameter to break out of the closure body. (Returns and properly declared exceptions continue to be legal, of course.)

If an inner class instance x (or other closure) contains a branch, and x is returned from its creating method or passed to another thread, then the branch will fail with a RuntimeException. The backtrace of this exception will include at least the backtrace of the point of closure creation.

Here is an example which branches from one method activation to another, written with the proposed extensions:

  int outer() {
    middle(
      new Runnable() { public void run() {
        return outer: 123; } });
    return 0;
  }

Assume a suitable library type $Break, here is an equivalent rewrite in the present language:

  int outer() {
    final $Break ret[] = {null};
    final int val[] = {0};
    try {
      middle(
        new Runnable() { public void run() {
          val[0] = 123; throw ret[0] = new $Break(); } });
      val[0] = 456;
    } catch ($Break ex) {
      if (ex != ret[0])  throw ex;  // rethrow
    }
    return val[0];
  }

There are many ways to improve this code, notably by means of cloned or preallocated exceptions. It may be profitable (at the compiler‘s discretion) to merge the exception object for a given block with the block‘s autoboxed locals.

Because the standard 228_jack benchmark uses preallocated exceptions for frequent non-local control transfers, it is likely that JITs are already able to optimize similar code shapes into straight a goto. (Hotspot can.)

All of the foregoing changes apply to pre-existing inner class notations as well as a new closure notation.

(3) Shrink the Notation

This is the pretty part. We can minimize the wrapper around the payload down to one or two tokens, in a way that looks and feels like Java, but is intelligible to users of Smalltalk, Ruby, and Groovy.

(3a) Allow the following syntaxes, to be used for creating anonymous inner instances:

  ClosureExpression:
    SimpleClosureExpression
    BlockClosureExpression
    DelegateClosureExpression

  SimpleClosureExpression:
    { Expression }
    { ClosureParameters : Expression } 

  BlockClosureExpression:
    { Statements } 
    { ClosureParameters : Statements } 

  ClosureParameters:
    Type Name
    ( MethodParameters )
    ( MethodParameters ) MethodThrows

  DelegateClosureExpression:
    & MethodName
    & Expression . MethodName

  Statement: YieldStatement
  YieldStatement:
    \^ Expression ; 

A yield statement exits immediately from the innermost enclosing block closure expression. When a closure is compiled down to a bytecoded class and method, its yield statements will similarly be compiled down to return instructions. A closure block can either produce a void result by running off the end, or it can produce one or more values via yield statements, but it cannot do both.

Because a closure expression is not a method or class definition, a return statement cannot make it produce a value or terminate normally. (A return statement can make an enclosing method terminate normally, abruptly exiting the closure and any intervening callers. It is a normal reaction, but a confused one, to suggest that since a closure in implemented in the VM using methods, then the closure syntax must use the return statement syntax to specify the closure's final value. But if closures are done right, there will be no trace of that method's existence, unless you disassemble the compiler's output, and even then it may be inlined into some other method.)

A simple closure is a single expression X surrounded by braces, and possibly preceded by formal parameters. For any X the expression {X} is identical in meaning to {X;} if X is a void-returning method invocation, else it means the yield statement {\^X;}.

The syntax for closure parameter declarations is identical to that of method parameter declarations, including the possible throws clause. If the parentheses are missing, there must be a single parameter. If there are no parameters the parameter declarations can be completely elided, along with the delimiting colon.

This syntax is inspired by Smalltalk (and more recently by Ruby and Groovy). The syntax is intended to resemble a Smalltalk block, a free-floating snippet of code, more than a lambda expression, with its parameter-bearing front bumper.

Smalltalk-style blocks are somewhat more concise than lambda expressions. As a bonus, they lend themselves to the following pleasant syntax, courtesy of Ruby and Groovy:

(3b) If the last argument to a method call is a closure expression, allow that expression to be appended after the closing parenthesis of the call. As a matter of style, programmers are discouraged from doing this unless the method call is itself a statement.

  f(x, {y});
  f(x) {y};

(3c)

Now for the fiddly details which make it all possible.

Closure expressions are not ordinary primary expressions. They are allowed only in syntactic contexts which allow them to be typed as one-method anonymous Java objects. (Yes, this is target typing. If closures are to be truly concise, they have canonical function types or else they must assume a type imposed by context.) Closure expressions can occur only in the following constructs:

  AssignmentExpression:         Name = ClosureExpression
  VariableInitialization:       Name = ClosureExpression
  MethodArgument:               Method ( ... ClosureExpression ... )
  NewInstanceExpression:        new Type ClosureExpression
  ReturnStatement:              return ... ClosureExpression ;
  YieldStatement:               \^ ClosureExpression ;
  CastExpression:               ( Type ) ClosureExpression
  MethodInvocationExpression:   ClosureExpression ( Expression ... )

Every syntax for a closure expression supplies a context type which becomes the type of the closure object created. The context type of a cast ed closure expression is the cast type, for example. (Other context types are left as an exercise to the reader; direct invocation and overloaded functions will be dealt with shortly.) The context type must be a reference type K (interface or class) with exactly one abstract method K.m. (If K is a class, it must have a zero-argument constructor accessible.) In addition, the method signature and throws must be compatible with the closure.

The closure expression is converted by closure conversion to the context type, using an anonymous class which closes over the lexical context of the expression. Note that closure expressions in isolation do not have types; they are expressions whose interaction with the type system is determined by their formal arguments, thrown exceptions, and yield statements. (One could make a functional type schema for closure expressions, but it‘s not strictly necessary, except perhaps as a bridge from the category of checked expressions to the category of types, for the sole purpose of closure conversion.)

An attempted closure conversion of a closure expression to its context type can succeed under these conditions:

  1. The actual arguments of K.m must be applicable, as a group, to the formal arguments of the closure. All method call conversions, including boxing and varargs conversion, are allowed.
  2. If the closure has a yield statement, the value of every yield statement must convert, by either method call conversion or closure conversion, to the return type of K.m, which must not be void. If there is no yield statement, a null, zero, or false value suitable to the return type is produced by K.m.
  3. Each exception thrown by the closure body must be compatible with the checked exceptions declared by K.m.

The rules for a delegating closure expression x=&y.z are similar, except that the signature and throws of the delegate method z are matched to K.m. For a delegate expression, the return value of K.m is allowed to be void regardless of the return type of z. (This parallels the syntax of calling a method in a statement expression, throwing away its value.) If z is overloaded, then each overloading zn is treated as a distinct method, the set of zn which are closure-convertible to K.m is formed, and (if the set is non-empty) the unique least specific method z is chosen from that set. (This is the dual of the usual for for selecting the most specific method of an overloaded set.) If there are convertible zn but there is no unique answer, the program is in error.

In the special case of a method call where there are several candidate context types K (because overloadings accept differing arguments at the closure's position), closure conversion is applied for each overloading, to the corresponding Kn.mn. If the conversion can succeed, the overloading is applicable, otherwise not. (Note: Non-unique results converting delegate expressions during overload matching still lead to errors, even if there would be other applicable methods. Two-way overloading does not require an M-by-N free-for-all.)

The special case of direct invocation of a closure expression is left as an exercise. (Or it could be omitted, but that is perhaps too surprising.) The result is as if the compiler came up with an anonymous interface K0 which is used nowhere else in the program, whose method K0.m0 takes exactly the closures declared arguments, throws what the closure throws, and returns what the yield statements return (void if none, or the type of true?x:y for yields x and y if there are more than one).

The purpose of delegate expressions is to convert cleanly from one calling sequence (method descriptor) to another. We could also have defined a system of interconversions between all one-abstract-method types (or one-method interfaces), but this would likely lead to expensive chains of delegating wrappers as types convert back and forth at module boundaries. We make this conversion process explicit by delegate creation expressions, so programmers can control it. We make it a simple special case of closure expression, so programmers can convert delegate types in a simple cast-like notation (NewType)&oldClosure.oldMethod.

The ugly ampersand is needed to respect Java‘s absolute distinction between method names and field names. Otherwise we would have to introduce new rules (as in BGGA) for scoping method and field names simultaneously.

(4) Suppressing Method and Class Scopes

We must point out a few more details about the cleanliness of closure expressions. Because an inner class is explicitly and syntactically a class definition with method definitions, expressions nested inside “see” the class and method wrappers and everything those wrappers bring into scope. Nested expressions can refer to class and superclass members by name, they can can request the this and super pointers, and they can issue return statements to the method.

A closure expression does not have the syntactic trappings of class and method, and so it cannot “see” the associated scoped and predefined names and labels. The expression this within a closure body refers not the the eventual closure object but to the same object (if any) that this would refer to at the point where the closure expression is introduced. (Indeed, it would be exceedingly hard to say that the type of this would be in a closure expression, until closure conversion forces a type onto the expression.) This considerations imply that a closure has does not any intrinsic “secret name” it can use to invoke itself recursively; any such name must be supplied explicitly by the enclosing block.

  public Function fact = null;
  fact = { int n : n <= 1 ? 1 : n \* fact.call(n-1); };
  println(fact(4));  // prints 24

Likewise, a return statement does not refer to an eventual method created for the VM, but to the syntactically enclosing method (if any) around the closure expression. It doesn't matter that a closure evaluates to an instance, any more than an autoboxed integer evaluates to an instance; the self pointers are not lexically in scope. (If you think about it, this wouldn‘t have a definite type anyway, because a closure gets its type from context.)

It is possible to peel away the class nature but keep the method nature of an inner class. (This is a separable extension, but a very useful one.) This allows method declarations to occur anywhere a local variable or inner class declaration can occur. A named local method can call itself and previously declared methods. (For general recursion, one must make a full local class. Otherwise the definite assignment rules get tricky.) The delegate expression provides a way to capture a handle to a local function.

Comments:

I don't get how a context is supplied when a closure is passed to a method invocation. The method to be invoked is selected during overload resolution based on the type of the actual arguments, but you have the type of the actual argument depending on the method being called. That would be a circular specification. We need a compositional semantics.

If you read my blog post about Tennent's principle, I think you'll see why we're moving away from the yield statement in the specification.

Another small point: you have {} being an expression, but it is already a statement, and given the expression-statement already in Java this is an ambiguity. Even if you disallow this form of expression as an expression statement so that it is not a formal ambiguity, it is poor language design to have one syntax mean two very different things.

Posted by Neal Gafter on August 31, 2006 at 04:28 PM PDT #

Hi John, the idea with the label is a good one I think. May be a solution for Groovy! At last to support break/continue. Just a crazy idea... does the label have to be written manually? Couldn't we do that automatically? Can't we do that mechanism whenever we encounter a break in a closure? I guess the explicit labeling is targeted at the variant of passing a closure as variable. This brings me back to the problem of what happens if people use the same label name multiple times. But ok, let us assume we have some kind of unique id. Then the compiler needs to know that id for the label as well as for the closure... hmm.. bad.

Posted by Jochen Theodorou on February 01, 2007 at 07:08 AM PST #

I forgot to ask you one thing.... let us assume I have code like this: list = [1,2,3] label: result = list.func (foo) { if (foo==2) continue label \^foo\*foo } assert result == [1,9] (Note: this is pseudo syntax!) While it might not be a problem of converting the continue into a break, I still have the problem, that the produced list may not the expected list. So maybe we can't simulate continue with break in that case and maybe we need to add Excpetion handling to in the closure taking methods to cover these cases... That would mean to move the bloat from inner classes to exception handling... Well at last it I wouldn't have to write it then

Posted by Jochen Theodorou on February 01, 2007 at 07:08 AM PST #

Jochen: The implementation pattern for non-local break/continue/return should work for Groovy. (In fact, the most recent version is motivated by Groovy.) As for requiring explicit labels, the compiler can introduce synthetic labels anywhere it wants to, but it's probably best to require labels for all new uses of break and continue, since people will expect the old uses to have their old meanings. (I.e., to apply to enclosing loop or 'switch' statements, and to fail otherwise.) One option to make this easier to work with would be to have method calls implicitly label themselves with their method name, so that the following label would be unnecessary: forEach: mylist.forEach{x -> if(x.isBad()) break forEach}

Posted by John Rose on February 05, 2007 at 10:26 AM PST #

Neal: Thanks for the comments. --- The circularity is broken by giving a closure a set of candidate types, if it appears as a method parameter. The candidate types interact individually with the overloading rules, and the unique selected overloading then determines which candidate is chosen. If you apply an explicit cast to every closure, the (restricted) syntax becomes completely compositional, without changing the program's meaning. The target typing amounts to a programming shorthand which allows the programmer to omit the casts in almost all cases. Your proposal supplies the same shorthands, by making function types explicit and allowing conversions in the same places I supply target typing. My aim here is to show that function types are not needed in order to get the good notation. --- The similarity between {} as a statement and as a closure expression is not an ambiguity. In my proposal, closures only appear in syntactic contexts where a statement is illegal. I think your proposal has more of a problem with that ambiguity, requiring lookahead to the "=>" token. Consider two statements: {x y, z a => b}.invoke(anx, az); {x y, z; y = z = null;}. Now consider an arbitrarily complex type expression instead of 'x'. --- I agree about the yield statement, and regret including it!

Posted by John Rose on February 05, 2007 at 11:02 AM PST #

John, Regarding the following sentence from your discussion I'd like to ask two questions:

"Likewise, a return statement does not refer to an eventual method created for the VM, but to the syntactically enclosing method (if any) around the closure expression."

1) If a closure is passed elsewhere as an argument and invoked later at a totally different "place on the call stack", how could it possibly "return from the syntactically enclosing method" (I assume you meant "lexically enclosing")? Shouldn't that be a "return from the method that invoked the closure" instead? Let's call this the "Smalltalk way" because this is what I think Smalltalk is doing with its blocks.

2) If closures perform return-operations the "Smalltalk way", shouldn't we be concerned about the possibility to use them as "bombs"? What I mean here is this: If e.g. a closure is registered as an event handler (which seems likely to become a frequent kind of use for closures) and the "event handler invoker" tries to process a list of handlers by invoking each of them in sequence, a malicious "handler" could issue a return and cause remaining handlers in the sequence to be skipped. It is quite cumbersome to write an invoker to safeguard against this and people will often forget it anyway. Haven't seen this problem occur within normal code yet, but if some security measure has forgotten about this possibility and turns out to be actually vulnerable somehow hackers will certainly exploit that...

Posted by Martin Valjavec on December 01, 2007 at 12:44 AM PST #

Martin: The "Smalltalk way" is, IMO, a mistake, the same sort of mistake as if your division operator tried to "help" you by returning zero whenever you made a divide-by-zero error. It is sometimes better for an operation to be meaningless, rather than be given an inconsistent meaning. See http://docs.codehaus.org/display/GroovyJSR/Out-of-scope+return+from+Closure.

Returns from closures do not introduce a new type of "bomb". Any Runnable can cause the same kind of damage by throwing a RuntimeException. If a callback queue wants to attempt a defense against such things, it can catch unintentional exceptions. (This is one reason why I think it is wrongheaded to give closure jumps special non-catchable status in the JVM, as some propose.) However, if the code in the callback is malicious, it can always find a way to DOS the API. And closures do not change this situation at all. Remember that in present JVMs non-local jumps will be built on top of exceptions. (And see my blog post "Longjumps Considered Inexpensive".)

Posted by John Rose on December 11, 2007 at 02:31 AM PST #

Thanks for the reply, John. I cannot see my post in the thread (and thought that was closed already) but I think I did not describe Smalltalk's behavior totally correctly (my Smalltalk days go way back ;) but still the basic concerns would remain; and seem to be partly the same as yours: closure jumps having non-catchable status.

If a bad return is an Exception, it's still a bomb, but one that I can catch. Fine so far, but...

Will I only know that it is bad if it was an out-of-scope return? Or can I always find out that it was a return, not a "normal exception"? And then I'll always have to distinguish between normal exceptions and closure returns - not good. A return should not be an exception and developers should not have to ponder whether they have "catch special return-exceptions" or not (of course only for in-scope-returns: the others are always errors) every time they are dealing with some closure. Sometimes we want to allow this (it it's not out of scope) sometimes we must safeguard against it.

But we must never forget to make a conscious design decision there - in every such situation! That's cumbersome and error prone - and I still fail to see, what such a non-local return facility wood buy me in Java (Smalltalk optimized away the need for some keywords that Java already has).

There is yet another thing that bugs me, though: What if I invoke a closure in another thread? That should be regarded out-of-scope, too, and thus be forbidden. That would prevent a lot of potential mischief. Still, an "innocent looking" return statement, which - even in legal use - amounts to an exception being thrown (that I can catch but then has a meaning fundamentally different from all other kinds of "exception"!) gives me the willies. IMO this smells like "clever reuse" of a mechanism for a totally unrelated feature - possibly too clever?

Posted by Martin Valjavec on December 12, 2007 at 10:37 PM PST #

Post a Comment:
  • HTML Syntax: NOT allowed
About

John R. Rose

Java maven, HotSpot developer, Mac user, Scheme refugee.

Once Sun and present Oracle engineer.

Search

Categories
Archives
« July 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
  
       
Today