X
  • Java
    August 31, 2006

Better Closures

John Rose
Architect

As I mentioned earlier, a closure per se has a special nature which
it does not share with other kinds of functions. A closure is
produced from snippet of code, written in the midst of a larger block
of code, in such a way that the meaning of names within the snippet
are consistent with the enclosing block.

Let‘s continue to examine the case for (a) better closures
and (b) slightly better type parameters but not (c) function types,
with the most enjoyable part, (a). We will find that giving up
function types produces the most compact, natural-looking syntax
for separable code snippets.

(a) Better Closures

Closures arose from the happy combination of functional language
(initially Lisp‘s version of lambda calculus) with a realization
of the benefits of simple, uniform treatment of names (lexical
scoping, referential transparency, etc.). As noted before, a language
can incorporate those benefits without an explicit function type
system. This note describes one way to do that for Java.

Let's start by noting that inner classes are imperfect closures.
The imperfections are (1) inaccessible nonfinal local variables, (2)
inaccessible statement labels, (3) bulky syntax, and (4) shadowing by
class and method scopes. As a group, these imperfections stem from
caution in the early days of Java; the designers avoided constructs
that could invisibly slow down the system. (Nobody had an optimizing
JIT yet.) Imperfections (3) and (4) are
due, moreover, to the very explicit “class-ness” of the
syntax which wraps the programmer‘s snippet of code, even if the
class is anonymous. For better or worse, everything needed to look
like a class.

For the curious and/or pedantic, I will explain more about these
imperfections later. But first I‘d rather assume we‘re on
the same page about inner classes, and propose some incremental
improvements, in the direction of closures with full-blown lexical
scoping of all names and labels. The four imperfections motivate four
groups of improvements: (1) fully visible local variables, (2) fully
visible statement labels, (3) concise syntax, and (4) suppression of
extranous scopes.

(1) Fully Visible Local Variables

(1a) Allow the public keyword in all places where
final is allowed. This keyword does not affect the
meaning of the variable at all, but makes it fully accessible within
its scope.

(1b) Allow uplevel access to a local variable if it is declared
public or final, or if it could be
declared final. (That is, it is statically single assigned.)

  public int counter = 0;
int increment = 10; // acts like a final
Thread t = new MyFancyRepeatingThread(
new Runnable() { public void run() {
counter += increment; } });
// I know how to avoid racing the counter:
t.start(); t.join();
println("thread ran "+(counter / increment)+" times");

Why not just abolish the final rule? Because there
are at least two classes of serious bug which come from the combination
of true closures with mutable variables. Declaring a variable
public has an appropriate connotation of danger, and will
convey the need for caution to authors, code reviewers, and
bug-hunters.

This is a key step toward what Guy Steele calls href="http://article.gmane.org/gmane.comp.lang.lightweight/2274">“full-blown
closures”. But there‘s more…, because closures
also turn out to be great (in combination with visitor-like patterns)
for creating new kinds of loops, searches, and other control flow.

(2) Fully Visible Statement Labels

(2a) Allow all return, break, and
continue statements to transfer control to (respectively)
the innermost matching method definition, matching statement, or
matching loop body statement.

It is helpful, when thinking about the meaning of branches, to
remember that a continue is always equivalent to a
break from the loop's body, and a return can be rendered
as a variable assignment followed by a break from the
method's body. Thus all branches can be converted to
breaks with suitable shuffling of labels.

Label matching rules are unchanged, and continue to take labels
into account. (We can view an unlabeled break or
continue to refer to some fixed label, which is
implicitly attached to all loops and switches. A return
can taken to refer to the label of its enclosing method
definition (as in
BGGA).

(2b) Define the semantics of non-local branches compatibly with
Common Lisp or Smalltalk. A branch statement can break out of any
statement whose label is lexically in scope.

This lets the programmer continue to insert early-return logic
into loops, even if those loops are implemented by library code:

  loop:
mycoll.forEach(new Visitor() {
public void visit(T x) {
if (ranOutOfTime()) break loop;
if (!isTrivial(x)) res.add(x); } });

Use unspecified subtypes of Throwable to manage the
popping of stack frames, preserving the integrity of
try/finally and catch(Throwable), but
otherwise keeping the operation opaque to users. As before, blocks
and method activations can only exit once, either normally or
abruptly. An attempt to exit a block a second time will throw an
unchecked exception, akin to a stack overflow, probably with
a helpful backtrace.

(2c) Allow return statements to carry a label. The
label must be the name M of an enclosing method definition.
The syntax is return M: or return M:X . The
expression X (if any) is the value returned from the method
M

(2d) For the sake of error checking a certain class of concurrent
library, define a standard compile-time statement and method parameter
annotation BreakProtect. It is a compile-time error for
an anonymous inner class or closure expression to be passed directly
to a method parameter marked with this annotation. It is a
compile-time error for a branch within a statement marked with a
BreakProtect annotationto break out of that statement.
And it is an error for a closure expression converted to a marked
method parameter to break out of the closure body. (Returns and
properly declared exceptions continue to be legal, of course.)

If an inner class instance x (or other closure)
contains a branch, and x is returned from its creating
method or passed to another thread, then the branch will fail with a
RuntimeException. The backtrace of this exception will
include at least the backtrace of the point of closure creation.

Here is an example which branches from one method activation to
another, written with the proposed extensions:

  int outer() {
middle(
new Runnable() { public void run() {
return outer: 123; } });
return 0;
}

Assume a suitable library type $Break, here is an
equivalent rewrite in the present language:

  int outer() {
final $Break ret[] = {null};
final int val[] = {0};
try {
middle(
new Runnable() { public void run() {
val[0] = 123; throw ret[0] = new $Break(); } });
val[0] = 456;
} catch ($Break ex) {
if (ex != ret[0]) throw ex; // rethrow
}
return val[0];
}

There are many ways to improve this code, notably by means of
cloned or preallocated exceptions. It may be profitable (at the
compiler‘s discretion) to merge the exception object for a given
block with the block‘s autoboxed locals.

Because the standard 228_jack benchmark uses
preallocated exceptions for frequent non-local control transfers, it
is likely that JITs are already able to optimize similar code shapes
into straight a goto. (Hotspot can.)

All of the foregoing changes apply to pre-existing inner class
notations as well as a new closure notation.

(3) Shrink the Notation

This is the pretty part. We can minimize the wrapper around the
payload down to one or two tokens, in a way that looks and feels like
Java, but is intelligible to users of Smalltalk, Ruby, and Groovy.

(3a) Allow the following syntaxes, to be used for creating
anonymous inner instances:

  ClosureExpression:
SimpleClosureExpression
BlockClosureExpression
DelegateClosureExpression
SimpleClosureExpression:{ Expression }{ ClosureParameters : Expression }
BlockClosureExpression:{ Statements } { ClosureParameters : Statements }
ClosureParameters:
Type Name( MethodParameters )( MethodParameters ) MethodThrows
DelegateClosureExpression:& MethodName& Expression . MethodName
Statement: YieldStatement
YieldStatement:\^ Expression ;

A yield statement exits immediately from the innermost enclosing
block closure expression. When a closure is compiled down to a
bytecoded class and method, its yield statements will similarly be
compiled down to return instructions. A closure block can either
produce a void result by running off the end, or it can produce one or
more values via yield statements, but it cannot do both.

Because a closure expression is not a method or class definition, a
return statement cannot make it produce a value or
terminate normally. (A return statement can make an
enclosing method terminate normally, abruptly exiting the closure and
any intervening callers. It is a normal reaction, but a confused one,
to suggest that since a closure in implemented in the VM using
methods, then the closure syntax must use the return
statement syntax to specify the closure's final value. But if
closures are done right, there will be no trace of that method's
existence, unless you disassemble the compiler's output, and
even then it may be inlined into some other method.)

A simple closure is a single expression X surrounded by
braces, and possibly preceded by formal parameters. For any
X the expression {X} is identical in meaning to
{X;} if X is a void-returning method
invocation, else it means the yield statement {\^X;}.

The syntax for closure parameter declarations is identical to that
of method parameter declarations, including the possible
throws clause. If the parentheses are missing, there
must be a single parameter. If there are no parameters the parameter
declarations can be completely elided, along with the delimiting
colon.

This syntax is inspired by Smalltalk (and more recently by
Ruby
and Groovy).
The syntax is intended to resemble a Smalltalk block, a free-floating
snippet of code, more than a lambda expression, with its parameter-bearing
front bumper.

Smalltalk-style blocks are somewhat more concise than lambda
expressions. As a bonus, they lend themselves to the following
pleasant syntax, courtesy of Ruby and Groovy:

(3b) If the last argument to a method call is a closure expression,
allow that expression to be appended after the closing parenthesis of
the call. As a matter of style, programmers are discouraged from
doing this unless the method call is itself a statement.

  f(x, {y});
f(x) {y};

(3c)

Now for the fiddly details which make it all possible.

Closure expressions are not ordinary primary expressions. They are
allowed only in syntactic contexts which allow them to be typed as
one-method anonymous Java objects. (Yes, this is target typing. If
closures are to be truly concise, they have canonical function types
or else they must assume a type imposed by context.) Closure
expressions can occur only in the following constructs:

  AssignmentExpression:         Name = ClosureExpression
VariableInitialization: Name = ClosureExpression
MethodArgument: Method ( ... ClosureExpression ... )
NewInstanceExpression: new Type ClosureExpression
ReturnStatement: return ... ClosureExpression ;
YieldStatement: \^ ClosureExpression ;
CastExpression: ( Type ) ClosureExpression
MethodInvocationExpression: ClosureExpression ( Expression ... )

Every syntax for a closure expression supplies a context
type
which becomes the type of the closure object created. The
context type of a cast ed closure expression is the cast type, for
example. (Other context types are left as an exercise to the reader;
direct invocation and overloaded functions will be dealt with
shortly.) The context type must be a reference type K
(interface or class) with exactly one abstract method
K.m. (If K is a class, it must have a
zero-argument constructor accessible.) In addition, the method
signature and throws must be compatible with the closure.

The closure expression is converted by closure conversion
to the context type, using an anonymous class which closes
over
the lexical context of the expression. Note that closure
expressions in isolation do not have types; they are expressions whose
interaction with the type system is determined by their formal
arguments, thrown exceptions, and yield statements. (One could make a
functional type schema for closure expressions, but it‘s not
strictly necessary, except perhaps as a bridge from the category of
checked expressions to the category of types, for the sole purpose of
closure conversion.)

An attempted closure conversion of a closure expression to its context type
can succeed under these conditions:


  1. The actual arguments of K.m must be applicable, as
    a group, to the formal arguments of the closure. All method call
    conversions, including boxing and varargs conversion, are allowed.
  2. If the closure has a yield statement, the value of every yield
    statement must convert, by either method call conversion or closure
    conversion, to the return type of K.m, which must not be
    void. If there is no yield statement, a null, zero, or false value
    suitable to the return type is produced by K.m.
  3. Each exception thrown by the closure body must be compatible
    with the checked exceptions declared by K.m.

The rules for a delegating closure expression
x=&y.z are similar, except that the signature and
throws of the delegate method z are matched to
K.m. For a delegate expression, the return value of
K.m is allowed to be void regardless of the
return type of z. (This parallels the syntax of calling
a method in a statement expression, throwing away its value.) If
z is overloaded, then each overloading zn is
treated as a distinct method, the set of zn which are
closure-convertible to K.m is formed, and (if the set is
non-empty) the unique least specific method z is
chosen from that set. (This is the dual of the usual for for
selecting the most specific method of an overloaded set.) If there
are convertible zn but there is no unique answer, the
program is in error.

In the special case of a method call where there are several
candidate context types K (because overloadings accept
differing arguments at the closure's position), closure conversion is
applied for each overloading, to the corresponding Kn.mn.
If the conversion can succeed, the overloading is applicable,
otherwise not. (Note: Non-unique results converting delegate
expressions during overload matching still lead to errors, even if
there would be other applicable methods. Two-way overloading does not
require an M-by-N free-for-all.)

The special case of direct invocation of a closure expression
is left as an exercise. (Or it could be omitted, but that is perhaps
too surprising.) The result is as if the compiler came up with an
anonymous interface K0 which is used nowhere else in the
program, whose method K0.m0 takes exactly the closures
declared arguments, throws what the closure throws, and returns what
the yield statements return (void if none, or the type of
true?x:y for yields x and y if
there are more than one).

The purpose of delegate expressions is to convert cleanly from one
calling sequence (method descriptor) to another. We could also have
defined a system of interconversions between all one-abstract-method
types (or one-method interfaces), but this would likely lead to
expensive chains of delegating wrappers as types convert back and
forth at module boundaries. We make this conversion process explicit
by delegate creation expressions, so programmers can control it. We
make it a simple special case of closure expression, so programmers
can convert delegate types in a simple cast-like notation
(NewType)&oldClosure.oldMethod.

The ugly ampersand is needed to respect Java‘s absolute
distinction between method names and field names. Otherwise we would
have to introduce new rules (as in
BGGA)
for scoping method and field names simultaneously.

(4) Suppressing Method and Class Scopes

We must point out a few more details about the cleanliness of
closure expressions. Because an inner class is explicitly and
syntactically a class definition with method definitions, expressions
nested inside “see” the class and method wrappers and
everything those wrappers bring into scope. Nested expressions can
refer to class and superclass members by name, they can can request
the this and super pointers, and they
can issue return statements to the method.

A closure expression does not have the syntactic trappings of class
and method, and so it cannot “see” the associated
scoped and predefined names and labels. The expression this within
a closure body refers not the the eventual closure object but to the
same object (if any) that this would refer to at the
point where the closure expression is introduced. (Indeed, it would
be exceedingly hard to say that the type of this would
be in a closure expression, until closure conversion forces a type
onto the expression.) This considerations imply that a closure
has does not any intrinsic “secret name” it can use
to invoke itself recursively; any such name must be supplied
explicitly by the enclosing block.

  public Function fact = null;
fact = { int n : n <= 1 ? 1 : n \* fact.call(n-1); };
println(fact(4)); // prints 24

Likewise, a return statement does not refer to an
eventual method created for the VM, but to the syntactically enclosing
method (if any) around the closure expression. It doesn't matter that
a closure evaluates to an instance, any more than an autoboxed integer
evaluates to an instance; the self pointers are not lexically in
scope. (If you think about it, this wouldn‘t have
a definite type anyway, because a closure gets its type from
context.)

It is possible to peel away the class nature but keep the method
nature of an inner class. (This is a separable extension, but a very
useful one.) This allows method declarations to occur anywhere a
local variable or inner class declaration can occur. A named local
method can call itself and previously declared methods. (For general
recursion, one must make a full local class. Otherwise the definite
assignment rules get tricky.) The delegate expression provides a way
to capture a handle to a local function.

Join the discussion

Comments ( 8 )
  • Neal Gafter Thursday, August 31, 2006
    I don't get how a context is supplied when a closure is passed to a method invocation. The method to be invoked is selected during overload resolution based on the type of the actual arguments, but you have the type of the actual argument depending on the method being called. That would be a circular specification. We need a compositional semantics.

    If you read my blog post about Tennent's principle, I think you'll see why we're moving away from the yield statement in the specification.

    Another small point: you have {} being an expression, but it is already a statement, and given the expression-statement already in Java this is an ambiguity. Even if you disallow this form of expression as an expression statement so that it is not a formal ambiguity, it is poor language design to have one syntax mean two very different things.

  • Jochen Theodorou Thursday, February 1, 2007
    I forgot to ask you one thing.... let us assume I have code like this:
    list = [1,2,3]
    label:
    result = list.func (foo) {
    if (foo==2) continue label
    \^foo\*foo
    }
    assert result == [1,9]
    (Note: this is pseudo syntax!)
    While it might not be a problem of converting the continue into a break, I still have the problem, that the produced list may not the expected list. So maybe we can't simulate continue with break in that case and maybe we need to add Excpetion handling to in the closure taking methods to cover these cases... That would mean to move the bloat from inner classes to exception handling... Well at last it I wouldn't have to write it then
  • Jochen Theodorou Thursday, February 1, 2007
    Hi John,
    the idea with the label is a good one I think. May be a solution for Groovy! At last to support break/continue.
    Just a crazy idea... does the label have to be written manually? Couldn't we do that automatically? Can't we do that mechanism whenever we encounter a break in a closure?
    I guess the explicit labeling is targeted at the variant of passing a closure as variable. This brings me back to the problem of what happens if people use the same label name multiple times. But ok, let us assume we have some kind of unique id. Then the compiler needs to know that id for the label as well as for the closure... hmm.. bad.
  • John Rose Monday, February 5, 2007
    Jochen: The implementation pattern for non-local break/continue/return should work for Groovy. (In fact, the most recent version is motivated by Groovy.)
    As for requiring explicit labels, the compiler can introduce synthetic labels anywhere it wants to, but it's probably best to require labels for all new uses of break and continue, since people will expect the old uses to have their old meanings. (I.e., to apply to enclosing loop or 'switch' statements, and to fail otherwise.) One option to make this easier to work with would be to have method calls implicitly label themselves with their method name, so that the following label would be unnecessary:
    forEach: mylist.forEach{x -> if(x.isBad()) break forEach}
  • John Rose Monday, February 5, 2007
    Neal: Thanks for the comments.
    --- The circularity is broken by giving a closure a set of candidate types, if it appears as a method parameter. The candidate types interact individually with the overloading rules, and the unique selected overloading then determines which candidate is chosen. If you apply an explicit cast to every closure, the (restricted) syntax becomes completely compositional, without changing the program's meaning. The target typing amounts to a programming shorthand which allows the programmer to omit the casts in almost all cases. Your proposal supplies the same shorthands, by making function types explicit and allowing conversions in the same places I supply target typing. My aim here is to show that function types are not needed in order to get the good notation.
    --- The similarity between {} as a statement and as a closure expression is not an ambiguity. In my proposal, closures only appear in syntactic contexts where a statement is illegal. I think your proposal has more of a problem with that ambiguity, requiring lookahead to the "=>" token. Consider two statements: {x y, z a => b}.invoke(anx, az); {x y, z; y = z = null;}. Now consider an arbitrarily complex type expression instead of 'x'.
    --- I agree about the yield statement, and regret including it!
  • Martin Valjavec Saturday, December 1, 2007

    John, Regarding the following sentence from your discussion I'd like to ask two questions:

    "Likewise, a return statement does not refer to an eventual method created for the VM, but to the syntactically enclosing method (if any) around the closure expression."

    1) If a closure is passed elsewhere as an argument and invoked later at a totally different "place on the call stack", how could it possibly "return from the syntactically enclosing method" (I assume you meant "lexically enclosing")? Shouldn't that be a "return from the method that invoked the closure" instead? Let's call this the "Smalltalk way" because this is what I think Smalltalk is doing with its blocks.

    2) If closures perform return-operations the "Smalltalk way", shouldn't we be concerned about the possibility to use them as "bombs"? What I mean here is this: If e.g. a closure is registered as an event handler (which seems likely to become a frequent kind of use for closures) and the "event handler invoker" tries to process a list of handlers by invoking each of them in sequence, a malicious "handler" could issue a return and cause remaining handlers in the sequence to be skipped. It is quite cumbersome to write an invoker to safeguard against this and people will often forget it anyway. Haven't seen this problem occur within normal code yet, but if some security measure has forgotten about this possibility and turns out to be actually vulnerable somehow hackers will certainly exploit that...


  • John Rose Tuesday, December 11, 2007

    Martin: The "Smalltalk way" is, IMO, a mistake, the same sort of mistake as if your division operator tried to "help" you by returning zero whenever you made a divide-by-zero error. It is sometimes better for an operation to be meaningless, rather than be given an inconsistent meaning. See http://docs.codehaus.org/display/GroovyJSR/Out-of-scope+return+from+Closure.

    Returns from closures do not introduce a new type of "bomb". Any Runnable can cause the same kind of damage by throwing a RuntimeException. If a callback queue wants to attempt a defense against such things, it can catch unintentional exceptions. (This is one reason why I think it is wrongheaded to give closure jumps special non-catchable status in the JVM, as some propose.) However, if the code in the callback is malicious, it can always find a way to DOS the API. And closures do not change this situation at all. Remember that in present JVMs non-local jumps will be built on top of exceptions. (And see my blog post "Longjumps Considered Inexpensive".)


  • Martin Valjavec Thursday, December 13, 2007

    Thanks for the reply, John. I cannot see my post in the thread (and thought that was closed already) but I think I did not describe Smalltalk's behavior totally correctly (my Smalltalk days go way back ;) but still the basic concerns would remain; and seem to be partly the same as yours: closure jumps having non-catchable status.

    If a bad return is an Exception, it's still a bomb, but one that I can catch. Fine so far, but...

    Will I only know that it is bad if it was an out-of-scope return? Or can I always find out that it was a return, not a "normal exception"? And then I'll always have to distinguish between normal exceptions and closure returns - not good. A return should not be an exception and developers should not have to ponder whether they have "catch special return-exceptions" or not (of course only for in-scope-returns: the others are always errors) every time they are dealing with some closure. Sometimes we want to allow this (it it's not out of scope) sometimes we must safeguard against it.

    But we must never forget to make a conscious design decision there - in every such situation! That's cumbersome and error prone - and I still fail to see, what such a non-local return facility wood buy me in Java (Smalltalk optimized away the need for some keywords that Java already has).

    There is yet another thing that bugs me, though: What if I invoke a closure in another thread? That should be regarded out-of-scope, too, and thus be forbidden. That would prevent a lot of potential mischief. Still, an "innocent looking" return statement, which - even in legal use - amounts to an exception being thrown (that I can catch but then has a meaning fundamentally different from all other kinds of "exception"!) gives me the willies. IMO this smells like "clever reuse" of a mechanism for a totally unrelated feature - possibly too clever?


Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.