Whenever the conversation about full closures (on any platform) gets to a certain level of detail, the question of closing over branch targets comes up. For example, we know what
break means in a regular
for loop. But suppose that loop is expressed as an internal iterator; that is, suppose the loop body is packaged into a closure and passed to a loop-driver method. Then the same
break statement has to do something scary to return control past the loop-driver method.
Every Java programmer know that the natural way to express this sort of early exit in the JVM is to throw an exception, and programmers are also taught to use this feature sparingly, because it is expensive. Back in the day, C programmers called this a “long jump”, or
longjmp, and they too were warned that it could enfeeble their optimizer.
But the conventional wisdom needs adjustment here. The most costly part of exception processing on the JVM is creating the exception, not throwing it. Exception creation involves a native method named
Throwable.fillInStackTrace, which looks down the stack (before the actual throw) and puts a whole backtrace into the exception object. It’s great for debugging, but a terrible waste if all you want to do is pop a frame or two.
In fact, you can pre-allocate an exception and use it as many times as you want. One JVM benchmark a decade old (named “jack”) uses this technique in a parser to implement backtracking. The Hotspot JIT can optimize a throw of a preallocated exception into a simple
goto, if the thrower and catcher are together in the same compilation unit (which happens because of inlining). Since “jack” is widely used for JVM bragging rights, it is likely that every commercial-quality JVM has a similar optimization in its JIT.
A similar technique, not so widely used yet, is to clone a pre-allocated exception and throw the clone. This can be handy if there is information (such as a return value) which differs from use to use; the variable information can be attached to the exception by subclassing and adding a field. The generated code can still collapse to a simple
goto, and the extra information will stay completely in registers, assuming complete escape analysis of the exception. (This level of EA is on the horizon.)
Here is a concrete example of non-local return, implemented efficiently via a cloned exception. I have observed the Hotspot server compiler emitting a machine-level
goto for the throws. With current server VM optimizations of this code, the cost of a non-local return is less than three times the cost of a plain (local) return; with the client VM the ratio is ten. See the code example and benchmark for details: NonLocalExit.java (javadoc).
The exception cloning technique is not widely used in part because
Object.clone is not optimized. But an upcoming build of Hotspot fixes this by making
clone an intrinsic which simply does a variable-length object allocation and a bulk copy. See the evaluation (mine!) of 2007-04-13 on bug 6428387. Even in existing deployed VMs, the cost of cloning is often less than the cost of walking the stack.
The bottom line is that languages on the JVM (including Java, if that’s the direction it goes) can implement non-local returns reasonably directly, and the JITs will turn them into local jumps in important common cases. In particular, when an internal iterator is properly inlined, a
break in its body closure will not surprise the programmer with a performance pothole. In fact, it will compile to about the same code as the equivalent old-style