Longjumps Considered Inexpensive

John Rose

Whenever the conversation about full closures (on any platform) gets to a certain level of detail, the question of closing over branch targets comes up. For example, we know what break means in a regular for loop. But suppose that loop is expressed as an internal iterator; that is, suppose the loop body is packaged into a closure and passed to a loop-driver method. Then the same break statement has to do something scary to return control past the loop-driver method.

Every Java programmer know that the natural way to express this sort of early exit in the JVM is to throw an exception, and programmers are also taught to use this feature sparingly, because it is expensive. Back in the day, C programmers called this a “long jump”, or longjmp, and they too were warned that it could enfeeble their optimizer.

But the conventional wisdom needs adjustment here. The most costly part of exception processing on the JVM is creating the exception, not throwing it. Exception creation involves a native method named Throwable.fillInStackTrace, which looks down the stack (before the actual throw) and puts a whole backtrace into the exception object. It’s great for debugging, but a terrible waste if all you want to do is pop a frame or two.

In fact, you can pre-allocate an exception and use it as many times as you want. One JVM benchmark a decade old (named “jack”) uses this technique in a parser to implement backtracking. The Hotspot JIT can optimize a throw of a preallocated exception into a simple goto, if the thrower and catcher are together in the same compilation unit (which happens because of inlining). Since “jack” is widely used for JVM bragging rights, it is likely that every commercial-quality JVM has a similar optimization in its JIT.

A similar technique, not so widely used yet, is to clone a pre-allocated exception and throw the clone. This can be handy if there is information (such as a return value) which differs from use to use; the variable information can be attached to the exception by subclassing and adding a field. The generated code can still collapse to a simple goto, and the extra information will stay completely in registers, assuming complete escape analysis of the exception. (This level of EA is on the horizon.)

Here is a concrete example of non-local return, implemented efficiently via a cloned exception. I have observed the Hotspot server compiler emitting a machine-level goto for the throws. With current server VM optimizations of this code, the cost of a non-local return is less than three times the cost of a plain (local) return; with the client VM the ratio is ten. See the code example and benchmark for details: NonLocalExit.java (javadoc).

The exception cloning technique is not widely used in part because Object.clone is not optimized. But an upcoming build of Hotspot fixes this by making clone an intrinsic which simply does a variable-length object allocation and a bulk copy. See the evaluation (mine!) of 2007-04-13 on bug 6428387. Even in existing deployed VMs, the cost of cloning is often less than the cost of walking the stack.

The bottom line is that languages on the JVM (including Java, if that’s the direction it goes) can implement non-local returns reasonably directly, and the JITs will turn them into local jumps in important common cases. In particular, when an internal iterator is properly inlined, a break in its body closure will not surprise the programmer with a performance pothole. In fact, it will compile to about the same code as the equivalent old-style for loop.

Join the discussion

Comments ( 3 )
  • Charles Oliver Nutter Wednesday, June 6, 2007
    Catching up on JRose posts...
    This is very good news. We use this technique in JRuby and have worried about whether it might constitute a performance bottleneck. But our independent testing and your numbers seem to confirm it's pretty far down the list of things to worry about.
    Given the fact that this performs very well, I see no reason for Java closures not to support non-local flow control. It seems like a no-brainer now.
  • John Cowan Wednesday, November 28, 2007

    What about overriding the fillInStackTrace method in a subclass of Throwable with a body that does nothing: does that speed things up?

  • John Rose Thursday, November 29, 2007

    John: Yes, removing fillInStackTrace also speeds things up. You get a nice, simple null stack trace.

    The clone technique captures a stack trace at the original creation site, which may (or may not) be desirable.

    Another way of controlling the cost of stack traces would be to have a way to fill in an abbreviated stack trace (say, just one frame) and constant-fold it in the JIT. That would provide another balance point between speed and debuggability. But that's an RFE for another day.

Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.