tailcalls meet invokedynamic
By John.Rose-Oracle on Nov 25, 2009
dynamic tail-call sites
Of course, the tail-call prefix should apply to invokedynamic as well as to the other invokes. The most recent design of invokedynamic has been kept clean enough to accommodate this. (The version of 2008 did not, since it allowed the bootstrap method to fulfill the initial execution of an invokedynamic site; this would have been very awkward to make tail-recursive.) The semantics of invokedynamic are (a) link the call site if necessary to create a reified call site object, (b) extract the current target method handle from the call site, (c) invoke the method handle as if by an invokevirtual instruction. To make this a tail call, only step (c) needs to be adjusted, and in the obvious way.
combinators which call method handles
There are a number of combinators that take an existing method handle M and create a new one M′ that calls the previous M. There are a number of cases where, if a caller invokes M′ as a tail call, he also rightly expects that M will be invoked as a tail call. In other words, certain adaptations of M should not spoil its tail-callability. Let us say in the case of such a combinator C, that C is tail-call transparent to its argument M. In Scheme,
APPLY is tail-call transparent to its first argument, while
FOR-EACH is not.
The cases appear to be something like this:
convertArguments— This one should be tail-call transparent, since all it does is adjust one method handle to an equivalent method handle of a different static type. There is a problem with return values, though; see next section.
invokeVarargs(etc.) — Any flavor of method handle invoker should be tail-call transparent. Many of them have the same problems with return type conversion as does
genericInvoker(etc.) — These are type-specific versions of the statically typed invokers, presented in the API form of method handle factories. They have the same issues as statically typed invokers.
spreadArguments(etc.) — Any kind of argument motion should clearly be tail-call transparent. Note that this requires somethign like a “stretchy” frame for the initiating caller, which can be opened up to hold larger argument lists than were sent by the initiator (the original non-tail caller). This problem is generic to most tail call mechanisms, so we can assume the JVM will have stretchy frames in the right places. (What is a good term for these frames?)
foldArguments— All of these combinators call a potentially complex transformation on the outgoing arguments, and then call the target M. They are tail-call transparent with respect to their first argument (the ultimate target M) but not with respect to any other method handles that are used to transform the argument list. (
collectArgumenthas an implicit call to an array constructor method; maybe we could make this optionally explicit.)
guardWithTest— This guy is an if-then-else; he clearly has to be tail-call transparent with respect to the second and third arguments, but cannot be with respect to the first, which is the predicate to evaluate.
catchException— You cannot seriously expect (I claim) to tail-call a method and simultaneously catch exceptions around it. But this guy should be tail-call transparent in the exception handler argument.
return value casts
MethodHandles.convertArguments combinator makes up the difference between a caller&rsqo;s and callee’s idea of a method’s static type; the callee specifies M and the caller matches the adapter M′. This is crucial on the JVM, which is mostly statically typed. If the return value types differ enough, then M′ might need to apply a cast, box, or unbox operation to the return value of M. This appears to spoil tail-calling, since M′ must “wait around” for M to complete in order to apply the return transformation. Do we want to force such a thing to look like a tail call?
I think the answer is yes. Note that the pending transformation is restricted to be a chain of casts, boxings, and unboxings. Such a chain can be represented in an approximately finite space. (Or so I claim, without providing all the details. It’s an exercise for the reader!) Thus, even if a loop is building up an infinite number of pending return value conversions, they can all be collapsed into a small record at the stable edge of the control stack. (By that I mean in the execution frame of the initiating caller, with a non-tail call, the problematic series of looping tail calls.) To do this will require some special pleading in the JVM, but I think it is a worthwhile investment.
arbitrary return value transformations
Suppose we support an arbitrary chain of low-level type conversions. Do we also support user-specified transformations on the return value, such as turning an object to a string or a string to a number? This immediately dispels any pretense of finite space. But we have garbage collectors to help us with infinite space. Perhaps it would be reasonable to add a combinator
transformReturnValue which tail-call transparent in both arguments.
It would work like this: First, a stretchy frame is lazily created to cache any pending return value transformations (including built-in ones like casting). Next, the return value transformer is linked into a heap-allocated list rooted in the frame. Then, the second argument is tail-called (keeping the stretchy frame, of course). When that call finally returns, the head element of the return value transformation list is popped and tail-called in its turn. Eventually, if the list is ever exhausted, the stretchy frame returns the last return value to its caller. This pattern seems useful and natural, as a complement to the regular stack-based recursion. I don’t think it can be coded at the user level, so maybe it deserves to be a VM-level mechanism.
It is possible to use
foldArguments to implement a looping state machine. The idea is to apply a variable, possibly infinite series of method handles (functions) to a fixed set of arguments (perhaps a state control block). The loop is initiated by calling
foldArguments on a target of
MethodHandle.invoke itself (any form), and require the fold function to look at the arguments and return a successor function to be the next one to look at the arguments. The successor function will either be the final function in the chain, which will return to the initiator. Or, if the successor function is another instance of
foldArguments, the process continues. If this is not to blow the stack, it requires tail calls within the
foldArguments combinator. Unusually, this use of tail calling does not require the user to issue an explicit tail call instruction, so it is plausible to require that this looping combinator pattern work even in the initial version of JSR 292, which lacks user-written tail calls.
This also raises the question of what is a more natural form of “Y combinator” for the JVM. If we had multiple-value return and/or tuples the options would be a little nicer. Any suggestions out there?
The combination of looping combinators and pending return value transforms may provide efficient ways to express generators. Note that a tree-walking generator, if it is not to blow the stack, has to do something like maintain a pending visit list, which maps nicely (I think) to a low-level return value transformer list.
bytecoded state machines
Dizzy yet? Strap in now; we’re going over the top… An irreducible use case (i.e., a use case without practical workarounds) for tail calls is machine-coded state machines. For this pattern, we can use tail calls to perform computed jumps from one state to another, when the successor state cannot simply be represented as a branch target. The nice thing about this pattern is that when successor states are few and static, conditional and unconditional branches are efficient, and not every transition needs to be a computed jump. This sort of partly-static control flow can be compiled pretty well.
An advanced example of this pattern is the code generated by PyPy, which is a dynamically growing set of machine-level basic blocks. Each block terminates with a computed goto (or switch, not sure which). This could be represented naturally in the JVM with a tail-call to a computed next block pointer. As an extra wrinkle, the switches are open-ended and can grow new successor cases. (This is how new traces are linked in.) The JVM way to express this, probably, is using invokedynamic to compute the successor method handle, so that the successor logic is patchable.
Where does this take us? Well, think of a tail-call as a providing a way for a block of code to replace itself with another block of code. Or, as a way for a method call to have a multiplicity of dynamically computed successors. I think a natural complement to this is to allow a method to have a multiplicity of dynamically computed predecessors also. Though it is a loose analogy, it suggests to me that tail calls would synergize well with bytecoded methods with multiple entry points. In this way, an executable state machine could be coded up as a bytecoded control flow graph, with both exit and entry points for the parts which are not statically definable. Method handles could be formed not only on the first bytecode of the method, but also on other (well-defined and verifiable) entry points. This would allow control to leave the state machine via a tail call, and re-enter it via a secondary entry point. Any type-state required (in locals) at that entry point would be verifiably related to the signature of the secondary entry point. I think it would be a slick way to build (say) compiled regular expression scanners, in a mostly non-recursive format. The occasional embarrassments like backtracking could be handled in a way decoupled from the basic state machine, and (particularly) from the host control stack.
Also, maybe, the type-state as of an exit from the state machine (i.e., a tail call) could be reified (opaquely for security) and passed along to the tail call itself. (I guess this is a kind of bounded continuation.) Later on, whenever the state machine needs to return to the source of the tail call, it could wake up the reified type-state and resume the call. This would be done without holding onto resources on the host control stack, and could be done more than once (for things like backtracking) on a given type-state.