X
  • JVM |
    Thursday, April 17, 2008

method handles in a nutshell

By: John Rose | Architect
The JVM prefers to interconnect methods via static reference or
dispatch through a class or interface. The Core Reflection API lets
programmers work with methods outside these constraints, but only
through a simulation layer that imposes extra complexity and execution
overhead. This note gives the essential outlines of a design for
method handles, a way to name and interconnect methods without
regard to method type or placement, and with full type safety and native
execution speed. We will do this in three and a half swift movements...

1. Direct method handles


Given any method M that I am able to invoke, the JVM provides me a way
to produce a method handle H(M). I can use this handle later on, even
after forgetting the name of M, to call M as often as I want.
Moreover, if I provide this handle to other callers, they also can
invoke M through the handle, even if they do not have access rights to
call M by name. If the method is non-static, the method handle always
takes the receiver as its first argument. If the method is virtual or
interface, the method handle performs the dispatch on the receiver.

A method handle will confess its type reflectively, as a series of
Class values, through the type operation.

In pseudo-code:

MHD h1 = H(Object.equals);
MHD h2 = H(System.identityHashCode);
MHD h3 = Hs(String.hashCode);
assert h1.type() == SIG[(Object,Object)boolean];
assert h1.invoke(r1,a1) == r1.equals(a1);
assert h2.invoke(a2) == System.identityHashCode(a2);
assert h3.invoke(r3) == r3.invokespecial:String.hashCode();

The actual name of the type MHD will be given shortly.
The actual API for H and Hs is uninterestingly straightforward, and
may be found at the end with the other details.

To complete the low-level access (and fill a gap in the Core
Reflection API), there is a variation Hs(M) which forces static
linkage just like an invokespecial instruction, and is
allowed only if I have the right to issue an
invokespecial instruction on M.

From the JVM implementor’s point of view, there are probably
three or four distinct subclasses of direct method handle,
corresponding to the distinct varieties of invoke instruction.
To round things out, one kind of method handle should work for
invoking a method handle itself. These are low-level concerns,
which hide nicely behind the H (and Hs) operator described above.

2. Invoking method handles


Given a method handle H, I can invoke it by issuing an
invokeinterface bytecode against it. The signature I use
must exactly match the original signature of the target method. (Even
beyond the spelling, the linked meaning of class names must be the
same, in the argument and return types.) The method name I use must
always be invoke (not the name of the target method).
In pseudo-code:
MHI h1 = ...;
h1.invoke(a1...)

The type MHI is special interface type known to the JVM.
(Its actual name will be given shortly.)

MHI functions as a marker interface to tell the JVM that this
occurrence of the invokeinterface bytecode must be treated
specially, different from all other interface invocations. For one
thing, normal JVM linking rules cannot apply, because the signature of
the call site relates to the target method, not to the marker
interface.
This kind of call site works on direct method handles (type MHD)
created in part 1 above. In a moment we will drop the other shoe
and observe that it works on other types of method handles.

The invokeinterface instruction is uniquely suited for
this sort of JVM extension, because the result for bytecode
verification allow any object to serve as the receiver of an interface
invocation.

3. Adapting method handles


The type MHI provides a very flexible jumping off point, for the
bytecodes of one method to call any other method, of any given
signature. The next question is whether the calling method and
receiving method have to agree exactly on the signature, and the
answer is “no”. This brings us to the third and final
major design point, of adapting method calling sequences.

The most important case of adaptation is partial invocation
(sometimes known as currying or binding).
A direct method handle by itself is really quite boring
because, unlike nearly everything else in an object-oriented
system, it is pure code, with no data to modify its meaning.

Thus, given a method handle and some arguments for it, the JVM will
give me a partial invocation of that method handle, which is
the new method handle that remembers those arguments, and, when
invoked on the remaining arguments, will invoke the original method
handle with the grand total set of arguments.

At the very least, the JVM is willing to let me specify the first
argument R of a virtual or interface method handle H(M), because that
lets it perform method dispatch when the handle is created, and hand
me back a method handle Adapt(H(M),R) that not only remembers the
argument R, but has also pre-resolved the method dispatch R.M.
This special case of partial invocation, sometimes called “bound
method references”, is enough of a hook to let programmers
introduce the usual object-oriented flexibilities into method handles.

In pseudo-code:

MHD h1 = H(Object.equals);  // SIG[(Object,Object)boolean]
MHB h2 = Bind(h1, (Object)"foo");
assert h2.type() == SIG[(Object)boolean];
assert h2.invoke(a2) == "foo".equals(a2);

The type MHB stands for a bound method reference. (Please wait a
moment for its actual spelling.)

3.5 Further adaptation


As long as we are messing with arguments, there is a fairly
unsurprising range of other adaptations that arise naturally from the
richness of JVM signatures, and the conversions that apply between
various data types. (The details of varargs and reflective invocation
also bear on this design.)

Specifically, given two method signatures (A)T and (A')T', and a
method handle H(M) of type (A)T, there is a library routine which will
create me a new method handle H' = Adapt(H(M), (A')T). It is my
responsibility to help the library routine match up the corresponding
arguments of the two signatures, to direct it to drop unneeded
arguments in A', to supply preset values for arguments in A missing in
A' (this is where partial invocation comes into the general picture),
and to tell it of the presence of varargs in either signature. The
library is happy to insert casts, primitive conversions, and boxing
(or unboxing) to make the arguments match up completely.

Here are some pseudo-code examples:

MHD h1 = H(String.concat);  // SIG[(String,String)String]
MHA h2 = Adapt(h1, SIG[(String,String)String], $1, $0);
MHA h3 = Adapt(h1, SIG[(String)String], $0, $0);
MHA h4 = Adapt(h1, SIG[(String)String], $0, ".java");
assert h2.invoke(a,b) == b.concat(a);
assert h3.invoke(c) == c.concat(c);
assert h4.invoke(c) == c.concat(".java");

That is a longish step beyond bound method references, but I believe
the sweet spot of the design will supply a flexible set of method
signature adaptations (including currying), and let JVM implementors
choose how much of that the JVM wants to take responsibility for.

At a minimum, bound method references must be special-cased by the
JVM, but everything else could be supplied by a Java library (one
which is willing to dynamically code-generate many of its adapter
methods).

At a maximum, the JVM could supply a Swiss Army Knife combinator which
interpretively handles all possible argument wrangling. This is
probably the right way to go for HotSpot, since the HotSpot JIT is as
well suited for optimizing complex adapters as simple ones, and having
the complex ones appear to the compiler as single intrinsics is no big
deal.

Breaking the suspense: And the name of the winner is...


So we have four different types floating around:
  • MHD - a direct handle to a user-requested method (either virtual or static)
  • MHI - the magic type which warns the JVM of a method handle call site
  • MHB - a bound method handle, which remembers the method receiver
  • MHA - a more complex adapted method handle

I can see no particular benefit in distinguishing all these types in
an API design. Therefore, I believe the proper spelling for all these
types is something all-encompassing: java.dyn.MethodHandle.
Clearly there will be other types under the covers, such as the
concrete types chosen by the JVM for specific direct method handles
(MHD), or various implementation classes of adapted methods (MHB,
MHA). But there is no reason to distinguish them to the user.

However, one specific case of bound method handles is important to
consider from the user’s viewpoint. If a receiver object R has
a public method (in a public API type) already named
invoke, with a signature of (S)T, then R is already
looking very much like a bound method handle for its own
invoke method, with signature (S)T.

For completeness of exposition, let’ll give this kind of
non-primitive method handle its own informal type name:

  • MHJ - a Java object that implements MethodHandle and a type-consistent invoke operation

So, at the risk of adding a chore to the JVM implementor’s list,
I think an object of such a type (MHJ) should serve (uniformly in the
contexts described above) as a method handle. (It is may be necessary
to ask that R implement the marker interface and the
type method; but is something the system could also
figure out well enough on its own.) I admit that this is not a
necessary feature, but it could cut in half the number of small
method-like objects running around in some systems.
And the MHA implementation above probably requires an MHJ anyway.

Background: How did we get here?


One of the biggest puzzles for dynamic language implementors on the
JVM, and therefore for the JSR 292 (invokedynamic) Expert Group, is
how to represent bits of code as small but composible units of
behavior. The JVM makes it easy to compose objects according to fixed
APIs, but it is surprisingly hard to do this from the back end of a
compiler, when (potentially) each call site is a little different from
its neighbors, and none of them match some fixed API. The missing
link is an object which will represent a chunk of callable behavior,
but will not require an early commitment to a fixed calling sequence.
In theory-language, we want an object whose API is polymorphic over
all possible method signatures, so the compiler (and runtime call site
linker, in turn) can manage calls in a common framework, not one
framework per signature.

Put another way, we cannot represent all callees as
Runnable or Callable, because fixed
interfaces like those serve just a subset of all interesting call
signatures. APIs which attempt to represent all possible calls,
notably Java’s Core Reflection API, simulate all signatures by
boxing arguments into an array, but this is a simulation (with
telltale overheads) rather than a native JVM realization of each
signature.

We know signature polymorphism is powerful, from our experience with
many dynamic and functional languages. (For an old example, consider
the Lisp APPLY function, which is an efficient but universal call
generator.) Integrating such polymorphism into the Java language is
challenging; that’s why the function types in Neal
Gafter’s closures proposal are a significant portion of the
specification.

Happily, it is a simpler matter to integrate signature polymorphism
into the JVM. As part of the JSR 292 process, I have been worrying
about this for some time. The result is the present story of method
handles which (a) JVMs can implement efficiently, which (b) are useful
to language backends, and which (c) have a workable Java API. That
last is actually the hardest, which is why I have not given it yet.
(See previous paragraph.)

Before giving the API, I want to emphasize a few more points. First,
method handles (per se) are completely stateless and opaque. They
self-report their signature (S)T (via a type operation
on MethodHandle) but they reveal nothing else about their
target. They do not perform any of the symbol table queries supplied
by the Core Reflection API.

Every native call site for a method handle is hardwired with a
particular signature. Compiler writers have every right to expect
that, if the target method has a similar signature, the call will have
only a few instructions of overhead. Likewise, a method
handle’s signature is intrinsic to the handle, and completely
rigid. Calls to near-miss signatures will fail, as will violations of
class loader naming consistency.

Besides signature simulation, one serious overhead in the Core
Reflection API is the requirement that, on every call to a reflected
method, the JVM look at the caller’s identity and perform an
access check to make sure that he is not calling someone else’s
private method. The method handle design respects all such access
checks, but performs them up front at handle creation, where
(presumably) they are more affordable. But you can publish a handle
to your own private method, if you choose.

One use case (which I have used to test the quality of this design) is
whether it can be used to re-implement the invoke
functionality in the Core Reflection API, for better speed and code
compactness. This has long been a sore spot for language implementors
(for reasons detailed above). This one reason I have included varargs
in the competency of the method adaptation API.

The calling sequence for a method handle (in part 2 above) will be
approximately as fast as today’s interface invocations.
Searching for an invoke method in a receiver is the same
sort of task as searching for an interface (and its associated
“vtable”, if you use such things). The search can be sped
up by the usual sorts of pre-indexing. A JVM-managed method handle
will advertise its signature prominently in its header, so that a
pointer equality check (remember, signature agreement is exact) is all
that needs to happen before the caller jumps through a hardware-level
function address.

Details and a hasty exit


Finally, here is a sketch of the API:
package java.dyn;
public interface MethodHandle /\*>\*/ {
// T type(); public R invoke(A...);
public MethodType type();
}
public interface MethodType {
public Class parameterType(int num); // -1 => return type
public int parameterCount();
}
public class MethodHandles {
public static MethodHandle
findStatic(Class defc, String name, MethodType type);
public static MethodHandle
findVirtual(Class defc, String name, MethodType type);
public static MethodHandle
findSpecial(Class defc, String name, MethodType type);
public static MethodHandle
unreflect(java.lang.reflect.Method m);
public static MethodHandle
convertArguments(MethodHandle mh, MethodType newType);
public static MethodHandle
insertArgument(MethodHandle mh, Object value);
...
// The whole enchilada:
public static MethodHandle
adaptArguments(MethodHandle mh, MethodType newType,
String argumentMovements, Object values);
}

That’s it, in a nutshell. Perhaps rather large coconut shell.
Actually, quite small, if you are used to Unix shells.

You will have noticed that there is no way to call these guys from
Java code, unless you assemble yourself a class file around the
required invokeinterface. It is simple enough to create
a Java API for calling method handles. Getting performance beyond the
reflective boxed-varargs style of calling is a little messier, but
doable. Dynamic language implementors solve this sort of thing as
they fight to remove simulation overheads from their system. Given
closures in Java, there would be nicer bridges for interoperability, to say
nothing of implementing closures on top of method handles.

But the point is not calling or using these things from Java; the
point is using them, down near the metal, to assemble the next 700
witty and winsome programming languages.

Join the discussion

Comments ( 6 )
  • Neale Friday, April 18, 2008

    At the risk of sounding like I don't get out much, I'm really pleased to see the progress being made on JVM, or should I say MLVM, changes.

    As someone who uses one language evolution of Java that leverages the VM, AspectJ, I think that the VM changes are \*the\* crucial stream of work on Java 7, especially where the VM forces less efficient approaches.

    Keep up the great work!


  • Andrea Francia Friday, April 18, 2008

    Compiler error: You use MethodHandle.getType() in the pseudo code but you have defined MethodHandle.type() in the API sketch.


  • John Rose Friday, April 18, 2008

    Neale: Thanks. Working on it...

    Andrea: Grazie. I simplified getType => type.

    There's some conversation on this over at http://groups.google.com/group/jvm-languages/t/f8df67386ad3c17d .


  • Mayur Patel Monday, October 6, 2008

    Great Work.

    Are you folks also considering a way to extend classes in Java the way "prototype" extends a class in JavaScript?


  • Emmanuel CASTRO Wednesday, November 4, 2009

    Is there any bridge between the old Method class and the new MethodHandle?

    I understand that while Methods allow users to get information about method that they can't invoke, MethodHandle cannot.

    I suppose such a bridge is easy to build (just take the Class, the method name and the parameter types). Am I wrong?


  • John Rose Tuesday, November 10, 2009

    Mayur: We have some ideas on the back burner; hopefully I'll have time to blog them some day.

    Emmanuel: The bridge is java.dyn.MethodHandles.unreflect, which takes a java.lang.reflect.Method, checks it against the caller's access rights, and returns a method handle.


Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.Captcha