Autonomous Methods for the JVM

Byte-coded methods in the Java Virtual Machine are perfectly suited for their main role today, which is to implement (more or less directly) methods defined in the Java Language, version 1.0.

Java classes and methods do not directly implement free-standing chunks of code found in other languages, such as function pointers, closures, delegates, or extension methods. As in the cases of inner classes, they can do a reasonable job of implementing such design patterns. But there are numerous overheads, in CPU cycles, memory, load time, application size, and (most importantly for language implementors) architectural complexity.

Because the JVM is a highly successful investment in efficient and robust support for byte-coded methods, it is worth while looking at teasing apart JVM methods from JVM classes. If this can be done well, the strengths of the JVM can be applied to programming tasks (scripting, functional programming) beyond the current scope of Java.

So, here is a design sketch for autonomous methods (AM) in the JVM. It is not complete, but should be suggestive of new ways to support languages beyond Java. The names Autonomous Methods and AM are pretty ugly, but all the good names for such things seem to be taken; if a proposal such as this catches on, it will naturally have to hijack a good name.

What's in an AM

An AM has an optional name, a type signature, and an optional receiver. These work the same as for a regular "class method" (CM). Unlike a CM, the receiver type of a non-static AM is part of the AM's signature, and can be any type. Allowed method modifiers are 'static', 'strict', and 'synchronized'. An AM is always 'public' The newer modifiers 'synthetic', 'bridge', and 'varargs' are also allowed. Other modifiers are reserved for future use.

An AM is associated with a class, even though it is not defined when that class is loaded. The AM has the same access rights as any CM in the associated class, and this is the only effect of the AM's association with that class. An AM's name need not be unique. If you succeed in creating an AM on a sealed class (like 'java.lang.String') you can access its private fields (like 'String.value'). This is not a security hole; see below.

The decoupling of receiver type from access rights removes a restriction against language methods like Ruby's String.squeeze [1] which really ought to look as much as possible like a Java method. An AM on an interface receiver type amounts to a generic function over that interface.

A useful term is in order: Define the "effective signature" of a method as that method's signature, with the receiver type prepended to the argument list, if the method is non-static.

An AM may also contain values of some of its arguments, which are called "pre-applied arguments". That is, AMs support currying. This is a simple and flexible basis for all kinds of closures. Unlike inner classes, it does not require an associated class to hold the data.

Other than the information exposed by java.lang.reflect.Method, AMs are totally opaque. Specifically, there is no way to find out whether there are pre-applied arguments, or what its bytecodes are, etc. The JVM hides such details to remain free to use dirty tricks for high performance.

Naming an AM

Methods are reified using the empty (marker) interface Function (in package java.lang). (So are CMs, for that matter.) There are interconversion methods between Function and with java.lang.reflect.Method, which allow (via the latter) reflective invocation.

There is no way (except perhaps in a debugger) to retrieve the set of AMs associated with a given class. If an AM reference is dropped, the AM can be garbage-collected individually.

When querying via reflection, the containing class of an AM is its associated class. AMs can be distinguished from other methods by the fact that they do not appear on their associate class's list of methods.

Defining a new AM

An AM can be defined in one of three ways: 1. by loading its bytecodes, 2. by executing a 'newmethod' bytecode, 3. by renaming or retyping a pre-existing method, or 4. by pre-applying arguments to a pre-existing method. (Case 1 could be subsumed by case 2, or perhaps case 2 is not necessary.)

To take these in turn:

1. Loading. The 'loadMethod' method on the TBD classloader API works like 'loadClass', except that it accepts a byte array containing a variant of the classfile format. This variant defines a single method in the context of a single class, the associated class. The signature specified in the class file is the method's "effective signature" (it includes any receiver type).

The 'loadMethod' call also accepts an optional array of objects, which are pre-applied to the resulting method. (See case 4.) The pre-applied arguments are associated with the resulting AM, and their types are removed from the its signature.

The JVM preserves security by refusing to let untrusted code load into sensitive packages, like 'java.lang'. It also requires that a class must grant permission to load new AMs associated with it; in untrusted code, the 'loadMethod' call must originate from the associated class itself. This is why AMs do not prove an attack on private fields like 'String.value'.

2. Allocating. The 'newmethod' bytecode has the following operand fields: A set of modifier bits, an optional Signature (Utf8) reference S0, a NameandType CP reference (N, S1), and a bytecode index BCI (referenced by offset, as with a branch). It pushes on the stack a new AM object with properties as follows.

The name of the new method is N. The associated class is that of the current method. The modifier bits are as given. The byte-codes are located within the body of the current method, at the given BCI. Bytecodes reached from that BCI must be unreachable except via 'newmethod' instructions referring to that BCI.

If S0 has one or more arguments A, the new method will pre-apply arguments of those types, popping them from the current stack.

The signature of the new method is S1, with the receiver type removed (if the 'static' modifier is absent).

3. Renaming. Library methods allow the name, modifiers, receiver type, or signature of a method to be changed. (The existing method is untouched; a new AM is returned.) If the signature is changed, casting or auto-boxing may be included to ensure low-level type safety. This is a non-primitive part of the proposal, in that the library methods probably can be implemented without help from the JVM. It is an elementary facility for creating adapter (or 'bridge') methods.

Converting a non-static method to a static AM moves the original's receiver type into the new AM's signature (in the primary position). Converting a static method to a non-static AM moves the original's first argument from its signature to its receiver type.

4. Pre-applying. The JVM supplies a library routine 'preApply' for associating a method with a group of one or more arguments. The routine returns a new AM, which when invoked on the remaining arguments, passes all the the arguments to the original method and executes it. This concept is called "currying a function" in the functional programming community [2]. The new AM has the same name, modifiers, receiver type, andassociated class as the original. Its signature is smaller, missing one argument type for each pre-applied argument.

Thus, the invocation of the original method happens in two steps, each with its own set of arguments. The second set of arguments can be empty, or can consist solely of the original method's receiver. The first set of arguments is carried around in the structure of the intermediate AM; this set of values is not visible to any API (except perhaps debuggers). The intermediate method can be invoked any number of times.

Pre-applied arguments are added to the end of the argument list. Looked at another way, the types of pre-applied arguments are removed from the end of the type signature of the original method.

A receiver argument (in a non-static method) cannot be pre-applied, unless the method is first converted to a static method, which turns the receiver argument into a plain argument.

(Note: There seem to be use cases for currying at either end of the argument list. Traditional currying pre-applies to the front, but pre-applying to the end seems slightly more suitable to existing JVM practices. Supporting either end in the JVM allows libraries to support the other end. Libraries must handle more compex argument shuffling anyway.)

A special case is "receiver pre-application", which consists of converting an object with a specified method (such as the method of a one-method interface) and turning it to an AM which is a proxy for that method on that object. This is likely to be common in practice, and may merit a special library function of its own.

Invoking an AM

An AM can be invoked reflectively by converting it to a Method and using Method.invoke.

There is a library function for wrapping any one-method interface around a suitably matching method, creating a 'proxy' object. (There is already a Proxy facility like this in Java, except that it cannot handle unboxed arguments.) In this way, bare methods can be adapted to arbitrary one-method interfaces. Combined with the inverse conversion (receiver pre-application), AMs can mediate API adjustments efficiently, without the argument boxing required by present mechanisms in 'java.lang.reflect'.

An AM can be invoked by an 'invokeinterface'. The AM itself is the receiver object, typed as a Function, and the name and calling signature must exactly match the AM's name and effective signature. The verifier allows a loophole here for the Function interface, and the JVM will dynamically check the AM's signature and receiver type, throwing a linkage error in the case of a mismatch.

Nameless AMs are not callable in this way. (Or perhaps they should be omni-callable.)

So far, this is enough to amply support functional programming styles. Note that the caller must know to push both the method and the receiver on the stack. The caller must be aware that there is a method explicitly involved in the call, not just a receiver, message name, and arguments.

So the design so far does not support dynamic object-oriented languages. They require a more seamless integration of object methods with AMs. There must be a way for the caller to push the receiver and the arguments on the stack and hope for the best, with some sort of safety net if the message is not properly received. In fact, the safety net must allow some way for the system to adapt the call site (dynamically) to new types as they appear. Finally, performance requires the call site to include some sort of "fast path" for frequently occurring receiver types.

Dynamic Invocation

This leads us to another new bytecode. The generic name would be 'invokedynamic', but let's call it 'invokemethod'. The 'invokemethod' bytecode works like 'invokeinterface' above, but its behavior is more interesting if the AM or receiver fail to have legal type and signatures. (In fact, a new bytecode is not needed if we extend the behavior of 'invokeinterface' in the sole case of the Function interface.)

The 'invokemethod' bytecode has two operands: A NameandType CP reference, and a second, arbitrary CP reference. It pops from the stack a method, a receiver, and zero or more arguments, as dictated by the CP signature. The method must be a Function or null; the receiver can be any reference.

Here are the specific cases. There are three ways for the call to succeed:

1. If the method argument is null, and receiver argument is non-null, and it has a (normal) method of the given name and calling signature, and that method is accessible, the call goes through, to the receiver. The method argument is ignored. This allows transparent access to ordinary Java APIs as a simple default.

2. If the method argument is non-static, and the receiver argument is reference-convertible to the method's receiver type, and the calling signature matches the method signature, the call goes through, to the method argument. (Note that an AM's receiver type can be java.lang.Object, so this is the universal case.) This is how a non-standard method (like Ruby's 'String.squeeze') could be invoked on a JVM-native object (like a 'java.lang.String').

3. If the method argument is static, and the calling signature matches the method signature, the call goes through, to the method argument. The receiver argument is ignored, and (as with reflection) is conventionally specified as null, if the caller knows it is a static call.

Typically, the method argument will be loaded out of some sort of call-site cache; it may be simply a static variable in the enclosing class. The method argument sort of stands there, waiting to be of use if a call needs help.

Case 1 is the optimistic case, where the receiver can take the call without help. Note that it is reasonable for a call-site cache to be initialized to a null reference, and this will work as long as the receiver actually handles the intended method.

Cases 2 and 3 could support a call-site pre-initialized to use an AM which implements complicated call handler, to perform language-specific dynamic loading and linking, or other bookkeeping. Or, these cases can support a "fast path mechanism" where the AM optimistically checks the receiver (and/or arguments) for expected types and then directly calls another method suited to those types. Note that the behavior of a single call site can evolve over time, simply by using different method arguments.

Failed Dynamic Invocation

If none of the above cases succeed, the call fails. All is not lost, because at this point the language implementor (who compiled this thing in the first place) gains control, and can determine present and future outcomes.

We assume that the containing class has been defined with an attribute called 'FailedCallHandler', which contains a CP reference to a class. When a call fails, the JVM resolves this reference to a class K and ensures it implements the following interface in java.lang:

_ interface FailedCallHandler {
_ _ Function failedCall(Method caller, String dope, int bci,
_ _ _ Function failure, String name, String signature,
_ _ _ Object receiver, Object... args );
_ }

The dope comes from the second operand to the bytecode instruction, the Utf8 reference. It is otherwise unused and uninterpreted by the system. It is useful to language implementors for encoding intentions about the call. The dope, name, and signature strings are interned.

The failure and receiver arguments are the first and second operands popped from the stack.

The result returned (which can be null) is used as a new method argument, and the process retried, as many times as necessary (or until the cows come home).

The JVM is free (but not required) to memoize values returned by failedCall for the same call site, the identical failing method, and subtypes of the receiver type. (This needs more thought.)

It is left as an exercise for the reader how to build a high-performance, natively executing dynamic language on top of a JVM extended this way. I believe it is quite possible.

References:
[1] http://www.rubycentral.com/book/ref_c_string.html#String.squeeze
[2] http://en.wikipedia.org/wiki/Currying

Comments:

John,

Do you think Autonomous Methods will be able to access private object state as well as private class state?

Posted by Peter firmstone on October 14, 2009 at 08:20 AM PDT #

Peter: Yes, if the creator of the autonomous method has access rights in the first place. This is how method handles work in JSR 292. Method handles are a variation on the idea of autonomous method. See my VMIL paper on this and many other aspects of method handles: http://blogs.sun.com/jrose/entry/vmil_paper_on_invokedynamic

Posted by John Rose on November 10, 2009 at 07:47 AM PST #

Post a Comment:
  • HTML Syntax: NOT allowed
About

John R. Rose

Java maven, HotSpot developer, Mac user, Scheme refugee.

Once Sun and present Oracle engineer.

Search

Categories
Archives
« February 2015
SunMonTueWedThuFriSat
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
       
       
Today