interface injection in the VM

Or, how to teach an old dog new tricks.

Introduction

“Self-modifying code...” used to be a phrase always uttered (by us hackers) with tones of both admiration and dread. Admiration, because there are stories from the earliest days of the stored program computer of how impossibly clever programmers would make their code flip state with perfect grace, simply by modifying (as data) an instruction it was about to execute. Dread, because when we tried the graceful flip on our own, the usual result was... less graceful. Painful, actually. Yet many of us all have a self-modifying code story, somewhere back in time, that we view with pride, perhaps like the bowler’s perfect game, or the golfer’s hole-in-one.

Is self-modifying code still an object of fear? It goes in and out of fashion. Operating systems and VMs are required to support it (always, in the loader). Aspect oriented programming has made a cottage industry of it, and I haven’t heard the horror stories yet, nor the backlash, that sometimes turns such things from a hip, edgy exploration, into a firearm on the playground. Though a longtime practitioner (I’m a JVM nerd), I still fear it, and when I hear customers ask for an API to edit classes in the JVM, I always reach for an alternative, a prescription subtitute for the illegal substance. Inteface injection is a good substitute for a surprisingly wide range of use cases; perhaps it can handle your use case for self-modifying code also.

(And if not, I will reach for yet more substitutes. Most any state change in your program could be modeled as self-modifying code, even a variable rebinding. [partial edit deleted] But it is a very powerful measure, liable to disastrous consequences from even small mistakes, and very hard to implement efficiently in the poor JVM which is loading the new bytecode. Not only do you have to load the new code, you have to undo the relevant effects of the old code, and there is always the temptation to “diff” the old and new versions, so as to avoid undoing and redoing everything. But diff-patching something that complex leads you down a long path of painful bugs.)

Ramble over. Now to business...

Interface injection is additive

Interface injection (in the JVM) is the ability to modify old classes just enough for them to implement new interfaces which they have not encountered before. Here are the key design points, in brief:
  • When an interface is injected into a class, any unimplemented methods must be supplied at the same time.
  • If a method is injected with an interface, it is not accessible except via that interface itself. (It does not alter or interfere with virtual dispatch or name linkage.)
  • When an interface is injected into a class, it is visible (via normal runtime type checking) on all instances of that class, whether pre-existing or created later.
  • If a type check ever finds that a given class does not implement an interface, that interface cannot later be injected into the class.
  • Every injectable interface is equipped with static injector method, which is solely responsible for managing the binding of that interface to any candidate class.
  • For any given class and injectable interface, the injector method is called the first time the question arises whether the class implements the interface. (This could be an invokeinterface, an instanceof, or a reflective operation.) Just before the decision, the class is called an injection candidate.
  • A class can be a candidate at most once. The decision made that point is final. (Except maybe for power tools like debuggers, etc.)
  • If the injector method must supply missing implementations of interface operations (this is the general case), they are supplied as a collection of method handles.
  • For any given class and injectable interface, the injector method is called the first time the question arises whether the class implements the interface. (This could be an invokeinterface, an instanceof, or a reflective operation.)
  • The decision made by the injector method (no, or yes with needed methods) is final.

For example, the Java class String implements the useful interface Comparable. This interface was not present in 1.1, but was added in 1.2, as a compatible extension to the 1.1 API. (This is a nice thing about interfaces: They coexist nicely. Java super-classes are by contrast territorial; there can only be one per class.) As it happens, the compareTo method was pre-existing in 1.1.

Suppose, for argument’s sake, that Java did not have the notions of comparability and sorted collections. With interface injection, a language runtime could add these notions (for its own use) in a modular way. It would define (in its own package, not in java.lang or java.util) the relevant interface and collection types. The language runtime would then define an injector method which knows about all standard classes (like String and Integer) to which the language wants to assign its idea of comparability.

When the program starts to put system-defined types (like String) into a sorted collection, there are type checks or interface invocations against the injectable interface. This leads to decisions about injecting the new interface to the old classes. The feel of it (though not all the details) is like the early linkage phase that Java programs go through when they first execute their symbolic references. The system types that the language runtime cares about are retrofitted with the needed interface, and Java strings coexist smoothly with language-specific collections.

Perhaps the language runtime handles an unexpected candidate class by inspecting it, looking for a compareTo method that it understands. This pattern matching is open-ended, limited only by the imagination (and good taste) of the language designer.

Fast forward to the present: We already have Comparable, but think of an interface that some non-Java language needs. A simple example would be a different flavor of serialization, like Smalltalk’s inspect operation, which can output a more or less human-readable Smalltalk program for reconstituting the object. Now that the JVM world is not all about Java, it is no longer possible for the person writing in java.util to reach over and add a few lines of code to each system type in java.lang. But interface injection can do this, without the need to introduce new code into the system classes.

Consider the current crop of dynamic languages (Groovy, JRuby, Jython, etc.). Most or all of them have some sort of master interface that provides access to their dynamic lookup mechanisms (aka. their metaobject protocol). For example Groovy has this master interface:

package groovy.lang;
public interface GroovyObject {
    MetaClass getMetaClass();
    ...
}
(By the way, thanks to Guillaume Laforge, for raising this example at Charlie Nutter’ excellent “Road to Babel” session today at Moscone. This blog post is for all the groovy people...)

The GroovyObject interface has other methods for picking apart the object via methods and properties, but the getMetaClass operation is the only one really necessary, since the other operations (like getProperty) can just as well be placed on the metaclass object. This is the style of coding that HotSpot uses internally, in its C++ code, and is close to the style of code that Attila Szegedi has adopted in his admirable dynalang project.

Interface injection does not simplify Groovy’s task of deciding how to bind new Groovy methods to old types. (There are lots of them, like java.lang.String.tokenize. And java.lang.String.execute will never be a real Java method, since it executes the string as Groovy code!) But interface injection radically simplifies the process of operation dispatch, since a Java string can be linked to its Groovy’s metaclass in one call to getMetaClass.

The current system must (in the general case) fetch the string’s Java class (using getClass) and somehow do a table lookup on that class to find the Groovy metaobject. This table lookup defeats JIT optimizations, since the JIT cannot reasonably know the contents of the ad hoc type mapping table. But it routinely optimizes interface calls; the JIT can ask the JVM whether the string implements the GroovyObject interface. If it then inlines the getMetaClass call, it then has an excellent chance of inlining the ultimate call to tokenize or execute or whatever.

Making it fast

This design is not slow, although it has a clunky bootstrapping behavior. (Perhaps there is a way to make it more declarative, at least in common use cases...) For example, the method handles supplied by the injector method can be almost as directly invoked as normal predefined interface methods.

JVMs use a variety of indexing structures to organize classes and their interface methods. Typically, there is a runtime lookup, often involving a short search, when invokeinterface executes. This search finds the interface in the receiver object’s class, and then loads the method from some sort of display of that interface’'s methods in the receiver class.

If virtual calls use a long array of method pointers called a vtable, then interface calls may well use a short array of method pointers (specific to one interface only) called an itable. The invocation operation first finds the itable in the receiver class, and then jumps through the relevent method slot. This is how HotSpot works; it is not unusual, but as I said there are many variations on this theme. The irreducible minimum is a quick search or test, and a jump.

Anyway, the search or test has the potential of failing. The JVM (verifier or no) can present an invokeinterface instruction with an object which does not implement the desired interface. The result is something like an IncompatibleClassChangeError. The important point to notice is that the search is followed by a slow error-reporting path.

Interface injection can work smoothly in most any JVM by putting fallback logic into that slow path, between the fast lookup of predefined interfaces, and the final error report. If the normal search comes up with a little itable embedded in the receiver class (as in HotSpot), the fallback search can also come up with a little itable, linked after the fact into the receiver class. In essence, the interface lookup degrades, at worst, into a linked list search. But there are all the usual optimizations that can apply, since the JVM and JIT know it all.

Applications

Let’s quickly overview some of the applications of this design for continuations.

Metaobject protocols

This application was sketched above. Every language can have its own metaobject, and they can coexist, reusing the original system objects without modifying them.

Traits

Languages which define traits (the structural version of nominal interface types) can readily implement themm, and most efficiently, on the JVM via interface injection. Each trait is called via an injectable interface, with a corresponding class (a nested class in the interface, I would say) which carries the trait implementation, as a set of static methods. Note that method handles allow you to mix and match static and non-static methods as long as the signatures line up.

Traits sound exotic, but they usually amount to utility methods on interfaces. With traits, most of the static methods in (say) java.util.Collections could be re-imagined as extension methods on the various collections interfaces. Actually, the injectable interfaces would be subtypes derived from the main interfaces.

Numeric towers

Numeric towers (as in Scheme) are difficult to engineer well, and exceedingly difficult to engineer in a modular way. The acid test of a numeric tower is whether you can add a new numeric type (like complex, or rational, or formal polynomials) without changing a line of code in the old types. Interface injection gives a hook for constructing doubly-dispatched binary operations (like addition) in terms of type-specific unary operations. For example, complex would define operations like addComplex, and inject them into previously defined number types like Integer.

Virtual statics

Sometimes it is helpful to define a value which is shared by all objects of a given class, but which subclasses can override. It is as if a static variable were also declared virtual. A canonical example of this might be the sizeof operation, which gives the size in storage units of a class’s instances. If you are willing to ask instances (not the class itself) for the shared value, interface injection can be used to define constant-returning methods on relevant classes. This is actually just a generalization of the getMetaClass pattern. As in that case, if the JIT can predict the class of the receiver, it can constant fold the “static” constant, and optimize from there.

Conclusion

This feature is under consideration in the Da Vinci Machine Project, and some form of it may make its way into the JSR 292 standard. Remember my peeve about self-modifying code? The original form of JSR 292 includes a vague promise of considering features for class extension.
Comments:

_Surely_ this is something that should be done at compile-time, not at run-time? At least for 95% of cases, including your example. And the other 5% must be ugly enough to justify finding another way for the sake of maintainability.

Posted by Greg M on May 05, 2008 at 08:50 PM PDT #

Scala's combination of trait-based mixins (for static composition) and implicit conversions (for views of unchangeable libraries that permit separate compilation) are illustrative examples of how this can be done without changes in the VM. It should be noted, however, that in the former case, many kinds of changes that one might want to make to a trait will necessitate recompilation of classes to which it has been "mixed-in."

Posted by Alex on May 07, 2008 at 03:49 AM PDT #

oh, not interface.... it is redundant somehow...why not simple do it like CGLib way? no interface required.... or open up the runtime calling stack/complier(oops, dont know the right name for it, maybe the new invocation method handler?)

or, will the new method handler flexible enough to allow something more than AOP, like multiple inheritance, dynamic inheritance, 2 way relationship, exception resume?

Posted by john wang on November 07, 2008 at 06:28 AM PST #

Sounds a lot like Objective-C categories?

Posted by guest on May 02, 2012 at 02:09 PM PDT #

Dear Mr. John Rose! How do you think about this today? I could make very good use of such an extension in the JVM.
Currently I'm trying to resolve something I call the model composability problem (in a sense Ossher and Harrison exposed it in their subject-oriented programming paper).

Suppose a model M that builds up from data entities and polymorphic operations. Two programmers write extensions A,B to the model in a reuseable fashion that is, without modifying the source of M. The challenge is how to compose M, A and B into a model C.
I figured out a series of patterns but each one has its limitations either in terms of flexibility or performance.

One of my patterns uses method handles. It works on system classes but doesn't eliminate the use of a lookup map for dispatch.
Load-time class modification would be elegant but as far as I know it's not allowed on system classes as java.lang.String.

Interface injection as you described seems the best solution to me and possibly to writers of interpreters, compilers, and more generally model transformers.
Why is it good:
-it is pure java
-it stays in the strongly typed world
As a consequence it can be embedded into the core of the JVM and it can be really fast.
If this was a JSR I would vote on it.

Posted by Kálmán Kéri on July 22, 2012 at 08:00 PM PDT #

Post a Comment:
  • HTML Syntax: NOT allowed
About

John R. Rose

Java maven, HotSpot developer, Mac user, Scheme refugee.

Once Sun and present Oracle engineer.

Search

Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today