• JVM
    December 15, 2010

Scheme in one class

John Rose
One of the secondary design goals of JSR 292 is to give dynamic language implementors freedom to delay bytecode generation until an opportune moment. In other words, we want to encourage mixed-mode execution, bouncing gracefully between interpreted and compiled code.

Here is some background: Dynamic language implementations generally have a choice between interpretation in a language specific form and compilation to a platform-defined executable. On the JVM, compilation means bytecode spinning. An interpreter is typically an AST walker, but it might also be a loop over a linearized array of tokens or bytecodes. This latter format is sometimes called "semi-compiled". Semi-compiled code often runs about twice as fast as an AST walker. Some dynamic languages gain an additional boost by unrolling the token stream dispatch into JVM bytecodes, so that each interpreter action becomes a JVM method call. Basically, the interpreter token stream is translated using simple context-free bytecode templates. This might allow the author to claim direct compilation to the JVM, but there is still a central interpreter runtime library mediating every operation. Such calls are often difficult for the JVM to optimize. I suspect most such compilation is premature.

More background: HotSpot itself is designed as a mixed-mode bytecode execution engine. Java applications start running under a bytecode interpreter, and eventually "warm up" into optimized native code. The JVM is free to collect loads of profile information before it commits itself to optimized native code. There is also a "deoptimization" facility, for backing out of executions which the optimized code is not prepared to handle. But the goal is to keep the Java program running at the hardware level, with relatively few trips to interpreter or runtime support. Similarly, the best compiled code for dynamic languages boils down the original user code into real JVM primitives. As with the JVM, getting this "boil-down" correct might require lots of profiling information gathered by the language runtime. Trace-based compilation is an example of such optimization, because the dynamic language has been coaxed to reveal its important types and control flow transfers, which the compiler (whether native or JVM) can exploit. Meanwhile, when bytecodes need to have late-bound or re-bindable semnatics, because optimization decisions must be revisited, invokedynamic provides flexible fall-backs.

In order to explore the proposition that JSR 292 allows implementors to delay bytecode spinning, I decided to build a small Scheme interpreter which could execute interesting programs without dynamically spinning bytecodes. I chose Scheme mainly because of its attractive simplicity (though it has grown over the years). I also want to pay homage to an old exploit called Scheme in One Definition, in which George Carrette coded up a Scheme interpreter in a single file of C code. It is clearly time for Scheme in One Class.

The current version of SIOC lives in a single class file of about 55Kb. It is a fragment of a Scheme interpreter, since it can only perform variable definitions and function calls. (I may fix this, by writing a tiny Scheme compiler in Scheme itself, and bootstrapping into a semi-compiled form. Not sure if it is worth while yet.) Yes, an interpreter that cannot execute lambda or even cond is pretty boring, but there are some interesting bits to point out.

The first interesting bit (for me at least) is the lack of inner classes. Since those compile to separate, fragmentary class files, the SIOC experiment has to avoid them. There is a point or two in the SIOC code where a Java programmer needs them, notably this place where I use Arrays.sort to sort a bunch of method handles:

MethodHandle[] mhv = ...;
// try to move more general types to the back of the list
Arrays.sort(mhv, C_compareMethodHandles);

Clearly, the definition of C_compareMethodHandles must be something like this:
Comparator C_compareMethodHandles = new Comparator() {
public int compare(MethodHandle o1, MethodHandle o2) {
return compareMethodHandles(o1, o2);

But in fact it uses a "SAM conversion" API in JSR 292 to manage the same thing without (visibly) creating a new class file:
MethodHandle MH_compareMethodHandles = (uninstructive details here...);
Comparator C_compareMethodHandles =
MethodHandles.asInstance(MH_compareMethodHandles, Comparator.class);

This API allows any method handle to "masquerade" as any SAM type, at the drop of a hat.

The second interesting bit is the mapping from Scheme types to JVM types. Because this is SIOC, wrapper types (like "SchemeInteger") are not welcome. (I distrust language-specific wrappers anyway, as a hindrance to interoperability. SIOC is partly an exercise in detecting wrappers as well as avoiding bytecode spinning.) Scheme strings, vectors, lists, booleans, characters, integers, and floats are modeled by Java strings, object arrays, lists, and the obvious primitive wrappers. Scheme procedures are modeled by JSR 292 method handles. (Symbols are modeled by SIOC instances. Something had to give, there. The mappings have various flaws, especially with lists.)

A third interesting bit is the use of symbol mangling to implicitly define Scheme procedure names. For example, the Scheme procedure write is defined in Java like this:

private void F_write(Object x, Object port) throws IOException {
unparse(x, toWriter(port), true);

(Here, unparse and toWriter are local static routines.) Note that this function is private. The JSR 292 reflective lookup methods make it easy for classes to define local functions for use as method handles. The Scheme interpreter handles symbol lookup (in part) by probing for such Java definitions.

A related aspect is the handling of overloading. This shows up in the definitions of arithmetic functions:

private static int SF_P(int x, int y) { return x + y; }
private static long SF_P(long x, long y) { return x + y; }
private static double SF_P(double x, double y) { return x + y; }

Here, the "SF_" prefix means "I am a static function implementing a Scheme procedure, and "P" is a mangling for the plus sign character. (I would have liked to use exotic names for this, but they are not in JDK 7.)
When these three definitions are processed by the interpreter, they are combined under a single dispatching method handle which examines the types of its arguments and calls the appropriate function. (Because this is a toy program, the overload resolution is not very good. A real system would use a real metaobject protocol.)

Overloading raises the question of variable arity. The Scheme write procedure noted above takes either one or two arguments. (The second argument defaults to Scheme's version of System.out.) In SIOC, the one-argument version is expressed by an additional Java overloading:

private void F_write(Object x) throws IOException { F_write(x, get("output")); }

When the Scheme interpreter looks up write, it finds both Java definitions, and overloads them into a single method handle. Later on, when the method handle is called (via invokeGeneric), the number of arguments, as expressed in the calling type (a java.dyn.MethodType), is handed to the method handle, which enables it to select the correct overloading.

A larger example of variable arity is the Scheme procedure list, which accepts any number of arguments of any type. In SIOC, I arbitrarily gave it specialized overloadings for up to three arguments:

private static Object SF_list(Object x) {
return new ArrayList(Arrays.asList(x));
private static Object SF_list(Object x, Object y) {
return new ArrayList(Arrays.asList(x, y));
private static Object SF_list(Object x, Object y, Object z) {
return new ArrayList(Arrays.asList(x, y, z));
private static Object SF_list(Object... xs) {
return new ArrayList(Arrays.asList(xs));

The JSR 292 feature which enables variable arity calls is relatively new, discussed last July at the JVM Language Summit. The issue is with the flexibility of MethodHandle.invokeGeneric. As defined earlier this year, a suitably defined method handle (like one of the SF_list above) could accept any type of arguments from any caller, as long as the number of arguments (the arity of the call) agreed with the type of the method handle (its intrinsic arity). This is usually fine for most applications, but when arguments are optional or procedures are variadic (as in Scheme write or print), there is a semantic gap. Without further work, one method handle cannot serve as Scheme's list procedure.

One workaround for this is would be to define a wrapper type SchemeProcedure which bundles together method handles of varying arity, and have Scheme call sites perform the unwrapping. This would be a performance hazard in the interpreter. Presumably compiled code would be able to "boil down" the wrapper into a correct method handle. Another problem with this workaround would be interoperability with other languages. Instead of passing a Scheme procedure to another language's call site, with the contract that invokeGeneric will sort out both type mismatches and arity mismatches, a more complex metaobject protocol is needed to narrow down the type of the Scheme procedure, before it leaves Scheme's control. This seems wrong to me, though further experiments are needed in this area.

In SIOC, wrapperless variable arity is built on top of a concept called "type handlers". A method handle may optionally be equipped with a type handler, which handles all type mismatches encountered via invokeGeneric. If a caller invokes a method handle on the wrong number of arguments, and the method handle has a type handler, the type handler is consulted with the caller's intended type, and given a chance to define the call.

Here is some background: When invokeGeneric is applied to a method handle, there is a negotiation (with or without type handlers) between the caller's invocation descriptor and the method handle's intrinsic type. (The descriptors may be represented as MethodType objects.) If the descriptors match exactly, the method handle is invoked through its main entry point, its "front door". If the descriptors do not match exactly, the method handle is (virtually) converted to the caller's desired type, and invoked in its converted form. (The JVM will probably do something more clever than creating a new method handle just for one call.) If there is no type handler, the caller and callee types must agree on arity, and the equivalent of MethodHandles.convertArguments is used to pairwise coerce the arguments and return values, if such coercion is possible according to certain rules (a variation of Java method invocation conversions).

If this overloading trickery applied only to Scheme functions, it would be only mildly interesting, since Scheme does not boast many overloaded functions. But there is a final interesting point in SIOC which I would like to observe, and that is its ability to call Java APIs. The same mechanism that allows the interpereter to resolve write and other Scheme names also applies to Java APIs, given suitable conventions for representing Java names as Scheme symbols. (In this matter of Java integration I give deep deference to other, much better JVM Lisps, notably Kawa, ABCL (added), and Clojure. The present exercise simply aims to show the usefulness of JSR 292 for getting at Java APIs.)

Specifically, there are a number of Scheme symbol resolution rules which jump down the Java rabbit-hole. Here are some examples:

$ $JAVA7X_HOME/bin/java -jar dist/sioc.jar -i
> (write "hello")
> (set! n (+ 2 2))
> (list n (+ 2.1 2.3))
(4 4.4)
> java.lang.String
class java.lang.String
> java.lang.String#concat
> (import 'java.io...)
import java.io as ...
> (define f (File#new "manifest.mf"))
> f
> (.getClass f)
class java.io.File
> (.getSimpleName (.getClass f))
> (File#exists f)
> (define p (FileReader#new f))
> (set! p (BufferedReader#new p))
> (write (BufferedReader#readLine p))
"Manifest-Version: 1.0"
> (quit)

Getting at such APIs, randomly and interactively, has been impossible before JSR 292, at least without statically or dynamically spinning bytecoded adapters. Now it works well in a small program. One of JSR 292's benefits should be much easier access to Java from dynamic languages.

Update: Several people have noted that reflection allows this also. I am guilty of HotSpot-centrism here, because on HotSpot reflection internally uses bytecode spinning to gain performance. But it is true that reflection has always allowed programs, like Kawa and ABCL, to make ad hoc access to interactively selected APIs. There is further discussion of this point at http://groups.google.com/group/jvm-languages/msg/4a4b1e23bb0b2e75. The thing that is new with JSR 292 is that method handles, especially when used with invokedynamic, optimize comparably to hand-spun bytecodes.

That's the last interesting point I have to show for now. The fuller support for a Scheme compiler (if I get around to it) will bring semi-compiled Scheme code into the mix, still in exactly one class. As a sort of cheat, the Scheme code of the compiler itself will be in a resource file associated with SIOC. The next step after that would be (finally) spinning bytecodes from the semi-compiled representation, after initial execution to settle out global variable bindings, variable types, etc. Perhaps at that point I can borrow a class file spinner (coded in Scheme) from somewhere else, and still keep everything to one class file, with some associated Scheme files.

A final caveat: Some of these JSR 292 API elements (such as type handlers and SAM conversion) may not make it into this year's definition, and JDK 7. This is all bleeding edge stuff. You can find a recent API draft here: http://cr.openjdk.java.net/~jrose/pres/indy-javadoc-mlvm/
The code itself is at http://kenai.com/projects/ninja/sources/sioc-repo/content/src/sioc/SIOC.java?rev=7

Be the first to comment

Comments ( 0 )
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.