5/09/2007, 2:00-3:00, Westin Civic CR
Attending:
John: JSR 292 needs insight into pain points implementing dynamic langs. Our basic thought: Method invocation in the JVM is too Java-specific. We probably need one more kind of invocation which is not type-bound, and allows systems to adapt to mismatches between caller and callee signatures. Other requirements may appear flowing from that basic fact.
Question: Are we missing any Big Idea for dynamic lang implementation?
Problem: Emulating Python call frames. There is a “current frame” API, which is infrequently used but costs on every call. (Cost includes a ThreadLocal.get.)
Possible solution: Use JVM’s debugging interface to get a bytecode-level view, and have Jython’s byte compiler save away its own local variable mappings. Questions: Would this work in the current JVM? Where are the pain points? Do we need adjustments to the JVM functionality? Or is the current API OK?
(Note that the JVM already has “vframes” and deoptimization to manage the correspondences between optimized frames and the bytecode level VM model. Need some sort of user-level vframes or deoptimization requests? Hopefully not.)
Implementation tricks:
Cross-languge interoperability
Feels expensive and complex to wrap one method per class. (Sometimes we can batch several methods at once, but not always.) Charles N: want to be able to have O(10\*\*5) of small dynamically generated methods JRuby has a JIT which makes small classes => want small-scalling method loading
Current VM overheads for one-method classes:
Possible VM trick: Reuse generated methods by adding extra parameters and currying those. E.g., Object.get(int, extractList:{Object=>List})
. The VM can simulate this with inner classes, or it might be worth building into the methods themselves, as they are called by invokedynamic.
Current implementations (Jython, JRuby, not Groovy) use wrapping in various places to manage variations in object format and calling sequence. Argument lists are wrapped in object arrays. Numbers are wrapped. System types (like List, String, Integer) are sometimes wrapped as “foreign pointers”. Stack frames are even wrapped! (To provide debuggability; very expensive.) Language implementors would prefer to avoid wrappers.
One other major need for wrappers: Languages usually have a “top type” so that all language operands can be verified as handling the common methods of the top type. (Ex: PyObject, which is a concrete class.) An invokedynamic opcode can relax this requirement, allowing a more Groovy-like direct treatment of foreign objects.
Some unwrapping is already done in language implementations:
Disadvantages of wrappers:
Dynamic languages cannot abandon all wrappers, because there usually must be a fallback to a fully general calling sequence.
Long-term goal is to be wrapperless. (Compare early Smalltalk and Java implementations with “handle tables”.)
Hard parts to emulate with wrapperless objects
Tricky bits:
Optimization goal: Make small integers (indexes) look like ints to the JIT. JVM ints get optimized better: range check elimination, subtype analysis, loop unrolling, etc. All primitive types get preferred treatment with native CPU registers and instructions.
(This is according to John, who is looking for comments and corrections.)
We want invokedynamic (like other invokes) to be as direct as possible in more than 99% of the calls. This allows your typical good JIT to do its tricks of optimistic receiver prediction, procedure integration, and (then) cross-call optimization.
Typical scenario: First use of invokedynamic instruction has an optimistically assigned caller signature. In the 80% case, this matches a signature directly on the receiver object and the call passes type checks. (Why? Because compile-time signatures are carefully chosen, base on actual usage patterns. Hopefully not just because great programmers are lucky.) In that 80% case, no further efforts are needed; the VM checks a class or two and branches directly.
In the 20% case, the invokedynamic instruction has to do something like dynamic linking. Each language has already planted a handler on its own call sites, probably a class-valued per-class attribute. The handler is invoked, and the dynamic lang runtime does something slow and complicated. It matches up the caller’s intended signature and the dynamic types of the arguments with the callee’s capabilities. The runtime handler then comes up with an adapter that accepts the caller’s signature, but then checks, shuffles, reformats, and coerces argument types to the callee’s preferred entry point signature. The adapter is passed down (as a sort of method pointer TBD) to invokedynamic, which remembers it. The call goes through, and so do the next 1000 calls, using the same adapter every time. When the VM’s JIT kicks in, it integrates the adapter along with all the other bytecodes.
Note that the caller can really win if it guesses rightly which arguments are strongly typed, e.g., as unwrapped ints. In the case of an out-call to a Java object, the dynamic language compiler can often guess the right type. (Maybe List.get(int).) In such cases, the caller’s signature and callee’s signature agree, and the only missing bit is a type check on the receiver. The invokedynamic instruction can determine that the matching method is present on the receiver type, and make the call. It can also cache (as is routine in JVMs) the winning type, and avoid the method lookup on subsequent calls.
If an unexpected receiver or argument ever shows up, the invokedynamic instruction notes the surprising types (or perhaps the adapter fails one of its checks) and the language runtime handler is invoked again, and produces a new adapter. Perhaps the JVM’s implementation of invokedynamic remembers both the new adapter and the previous one, or perhaps it only remembers the newest one. It’s a quality issue, not a correctness issue. In principle, the JVM could call the language’s runtime handler on every call, but that would be bad form.
In the worst case, every single call is unique and has to be handled with a slow lookup through the language’s metaobjects. In such a case, the adapter passed back down to invokedynamic is probably something excruciatingly general, like the current generation of implementations. It is so general it never again calls up to the runtime handler, but is prepared to handle all calls on its own account. No optimizer will touch it. (The VM equivalent to this is called a “megamorphic call site”.)
http://pec.dev.java.net/nonav/compile/javadoc/pec/compile/multipledispatch/package-summary.html