JSR 292 Deep Dive Notes

John Rose

5/09/2007, 2:00-3:00, Westin Civic CR


  • John Rose (host, Hotspot VM)

  • Oti Humbel, Charlie Groves, Tobias Ivarsson (Jython)

  • Thomas Enebo, Charles Nutter (JRuby)

John: JSR 292 needs insight into pain points implementing dynamic langs. Our basic thought: Method invocation in the JVM is too Java-specific. We probably need one more kind of invocation which is not type-bound, and allows systems to adapt to mismatches between caller and callee signatures. Other requirements may appear flowing from that basic fact.

Question: Are we missing any Big Idea for dynamic lang implementation?

Jython notes

  • generic call ops implemented on all PyObject (concrete)

  • callee looks at a sequence of desired types

  • Java objects are wrapped in Jython proxies

Problem: Emulating Python call frames. There is a “current frame” API, which is infrequently used but costs on every call. (Cost includes a ThreadLocal.get.)

Possible solution: Use JVM’s debugging interface to get a bytecode-level view, and have Jython’s byte compiler save away its own local variable mappings. Questions: Would this work in the current JVM? Where are the pain points? Do we need adjustments to the JVM functionality? Or is the current API OK?

(Note that the JVM already has “vframes” and deoptimization to manage the correspondences between optimized frames and the bytecode level VM model. Need some sort of user-level vframes or deoptimization requests? Hopefully not.)

JRuby notes

Implementation tricks:

  • numbers: can unbox inside a method, rebox on out-calls

  • selector-table-index (callee + selector index) allows switch-based method selection

  • java objects are wrapped in Ruby proxies

Cross-languge interoperability

  • Charles N.: ideally Python sees Ruby strings as Python strings

  • List (either Ruby-list or generic) as receiver of Ruby append request:

Pain points in method generation

Feels expensive and complex to wrap one method per class. (Sometimes we can batch several methods at once, but not always.) Charles N: want to be able to have O(10\*\*5) of small dynamically generated methods JRuby has a JIT which makes small classes => want small-scalling method loading

Current VM overheads for one-method classes:

  • class naming (garbage names clog symbol table)

  • class loader (to get GC-ability, need lots of these too)

Possible VM trick: Reuse generated methods by adding extra parameters and currying those. E.g., Object.get(int, extractList:{Object=>List}). The VM can simulate this with inner classes, or it might be worth building into the methods themselves, as they are called by invokedynamic.

Wrappers vs. no wrappers

Current implementations (Jython, JRuby, not Groovy) use wrapping in various places to manage variations in object format and calling sequence. Argument lists are wrapped in object arrays. Numbers are wrapped. System types (like List, String, Integer) are sometimes wrapped as “foreign pointers”. Stack frames are even wrapped! (To provide debuggability; very expensive.) Language implementors would prefer to avoid wrappers.

One other major need for wrappers: Languages usually have a “top type” so that all language operands can be verified as handling the common methods of the top type. (Ex: PyObject, which is a concrete class.) An invokedynamic opcode can relax this requirement, allowing a more Groovy-like direct treatment of foreign objects.

Some unwrapping is already done in language implementations:

  • Jython unwraps argument lists by using a schema of call1, call2, etc.

  • JRuby and Rhino unwrap fixnums by doubling arguments.

  • Can unwrap values inside the compilation of a single method.

  • Groovy unwraps routinely, uses a marker interface for MOP extensions.

Disadvantages of wrappers:

  • inefficient: extra indirections & type dispatches

  • hard for the JIT to unwrap (therefore harder to do classic C-style optimizations)

  • complex; each language has to re-invent a full suite

  • interoperability requires common wrappers or re-wrapping (real use case: Jython and Rhino)

Dynamic languages cannot abandon all wrappers, because there usually must be a fallback to a fully general calling sequence.

Long-term goal is to be wrapperless. (Compare early Smalltalk and Java implementations with “handle tables”.)

Hard parts to emulate with wrapperless objects

  • JRuby, Jython can add a method to an object (method added to new metaclass, wrapper metaclass changed)

  • some languages need to freeze objectc (without changing object identity?!)

  • some need to taint values (even scalars, but an identity change is OK for those)

Handling numbers

Tricky bits:

  • generic arithmetic seems to require multiple dispatch

  • (or, cascaded single dispatch with anonymous intermediate selectors)

  • want fast paths for int-friendly operations like List.get, (x+1)

  • overflow semantics differ: go to long, double, BigInteger, etc.

  • need slow paths to prepare for rare events like overflow and unexpected operand types (e.g., taints)

Optimization goal: Make small integers (indexes) look like ints to the JIT. JVM ints get optimized better: range check elimination, subtype analysis, loop unrolling, etc. All primitive types get preferred treatment with native CPU registers and instructions.

View from the VM and JIT

(This is according to John, who is looking for comments and corrections.)

We want invokedynamic (like other invokes) to be as direct as possible in more than 99% of the calls. This allows your typical good JIT to do its tricks of optimistic receiver prediction, procedure integration, and (then) cross-call optimization.

Typical scenario: First use of invokedynamic instruction has an optimistically assigned caller signature. In the 80% case, this matches a signature directly on the receiver object and the call passes type checks. (Why? Because compile-time signatures are carefully chosen, base on actual usage patterns. Hopefully not just because great programmers are lucky.) In that 80% case, no further efforts are needed; the VM checks a class or two and branches directly.

In the 20% case, the invokedynamic instruction has to do something like dynamic linking. Each language has already planted a handler on its own call sites, probably a class-valued per-class attribute. The handler is invoked, and the dynamic lang runtime does something slow and complicated. It matches up the caller’s intended signature and the dynamic types of the arguments with the callee’s capabilities. The runtime handler then comes up with an adapter that accepts the caller’s signature, but then checks, shuffles, reformats, and coerces argument types to the callee’s preferred entry point signature. The adapter is passed down (as a sort of method pointer TBD) to invokedynamic, which remembers it. The call goes through, and so do the next 1000 calls, using the same adapter every time. When the VM’s JIT kicks in, it integrates the adapter along with all the other bytecodes.

Note that the caller can really win if it guesses rightly which arguments are strongly typed, e.g., as unwrapped ints. In the case of an out-call to a Java object, the dynamic language compiler can often guess the right type. (Maybe List.get(int).) In such cases, the caller’s signature and callee’s signature agree, and the only missing bit is a type check on the receiver. The invokedynamic instruction can determine that the matching method is present on the receiver type, and make the call. It can also cache (as is routine in JVMs) the winning type, and avoid the method lookup on subsequent calls.

If an unexpected receiver or argument ever shows up, the invokedynamic instruction notes the surprising types (or perhaps the adapter fails one of its checks) and the language runtime handler is invoked again, and produces a new adapter. Perhaps the JVM’s implementation of invokedynamic remembers both the new adapter and the previous one, or perhaps it only remembers the newest one. It’s a quality issue, not a correctness issue. In principle, the JVM could call the language’s runtime handler on every call, but that would be bad form.

In the worst case, every single call is unique and has to be handled with a slow lookup through the language’s metaobjects. In such a case, the adapter passed back down to invokedynamic is probably something excruciatingly general, like the current generation of implementations. It is so general it never again calls up to the runtime handler, but is prepared to handle all calls on its own account. No optimizer will touch it. (The VM equivalent to this is called a “megamorphic call site”.)

Join the discussion

Comments ( 1 )
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.