A Day with PyPy

Yesterday, Charles Nutter and I played host to a roving band of PyPy researchers. Their aim is to get people interested in their new techniques in dynamic language compilation, and they succeeded with us. I had a special goal to figure out how the JVM might be useful in their world.

Here are some bits from the conversation that stick in my memory...

When dynamic language implementors say “C speed” they are talking about the speed of the original C-coded interpreters of their language. The original technology is almost always an AST-walking interpreter, which is about the slowest way to go about it. By contrast, when JVM implementors say “C speed”, they are comparing a Java program with a transliterated C or C++ program, compiled, optimized, and statically linked. Those two standards are separated by two or three orders of magnitude.

The PyPy idea of promotion includes the idea of capturing runtime values for use by the compiler, by lazily growing a PIC (polymorphic inline cache). A PIC is a dynamic call site (or other branch point) which directs control flow based on selectors (often class IDs). It also keeps a record of those selectors as they appear. PyPy characterizes this as a “growable switch”. As each new selector appears, the PIC grows another branch edge, to lazily linked code which is appropriate to that selector. This is more or less classic. (HotSpot relies on interpreter profiles more than PICs, but most implementations of the forthcoming invokedynamic instruction will probably create PICs.)

Also, PyPy promotion involves lazy compilation, even at the basic block level. The code “appropriate” to a selector might be immediately linkable, or it might not yet exist. In the latter case the JIT creates the code, potentially in the full type and IR context of the caller, with the particular selector. That is how it is promoted: The JIT IR downstream of the switch can “see” the selector value as a compile-time constant.

Moreover, the generation does not stop with the end of the callee method (assuming the PIC was for a method call), but it continues with the next statement of the caller. In this way, the promoted selector value is available downstream of the call. The compilation is not just of the method call, but of the caller’s whole continuation from the point of the call. I think I want to call this kind of JIT a continuation JIT, because around here JITs do not use continuation passing style. (Other VMs do this—kudos.)

The cost of all this is frequent JIT runs, potential exponential code blow-up, and some trouble getting loops right. Another problem is the perennial trade-off between thoroughness and agility: The PyPy JIT runs most naturally on small pieces of code, with a correspondingly small window of context for the optimizer. To get global optimizations (such as loop transformation that HotSpot provides) PyPy code will need a second tier of recompilation, once the dynamic flow graph has settled down a bit. All in all, it is a very promising approach.

Since PyPy PICs do not return anywhere, the underlying platform really needs to supply a reliable tail-call optimization (TCO). We found some workarounds for the current set of JVMs, but the lack of TCO gets in the way of directly optimizing code produced by a continuation JIT... or a trace-based JIT.

PyPy promotion does not apply only to the class ID of a message receiver (as with classic PICs). A promoted switch selector can be a byte loaded from a bytecode stream, which is the way they partial-evaluate their interpreter loop into a templating JIT. A selector could also be composed from two or more receiver classes, which is how one could do CLOS-style multiple dispatch (for generic binary arithmetic, etc.).

We talked about method replacement, especially the kind where a method is replaced by a better-optimized version. The invokedyanmic instruction, combined with method handles, will provide some hooks for this. The HotSpot JVM can rewrite groups of stack frames (in both directions: optimizing and deoptimizing). We realized that first-class continuations in the JVM would allow Java code (in a dynamic language runtime) to express such transformations effectively. (Not surprising in hindsight, since my prototype version of call/cc on the JVM uses vframes, which is the mechanism HotSpot uses for deoptimization.) First-class continuations could also help with stack inspection operations and coroutines.

We talked about class splitting, in two cases: boxed int and Dictionary. In the latter case, you want a special system-designed subclass for (say) string keys only. In the former case, you want a special system-supported representation for small integer values which could be packed into an immediate field in a tagged pseudo-pointer.

The intermediate representation is a graph parsed from an RPython (restricted Python) program. RPython is a true subset of Python, but one which avoids the really dynamic features. To a first approximation, it is the part of Python with Java semantics: It can be statically typed. Another restriction is that its world of types is closed, so the compiler can enumerate the type hierarchy statically. (So-called application classes, loaded dynamically need more metadata, an extra level of indirection away.)

Like any fully dynamic language, the RPython optimizer performs scalarization pretty eagerly. (Or if you like, it performs boxing lazily and unwillingly.) This is needed for unboxing the normal Python integer type, and also for decomposing things like argument lists. It doesn’t work so well (yet) outside the closed world of RPython.

Why does RPython makes a good intermediate language? Partly because it is block-structured, typeable, easy to program in, has nice types like lists, and is a true subset of a rich parent language (Python). So far, Java is like that also. Probably Python would support little languages for specialized problem spaces like pattern matching or template generation, but perhaps the PyPy people feel that would be a programming luxury they cannot afford right now.

The main edge RPython has over Java as an intermediate language is its complete dynamic typing, which allows the same code to work on compile-time and run-time representations of objects. (It sounded like abstract interpretation to me, but they say it is more a species of partial evaluation.) There is a compile-time object space which mirrors the regular (R)Python object space.

I would think that doing abstract interpretation over compile-time booleans would work best when the if/then/else statements, etc., are represented with functions and closures. Then, as Yogi Berra said, “When you come to a fork in the road, take it.” A side effect of each branch would be generating the IR for that branch, with the boolean promoted to a constant. One of the trade-offs in RPython, though, is to eliminate closures. This simplifies variable analysis.

They were excited to hear about our nascent multi-language VM project. I expect we can collaborate there on some experiments with invokedynamic.

All in all, it was an interesting and enjoyable day.

Comments:

Do sun has any plans about python in jvm?
Sun hires jruby developers. What about Jython?

Posted by mef on November 13, 2007 at 11:12 PM PST #

"When dynamic language implementors say “C speed” they are talking about the speed of the original C-coded interpreters of their language. The original technology is almost always an AST-walking interpreter, which is about the slowest way to go about it."

"C speed" may have this meaning in the Ruby world, with thoughts of an interpreter done in the style of the old days of Tcl, but if you ask general Perl or Python people what "C speed" means, they'll tell you it is (or would be) the speed of their code rewritten in C, just like those JVM implementers.

Of course, some dynamic language implementers may compare their efforts to the speed of existing virtual machines (which typically aren't AST visitors), but it'd surprise me if they really regarded those VMs as running at "C speed", or (more correctly) running \*programs\* at "C speed", especially given the existence of a variety of compiler tools which convert restricted subsets of those languages to C or C++, such as RPython (which you mention) and ShedSkin.

Posted by Paul Boddie on November 14, 2007 at 11:40 PM PST #

Yes, please, first class continuations and first class closures of indefinite extent would be great. Closures of dynamic extent, if you can verify bytecode safety, or invalidate them cheaply at runtime as necessary, would also rock (such closures are good enough for depth-first backtracking algorithms and require only stack allocation).

Posted by Nico on November 15, 2007 at 01:31 AM PST #

For some time now I was under the impression that Sun has decided to neglect Python/Jython and its community in favor of Ruby/JRuby. This impression is also influenced by the lack of any Python/Jython support in Netbeans 6. I hope that Sun will be wiser and embrace the Python community as it has done with Ruby - there can be a lot of benefits for both sides.

Posted by Ze'ev on November 19, 2007 at 05:39 PM PST #

John, could you say more about your call/cc prototype, vframes and first class continuations for the JVM?

Posted by Steven Shaw on February 06, 2008 at 12:27 PM PST #

Dear "Mef," C-speed means to dynamic language implementors the same things it means to everyone else. You are confused. Yours truly, ...

Posted by Donny Viszneki on February 02, 2009 at 02:39 AM PST #

Post a Comment:
  • HTML Syntax: NOT allowed
About

John R. Rose

Java maven, HotSpot developer, Mac user, Scheme refugee.

Once Sun and present Oracle engineer.

Search

Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today