Recent Posts


two thoughts about career excellence

I love Dickens, warts and all. Sometimes he is sententious, and (like the mediocre modern I am) at such points I am willing to listen non-ironically. This bit here struck me hard enough to stop and write it down:I mean a man whose hopes and aims may sometimes lie (as most men's sometimes do, I dare say) above the ordinary level, but to whom the ordinary level will be high enough after all if it should prove to be a way of usefulness and good service leading to no other. All generous spirits are ambitious, I suppose, but the ambition that calmly trusts itself to such a road, instead of spasmodically trying to fly over it, is of the kind I care for. It is Woodcourt's kind.(John Jarndyce to Esther Summerson, Bleak House, ch. 60)Woodcourt is, of course, one of the heroes of the story. It is a heroism that is attractive to me.Here is a similar idea, from the Screwtape Letters. In the satirically inverted logic of that book, the “Enemy” is God, the enemy of the devils but the author of good:The Enemy wants to bring the man to a state of mind in which he could design the best cathedral in the world, and know it to be the best, and rejoice in the, fact, without being any more (or less) or otherwise glad at having done it than he would be if it had been done by another.(C.S. Lewis, Screwtape Letters, ch. 14)Though I will be happy with a good Bazaar, I also dream of Cathedrals. Put whatever name you like on it, as long as I get some part in the fun of building a good one.

I love Dickens, warts and all. Sometimes he is sententious, and (like the mediocre modern I am) at such points I am willing to listen non-ironically. This bit here struck me hard enough to stop and...


the isthmus in the VM

body {font-family: Times New Roman; font-size: 14.0px} code {font-family: Courier New} .smaller {font-size: 85%} h3 {font: inherit; font-weight:bold} pre {margin-left: 1.5em; padding: 1.5em; background: #eee; width: 36em} h3, ol, ul, p, pre, blockquote {margin-top: 1.3ex; margin-bottom: 1.3ex} h3 {padding-top: 1.2ex}This is a good time to consider new options for a “native interconnect”between code managed by the JVM and APIs for libraries not managed bythe JVM.Notably, Charles Nutter has followed up on hisJVM Language Summit talk (video on this page)by proposing JEP 191,to provide a new foreign function interface for Java.To access native data formats (and/or native-like ones inside the JVM),there are several projects under wayincluding David Chase’s data layout package,Marcel Mitran’s packed object proposal,and Gil Tene’s object layout project.This article describes some of the many questions related to nativeinterconnect, along with some approaches for solving them.We will start Project Panama in OpenJDK to air out these questionsthoroughly, and do some serious engineering to address them, for the JDK.Let us use the term native interconnect for connections betweenthe JVM and “native” libraries and their APIs.By “native” libraries I simply mean those routinely used byprogrammers of statically compiled languages outside the JVM.the big goalI think the general, basic, idealistic goal is something like this:If non-Java programmers find some library useful and easy toaccess, it should be similarly accessible to Java programmers.That ideal is easy to state but hard to carry out.The fundamental reason is simple—the languages are different.C++ programmers use the #include statement for pulling in APIs,but it would be deeply misguided to try to add #includes tothe Java language.For more details on how language differences affectinterconnection, see the discussion below.Happily, this is not completely new ground, since managed languages(including Lisp, Smalltalk, Haskell, Python, Lua, and more)have a rich history of support for native interconnect.Most subtly, even if all the superficial differences could beadjusted, the rules for safe and secure usage of Java differfrom those of the native, statically-compiled languages.There is a range of choices for ensuring that a nativelibrary gets safely used. The main two requirements are tomake VM-damaging errors very rare,and (as a corollary) to make intentional attacks very difficult.We will get into more details below.Besides safety, Java has a distinctive constellation of“cultural” values and practices,notably the features which provide safety and error management.So, the access to C APIs must be be adapted to the client language(Java) by means of numerous delicate compromises and engineeringchoices to preserve not only the “look and feel” of Java expressionsbut also their deeper cultural norms.By using the metaphor of culture, I don’t imagine a “Java way of life”,but I observe that there are “Java ways” of coding, which differinterestingly from other ways of coding.Cultural awareness becomes salient when cultures meet and mix.Anyway, to get this done, we need to build a number of differentartifacts, including Java libraries, JVM supportmechanisms, tools, and format specifications.A number of possibilities are enumerated below.why this is difficultFirst, let’s survey some of the main challenges to full native interconnect.syntax: Since the languages differ, Java user code for a nativeAPI will differ in syntax from the corresponding native user code,sometimes surprisingly.For example, Java 8 lambdas are very different in detail from Cfunction pointers, although they sometimes have corresponding uses.Java has no general notions corresponding to C macros or C++ templates.naming: Different languages have different rules for identifierformation, API scoping (packages vs. namespaces), and API element naming.Languages even have differing kinds of names: Java has distinct namespaces for fields and methods, while C++ has just members.data types: Basic data types differ.Booleans, characters, strings, arrays differ between the languagesC++ uses pointers, sometimes for information hiding, sometimes forstructurally transparent data. Java uses managed references, whichalways have some hidden structure (the object header). And so on.A user-friendly Java interconnect to a native API needs to adjustthe types of API arguments and return values to reduce surprises.storage management: Many native libraries operate throughpointers to memory, and they provide rules for managing that memory’slifetime.Java and native languages have very distinct tactics for this.Java uses garbage collection and C++ libraries usually require manualstorage management.Even if C++ were to add garbage collection, the details wouldprobably be difficult to reconcile.A safe Java interconnect to a native API needs to manage native storagein a way that cannot crash the JVM.exceptions: As with storage management, languages differ inhow they handle error conditions. C++ and Java both have exceptions,but they are used (and behave) in very different ways.For example, C++ does not mandate null pointer exceptions.C APIs sometimes require ad hoc polling for errors.A user-friendly Java interconnect to a native API needs a clear storyfor producing exceptions, which is somehow derived from thenative library’s notion of error reporting.other semantics: Java’s strings are persistent (used to be called“immutable”) while C’s strings are directly addressable character arrayswhich can sometimes change. (And C++ strings are yet another thing.)performance: Code which uses Java primitives performs on a parcorresponding C code, but if an API exchanges information using othertypes, including strings, boxing or copying can cause performance“potholes”. I expect that value types will narrow the gap eventuallyfor other C types, but they are not here yet.safety: I’m putting this last, but it is the most difficult andimportant thing to get right. It deserves its own list of issues,but the gist of it is the JVM as a whole must continue to operatecorrectly even in the face of errors or abuse of any single API.The next section examines this requirement in detail.safety firstThe JVM as a whole must continue to operate correctly whennative APIs are in use by various kinds of users.no attacks from untrusted code: Untrusted codemust not be allowed to subvert the correct operation of the JVM,even if it makes very unusual requests of native APIs available to it.This implies that many native APIs must be made inaccessible to untrustedcode.no privilege escalation from untrusted code: Untrusted usersshould not be able to access files, resources, or Java APIs vianative APIs, if they would not already have access to them via Java code.no crashes: It must be difficult for ordinary user code, andimpossible for untrusted code, to crash the JVM using using a nativeAPI. Native API calls which might lead to unpredictable behaviormust be detected and prevented in Java code, preferably by throwingexceptions. Pointers to native memory must be checked for nullbefore all use, and discarded (e.g., set to null) when freed.no leaks: It must be difficult or impossible for ordinary usercode to use a native API to use memory or other system resources in away that they cannot be recovered when the user code exits.Native resources must be used in a manner that is scopedno hangs: It must be difficult or impossible for ordinary userto cause deadlocks or long pauses in system execution.Pauses for JVM housekeeping, like garbage collection, must not benoticeably lengthened because of waits for threads running native code.rare outages: Even if code is partially or fully trusted, errorsthat might lead to crashes, leaks, or hangs must be detected beforethey cause the outage, almost always.no unguarded casts: If privileged Java code must use cast-likeoperators to adjust its view of native data or functions, the castingmust be done only after some kind of check has proven that the castwill be valid.This implies that native data and functions must be accessed throughJava APIs that fully describe the native APIs and can mechanically checktheir use.From these observations, it is evident that there are at least three trustlevels that are relevant to native interconnect: untrusted, normal,and privileged.Java enforces configurable security policies on untrusted code, usingAPIs like the security manager. This ensures that untrusted codecannot break the system (or elevate privileges) even if APIs areabused.Normal code is the sort of code which can run in a JVM without asecurity manager set. Such code might be able to damage the JVM,using APIs like sun.misc.Unsafe, but will not do so by accident.As a practical way to reduce risk, we can search normal codefor risky operations, which should be isolated, and review their usefor safety.I think many of the tricky details of native interconnect are relatedto this concept of privileged code. Any system like the JVM thatenforces safety invariants or access restrictions has trusted,privileged code that performs unsafe or all-access operations,such as file system access, on behalf of other kinds of code.Put another way, privileged code is expected to be in the risky business.It is engineered with great care to conform to safety and security policies.It supports requests from non-privileged code—even untrusted code—afteraccess checks on behalf of the requester.Privileged code needs maximum access to native APIs of theunderlying system, and must use them in a way that does not propagatethat access to other requesters.engineering privileged wrapper codeIn the present discussion, we can identify at least two levels of bindingfrom Java code to native APIs: a privileged “raw access” to most orall API features, and a wrapped access that provides safety guaranteesthat match the cultural expectation of Java programmers.So let’s examine the process of engineering the wrapper codethat stands between normal Java users and native APIs.In current implementations of the JDK, native APIs are wrappedin hand-written JNI wrapper code, written in C.In particular, all C function calls are initiated from JNI wrappers.(There is plenty of other privileged code written both in Java and C++.Much Java code in packages under java.lang and sun is privilegedin some way. Most of it is not relevant to the present subject.)Ideally, wrapper code should be constructed or checked mechanically when possible.In the present system, the javah tool assists, slightly, in bridging betweenJava APIs and JNI code. JNI wrapper code is checked by the native C compiler.And that is about all. Surely Java-centered tools could do more.On the other hand, as we saw above, bringing the languages togetheris hard.No tool can erase the cultural differences between Java andnative languages. There will always be ad hoc adjustment to reduce orremove hazards from native APIs. Such adjustments will usually beengineered by hand in privileged code, as they are today in JNIwrapper code.We must ask ourselves, why bother to build new mechanisms fornative interconnect when JNI wrappers already do the job?If manual coding will always be required, perhaps it is better to dothe coding in the native language, where (obviously) the native APIsare most handy. In that case, there would be no need for Javacode ever to perform unsafe operations. Isn’t this desirable?I think the general answer is that we can improve on the trade-offs providedby the present set of tools and procedures. Specifically, by using moreJava-centered tools and procedures, we can improve performance.Independently of performance, we can also decrease the engineeringcosts of safety.better performance without compromising safetySafety will always trade against performance, but—as Java has provenover its lifetime—it is possible with care to formulate and optimizesafety checks that do not interfere unacceptably with performance.Classic JNI performance is relatively poor, and some of the reasonsare inherent in its design. JNI wrappers are created and maintainedby hand, which means that the JVM cannot “see into” them foroptimizing them.If the JNI wrappers were recoded in Java (or some other transparent representation)then the JVM could much better optimize the enforcement of safety checks.For example, a program containing many JNI calls could be reorganized asone which grouped the required safety checks (and other housekeeping)into a smaller number of common blocks of code.These blocks could then be optimized, amortizing the cost of safetychecks across many JNI calls.Analogous optimizations of lock coarsening or boxing elimination arepossible because all the operations are fully transparent to the JVM.By comparison, there is much unnecessary overhead around native calls today.This sort of optimization is routine when the thing being called can bebroken down into analyzable parts by the JIT compiler.But C-coded JNI wrappers are totally opaque to it.The same is currently true of the wrappers created by JNR,but they are regular enough in structure that the JIT canbegin to optimize them.In my opinion, a good goal is to continue opening up therepresentation of native API calls until the optimizedJIT code for a native API call is, well, optimal.That is, it can and should consist of a direct callto the native API, surrounded by a modest amount of housekeeping,and all inlined and optimized with the client Java code.Making this happen in the compiler will require certain designadjustments. Specifically, the metadata for the native APImust be provided in a form suitable for both the JVM interpreterand compiler.More precisely, it must support both execution by the JVM interpreterand/or first-level JIT, and also optimizing compilation by the full JIT.This implies that the native API metadata must contain some of thesame kind of information about function and data shape that a C compileruses to compile calls within C code.lower engineering costs for safetyI also think that coding more wrapper logic in Java instead of C willprovide more correctness at a lower engineering cost.Although wrapper code in C has the advantage of direct accessto native APIs, the code itself is difficult to write and to review forcorrectness.C programmers can create errors such as unsafe casts in a fewbenign-looking keystrokes.C-oriented tools can flag potential errors, but they are not designedto enforce Java safety norms.If direct access to C APIs were available to Java code, all otheraspects of wrapper engineering would be simpler and easier toverify as correct.Java code is safer and more verifiable than C code.If written by hand, it is often more compact and simple thancorresponding C code.Routine aspects of wrapper engineering could be specified declaratively,using specialized tools to generate Java code or bytecode automatically.Whether Java wrapper code is created manually or automatically, it issubject to layers of safety checking (verifying and dynamic linking)that C code does not enjoy.And Java code (both source files and class files) can be easily inspectedby tools such FindBugs.The strength of such an automated approach can be seen in thework noted by JEP 191, the excellent JNR project.For a quick look at a “hello world” type example from JNR,see Getpid.java.Although the emphasis on JNR is on function calling,integrated native interconnect to functions, data, and typesis also possible. Side note:My personal favorite example of automated language integrationis an old project that integrated C++ and Schemeon Solaris.The native interconnect was strong enough in that system toallow full interactive exploration of C++ APIs using the Schemeinterpreter. That was fun.One way we can improve on the safe use of these prior technologies isto provide more mechanical infrastructure for reasoning about thesafety of Java application components.It should be possible to create wrapper libraries that internally useunsafe native APIs but reliably block their users from accessing those APIs.To me this feels like a module system design problem.In any case, it must be possible to correctly label, track, review,and control both unsafe code and the wrapper code that secures it.wrapper tacticsA likely advantage of Java-based wrappers is easier access to goodengineering tactics for wrapping native APIs.Here are a few examples of such tactics:exception conversion: Error reporting conventions specificto native languages or APIs can be converted to Java exceptions.pointer handles: Native pointers which can or must be freedcan be stored in Java wrapper objects which nullify the savedpointer when it is freed, and check for this state as needed.wrapper objects: Native data can be encapsulated inside Javaobjects to mediate access by providing a safe view.The object can use an internal handle field to manage native lifetime.(Future wrapper values: In cases where stateless wrappers cando the job, value types are likely to provide provide cheaperencapsulation in the future. This would be the case with primitivetypes not in Java, such as unsigned long or platform specific vectors.When native lifetime is not an issue, value types could alsoprovide encapsulating views of native pointers, structs, and arrays.)resource scoping: APIs which require critical sections or pairedprimitives can be mapped to the Java try-with-resources syntaxor refactored into a callback driven style (using lambdas).language feature mapping: Corresponding types and operationscan usually be mapped according to simple conventional rules.For example, a C char* can usually be represented by a JavaString object at an API boundary.(But, these mappings must be tunable on a case-by-case basis.)static typing: The Java type system can represent a widevariety of type shapes.design rule checking:Ad hoc usage rules for native APIs can be enforced as executableassertions in code wrapped around the unchecked native API.interfaces: Every transfer of control or data into or out ofa native API can (and should) be mediated through a Java interface.In this way fully abstract API shapes can be presented directly to the(unprivileged) end user without exposing sensitive implementations.Most of these tactics can be made automatic or semi-automaticwithin a code generation tool, and apply routinely unless manually disabled.This will further reduce the need for tricky hand-maintained code.Interfaces are particularly useful for expressing groups of methods,since they express (mostly) pure behavior rather than Java objectimplementation.Also, interfaces are easy to compose and adapt, allowing flexibleapplication of many of the above tactics.As used to represent an extracted native API, an interfacewould be unique to that API. Uses of such interfaces would tendto be in one-to-one correspondence with their implementations.In that case JVMs are routinely able to remove the overhead ofmethod selection and invocation by inlining the only relevantimplementation.questions to answer, artifacts to buildA native interconnect story will supply answers to a number of related questions:How do we simplify the user experience for Java programmers who use C and C++ APIs?(The benchmark is the corresponding experiences of C and C++ programmers,as well as the experiences of today’s JNI programmers.)What appropriate tools, APIs, and data formats support these experiences?Specifically, how is API metadata produced, stored, loaded, and used?How are native libraries named and loaded?What appropriate JVM and JDK infrastructure works with native API elements(layouts, functions, etc.) from Java code (interpreter and JIT)?How performant are calls and data access to native libraries?(Again, the benchmark is the corresponding experiences of C and C++ programmers,as well as the experiences of today’s JNI programmers.)enjoyed by their primary users (programmers of C, C++, Fortran, etc.).What are the definite, reliable safety levels available for usingnative libraries from Java?This includes the question: What is the range of options betweenautomatic, perhaps unsafe import, and engineered hand-adjustments?What are the options for managing portability?This includes the use of platform-specific libraries,and a story for switching between platform-specific bindingsand portable backup implementations.Answering these questions affirmatively will require us to build someinteresting technology, including discrete and separable projectsto enable these functions:native function calling from JVM (C, C++)native data access from JVM or inside JVM heapnew data layouts in JVM heapnative metadata definition for JVMheader file API extraction tools (see below)native library management APIsnative-oriented interpreter and runtime “hooks”class and method resolution “hooks”native-oriented JIT optimizationstooling or wrapper interposition for safetyexploratory work with difficult-to-integrate native librariesProject Panama in OpenJDK will provide a venue for exploringthese projects.Some of them will be closely aligned with OpenJDK JEPs,notably JEP 191,allowing the Project to incubate early work on them.Other inspiration and/or implementation starting points include:the Java Native Runtime package and the libffi native call binderJava data layout packagesJVM support for new layouts (IBM packed objects, Sun Labs Maxine hybrids, Arrays 2.0)metadata-based native API extractors (WinRT metadata)existing JVM infrastructure (class files, SA, JNI, sun.misc.Unsafe)A native header file import tool scans C or C++ header filesand provides raw native bindings for privileged Java code.Such tools exist already for other languages, and can get colorfulnames like SWIG or Groveller.For the present purposes, I suggest a simpler name like jextract.A high-quality implementation for Java could start with anoff-the-shelf front end like libclang.It would apply Java-oriented rules (with hand-tunable defaults)and produce some form of metadata, such as loadable class files.A toolchain that embodies many of these ideas could look something like this: /-----------| /-----------|| stdio.h | | stdio.java ||------------| |------------| | | v ||------------| || jextract | <-----/|------------| | v /-----------|| stdio.jar | /------------||------------| | userapp.jar| | |------------| v ||------------| || jvm | <--------/ /---------|| | <--------------| libc.dll ||------------| |----------|The stdio.java file would contain hand-written adjustments to the raw API from the header file.The stdio.jar file would contain automatically gathered metadata from the header file,plus the results of compiling stdio.java.The contents of stdio.java could be straight Java code for the user-level API,but could also be annotations to be expanded by a code generation step in the extraction process.The code in userapp.jar would access the features it needs from stdio.jar.The implementations of these interfaces would avoid C code as much as possible,so that the JVM’s JIT can optimize them suitably. Side note:The familiar header file I am picking on is actually unlikely to need this full treatment.In a more typical case, a whole suite of header files would be extracted and wrapped.For bootstrapping or pure interpretation, a minimum set of trustedprimitives are required in the JVM to perform data access and function call.these would be coded in C and also known to the JIT as intrinsics.They can be made general enough to implement once in the JVM, rather thanloaded (as JNI wrappers are loaded today) separately for each native API.For example, JNR uses a set of less than 100 speciallydesigned JNI methods to perform all native calls; these methods arecollectively called jffi.Building such toolchains will allow cheaper, faster commerce between Javaapplications and native APIs, much as the famous Panama Canal cuts throughthe rocky isthmus that separates the Atlantic and Pacific Oceans.Let’s keep digging.Appendix: preserving Java cultureLet’s go back to the metaphor of culture as it applies to the worldof Java programming.Here is a list of benefits about Java that programmers rely on, whichany design for native interconnect must preserve.As a group, these features support a set of basic programming practicesand styles which allow programmers great freedom to create good code.They can be viewed as the basis of a programming “culture”, peculiarto Java, which fosters safe, useful, performant, maintainable code. Side note:This list contains many truisms and will be unsurprising to Java users.Remember that culture is often overlooked until two cultures meet.I am writing this list in hopes it will prove useful as a checklist tohelp analyze design problems with native interconnect, and to evaluatesolutions.Also, I am claiming that the sum total of these items underlies a uniqueprogramming culture or ecosystem to Java, but not that they are individuallyunique to Java.basic type safety: Pointers, integers, and floats must not be confused;conversions must be explicit and must preserve VM integrity.This applies to values of all kinds, in memory and elsewhere.basic operation safety: Any basic VM operation either completesaccording its specification, or produces a catchable exception.It cannot corrupt memory or any other VM state.class safety: Pointer conversions must be explicit and checked.There are exceptions for conversion to a Java superclass (which is always safe),to a Java interface (which is always checked later at any use point),and to an erased generic type (which is checked implicitly).storage lifetime safety: No block of memory can be accessed after it has beendeallocated. This is why we have automatic storage management.variable domain safety: There is no way to obtain “garbage” orindeterminately initialized values of any type (especially pointers, of course).API type checking: Every use of an API, such as a method call, is fullytype-consistent with its definition (such has a method definition).This requirement serves the earlier ones, of course; it shows upin detail in the operation of Java’s dynamic linkage rules.late linking: All uses of names, including class, method, andfield names, are resolved and access-checked not only at compiletime but also at run time. Separately compiled modules (classes)cannot observe the implementation details of other modules.concurrency safety: Race conditions between threads can beprevented, or their effects can be predicted usefully,or (at worst) they cannot violate the other safety invariants.error manifestation: Exceptional or erroneous conditionsare not discarded. They are manifested as thrown exceptions,which will be caught and/or displayed.access control: Non-public or otherwise restricted API pointscannot be accessed except by their specified users.Access is enforced at all phases of compilation and execution.System internals cannot be touched except by highly trusted code.appropriately concise: Typically, Java code does not pay forany of Java’s built-in safety features by unnecessary verbosity.Safe and sane practices are encouraged by simpler notations.The “semantic payload” of a bit of code is not obscured byany necessary ceremony. (But note next points.)predictably explicit: Typically, complex or potentiallyexpensive features of Java are made explicit by a visiblesyntax, such as a method call. (This point is in tensionwith the previous point, and reasonable people differ onthe proper resolution.)explicit types: Java code has reasonably strong static typing,with many types explicitly written in the source code.(Notably, declaration types are explicit on the left, despite typeinference elsewhere.)This feature catches errors early and gives IDEs helpful contextfor each name.transparent code: Programs are represented using bytecode,which automated tools can inspect, verify, and transform.User-written annotations can help guide these tasks.There are easy to use, open source implementations ofoffline processors for both source code and bytecode,as well as the VM itself. Multiple good IDEs exist.transparent data: Data can be inspected using reflectionand other ubiquitous self-description machinery such astoString and debuggers.(Transparency of data is balanced with access control, of course.)robust performance: With moderate programmer care and experience,simple single-threaded programs tend to not show surprisingperformance “potholes”, not even when they are composed together.Multi-threaded programs preserve and scale up throughput withadditional CPUs, in the absence of algorithmic bottlenecks.All of these benefits are familiar to Java programmers, perhapseven taken for granted.The corresponding benefits for a native language like C++ are oftenmore complex, and require more work and care from the nativeprogrammer to achieve.A good native interconnect story will provide ways to reliably disposeof this work and care before it gets to the end user coding Java to anative API.This requires native APIs to be acculturated to Java by theartful creation of wrapper code, as noted above.

This is a good time to consider new options for a “native interconnect” between code managed by the JVM and APIs for libraries not managed by the JVM. Notably, Charles Nutter has followed up on hisJVM...


value types and struct tearing

value types and struct tearing p {margin: 0.0px 0.0px 8.0px 0.0px; font: 14.0px Times} ol {margin: 0.0px 0.0px 8.0px 0.0px; font: 14.0px Times} ul {margin: 0.0px 0.0px 8.0px 0.0px; font: 14.0px Times} h3 {margin: 0.0px 0.0px 8.0px 0.0px; font: 14.0px Times} code {margin: 0.0px 0.0px 0.0px 0.0px; font: 14.0px Courier} span.smaller {font: 12.0px Times}This note explains how a notion of “value types” for the VMshould protect the integrity of that value type’s invariants,and points out important differences, in memory effects,between “struct-like” and “persistent” designs for values.First, the running exampleConsider this two-variable data structure, a pair of encapsulatedintegral coordinates, with a couple of access functions:class XY { private int x = 1, y = 0; public XY() { } // make a blank one public boolean step(int p, int q) { int x1 = x + p, y1 = y + q; if (x1 == 0 && y1 == 0) return false; x = x1; y = y1; return true; } public double measure() { // inverse radius return Math.pow((double)x*x + (double)y*y, -0.5); } public void copyFrom(XY that) { this.x = that.x; this.y = that.y; }}The reader function measure produces a real number, but will fail todo so if both values x, y are zero at the same time.The writer function step updates the two coordinates (incrementally,as if in a random walk) except when they would both be zero.The details are not very important; the main point is that the twovalues are mostly independent, but are coupled by an invariant thatcan be violated by an uncoordinated change to either value alone.Off to the racesWhat could go wrong? Why, nothing, if one of the followingconditions is satisfied:If the object is confined to one thread only.If multiple threads are updating the object, but some sort ofsynchronization is preventing calls to step and measure fromoverlapping.If there is some assurance that an XY cannot change and has beensafely published. Stashing a copy of the object is the basicidea; that is why we might need methods like copyFrom. Side note 1:Even a fully confined XY object could in principle be broken by anasynchronous interrupt which cancels an assignment to the y field,but such interrupts are not now part of the Java landscape. Side note 2:General descriptions of these sharing options, and more, may befound in Chapter 3 of Java Concurrency in Practice.The easy way to cover all the cases is to mark all the methodsas synchronized.This is the design of the old Java class java.util.Vector,which synchronizes all of its access functions.Since synchronization is usually not free, this design choicepushes a cost onto all users.This cost can sometimes be reduced by using transactions underthe hood, such as Intel supports, but since the languageand bytecode set do not directly express the transactions,there is always a possibility that the JVM will have to createa real critical section, so the optimization is not reliable.This is why many programmers (including those who designed newerstandard classes like ArrayList) omit the synchronizationand instead push the responsibility for locking onto the user.This allows users to balance the cost of locking against thedanger of races.The user of an ArrayList must take responsibility to avoid publishing areference to the list without proper mutual exclusion between threads.Often this is easy enough. If the data is shared, it is shared between threadsthat are programmed together under common design rules and tested well enoughto catch simple bugs.Insecurity complexThe responsibility of the user increases if the list will containsensitive data, such as parameters to a privileged operation.In that case, the programmer must ensure that a reference to the sharedlist cannot leak to uncontrolled code, even if the program is abusedin some way.Otherwise, an attacker may create or obtain a shared list reference,hand it to privileged code, and then at the same time mutate it.The mutations will appear to the victim code via data races.Most such mutations will be harmless, since the privileged codewill validity-check the arguments and throw an exception.But with enough attempts from the attacker, a race can happenwhich puts the list into a state which confuses the privilegedcode into doing something unpredicted—except by the attacker.This kind of weakness, known as a TOCTTOU bug, is a riskwhenever privilege checks are based on mutable data objects.It is important to note that marking a variable as private doesnot protect against inconsistent updates from data races.In the simple example above, where all the state is private,a data race can create a disallowed state as follows:final XY victim = new XY();Thread racer = new Thread() { public void run() { for (int i=0;i<1e6;i++) { victim.step(-1, 1); victim.step(1, -1); } } };racer.start();for (int i=0;i<1e6;i++) { assert(victim.measure() <= 1e12) : i;}This code can fail its assertion quickly, often in the first 100tries. The victim is observed in a state where both fields are zero.Locally valid updates to single fields of the victim can causethis violation.The sequence of updates that leads the victim to that disallowedstate, plus the existence of additional surprising states, isan exercise for the reader. Side note:Since measure reads each field twice, it is allowedto pick up different values for the two reads of one field.This is rare, because optimizers tend to merge the reads, but itcannot be excluded.Thus, inconsistent reads of the same variable can therefore bea source of bugs, and this is true even for one-field objects.We could call this a single-variable-double-read hazard.The invariant on XY fails because the two fields lose theirmutual coherence because of race conditions. Let’s call thisloss of coherence structure tearing, because it looks likethe victim has been torn in parts and reassembled from otherparts.In the analogous case of (non-volatile) 64-bit primitiveson old 32-bit JVMs, the 64-bit value can come apart intoindependent 32-bit halves, which can race apart.This is not exactly the same as “word tearing” butis close enough (I think) to merit the term.Persistence pays offIn our simple example, another way to protect the data is to formulateit using final variables, changing the design to be persistent. Side note:These days we are recycling the term “persistent” as analternative and refinement to the older term “immutable”.A problem with the older term is evident whenever you have aconversation that stalls on the audible similarity of phrases like“immutable object” and “a mutable object”. Mumblable “immutable” issemblable to “mutable”, say that ten times fast.class XY { final private int x, y; private XY(int x, int y) { this.x = x; this.y = y; } public static final BLANK = new XY(1, 0); public XY step(int p, int q) { int x1 = x + p, y1 = y + q; if (x1 == 0 && y1 == 0) return null; return new XY(x1, y1); } public double measure() { // inverse radius return Math.pow((double)x*x + (double)y*y, -0.5); } //XY copyFrom(XY that) { return that; } // no special copy}In contrast to this persistent version of XY, we can call the firstversion struct-like. Despite the encapsulation of the fields, itinteracts with memory like a C struct.The persistent version does not suffer from races on its x and yvariables.A persistent XY object state can be captured or published simply bymoving a reference.There only needs to be one public “blank” uninitalized value, and theconstructor itself is hidden inside the capsule.In exchange for this new stability, users must update their referenceswhenever they call the updater function step.Crucially, it is now impossible to observe an XY object in adisallowed state. The class has full control over its internalinvariants, even in the face of deviously attacking racer threads.There are still races possible, but they are on the references.And since updating a reference is one memory operation, thereis no issue of multiple fields parting ways.The single-variable-double-read bug can still happen, rarely.It seems there is always a cost for stability and security.In this case, the cost is allocating a new XY object foreach distinct state.This is required because the class insists on creating a new XYobject to represent each new position in the coordinate space.Put another way, the user is forbidden to use the optimizationof re-using XY objects to represent multiple values over time.This is a reasonable restriction, since that optimization canlead to race conditions, as described above. Side note:Since the XY constructor is private, the class might cachevalues like java.lang.Integer does, but this has its ownsometimes surprising costs, since the caching logic caninterfere with optimizations such as escape analysis.In any case, making the constructor private is a helpful move.Where flattery gets usBut, suppose the JVM had an optimization to flatten instancesof XY, in either the mutable or the persistent form. A flattenedinstance of XY would have both of its fields stored directlyin some containing object (or array). There would be no needto have a separate XY object on the heap to hold the fields.Of course, if references to XY were required for some uses,these references could be created temporarily and then discarded.Obviously, this sort of thing is what we are calling value types.In a nutshell, a value type encapsulates a group of component values,and can be efficiently stored in any type of Java variable.Would value types this provide significantly better options fordesigning XY and similar classes, than the current ones sketchedabove? The answer is a qualified “yes”.Perhaps the biggest advantage would be better use of memory.Overhead for the distinct XY object (header+padding) would vanish.And code would access the x and y fields directly withina containing object, using static offset arithmetic ratherthan a dynamic and cache-busting pointer chase.This is most evident if many pairs are stored in an array:Hardware can get clever about sequential accesses to thefields.The other main advantage would be better support formethods which pass and return composite values. A methodcould receive or return two or more values inside a singleflattened value “on the stack” (or in registers, usually).Of course, locals could also contain flattened values.So complex numbers and vectors get practical, along witha host of other small but useful types.There are more advantages that may accrue from value types,but this is not the place to go into the details.Memory vs. methodThe two main advantages—better use of memory and better methodtypes—are at some tension with each other. If value types areconceived as a memory layout mechanism only, then their primary usagewill be via a Java pointer. The flattened realization in registers(in method types) will be a fiction to be uneasily maintained, at thecost of confusing various JVM optimizations.On the other hand, value types will fail to deliver effective use ofmemory if they are conceived as only a clever way of loading registers(again, holding method arguments, locals, and return values).This is because their representation in other variables (fields andarray elements) must live in memory, and as such is sensitive toquality of memory layout.Naturally, we want both advantages, with good representations inboth register-based and memory-based variables.To the extent the system materializes Java references that pointto flattened values, those references must be easily and routinelyoptimized away. And such materialized references should be rare.Therefore, a good design for value types needs a primary representationthat is pointer-free, even if there is a secondary “boxed” representationwhich uses references.Climbing into the capsuleWe also want encapsulation, which means privacy of component fieldsand of methods. Any credible value type design for the JVM must allowa value type to restrict access and enforce invariants.But encapsulation is incomplete—and therefore a dangerousillusion—if it the protected invariants can be subverted by raceconditions, since those race conditions are available to anyone whocan perform a value type assignment.This brings us back to the defects of the struct-like programmingstyle for XY in the first example above. As seen above, the Javamemory model permits structure tearing, and requires the author ofa class to clearly state its contract, allowing the user to takedefensive action if the class does not adequately defend itselfagainst races.But I wish to use value types, someday, to contain security criticalvalues, such as 96-bit timestamps or encapsulated native pointers.The persistent-style design of String is integral to its usability,outside of any container, for carrying security critical values.Struct-like value types will, I think, be difficult or impossibleto secure to the same degree, because of their weaker encapsulation.They will have to be put inside a container to manage safe access(a bit like an ArrayList inside Collections.synchronizedList),and this will tend to cancel the advantages of flattening themin the first place.Assignment as an act of violenceThere are a few specific reasons the persistent design provides betterencapsulation and safer APIs than the struct-like design.First, since value types can be stored in all sorts of variables, notjust memory locations, it will be extremely common to assign them fromplace to place. Note that assignment is a JVM primitive (in thecurrent JVM design) and cannot be customized by a class.This is of no consequence for existing Java APIs, but if it is simplygeneralized to a racy componentwise copy (as in C), then everyassignment statement which operates on a value type will be at risk ofstructure tearing.But, Java programmers do not expect assignments to corruptany internal structure of the assigned quantity.(The sole exception of 64-bit primitives on 32-bit machinesis widely neglected.)Allowing a value type assignment to racily disturb the underlyingvalue’s invariants is likely to introduce a new and persistent familyof bugs to Java programs.Fixing this problem for struct-like types would require reifying theassignment operation as an explicit method which could then performsynchronization on both the source and destination of the copiedvalue. The code to do this would be complex and prone to errors anddeadlocks. In practice, designers of value types would punt on theproblem, pushing it (with wide-eyed trust and hope, doubtless) totheir users. Their users, meanwhile, would have to learn theconventional distinctions between safe and unsafe versions of types,for example choosing between FastString and ThreadSafeString.Publishing via finalsSecond, and more subtly, safe publication is one of the leastwell understood aspects of the Java memory model, but it is crucial tothe safe sharing of any non-persistent type. Safe publicationrequires either accurate synchronization on the object containing themutable state, or publishing the state via a final variable. Both ofthese options require levels of indirection which are convenientenough today, but would tend to vanish as data structures areflattened.The simplest (and thus safest) extension of safe publication patternsto value types is to declare that their component fields are final, sothat when a value is assigned, its components are automaticallypublished individually. This pattern comes out naturally frompersistent-style values, but must be imposed on struct-like values.In a sense, we are revisiting the old design decision in Java to makevariables mutable by default. Recall that blank final fields wereadded in Java 1.1, enabling persistent-style types, and even today wedon’t yet have frozen (immutable, persistent) arrays. It would beconsistent with the oldest versions of Java to define the newcomposite types to be mutable unless requested otherwise.But more modern forms of inter-thread communication havebeen created outside of that original model, avoiding synchronization and adding (via the JMM) safe publicationsemantics to final variables.Although it would be a stretch to say, “Java should have beendesigned with immutability as the default”, I believe thatconsistency with Java 1.0 conventions for mutability couldreasonably be traded away in order to gain modern thread safety.Persistence costsThose are the reasons why I think the persistent design patternis preferable over the struct-like pattern.That is true despite the difficulties of the persistent design,which I will describe next.First, the stability provided by final variables is not free.In the current JVM, they may require memory fences in constructors.In a JVM supporting persistent-style value types, every assignmentto a memory location (not to registers) may require a similarhandshake with the memory system, to ensure that writes of thecomponent values are safe publications.Correctness vs. throughput?Second, the atomicity provided by single-reference updatesgoes away if objects are flattened, and it must be recoveredsome other way.A store to a persistent-style value type must be all or nothing:It must ensure that otherthreads either see the whole store, or none of it.This will require another kind of handshake with the memorysystem, along the lines of the Intel transactions mentionedabove, or an atomic multi-word store instruction.Note that the example type XY fits in 64 bits, and sowould be supported at no extra cost by all 64-bit processors.Processors which provide larger atomic vector store instructionswill cheaply support any value that fits in their vectors.In the case of a jumbo value type which spills over multiple hardwarevectors and/or cache lines, the JVM’s software will need to performadditional handshakes, perhaps including old fashioned boxing.For sophisticated users who know what they are doing, anon-transactional store operation, with the possibility of structuretearing, needs to be provided.Because of tearing, the library would have to provide additionalinterlocks to ensure proper confinement, immutability, and so forth.In other words, the struct-like component-wise assignment operatorwill need to be made available as a privileged operation derived fromthe persistent pattern, and used only inside carefully designedconcurrency-safe libraries.This is the flip side of the necessity for providing persistentcontainers for struct-like values, for safe publication. The keyquestion, I think, is which should be the default (mutable orpersistent) and under what circumstances the non-default modebe supplied.Notation, notation, notationA third downside to the persistent style of values is notational.It seems there are times when you want to say “just change theimaginary component of this number to zero, please”.If a value type is willing to expose its components andaccept component-wise updates, it seems harsh to requirethe user to create a new value from scratch:Complex c = ...;c = new Complex(c.re, 0.0); // rebuild from scratchc = c.changeIm(0.0); // maybe use a helper methodc.im = 0.0; // but what I meant was thisThis can be viewed mainly a matter of syntax sugar, but there is alsoa deep connection to the hardware here. A multi-component value typeis realized either as bits in a block of memory words or as as bits ina collection of live registers. It is fundamentally reasonable forthe user to ask to change just one of those low-level componentsin isolation, assuming the library designer allows the operation.At the JVM level, it should be possible to render the bytecodesof a component-wise update fairly directly and without confusionto a register or memory write (adding memory handshakes as needed).Reconstructing a new value from the ground up, just because onecomponent shifted, might create a bunch of noisy intermediaterepresentation which could distract the JIT compiler from moreimportant optimization work.Our race is now runTo conclude, I believe that implementation challenges ofpersistent-style values are manageable, and that the correspondingopposing trade-offs for struct-like values are much more difficult tocontrol. In the end it comes down to safe and clean user model versusdirect compilation to memory instructions. And when safety seemsopposed to speed, I think we can agree that JVM needs to lean towardsdesigns that are safe by default, in the expectation that slowness iseasier to fix than insecurity.And in the end, I think we will get both safety and speed.

This note explains how a notion of “value types” for the VM should protect the integrity of that value type’s invariants, and points out important differences, in memory effects,between “struct-like”...


celestial harmony

Listening to music helps me concentrate at work. Music with words is too distracting, but all kinds of classical and jazz works for me, as well as various kinds of atmospheric sounds, natural or constructed.Lately I’ve been enjoying audio transcriptions of electromagnetic field measurements taken by NASA’s Voyager probe,which I got from iTunes.(My son David, who has a degree in psychoacoustics, turned me on to this.)As has often been noted,the sounds of space are unearthly yet somehow natural, often pleasant and sometimes eerie.There is some sort of deep structure which our earth-trained senses can still respond to.I suppose it has something to do with auto-correlations and self-similar structures at multiple scales in both frequency and time domains.Composers have sometimes chosen to create sounds like this to suggest “spacey” or other-worldly environments.For example, the NASA recordings sometimes remind me, vaguely, of the creepy electronic beeps and bumps heard inForbidden Planet.The actual sounds from space are less dramatic, as one might expect.But even when they are a little creepy, the NASA sounds provide a very pleasant and unobtrusive background for me as a I hack away on my code.So why am I not working, but writing a blog entry instead?Well, starting at minute 12 of the track “Sphere of Io”, there are sounds which resemble a women’s choir singing (wordlessly) in rising tone clusters. You can also find this on YouTube, where the tone clusters start at about 7:30.Recently I have also been listening to Gustav Holst’s The Planets Suite.Planets, of course, is great listening for nerds, both for its own sake and because of the links to astronomy and also (via John Williams) to movie music.I love the first movement, “Mars, the Bringer of War”, for its rowdy energy, and have learned to appreciate the other movements also.But I find the final movement, “Neptune the Mystic”, to be frustratingly anticlimactic.It doesn’t stride triumphantly to an dramatic conclusion, but rather slowly fades out into a women’s choir, which sings (wordlessly) in rising tone clusters.That’s what suddenly ripped my attention away from work: I heard Io doing a cover of Holst’s final fade-out.You can hear Holst’s choral fade-out at the end of “Neptune”, from about 5:42 onwardin the Boston Pops recording.You can also hear it on YouTube, where the fade-out starts at about 6:02.The most remarkable thing about this, I suppose, is that in reality Holst is not imitating Io (since he wrote it a century ago), and nor is Io’s behavior patterned after Holst.Either the two of them are following a common pattern, or I am indulging in a common human behavior of seeing patterns in noise.I think both of the latter alternatives are true; we humans usually require some basic phenomenal structure to prompt us before we begin to see patterns.In this case, I think the basic structure has to do with slightly dispersed audio-range tones, modulated to wander on the 1-second scale, as I hinted above.(I wonder: If air-breathing extraterrestrials exist, would we enjoy their songs? It seems likely to me at the moment.)The most enjoyable thing for me about this is to contemplate the unity of physical laws as we experience them personally, and as they operate in unearthly places like Io.The proof of this unity made Isaac Newton a rock star and launched modern science, but it has been pondered since humans were human,and is still puzzling today.The ancients called it the music of the spheres, and so do I.

Listening to music helps me concentrate at work. Music with words is too distracting, but all kinds of classical and jazz works for me, as well as various kinds of atmospheric sounds, natural...


Monday at Microsoft Lang.NEXT

We are having a blast at Microsoft Lang.NEXT.For the record, I posted my talk about Java 8.[Update 4/06] The videos are already coming out from Channel 9, including my talk on Java 8 (mostly about closures) and an update from Jeroen Frijters on hosting the JVM on .NET.I recommend Martin Odersky’s keynote about new facilities for expression reflection in Scala. As befits a typeful language, the notation for what Lisp-ers think of as “backquote-comma” and F#-ers call code quotation is based on the type system. The amazing part (at least to me) is that there is no modification to the language parser: no special pseudo-operators or new brackets. I guess this is a generalization of C#’s special treatment of anonymous lambda expressions, added for LINQ.Here is an example based on the talk, showing a macro definition and its use of an typeful expression template:def assert(cond: Boolean, msg: Any) = macro Asserts.assertImplobject Asserts { def raise(msg: Any) = throw new AssertionError(msg) def assertImpl(c: Context) (cond: c.Expr[Boolean], msg: c.Expr[Any]) : c.Expr[Unit] = c.reify( if (!cond.eval) raise(msg.eval) )}The last code line is roughly comparable to a string template "if (!$cond) raise($msg)" or a Lisp backquote `(if (not ,cond) (raise ,msg)), but the Scala version is hygenically parsed, scoped, and typed. Note also the crucial use of path-dependent types (c.Expr) to allow compilers freedom to switch up their representations.Speaking of C#, Mads Torgersen talked about new features in C# for asynchronous programming, the async and await keywords which go with the Task type pattern (which includes the varying notions of promise and future). You use async/await in nested pairs to shift code styles between coroutined (async) and blocking (awaited). The source notation is the same, but coroutined code is compiled using state machines. It is similar (but internally distinct) from the older generator/yield notation. To me it looks like a fruitful instance of the backquote-comma notational pattern.Coroutines are a good way to break up small computations across multiple cores, so it is not surprising that they are a hot topic. Robert Griesemer’s talk on Go was a tour de force of iterative interactive development, in which he started with a serialized Mandelbrot image server and added 25% performance (on two cores: 200 ms to 160 ms) by small code changes to partition the request into coroutined tasks. Each task generated a line of the result image, so the adjustment was simple and natural. At a cost to interoperability, a segmented stack design allows tasks to be scheduled cheaply; fresh stacks start at 4 kilobytes. The use of CSP and a built-in channel type appear to guide the programmer around pitfalls associated with concurrent access to data structures. This is good, since Go data structures are low-level and C-like, allowing many potential race conditions.Go includes a structural (non-nominal) interface type system which makes it simple to connect disparate data structures; the trick is that an interface-bearing data structure is accompanied by a locally created vtable. Such fine-grained interfacing reminds me of an oldie-but-goodie, the Russell programming language.During Q/A I noted that their language design includes the “DOTIMES bug”, and their demo code required a non-obvious workaround to fix it. This was answered in the usual circular way, to the effect that the language spec implies the broken behavior, so users just have to live with it. (IMO, it is like a small land mine in the living room.) Happily for other programmers, C# is fixing the problem, and Java never had the problem, because of the final variable capture rule. Really, language designers, what is so hard about defining that each loop iteration gets a distinct binding of the loop variable? Or at least emitting a diagnostic when a loop variable gets captured?(By the way, the Java community is interested in coroutines also, and Lukas Stadler has built a prototype for us to experiment with. It seems to me that there is a sweet spot in there somewhere, with user-visible async evaluation modes and JVM-mediated transforms, that can get us to a similar place. As a bonus, I would hope that the evaluation modes would also scale down to generators and up to heterogenous processing arrays; is that too much to ask? Perhaps a Scala-like DSL facility is the right Archimedean lever to solve these problems.)Walter Bright and Andrei Alexandriu presented cool features of the D language. A key problem in managing basic blocks is reliably pairing setup and cleanup actions, such as opening a file and closing it. C++ solves this with stack objects equipped with destructors, a pattern which is now called RAII (resource allocation is initialization). As of 7, Java finally (as it were) has a similar mechanism, although it must be requested via a special syntax. Similarly, D has a special syntax (scope(exit)) for declaring cleanup statements inline immediately next to the associated setups. This is intriguingly similar to to Go’s defer keyword, except that Go defers dynamically pile up on the enclosing call frame, while the D construct is strictly lexical. Also, D has two extra flavors of cleanup, which apply only to abnormal or normal exits. The great benefit of such things is being able to bundle setups and cleanups adjacently, without many additional layers of block nesting. D also ensures pointer safety using 2-word fat pointers. (This reminded me of Sam Kendall’s early work with Bounds Check C, which used 3-word fat pointers. D is a good venue for such ideas.)In keeping with the keynote theme of quasi-reflective computation at compile time. D has a macro facility introduced with the (confusing) keyword mixin, and called “CTFE” (compile-time function execution). Essentially, D code is executed by the compiler to extrude more D code (as flat text) which the compiler then ingests. The coolest part of all this is the pure attribute in the D type system, which more or less reliably marks functions which are safe to execute at compile time. There is also an immutable attribute for marking data which is safe for global and compile-time consumption.Here are some other tidbits gleaned from my notes:As data continues to scale up, Von Neumann machines and their data structures—arrays with peek and poke operations—are straining to keep up. The buzzword this week for well-behaved Big Data is monotonic. I am thinking that Java’s long awaited next array version should not look much like an array at all. (Rich Hickey, you are ahead of the pack here with Clojure.) Jeroen Frijters pointed out to me that embedding value type structs in arrays exposes them to A/B/A bugs. Glad I held off on that blog post...As the Von Neumann Memory continues to morph into the Non-Von Network, systems need to decouple the two ends of every query operation, the request from the response, even in the case of single memory loads. This is why asynchronous computations, coroutines, channels, etc. are important. The design problem is to give programmers sequential notations without requiring the hardware to serialize in the same order. Yes, this has to add “magic” to the language; either we do that or (as Mads pointed out) set programmers the impossible task of expressing all desequentializations as explicit callbacks or continuations. The question is which magic will be less surprising and more transparent. Also: which magic will scale into the future, as new programmers are trained on less rigidly sequential styles. (Hint: The future does not contain the statement x = x+1.)On the other end of the programming scale are the transparently partitionable bulk operations, such as supplied by databases and mathematical systems (from APL to Julia). These hide elements of the computation invisible so they can be efficiently scheduled. Even beyond that are the Big Ideas of system modeling and logic programming, which were discussed in several talks.Still, programming tasks will apparently always use collections, at least in the sense of increasingly aggregated data values. This is why we are not done exploring the design space occupied by Java collections, LINQ, etc. It seems to me that much of the brain-power raditionally devoted to language design ought to be pointed at aggregate computation design, whether or not that requires new languages. Martin Odersky’s comment was that designing libraries is just as hard as designing a language. Also, John Cook notes that, from the point of view of the user (the non-programmer domain specialist), R has a language, rather than R is a language. What R also has is really useful arrays and all the specialist tools in easy reach.Kunle Olukotun’s talk showed how to design aggressively for parallelism without inventing a whole new language. The key is DSLs, domain specific languages, for which Scala provides a fertile testbed. (Perhaps this is a good way, in the future, to put specialist tools in easy reach?)Here is important design point about languages: Matlab and R style numeric arrays are very much alive and well. John Cook grounded our considerations with a presentation of how the R language is used, and the Julia project team was present to talk about their efforts to support numerics.The Julia folks have tackled the hard problem of doing numeric stacks in a modular way. The problem is harder than it looks; it requires symmetric dispatcing of binary operations and normalizing representations composed from independent types. For example, what happens when you divide two gaussian integers (Complex[Int]), do you get a Complex[Rational] and if so, why? What happens when you replace Complex and Rational by types defined by non-communicating users? Apparently their framework can handle such things.A bunch of CS professors are cooperatively designing their own pedagogical language, Grace. Besides their sensitivity to theoretical concerns, this team uniquely brings to the table a limitless supply of experimental subjects, also known as “students”. Good luck, fellows! (Really! Even if I had my irony fonts installed, I wouldn’t use them here.)The new Windows Runtime (WinRT) system API emphasizes asynchronous programming with tasks. I guess we programmers are being weaned off of threads; at least, it is high time, since threads are bulky compared to virtualized CPUs. WinRT also provides much richer and more flexible management of component API metadata, repurposing CLR metadata formats to interconnect native, managed, and HTML5 code. Compilers (e.g., F#—there was a talk on this) are being modified to read the metadata on the fly, instead of requiring build-time binding generators.Great quote on Moore’s End: “Taking a sabbatical is no longer an option for accelerating your program.” (Andrei Alexandriu)There was an awesome (though hard to understand) demo of the Roslyn project, in which the Visual IDE language processors are opened up to allow (amazingly) easy insertion of code analysis and transforms.This year the conference was called Lang.NEXT instead of Lang.NET, in order to welcome “non-managed” languages like C++. There was a good panel on native languages. The end result of having the managed and native folks talk, I think, was that the bright line between managed and native become dimmer and more nuanced. This may have defused some latent partisanship. In any case, for those who believe that managed language JITs are inherently poor creatures devoid of heroics, please see the HotSpot architecture wiki. The HotSpot JVM is described in various papers (though there are not enough of them); Arnold Schwaighofer’s thesis provides a recent medium-length account.I am grateful to the organizers for making a lively and comfortable conference. They have set a high bar for such events. We at Oracle will do our best to reciprocate at this year’s JVM Language Summit this summer.

We are having a blast at Microsoft Lang.NEXT. For the record, I posted my talk about Java 8. [Update 4/06] The videos are already coming out from Channel 9, including my talk on Java 8 (mostly about...


the OpenJDK group at Oracle is growing

p.p1 {margin: 0.0px 0.0px 12.0px 0.0px; font: 12.0px Times} span.s1 {text-decoration: underline ; color: #0000ee}The OpenJDK software development team at Oracle is hiring. To get an idea of what we’re looking for, go to the Oracle recruitment portal and enter the Keywords “Java Platform Group” and the Location Keywords “Santa Clara”.  (We are a global engineering group based in Santa Clara.)  It’s pretty obvious what we are working on; just dive into a public OpenJDK repository or OpenJDK mailing list.Here is a typical job description from the current crop of requisitions:The Java Platform group is looking for an experienced, passionate and highly-motivated Software Engineer to join our world class development effort. Our team is responsible for delivering the Java Virtual Machine that is used by millions of developers. We are looking for a development engineer with a strong technical background and thorough understanding of the Java Virtual Machine, Java execution runtime, classloading, garbage collection, JIT compiler, serviceability and a desire to drive innovations.As a member of the software engineering division, you will take an active role in the definition and evolution of standard practices and procedures. You will be responsible for defining and developing software for tasks associated with the developing, designing and debugging of software applications or operating systems.Work is non-routine and very complex, involving the application of advanced technical/business skills in area of specialization. Leading contributor individually and as a team member, providing direction and mentoring to others. BS or MS degree or equivalent experience relevant to functional area. 7 years of software engineering or related experience.

The OpenJDK software development team at Oracle is hiring. To get an idea of what we’re looking for, go to the Oracle recruitment portal and enter the Keywords “Java Platform Group” and the Location...


value types in the vm

value types in the vm p {margin: 0.0px 0.0px 8.0px 0.0px; font: 14.0px Times} ol {margin: 0.0px 0.0px 8.0px 0.0px; font: 14.0px Times} ul {margin: 0.0px 0.0px 8.0px 0.0px; font: 14.0px Times} h3 {margin: 0.0px 0.0px 8.0px 0.0px; font: 14.0px Times} code {margin: 0.0px 0.0px 0.0px 0.0px; font: 14.0px Courier} span.smaller {font: 12.0px Times}Or, enduring values for a changing world.IntroductionA value type is a data type which, generally speaking, is designed for being passed by value in and out of methods, and stored by value in data structures. The only value types which the Java language directly supports are the eight primitive types. Java indirectly and approximately supports value types, if they are implemented in terms of classes. For example, both Integer and String may be viewed as value types, especially if their usage is restricted to avoid operations appropriate to Object. In this note, we propose a definition of value types in terms of a design pattern for Java classes, accompanied by a set of usage restrictions. We also sketch the relation of such value types to tuple types (which are a JVM-level notion), and point out JVM optimizations that can apply to value types.This note is a thought experiment to extend the JVM’s performance model in support of value types. The demonstration has two phases. Initially the extension can simply use design patterns, within the current bytecode architecture, and in today’s Java language. But if the performance model is to be realized in practice, it will probably require new JVM bytecode features, changes to the Java language, or both. We will look at a few possibilities for these new features.[Posted 3/21/2012, updated as marked 3/24/2012 and 3/05/2014.]An Axiom of ValueIn the context of the JVM, a value type is a data type equipped with construction, assignment, and equality operations, and a set of typed components, such that, whenever two variables of the value type produce equal corresponding values for their components, the values of the two variables cannot be distinguished by any JVM operation.Here are some corollaries: A value type is immutable, since otherwise a copy could be constructed and the original could be modified in one of its components, allowing the copies to be distinguished. Changing the component of a value type requires construction of a new value. The equals and hashCode operations are strictly component-wise. If a value type is represented by a JVM reference, that reference cannot be successfully synchronized on, and cannot be usefully compared for reference equality.A value type can be viewed in terms of what it doesn’t do. We can say that a value type omits all value-unsafe operations, which could violate the constraints on value types. These operations, which are ordinarily allowed for Java object types, are pointer equality comparison (the acmp instruction), synchronization (the monitor instructions), all the wait and notify methods of class Object, and non-trivial finalize methods. The clone method is also value-unsafe, although for value types it could be treated as the identity function. Finally, and most importantly, any side effect on an object (however visible) also counts as an value-unsafe operation.A value type may have methods, but such methods must not change the components of the value. It is reasonable and useful to define methods like toString, equals, and hashCode on value types, and also methods which are specifically valuable to users of the value type.Representations of ValueValue types have two natural representations in the JVM, unboxed and boxed. An unboxed value consists of the components, as simple variables. For example, the complex number x=(1+2i), in rectangular coordinate form, may be represented in unboxed form by the following pair of variables:/*Complex x = Complex.valueOf(1.0, 2.0):*/double x_re = 1.0, x_im = 2.0;These variables might be locals, parameters, or fields. Their association as components of a single value is not defined to the JVM. Here is a sample computation which computes the norm of the difference between two complex numbers:double distance(/*Complex x:*/ double x_re, double x_im, /*Complex y:*/ double y_re, double y_im) { /*Complex z = x.minus(y):*/ double z_re = x_re - y_re, z_im = x_im - y_im; /*return z.abs():*/ return Math.sqrt(z_re*z_re + z_im*z_im);}A boxed representation groups component values under a single object reference. The reference is to a ‘wrapper class’ that carries the component values in its fields.(A primitive type can naturally be equated with a trivial value type with just one component of that type. In that view, the wrapper class Integer can serve as a boxed representation of value type int.)The unboxed representation of complex numbers is practical for many uses, but it fails to cover several major use cases: return values, array elements, and generic APIs. The two components of a complex number cannot be directly returned from a Java function, since Java does not support multiple return values. The same story applies to array elements: Java has no ‘array of structs’ feature. (Double-length arrays are a possible workaround for complex numbers, but not for value types with heterogeneous components.) By generic APIs I mean both those which use generic types, like Arrays.asList and those which have special case support for primitive types, like String.valueOf and PrintStream.println. Those APIs do not support unboxed values, and offer some problems to boxed values. Any ‘real’ JVM type should have a story for returns, arrays, and API interoperability.The basic problem here is that value types fall between primitive types and object types. Value types are clearly more complex than primitive types, and object types are slightly too complicated. Objects are a little bit dangerous to use as value carriers, since object references can be compared for pointer equality, and can be synchronized on. Also, as many Java programmers have observed, there is often a performance cost to using wrapper objects, even on modern JVMs.Even so, wrapper classes are a good starting point for talking about value types. If there were a set of structural rules and restrictions which would prevent value-unsafe operations on value types, wrapper classes would provide a good notation for defining value types. This note attempts to define such rules and restrictions.Let’s Start CodingNow it is time to look at some real code. Here is a definition, written in Java, of a complex number value type.@ValueSafepublic final class Complex implements java.io.Serializable { // immutable component structure: public final double re, im; private Complex(double re, double im) { this.re = re; this.im = im; } // interoperability methods: public String toString() { return "Complex("+re+","+im+")"; } public List<Double> asList() { return Arrays.asList(re, im); } public boolean equals(Complex c) { return re == c.re &amp;&amp; im == c.im; } public boolean equals(@ValueSafe Object x) { return x instanceof Complex &amp;&amp; equals((Complex) x); } public int hashCode() { return 31*Double.valueOf(re).hashCode() + Double.valueOf(im).hashCode(); } // factory methods: public static Complex valueOf(double re, double im) { return new Complex(re, im); } public Complex changeRe(double re2) { return valueOf(re2, im); } public Complex changeIm(double im2) { return valueOf(re, im2); } public static Complex cast(@ValueSafe Object x) { return x == null ? ZERO : (Complex) x; } // utility methods and constants: public Complex plus(Complex c) { return new Complex(re+c.re, im+c.im); } public Complex minus(Complex c) { return new Complex(re-c.re, im-c.im); } public double abs() { return Math.sqrt(re*re + im*im); } public static final Complex PI = valueOf(Math.PI, 0.0); public static final Complex ZERO = valueOf(0.0, 0.0);}This is not a minimal definition, because it includes some utility methods and other optional parts. The essential elements are as follows:The class is marked as a value type with an annotation.The class is final, because it does not make sense to create subclasses of value types.The fields of the class are all final. (I.e., the type is immutable.)From the supertype Object, all public non-final methods are overridden.The constructor is private.Beyond these bare essentials, we can observe the following features in this example, which are likely to be typical of all value types:One or more factory methods are responsible for value creation, including a component-wise valueOf method.There are utility methods for complex arithmetic and instance creation, such as plus and changeIm.There are static utility constants, such as PI.The type is serializable, using the default mechanisms.There are methods for converting to and from dynamically typed references, such as asList and cast.The RulesIn order to use value types properly, the programmer must avoid value-unsafe operations. A helpful Java compiler should issue errors (or at least warnings) for code which provably applies value-unsafe operations, and should issue warnings for code which might be correct but does not provably avoid value-unsafe operations. No such compilers exist today, but to simplify our account here, we will pretend that they do exist.A value-safe type is any class, interface, or type parameter marked with the @ValueSafe annotation, or any subtype of a value-safe type. If a value-safe class is marked final, it is in fact a value type. All other value-safe classes must be abstract. The non-static fields of a value class must be final, and all its constructors must be private.Under the above rules, a standard interface could be helpful to define value types like Complex. Here is an example:@ValueSafepublic interface ValueType extends java.io.Serializable { // All methods listed here must get redefined. // Definitions must be value-safe, which means // they may depend on component values only. List<? extends Object> asList(); int hashCode(); boolean equals(@ValueSafe Object c); String toString();}//@ValueSafe inherited from supertype:public final class Complex implements ValueType { ...The main advantage of such a conventional interface is that (unlike an annotation) it is reified in the runtime type system. It could appear as an element type or parameter bound, for facilities which are designed to work on value types only. More broadly, it might assist the JVM to perform dynamic enforcement of the rules for value types.Besides types, the annotation @ValueSafe can mark fields, parameters, local variables, and methods. (This is redundant when the type is also value-safe, but may be useful when the type is Object or another supertype of a value type.) Working forward from these annotations, an expression E is defined as value-safe if it satisfies one or more of the following:The type of E is a value-safe type.E names a field, parameter, or local variable whose declaration is marked @ValueSafe.E is a call to a method whose declaration is marked @ValueSafe.E is an assignment to a value-safe variable, field reference, or array reference.E is a cast to a value-safe type from a value-safe expression.([Added 3/24/2012:]The issue of null pollution is discussed below.)E is a conditional expression E0 ? E1 : E2, and both E1 and E2 are value-safe.Assignments to value-safe expressions and initializations of value-safe names must take their values from value-safe expressions.[Added 3/24/2012:]A cast from an arbitrary value to a value-safe type is almost value-safe, except for the possibility of a null operand, which will cause a null pointer exception whenever the value is converted to an unboxed representation.A value-safe expression may not be the subject of a value-unsafe operation. In particular, it cannot be synchronized on, nor can it be compared with the “==” operator, not even with a null or with another value-safe type.In a program where all of these rules are followed, no value-type value will be subject to a value-unsafe operation. Thus, the prime axiom of value types will be satisfied, that no two value type will be distinguishable as long as their component values are equal.More CodeTo illustrate these rules, here are some usage examples for Complex:Complex pi = Complex.valueOf(Math.PI, 0);Complex zero = pi.changeRe(0); //zero = pi; zero.re = 0;ValueType vtype = pi;@SuppressWarnings("value-unsafe") Object obj = pi;@ValueSafe Object obj2 = pi;obj2 = new Object(); // okList<Complex> clist = new ArrayList<Complex>();clist.add(pi); // (ok assuming List.add param is @ValueSafe)List<ValueType> vlist = new ArrayList<ValueType>();vlist.add(pi); // (ok)List<Object> olist = new ArrayList<Object>();olist.add(pi); // warning: "value-unsafe"boolean z = pi.equals(zero);boolean z1 = (pi == zero); // error: reference comparison on value typeboolean z2 = (pi == null); // error: reference comparison on value typeboolean z3 = (pi == obj2); // error: reference comparison on value typesynchronized (pi) { } // error: synch of value, unpredictable resultsynchronized (obj2) { } // unpredictable resultComplex qq = pi;qq = null; // possible NPE; warning: "null-unsafe"qq = (Complex) obj; // warning: "null-unsafe"qq = Complex.cast(obj); // OK@SuppressWarnings("null-unsafe") Complex empty = null; // possible NPEqq = empty; // possible NPE (null pollution)The PayoffsIt follows from this that either the JVM or the java compiler can replace boxed value-type values with unboxed ones, without affecting normal computations. Fields and variables of value types can be split into their unboxed components. Non-static methods on value types can be transformed into static methods which take the components as value parameters.Some common questions arise around this point in any discussion of value types. Why burden the programmer with all these extra rules? Why not detect programs automagically and perform unboxing transparently? The answer is that it is easy to break the rules accidently unless they are agreed to by the programmer and enforced. Automatic unboxing optimizations are tantalizing but (so far) unreachable ideal. In the current state of the art, it is possible exhibit benchmarks in which automatic unboxing provides the desired effects, but it is not possible to provide a JVM with a performance model that assures the programmer when unboxing will occur. This is why I’m writing this note, to enlist help from, and provide assurances to, the programmer. Basically, I’m shooting for a good set of user-supplied “pragmas” to frame the desired optimization.Again, the important thing is that the unboxing must be done reliably, or else programmers will have no reason to work with the extra complexity of the value-safety rules. There must be a reasonably stable performance model, wherein using a value type has approximately the same performance characteristics as writing the unboxed components as separate Java variables.There are some rough corners to the present scheme. Since Java fields and array elements are initialized to null, value-type computations which incorporate uninitialized variables can produce null pointer exceptions. One workaround for this is to require such variables to be null-tested, and the result replaced with a suitable all-zero value of the value type. That is what the “cast” method does above.[Added:]The introduction of nulls into value-safe variables and expression can be called null pollution, by analogy with the concept of heap pollution caused by unchecked generic type conversions. A polluting null value may propagate through value-safe variables and expressions, until someone unboxes it into a value type, at which point it will cause a null pointer exception. For this reason alone, a cast from a non-value-safe expression is not value-safe; all non-null casts are useful and safe. It might be reasonable, at some point, for a cast to a value-safe type to deal specially with nulls, either forcing a NPE, or substituting a box of zero values (if the cast is to a concrete type). I think experimentation is needed to decide the right answer here.Generically typed APIs like List<T> will continue to manipulate boxed values always, at least until we figure out how to do reification of generic type instances. Use of such APIs will elicit warnings until their type parameters (and/or relevant members) are annotated or typed as value-safe. Retrofitting List<T> is likely to expose flaws in the present scheme, which we will need to engineer around. Here are a couple of first approaches:public interface java.util.List<@ValueSafe T> extends Collection<T> { ...public interface java.util.List<T extends Object|ValueType> extends Collection<T> { ...(The second approach would require disjunctive types, in which value-safety is “contagious” from the constituent types.)With more transformations, the return value types of methods can also be unboxed. This may require significant bytecode-level transformations, and would work best in the presence of a bytecode representation for multiple value groups, which I have proposed elsewhere under the title “Tuples in the VM”.But for starters, the JVM can apply this transformation under the covers, to internally compiled methods. This would give a way to express multiple return values and structured return values, which is a significant pain-point for Java programmers, especially those who work with low-level structure types favored by modern vector and graphics processors. The lack of multiple return values has a strong distorting effect on many Java APIs.Even if the JVM fails to unbox a value, there is still potential benefit to the value type. Clustered computing systems something have copy operations (serialization or something similar) which apply implicitly to command operands. When copying JVM objects, it is extremely helpful to know when an object’s identity is important or not. If an object reference is a copied operand, the system may have to create a proxy handle which points back to the original object, so that side effects are visible. Proxies must be managed carefully, and this can be expensive. On the other hand, value types are exactly those types which a JVM can “copy and forget” with no downside.Array types are crucial to bulk data interfaces. (As data sizes and rates increase, bulk data becomes more important than scalar data, so arrays are definitely accompanying us into the future of computing.) Value types are very helpful for adding structure to bulk data, so a successful value type mechanism will make it easier for us to express richer forms of bulk data.Unboxing arrays (i.e., arrays containing unboxed values) will provide better cache and memory density, and more direct data movement within clustered or heterogeneous computing systems. They require the deepest transformations, relative to today’s JVM. There is an impedance mismatch between value-type arrays and Java’s covariant array typing, so compromises will need to be struck with existing Java semantics. It is probably worth the effort, since arrays of unboxed value types are inherently more memory-efficient than standard Java arrays, which rely on dependent pointer chains.It may be sufficient to extend the “value-safe” concept to array declarations, and allow low-level transformations to change value-safe array declarations from the standard boxed form into an unboxed tuple-based form. Such value-safe arrays would not be convertible to Object[] arrays. Certain connection points, such as Arrays.copyOf and System.arraycopy might need additional input/output combinations, to allow smooth conversion between arrays with boxed and unboxed elements.Alternatively, the correct solution may have to wait until we have enough reification of generic types, and enough operator overloading, to enable an overhaul of Java arrays.Implicit Method DefinitionsThe example of class Complex above may be unattractively complex. I believe most or all of the elements of the example class are required by the logic of value types. If this is true, a programmer who writes a value type will have to write lots of error-prone boilerplate code. On the other hand, I think nearly all of the code (except for the domain-specific parts like plus and minus) can be implicitly generated.Java has a rule for implicitly defining a class’s constructor, if no it defines no constructors explicitly. Likewise, there are rules for providing default access modifiers for interface members. Because of the highly regular structure of value types, it might be reasonable to perform similar implicit transformations on value types. Here’s an example of a “highly implicit” definition of a complex number type:public class Complex implements ValueType { // implicitly final public double re, im; // implicitly public final //implicit methods are defined elementwise from te fields: // toString, asList, equals(2), hashCode, valueOf, cast //optionally, explicit methods (plus, abs, etc.) would go here}In other words, with the right defaults, a simple value type definition can be a one-liner. The observant reader will have noticed the similarities (and suitable differences) between the explicit methods above and the corresponding methods for List<T>.Another way to abbreviate such a class would be to make an annotation the primary trigger of the functionality, and to add the interface(s) implicitly:public @ValueType class Complex { ... // implicitly final, implements ValueType(But to me it seems better to communicate the “magic” via an interface, even if it is rooted in an annotation.)Implicitly Defined Value TypesSo far we have been working with nominal value types, which is to say that the sequence of typed components is associated with a name and additional methods that convey the intention of the programmer. A simple ordered pair of floating point numbers can be variously interpreted as (to name a few possibilities) a rectangular or polar complex number or Cartesian point. The name and the methods convey the intended meaning.But what if we need a truly simple ordered pair of floating point numbers, without any further conceptual baggage? Perhaps we are writing a method (like “divideAndRemainder”) which naturally returns a pair of numbers instead of a single number. Wrapping the pair of numbers in a nominal type (like “QuotientAndRemainder”) makes as little sense as wrapping a single return value in a nominal type (like “Quotient”). What we need here are structural value types commonly known as tuples.For the present discussion, let us assign a conventional, JVM-friendly name to tuples, roughly as follows:public class java.lang.tuple.$DD extends java.lang.tuple.Tuple { double $1, $2;}Here the component names are fixed and all the required methods are defined implicitly. The supertype is an abstract class which has suitable shared declarations. The name itself mentions a JVM-style method parameter descriptor, which may be “cracked” to determine the number and types of the component fields.The odd thing about such a tuple type (and structural types in general) is it must be instantiated lazily, in response to linkage requests from one or more classes that need it. The JVM and/or its class loaders must be prepared to spin a tuple type on demand, given a simple name reference, $xyz, where the xyz is cracked into a series of component types. (Specifics of naming and name mangling need some tasteful engineering.)Tuples also seem to demand, even more than nominal types, some support from the language. (This is probably because notations for non-nominal types work best as combinations of punctuation and type names, rather than named constructors like Function3 or Tuple2.) At a minimum, languages with tuples usually (I think) have some sort of simple bracket notation for creating tuples, and a corresponding pattern-matching syntax (or “destructuring bind”) for taking tuples apart, at least when they are parameter lists. Designing such a syntax is no simple thing, because it ought to play well with nominal value types, and also with pre-existing Java features, such as method parameter lists, implicit conversions, generic types, and reflection. That is a task for another day.Other Use CasesBesides complex numbers and simple tuples there are many use cases for value types. Many tuple-like types have natural value-type representations. These include rational numbers, point locations and pixel colors, and various kinds of dates and addresses.Other types have a variable-length ‘tail’ of internal values. The most common example of this is String, which is (mathematically) a sequence of UTF-16 character values. Similarly, bit vectors, multiple-precision numbers, and polynomials are composed of sequences of values. Such types include, in their representation, a reference to a variable-sized data structure (often an array) which (somehow) represents the sequence of values. The value type may also include ‘header’ information.Variable-sized values often have a length distribution which favors short lengths. In that case, the design of the value type can make the first few values in the sequence be direct ‘header’ fields of the value type. In the common case where the header is enough to represent the whole value, the tail can be a shared null value, or even just a null reference. Note that the tail need not be an immutable object, as long as the header type encapsulates it well enough. This is the case with String, where the tail is a mutable (but never mutated) character array.Field types and their order must be a globally visible part of the API. The structure of the value type must be transparent enough to have a globally consistent unboxed representation, so that all callers and callees agree about the type and order of components that appear as parameters, return types, and array elements. This is a trade-off between efficiency and encapsulation, which is forced on us when we remove an indirection enjoyed by boxed representations. A JVM-only transformation would not care about such visibility, but a bytecode transformation would need to take care that (say) the components of complex numbers would not get swapped after a redefinition of Complex and a partial recompile. Perhaps constant pool references to value types need to declare the field order as assumed by each API user.This brings up the delicate status of private fields in a value type. It must always be possible to load, store, and copy value types as coordinated groups, and the JVM performs those movements by moving individual scalar values between locals and stack. If a component field is not public, what is to prevent hostile code from plucking it out of the tuple using a rogue aload or astore instruction? Nothing but the verifier, so we may need to give it more smarts, so that it treats value types as inseparable groups of stack slots or locals (something like long or double).My initial thought was to make the fields always public, which would make the security problem moot. But public is not always the right answer; consider the case of String, where the underlying mutable character array must be encapsulated to prevent security holes. I believe we can win back both sides of the tradeoff, by training the verifier never to split up the components in an unboxed value. Just as the verifier encapsulates the two halves of a 64-bit primitive, it can encapsulate the the header and body of an unboxed String, so that no code other than that of class String itself can take apart the values.Similar to String, we could build an efficient multi-precision decimal type along these lines:public final class DecimalValue extends ValueType { private final long header; private final BigInteger digits; public DecimalValue valueOf(int value, int scale) { assert(scale >= 0); return new DecimalValue(((long)value << 32) + scale, null); } public DecimalValue valueOf(long value, int scale) { if (value == (int) value) return valueOf((int)value, scale); return new DecimalValue(-scale, new BigInteger(value)); }}Values of this type would be passed between methods as two machine words. Small values (those with a significand which fits into 32 bits) would be represented without any heap data at all, unless the DecimalValue itself were boxed.(Note the tension between encapsulation and unboxing in this case. It would be better if the header and digits fields were private, but depending on where the unboxing information must “leak”, it is probably safer to make a public revelation of the internal structure.)Note that, although an array of Complex can be faked with a double-length array of double, there is no easy way to fake an array of unboxed DecimalValues. (Either an array of boxed values or a transposed pair of homogeneous arrays would be reasonable fallbacks, in a current JVM.) Getting the full benefit of unboxing and arrays will require some new JVM magic.Although the JVM emphasizes portability, system dependent code will benefit from using machine-level types larger than 64 bits. For example, the back end of a linear algebra package might benefit from value types like Float4 which map to stock vector types. This is probably only worthwhile if the unboxing arrays can be packed with such values.More DaydreamsA more finely-divided design for dynamic enforcement of value safety could feature separate marker interfaces for each invariant. An empty marker interface Unsynchronizable could cause suitable exceptions for monitor instructions on objects in marked classes. More radically, a Interchangeable marker interface could cause JVM primitives that are sensitive to object identity to raise exceptions; the strangest result would be that the acmp instruction would have to be specified as raising an exception.@ValueSafepublic interface ValueType extends java.io.Serializable, Unsynchronizable, Interchangeable { ...public class Complex implements ValueType { // inherits Serializable, Unsynchronizable, Interchangeable, @ValueSafe ...It seems possible that Integer and the other wrapper types could be retro-fitted as value-safe types. This is a major change, since wrapper objects would be unsynchronizable and their references interchangeable. It is likely that code which violates value-safety for wrapper types exists but is uncommon. It is less plausible to retro-fit String, since the prominent operation String.intern is often used with value-unsafe code.We should also reconsider the distinction between boxed and unboxed values in code. The design presented above obscures that distinction. As another thought experiment, we could imagine making a first class distinction in the type system between boxed and unboxed representations. Since only primitive types are named with a lower-case initial letter, we could define that the capitalized version of a value type name always refers to the boxed representation, while the initial lower-case variant always refers to boxed. For example:complex pi = complex.valueOf(Math.PI, 0);Complex boxPi = pi; // convert to boxedmyList.add(boxPi);complex z = myList.get(0); // unboxSuch a convention could perhaps absorb the current difference between int and Integer, double and Double. It might also allow the programmer to express a helpful distinction among array types.As said above, array types are crucial to bulk data interfaces, but are limited in the JVM. Extending arrays beyond the present limitations is worth thinking about; for example, the Maxine JVM implementation has a hybrid object/array type. Something like this which can also accommodate value type components seems worthwhile. On the other hand, does it make sense for value types to contain short arrays? And why should random-access arrays be the end of our design process, when bulk data is often sequentially accessed, and it might make sense to have heterogeneous streams of data as the natural “jumbo” data structure. These considerations must wait for another day and another note.More WorkIt seems to me that a good sequence for introducing such value types would be as follows:Add the value-safety restrictions to an experimental version of javac.Code some sample applications with value types, including Complex and DecimalValue.Create an experimental JVM which internally unboxes value types but does not require new bytecodes to do so. Ensure the feasibility of the performance model for the sample applications.Add tuple-like bytecodes (with or without generic type reification) to a major revision of the JVM, and teach the Java compiler to switch in the new bytecodes without code changes.A staggered roll-out like this would decouple language changes from bytecode changes, which is always a convenient thing.A similar investigation should be applied (concurrently) to array types. In this case, it seems to me that the starting point is in the JVM:Add an experimental unboxing array data structure to a production JVM, perhaps along the lines of Maxine hybrids. No bytecode or language support is required at first; everything can be done with encapsulated unsafe operations and/or method handles.Create an experimental JVM which internally unboxes value types but does not require new bytecodes to do so. Ensure the feasibility of the performance model for the sample applications.Add tuple-like bytecodes (with or without generic type reification) to a major revision of the JVM, and teach the Java compiler to switch in the new bytecodes without code changes.That’s enough musing me for now. Back to work![Added:]Actually, here’s a little more context for this note, with some acknowledgements: The need for immutable objects has been known to Java designers from the beginning, as can be seen from the design of the String class, and the early introduction of blank finals (in which I participated, back in the day). The numeric community (remember Java Grande?) has kept Java’s designers aware from Day One of the need for tuple-like or struct-like types, starting with complex numbers, and for the need for more flexible control of array layout. It has been a perennial puzzle, to us Java folk, how to reconcile the highly reference-like nature of Java values with the use cases for value types.This note presents the results of my own puzzling on this topic for (yes) 15 years, along with some proposals for finishing the puzzle. This work on this has been informed by the thoughts of the original authors of Java, as well as very old language designs (think APL in the ’70s and function programming work in the ’80s), as well as current language designs (think C# and Scala). This note is also partial fulfillment of a promise to many colleagues over the years to propose a suitable solution. As a compiler (JIT) geek, I have seen many hopefully “automagic” scalarization or unboxing optimizations strain and fail to achieve their promise. I have concluded that user advice is necessary. (This is also, I think, the tacit conclusion of decades of research into numeric loop auto-parallelization, but I’m not an expert in that.)This topic, viewed historically, is so huge that I have not even tried to adequately link all the relevant web pages. It is easy to Google them up by the hundreds. However, I would like to acknowledge my discussions with the JavaFX graphics library designers, including Per Bothner, who detailed to me their frustrations in programming in Java to modern GPUs—a series of brick wall encounters which are a telling repeat of the frustrations of the numerics community. I also appreciate the eloquent plea for JVM value types by Ben Hutchison, almost four years ago.So, who wants to hack up an explicitly unboxing javac and/or JVM? I offer the mlvm repository as a home for the work.[Added 2/05/2014:]Martin Buchholz recently dug up an set of design notes by James Gosling from the aforesaid 15 years ago.It is no accident that they have a resemblance to the current note.Watch the OpenJDK project, including the JEP page for more information on value objects and other initiatives.[Added 2/07/2014:]Sam Umbach pointed out some inconsistencies in the use of the private access modifier; thanks Sam!Some of them came from the fact that I couldn’t decide (while writing the post) whether value type fields should always be public.On the “pro” side, the value would be structurally transparent, a mainly esthetic advantage.On the “con” side, the value would be insecure.Result: Values should be allowed to have private fields, and the amended post reflects this everywhere.I also fixed some broken links, due to my misuse of Markdown. (And Markdown and Smartypants are lovely things.)

Or, enduring values for a changing world. Introduction A value type is a data type which, generally speaking, is designed for being passed by value in and out of methods, and stored by value in data...


sweet numerology keyfob

(...Being a ramble through mnemonics, phonetics, information theory, pi, absolute zero, and cosmology, also touching on rutabagas and conches.)Sometimes we need to commit a number to short-term memory, such as an street address or parking garage berth. More important numbers (for phones or credit cards, PINs or birthdays) call for long-term memorization although smart phones increasingly fill that gap. But even with a smart phone it is pleasantly useful to have the option to use one’s own brain for such things. To do this, I rely on something I call the Sweet Numerology Keyfob. Before I explain that outlandish phrase, here is some background...Digital data is hard to memorize because digits are abstract, while our human memory works best on concrete data such as images or meaningful phrases. There are various systems to encode digits in this way, so as to make them “stick” better in the memory. The simplest systems to learn replace each digit by an word, and produce a sentence corresponding to the original sequence of digits. Here is an example where each digit N is represented by a word with N letters:How I wish I could recollect pi!3-- 1 4--- 1 5---- 9-------- 2-Count the letters in each word and you’ll get a morsel of pi. There are many fine examples of this technique on Wikipedia’s Piphilology page.But one digit per word is a waste of words. Here’s why, in terms of basic information theory: To encode a random stream of decimal digits, we will need a minimum of about 3.3 bits per digit. (Are the digits of pi really random? Probably, but nobody knows for sure.) Meanwhile, as shown by various statistical studies and human tests, English words carry 6-12 bits of information. This suggests that each word should be able to carry not one but two or three digits of “payload”. In fact, it is possible to do just that. With a small amount of practice, it is easy (and fun, in a nerdy way) to learn how.(Here is more on the information content of English: Shannon’s pioneering study measured an upper bound of 2.3 bits per letter and speculated on an actual value near 1 bit. Experiments with human prediction have refined the upper bound to 1.2 (). The theory is fascinating. A small study of Latin and Voynichese texts shows those languages exhibit statistics consistent with 9-10 bits per word, similar to Shannon’s upper bound. Note that information content is not constant but depends on factors like style and genre; striking, artificially constructed mnemonic phrases will probably have higher entropy than narrative text.)For representing digits by words, I use a system I learned from Kenneth Higbee’s wonderful book, Your Memory: How It Works and How to Improve It. I call it the Sweet Numerology Keyfob, but Higbee calls it the Phonetic System; it is also referred to (for no apparent reason) as the Major System. The technique is about 300 years old.The idea is to make sounds (not letters or words) correspond to digits. In order to give just the right amount of creative freedom, vowel sounds are ignored; only consonants are significant. The following words all have the same sequence of consonant sounds, and therefore represent the same number: meteor, meter, metro, mitre, motor, mutter, amateur, emitter. The consonants are MTR, and the number happens to be 314. Because this code is perceived phonetically, spellings with a double T still contribute a single digit, because to the ear there is only one T sound. Depending on how you pronounce it, a word like hattrick or rattail probably contributes a pair of 1 digits, one for each T. My point is that your ear, not your eye, will be the judge.Here is a table relating the digits to some of the consonants:digitsoundexamples0Ssea, see, sigh, so, sue, ass, ice1Ttea, tee, tie, toe, to, at, oat2Nneigh, knee, nigh, no, new, an, in3Mmay, me, my, mow, moo, am, ohm4Rray, raw, awry, row, rue, are, oar5Llay, lee, ally, low, Louie, ale, Ollie6SHshe, shy, show, shoe, ash, ashy7Kkey, Kay, echo, coo, icky, auk8Ffee, fie, foe, foo, if, off9Ppea, pie, Poe, ape(none)AEIOUah, eh?, I, oh!, youAs you can see from these examples, silent letters are ignored, like K in knee and GH in night.Note that the letter combination SH is a single sound, and is used to encode a digit (6).The point of giving so many examples in the right-hand column is to suggest the range of choices available for each digit. As with the “number of letters per word” code used for pi above, for any given number there will be many words to choose from when deciding on a mnemonic phrase. Longer words with more consonants can be used to encode multiple digits, giving even more choices.Using phonetics instead of letters makes it possible to use the system without attending to the written form of a word. In practice, this makes encoding and decoding easier and more reliable.There are only a few other rules to know. First, three sounds are ignored even though they can be classified as consonants. They are W, H, and Y. (You may well ask, “why”?) Thus, the phrase happy hippo whip counts as 999. What about the other consonants? They are grouped with the ones already mentioned above, according to similarity of sound. Any consonant which vibrates the vocal cords is treated like the corresponding consonant which does not, so a hard G goes with its fraternal twin K, etc. Also, the various sounds spelled TH are grouped with T, which differs from the treatment of SH and CH.(Technically, a voiced consonant is treated as if it were unvoiced, and a fricative consonant is treated as if it were a plain stop, if it does not already have a role in the encoding.)Here is an updated table. It is helpful to read the examples aloud. (But don’t let your family overhear.)digitsoundexamples0S/Zsue, zoo, ease, wheeze1T/D/THtwo, due, the, aid, weighed, mayday2Nno, enough, whine, anyway, Hawaiian3Mmow, yam, home, Omaha4Rrow, war(*), airway(*)5Llow, wool, hollow6SH/CH/ZH/J/(G)show, Joe, chew, huge, awash7K/G(hard)coat, goat, wacky, wig8F/Vfew, view, half, heavy9P/Bpie, buy, hope, oboe(none)AEIOUah, eh?, I, oh!, you(none)WHYwhy, how, highway(* 3/24 Note: See guest comments below on non-initial R, etc. Remember that hard and soft G are very different sounds!)Although the tables above may seem daunting, their core information is simple and easy to learn, if you use your ear instead of your eye. There are many ways to remember the basic sequence of ten letters, but I like to tie it up in a phrase which both describes and encodes the technique:SweeT Nu-MeR-oLoGy Key-FoB0---1 2--3-4--5-6- 7---8-9(I coined this phrase when I read Higbee’s book, in 2002.)Now let’s use this system to encode pi, using about the same size phrase as above:How I wish I could recollect pi!3-- 1 4--- 1 5---- 9-------- 2-Moderately pinch Lemuel phobic.3-1-4-1-5- 9-26- 5-3--5 8--9-7Both phrases are about the same length, but the second one encodes twice as many digits of pi.Not surprisingly, because it is more compressed, the second phrase sounds more arbitrary.Still, it may be easy for some of us to remember, especially if you have a mental image for someone named Lemuel.As with any such method, there are many alternative phrases encoding the same payload digits, and the user should pick whichever option works best. Here are some other possibilities, also encoding the first 14 digits of pi. Because they are longer, there is more freedom in composing them, so they might be (in some sense) slightly more logical:Meteorite lupine, chew a lame wolfback.3-1--4-1- 5-9-2- 6--- - 5-3- --589-7-Motherhood, help any shallow male phobic.3-1--4---1 5-9- -2- 6--5--- 3-5- 8--9-7.My tired lip now shall my laugh pique.3- 1-4-1 5-9 2-- 6--5- 3- 5--8- 9-7--Fans of Twilight or Woody Allen might find the first or second more meaningful, respectively.I used a computer to help me build all but the last phrase.The last phrase is an example of a typical off-the-cuff product; it is as short as the others but not as vivid.As with most compression systems, the more work you put in, the better results you get. In this case, the 47 bits required to encode 14 decimal digits are distributed through 4-8 words, at an average of about 6-12 bits per word. Remember that words carry 6-12 bits each, so we are using nearly all of those carrier bits to carry payload. The channel known as Grammatical English provides the bandwidth, and we are using it efficiently.It can be difficult to get “home run” matches like moderately for 31415, but with a little practice, phrases which combine only two or three digits per word are easy to build quickly without mechanical assistance. Memorizing a few digits using short words is the most common way I use this technique. Less commonly, I work hard to get a good vivid encoding for a number I care about, such as a family birthday or phone number.You will notice that there is no clever tie-in to pi itself in these phrases. This is a natural effect of the high compression level of the encoding. There is not much “slack” for an author to choose words that are relevant to the subject. Although this might seem to be a defect of the phonetic system, it is easy to compensate for, by mentally adding a tie-in. If you are imagining Lemuel getting pinched, give him a pie in the face also, and you are done. This extra mental step is more than compensated by the doubled efficiency of the encoding itself.Once I got started it was hard to stop. Eventually I got the following quasi-story which encodes the first 51 digits of pi in 20 words:Immoderately, punchily, my wolfpack bowmen may free Ginger,--3-1-4-1-5- 9-26--5- 3--5 8--9-7 9--3-2 3-- 84-- 6-26-4maim via ammo hangable, sniff heavy rutabaga;3--3 8-- -3-- --27-95- 02-8- ---8- 4-1-9-7-the sheep imbibe my accolades.1-- 6---9 -39-9- 3- -7--5-1-0I can’t say why my bowmen would want to sniff root vegetables; maybe they have swine DNA.It does not have to make complete sense, as long as it is vivid and coherent enough to memorize as something that can be repeated back to oneself.The point of encoding digits this way is that most people (including me) have a much easier time memorizing 20 spoken words instead of 50 abstract digits.That is enough pi for now. The occasion of this post (which is getting long) was a different and much simpler constant. It is used to define the Kelvin and Celsius scales with reference to the triple point of water. The magic number is 273.16, which is (by definition, and exactly) the Kelvin-scale temperature of water in its triple point. This number, along with its close neighbor 273.15, defines both the Kelvin and Celsius scales. In particular, the temperature we call “absolute zero” is -273.15 Celsius. Because the freezing point of water varies slightly depending on pressure, the Kelvin and Celsius scales are defined relative to the triple point, which is very slightly warmer than normal freezing. This hundredth of a degree only makes a difference to scientists, but it is nice to know.Although I find that number number 273.16 interesting, it tends to slip my mind. Today I made a mnemonic to nail it in there:Enigma dish.-2-73- 1-6-After all, if you were presented with a dish of triple-point water, you would have trouble telling whether it were liquid, solid, or gas: A mystery! (You’d also be inside a vacuum chamber or outside in the stratosphere, but that’s irrelevant.) And if a British Enigma machine were dropped on it, it would break. Ludicrous images are memorable ones; at least, it works for me. Similarly, the freezing point of water at normal pressures, and the exact zero point of the Celsius scale is 273.15. Possible phrases for that are (for taxpayers) income toll or (for dystopians) nuke ’em daily.And there is a bonus here, a convenient coincidence. (Not quite a mathematical coincidence.) The digits 273 also present the temperature of the cosmos to three significant figures. More specifically, the temperature of the cosmic microwave background radiation has been measured as 2.726 degrees Kelvin. Rounding gives 2.73K, or 1% of the freezing point of water (273K).The actual ratio is not 1/100 but rather more like 10/1002. Still, it is interesting (to me at least) to note that the cold of space is two orders of magnitude colder than a cold day in most parts of the Earth. It is also interesting to think that this ratio (10/1002) is probably the same everywhere in cosmos. It is slightly more impressive, however, to people who use base 10.Although I expect to use 273 for both purposes, the more precise value of the CMBR is of course worth remembering. It is, after all, one of the most important physical discoveries of late twentieth century. The CMBR is (to me) like the sound of the sea, unvarying across aeons. Folklore says (erroneously and/or poetically) that you can hear the sea in a shell, for example, in a conch shell. I suggest that some part of the CMBR can also be found there:In a conch.-2 - 7-26-

(...Being a ramble through mnemonics, phonetics, information theory, pi, absolute zero, and cosmology, also touching on rutabagas and conches.) Sometimes we need to commit a number to short-term...


suppressing warnings this month

The OpenJDK community just had a nice sprint cleaning up warnings in JDK code. I did my bit; it was fun.Two questions came up during reviews, one about the equivalence of wildcards and named type parameters, and one about the best way to handle creation of an array of generic element types.Here's the quick summary of how to write arrays of generics:@SuppressWarnings("unchecked") // array creation must have wildcardList<String>[] lss = (List<String>[]) new List<?>[1];In the common case where desired type parameter is wild, there is no need to suppress warnings:List<?>[] lqs = new List<?>[1];Class<?>[] cs = new Class<?>[1];Note that every use of @SuppressWarnings should have a comment.Also, every @SuppressWarnings should be placed on the smallestpossible program element, usually a local variable declaration.Here's an example of removing a deprecation warning. A typical deprecation warning is:Foo.java:876: warning: [deprecation] String(byte[],int) in String has been deprecatedString x = new String(new byte[0], 0); ^Here a the corresponding fix:@SuppressWarnings("deprecation") // String(byte[],int) in String has been deprecated// and this is actually useful because the fremdish preskittler was exsufflatedString x = new String(new byte[0], 0);The clarifying comment is a useful way to record any extra information (if known)why the deprecated method is being used.

The OpenJDK community just had a nice sprint cleaning up warnings in JDK code. I did my bit; it was fun. Two questions came up during reviews, one about the equivalence of wildcards and named type...


on coding style

I vastly prefer coding to discussing coding style, just as I would prefer to write poetry instead of talking about how it should be written.Sometimes the topic cannot be put off, either because some individual coder is messing up a shared code base and needs to be corrected, or (worse) because some officious soul has decided, "what we really need around here are some strongly enforced style rules!"Neither is the case at the moment, and yet I will venture a post on the subject.The following are not rules, but suggested etiquette.The idea is to allow a coherent style of coding to flourish safely and sanely, as a humane, inductive, social process.Maxim M1: Observe, respect, and imitate the largest-scale precedents available. (Preserve styles of whitespace, capitalization, punctuation, abbreviation, name choice, code block size, factorization, type of comments, class organization, file naming, etc., etc., etc.)Maxim M2: Don't add weight to small-scale variations. (Realize that Maxim M1 has been broken many times, but don't take that as license to create further irregularities.)Maxim M3: Listen to and rely on your reviewers to help you perceive your own coding quirks. (When you review, help the coder do this.)Maxim M4: When you touch some code, try to leave it more readable than you found it. (When you review such changes, thank the coder for the cleanup. When you plan changes, plan for cleanups.)On the Hotspot project, which is almost 1.5 decades old, we have often practiced and benefited from such etiquette. The process is, and should be, inductive, not prescriptive. An ounce of neighborliness is better than a pound of police-work.Reality check: If you actually look at (or live in) the Hotspot code base, you will find we have accumulated many annoying irregularities in our source base. I suppose this is the normal condition of a lived-in space. Unless you want to spend all your time polishing and tidying, you can't live without some smudge and clutter, can you?Final digression: Grammars and dictionaries and other prescriptive rule books are sometimes useful, but we humans learn and maintain our language by example not grammar. The same applies to style rules. Actually, I think the process of maintaining a clean and pleasant working code base is an instance of a community maintaining its common linguistic identity. BTW, I've been reading and listening to John McWhorter lately with great pleasure.(If you end with a digression, is it a tail-digression?)

I vastly prefer coding to discussing coding style, just as I would prefer to write poetry instead of talking about how it should be written.Sometimes the topic cannot be put off, either because...


Feynman's inbox

Here is Richard Feynman writing on the ease of criticizing theories,and the difficulty of forming them:The problem is not just to say something might be wrong, but toreplace it by something — and that is not so easy. As soon asany really definite idea is substituted it becomes almost immediatelyapparent that it does not work.The second difficulty is that there is an infinite number ofpossibilities of these simple types. It is something like this. Youare sitting working very hard, you have worked for a long time tryingto open a safe. Then some Joe comes along who knows nothing aboutwhat you are doing, except that you are trying to open the safe. Hesays ‘Why don’t you try the combination 10:20:30?’Because you are busy, you have tried a lot of things, maybe you havealready tried 10:20:30. Maybe you know already that the middle numberis 32 not 20. Maybe you know as a matter of fact that it is a fivedigit combination… So please do not send me any letters tryingto tell me how the thing is going to work. I read them — Ialways read them to make sure that I have not already thought of whatis suggested — but it takes too long to answer them, becausethey are usually in the class ‘try 10:20:30’.(“Seeking New Laws”, page 161 inThe Character of Physical Law.)As a sometime designer (and longtime critic) of widely used computersystems, I have seen similar difficulties appear when anyoneundertakes to publicly design a piece of software that may be used bymany thousands of customers. (I have been on both sides of the fence,of course.) The design possibilities are endless, but the deep designproblems are usually hidden beneath a mass of superfluous detail.The sheer numbers can be daunting.Even if only one customer out of a thousand feels a need to express apassionately held idea, it can take a long time to read all the mail.And it is a fact of life that many of those strong suggestions areonly weakly supported by reason or evidence. Opinions are plentiful,but substantive research is time-consuming, and hence rare.A related phenomenon commonly seen with software isbike-shedding,where interlocutors focus on surface details like naming andsyntax… or (come to think of it) like lock combinations.On the other hand, software is easier than quantum physics, and thepopulation of people able to make substantial suggestions aboutsoftware systems is several orders of magnitude bigger thanFeynman’s circle of colleagues. My own work would be poorerwithout contributions — sometimes unsolicited, sometimespassionately urged on me — from the open source community.If a Nobel prize winner thought it was worthwhile to read his mail onthe faint chance of learning a good idea, I am certainly not going tothrow mine away.(In case anyone is still reading this, and is wondering what provokeda meditation on the quality of one’s inbox contents,I’ll simply point out that the volume has been very high, formany months, on theLambda-Devmailing list, where the next version of the Java language is being discussed.Bravo to those of my colleagues who are surfing that wave.)I started this note thinking there was an odd parallel between thelife of the physicist and that of a software designer. On secondthought, I’ll bet that is the story for anybody who works inpublic on something requiring special training.(And that would be pretty much anything worth doing.)In any case, Feynman saw it clearly and said it well.

Here is Richard Feynman writing on the ease of criticizing theories, and the difficulty of forming them: The problem is not just to say something might be wrong, but toreplace it by something — and...


JSR 292 support in b136

The OpenJDK7 build b136 was recently released for download. This is an important step forward for JSR 292 because the JVM now supports the new package name, java.lang.invoke. Up until recently, the package has been java.dyn, but we changed the name just before the JSR 292 Public Review.To minimize test failures, we are using a multi-phase process for the API changes:b130: 7017414 in langtools: Release javac which supports both java.dyn and java.lang.invoke. This allows us to compile either version of JDK code for JSR 292.b135: 6839872, 7012648 in JVM: Release JVM which supports both java.dyn and java.lang.invoke. This allows us to run either version of JDK code for JSR 292.b136: 6839872, 7012648 in JDK: Release JDK which supports only java.lang.invoke. This is the JDK code for JSR 292. Here is a preview from mlvm.b137 (approx): 6817525 in JVM: Turn on JSR 292 functionality by default. This will allow the JVM to support JSR 292 “out of the box.”b137 (approx): 6981791 in JVM: Release cleaned-up JVM, purging all java.dyn references. Will also include rename support for MethodHandle.invoke.b137 (approx): 7028405 in langtools: Release cleaned-up javac, purging all java.dyn references.As the API slowly adjusts under the Public Review, there will be a few additional changes. Here are the ones which are planned or at least under consideration:rename MethodHandle.invokeGeneric to MethodHandle.invoke (the EG has decided on this one)rename or remove the wrapper method MethodHandles.asInstance (possible rename is MethodHandleProxies.asSingleMethodInterface)Possible finality of some classes and methods (to inhibit subclassing and overrides). Might affect SwitchPoint, ConstantCallSite, other call sites, ClassValue.get, ClassValue.remove, etc.Allow ConstantCallSite subclasses to self-bind (i.e., construct with an implicit mh=mh.bindTo(this)).Add non-varargs overloadings to some varargs methods (for efficiency on simple systems). (Could affect insertArguments, dropArguments, filterArguments. Cf. methodType factories.)There have been very good discussions about JSR 292 on the mlvm-dev and jvm-languages mailing lists, as well as numerous comments from other sources. Since the Public Review, I have updated the working draft in the OpenJDK sources. You can see the current javadoc for JSR 292 via the JDK 7 download page. Here are the components to the JDK7 documentation:http://download.java.net/jdk7/docs/api/java/lang/invoke/package-summary.html — the java.lang.invoke packagehttp://download.java.net/jdk7/docs/api/java/lang/ClassValue.html — the ClassValue class (a distant cousin to ThreadLocal)http://download.java.net/jdk7/docs/api/java/lang/BootstrapMethodError.html — the BootstrapMethodError classhttp://download.java.net/jdk7/docs/technotes/guides/vm/multiple-language-support.html — a simple introduction to the JSR 292 APIWe will continue to update the documentation during and shortly after the Public Review period. Please continue to experiment with the APIs and to share your experiences.(Note to JDK7 port users: The link to the preview JAR for b136, in the first list above, allows you to run the current JSR 292 under the b135 JVM. This is a spin from the mlvm patch repository of just the JSR 292 classes. If you put it on your boot class path on a b135 JVM, and if you are daring and lucky, you can preview the b136 functionality. This may be useful if you have a b135-level JVM in one of the porting projects, such as the BSD port.)

The OpenJDK7 build b136 was recently released for download. This is an important step forward for JSR 292 because the JVM now supports the new package name, java.lang.invoke. Up until recently, the...


JSR 292 formal Public Review

If all goes well, there will be a sixty-day formal Public Review of the specification of JSR 292. I expect this to start in about ten days.(Update: The Public Review period is 2/16/2011 through 4/18/2011. The JDK7 implementation appeared in b136.)For almost two years, I have been keeping an updated preview of the specificationhere.This is simply a spin of the Javadoc derived from the current patch set in theDa Vinci Machine Project.I have also frozen a copy of this preview, as of the Public Review,here.This has has been a long time coming. In some ways, it is sad that it hastaken so long.(Four years, yikes!)On the other hand, some very exciting improvements have place in recent months, since last year'sJVM Language Summit.JSR 292 has not slowed the JDK 7 process, but where JDK 7 has been slowed for other reasons(such as the Oracle acquisition) I have aspired to use the extra time to make JSR 292 even more mature.In that light, I am very glad that the specification has benefited from those extra months of development.The first working implementationof August, 2008 was pretty good (as an API design), but what we have today is far better.Here are some recent major changes leading up to Public Review:The package for JSR 292 is no longer java.dyn. It is java.lang.invoke.There is a much simpler variable-arity hook. The withTypeHandler method is gone; now we have asVarargsCollector. Varargs method handles are fully integrated with Java's "varargs" feature.There is clear documentation about how compilers are to recognize that the invokeExact and invokeGeneric methods require special processing.The process for decoding a SAM object into a method handle is based on static methods instead of an interface.MethodType is serializable, like Class.There is a documented SecurityManager API for checking reflective method handle lookups.Exceptions produced by method handle lookups are documented and standardized.And here are a few of the many changes in presentation:There are @throws specifications for all the corner cases of bad arguments.CallSite and its key methods (getTarget, setTarget, dynamicInvoker) are all abstract.The introductory texts for the major classes have been reworked and the examples rechecked.Obsolete, deprecated, and provisional features have been removed. There are some "historic notes" to acknowledge their previous presence.The term "signature" has been replaced by the term "type descriptor" in most places, for compatibility with the JLS. (Exception: The special term "signature polymorphic" is retained.)A complete (verbose) account of the differences sincemy last announcementmay be foundhere.As noted ina recent announcement,this specification is not fully implemented yet.Specifically, the package renaming will take a few weeks of maneuvering; this process has already begun with a modification to javac.

If all goes well, there will be a sixty-day formal Public Review of the specification of JSR 292. I expect this to start in about ten days.(Update: The Public Review period is 2/16/2011...


123, go!

As of today, 12/23, the OpenJDK 7 project just released build b123.This build contains many changes,including an up-to-date version of the JSR 292 API.If you have been experimenting with JSR 292, you’ll want to try this build. See the javadoc for all the gory details.Here are some of the differences you might notice:The class CallSite is now abstract, causing an InstantiationError if you try to make one directly, or a problem with a super call if you make a subclass. This breaks a bunch of old code, including mine. The Expert Group decided to refactor CallSite into three subclasses: MutableCallSite, ConstantCallSite, and VolatileCallSite. To fix your code, replace CallSite by one of the subclasses, probably MutableCallSite.The invokedynamic instruction should point to a constant pool entry of type 18, not 17 (a tag that is now deprecated) or 12 (the name-and-type required by the Early Draft Review). The first component of this constant pool entry must be an index into the new attribute BootstrapMethods. For the moment, tag 17 is supported, under the JVM option -XX:+AllowTransitionalJSR292.The class java.dyn.Linkage is deprecated and going away. In particular, registerBootstrapMethod is going away, because the class file format now support static, per-instruction registration, via the BootstrapMethods attribute.The class java.dyn.JavaMethodHandle is gone. Use Lookup.findVirtual (cached in a private static final) and MethodHandle.bindTo to convert a random object to a method handle. Use MethodHandles.asInstance to convert a method handle to a random SAM interface.Direct syntax support for generating invokedynamic instructions in Java code will not be part of JDK 7. (Sorry! Use indify instead.) Exotic identifiers are also out for now, as is the annotation-based BSM specification. We will continue to experiment with these things in the mlvm repository, and some may be adopted into the Lambda project. In fact, we hope that Lambda will integrate method handles somehow, at least providing a way to construct a method handle from a literal expression like Object#toString.Some of the documented argument conversions for method handles are not well supported, especially those involving unboxing or boxing, or primitive widening or narrowing. You can run into this easily by passing a primitive value to invokeGeneric. As a workaround, try casting the offending primitive to Object at the call site. This is bug 6939861.Some of the method handle combinators (aka. transformers) will fail if you work with more than about 10 arguments at a time. This is bug 6983728.Your item here: If you notice other changes that disturb your code, please let me know, so I can adjust this list.The indify tool has been updated to support the new formats. Its default output uses non-transitional formats. An earlier version of indify also appears in the unit test code for OpenJDK.It is still the case that you must pass the following options to the JVM to use JSR 292: -XX:+UnlockExperimentalVMOptions -XX:+EnableInvokeDynamic.Here are a few more changes that are very likely in the near future:The JVM flags -XX:+EnableInvokeDynamic and -XX:+EnableMethodHandles will be turned on by default. (This is tracked by bug 6817525.)The JVM flag -XX:+AllowTransitionalJSR292 will be turned off by default, causing constant pool tag 17 to be rejected.The class java.dyn.Switcher will be renamed java.dyn.SwitchPoint.The constant and identity functions of java.dyn.MethodHandles may change, losing the overloadings which take MethodType arguments.The variable-arity “hook” for MethodHandle, withTypeHandler is controversial among JVM implementors and may be adjusted or deleted. (To see what variable arity means for method handles, see the method arityOverload around line 1423 in SIOC.java.)Any remaining stray sun.dyn.\* names in the API are going away, of course. MethodHandle will not have a superclass. (Nor will it have user-callable constructors for building subclasses.)A few items are marked “provisional” in the javadoc; these could still change also.At year’s end I am deeply grateful to the JSR 292 Expert Group, intrepid early adopters (Dan H., Rémi F., Charlie N., Fredrik O., Jim L., Sundar!), mad scientists (Thomas W., Lukas S., Hiroshi Y.!), conference propagandists (Christian T., Brian G., Alex B., Chanwit K.!) and the whole MLVM community for their enthusiastic, deeply technical contributions to the future of the JVM. Thank you all for the excellent journey so far.Please accept my personal warm best wishes for a merry Christmas and a happy New Year.

As of today, 12/23, the OpenJDK 7 project just released build b123. This build contains many changes, including an up-to-date version of the JSR 292 API.If you have been experimenting with JSR 292,...


Scheme in one class

One of the secondary design goals of JSR 292 is to give dynamic language implementors freedom to delay bytecode generation until an opportune moment. In other words, we want to encourage mixed-mode execution, bouncing gracefully between interpreted and compiled code.Here is some background: Dynamic language implementations generally have a choice between interpretation in a language specific form and compilation to a platform-defined executable. On the JVM, compilation means bytecode spinning. An interpreter is typically an AST walker, but it might also be a loop over a linearized array of tokens or bytecodes. This latter format is sometimes called "semi-compiled". Semi-compiled code often runs about twice as fast as an AST walker. Some dynamic languages gain an additional boost by unrolling the token stream dispatch into JVM bytecodes, so that each interpreter action becomes a JVM method call. Basically, the interpreter token stream is translated using simple context-free bytecode templates. This might allow the author to claim direct compilation to the JVM, but there is still a central interpreter runtime library mediating every operation. Such calls are often difficult for the JVM to optimize. I suspect most such compilation is premature.More background: HotSpot itself is designed as a mixed-mode bytecode execution engine. Java applications start running under a bytecode interpreter, and eventually "warm up" into optimized native code. The JVM is free to collect loads of profile information before it commits itself to optimized native code. There is also a "deoptimization" facility, for backing out of executions which the optimized code is not prepared to handle. But the goal is to keep the Java program running at the hardware level, with relatively few trips to interpreter or runtime support. Similarly, the best compiled code for dynamic languages boils down the original user code into real JVM primitives. As with the JVM, getting this "boil-down" correct might require lots of profiling information gathered by the language runtime. Trace-based compilation is an example of such optimization, because the dynamic language has been coaxed to reveal its important types and control flow transfers, which the compiler (whether native or JVM) can exploit. Meanwhile, when bytecodes need to have late-bound or re-bindable semnatics, because optimization decisions must be revisited, invokedynamic provides flexible fall-backs.In order to explore the proposition that JSR 292 allows implementors to delay bytecode spinning, I decided to build a small Scheme interpreter which could execute interesting programs without dynamically spinning bytecodes. I chose Scheme mainly because of its attractive simplicity (though it has grown over the years). I also want to pay homage to an old exploit called Scheme in One Definition, in which George Carrette coded up a Scheme interpreter in a single file of C code. It is clearly time for Scheme in One Class.The current version of SIOC lives in a single class file of about 55Kb. It is a fragment of a Scheme interpreter, since it can only perform variable definitions and function calls. (I may fix this, by writing a tiny Scheme compiler in Scheme itself, and bootstrapping into a semi-compiled form. Not sure if it is worth while yet.) Yes, an interpreter that cannot execute lambda or even cond is pretty boring, but there are some interesting bits to point out.The first interesting bit (for me at least) is the lack of inner classes. Since those compile to separate, fragmentary class files, the SIOC experiment has to avoid them. There is a point or two in the SIOC code where a Java programmer needs them, notably this place where I use Arrays.sort to sort a bunch of method handles:MethodHandle[] mhv = ...;// try to move more general types to the back of the listArrays.sort(mhv, C_compareMethodHandles);Clearly, the definition of C_compareMethodHandles must be something like this:Comparator C_compareMethodHandles = new Comparator() { public int compare(MethodHandle o1, MethodHandle o2) { return compareMethodHandles(o1, o2); }};But in fact it uses a "SAM conversion" API in JSR 292 to manage the same thing without (visibly) creating a new class file:MethodHandle MH_compareMethodHandles = (uninstructive details here...);Comparator C_compareMethodHandles = MethodHandles.asInstance(MH_compareMethodHandles, Comparator.class);This API allows any method handle to "masquerade" as any SAM type, at the drop of a hat.The second interesting bit is the mapping from Scheme types to JVM types. Because this is SIOC, wrapper types (like "SchemeInteger") are not welcome. (I distrust language-specific wrappers anyway, as a hindrance to interoperability. SIOC is partly an exercise in detecting wrappers as well as avoiding bytecode spinning.) Scheme strings, vectors, lists, booleans, characters, integers, and floats are modeled by Java strings, object arrays, lists, and the obvious primitive wrappers. Scheme procedures are modeled by JSR 292 method handles. (Symbols are modeled by SIOC instances. Something had to give, there. The mappings have various flaws, especially with lists.)A third interesting bit is the use of symbol mangling to implicitly define Scheme procedure names. For example, the Scheme procedure write is defined in Java like this:private void F_write(Object x, Object port) throws IOException { unparse(x, toWriter(port), true);}(Here, unparse and toWriter are local static routines.) Note that this function is private. The JSR 292 reflective lookup methods make it easy for classes to define local functions for use as method handles. The Scheme interpreter handles symbol lookup (in part) by probing for such Java definitions.A related aspect is the handling of overloading. This shows up in the definitions of arithmetic functions:private static int SF_P(int x, int y) { return x + y; }private static long SF_P(long x, long y) { return x + y; }private static double SF_P(double x, double y) { return x + y; }Here, the "SF_" prefix means "I am a static function implementing a Scheme procedure, and "P" is a mangling for the plus sign character. (I would have liked to use exotic names for this, but they are not in JDK 7.)When these three definitions are processed by the interpreter, they are combined under a single dispatching method handle which examines the types of its arguments and calls the appropriate function. (Because this is a toy program, the overload resolution is not very good. A real system would use a real metaobject protocol.)Overloading raises the question of variable arity. The Scheme write procedure noted above takes either one or two arguments. (The second argument defaults to Scheme's version of System.out.) In SIOC, the one-argument version is expressed by an additional Java overloading:private void F_write(Object x) throws IOException { F_write(x, get("output")); }When the Scheme interpreter looks up write, it finds both Java definitions, and overloads them into a single method handle. Later on, when the method handle is called (via invokeGeneric), the number of arguments, as expressed in the calling type (a java.dyn.MethodType), is handed to the method handle, which enables it to select the correct overloading.A larger example of variable arity is the Scheme procedure list, which accepts any number of arguments of any type. In SIOC, I arbitrarily gave it specialized overloadings for up to three arguments:private static Object SF_list(Object x) { return new ArrayList(Arrays.asList(x));}private static Object SF_list(Object x, Object y) { return new ArrayList(Arrays.asList(x, y));}private static Object SF_list(Object x, Object y, Object z) { return new ArrayList(Arrays.asList(x, y, z));}private static Object SF_list(Object... xs) { return new ArrayList(Arrays.asList(xs));}The JSR 292 feature which enables variable arity calls is relatively new, discussed last July at the JVM Language Summit. The issue is with the flexibility of MethodHandle.invokeGeneric. As defined earlier this year, a suitably defined method handle (like one of the SF_list above) could accept any type of arguments from any caller, as long as the number of arguments (the arity of the call) agreed with the type of the method handle (its intrinsic arity). This is usually fine for most applications, but when arguments are optional or procedures are variadic (as in Scheme write or print), there is a semantic gap. Without further work, one method handle cannot serve as Scheme's list procedure.One workaround for this is would be to define a wrapper type SchemeProcedure which bundles together method handles of varying arity, and have Scheme call sites perform the unwrapping. This would be a performance hazard in the interpreter. Presumably compiled code would be able to "boil down" the wrapper into a correct method handle. Another problem with this workaround would be interoperability with other languages. Instead of passing a Scheme procedure to another language's call site, with the contract that invokeGeneric will sort out both type mismatches and arity mismatches, a more complex metaobject protocol is needed to narrow down the type of the Scheme procedure, before it leaves Scheme's control. This seems wrong to me, though further experiments are needed in this area.In SIOC, wrapperless variable arity is built on top of a concept called "type handlers". A method handle may optionally be equipped with a type handler, which handles all type mismatches encountered via invokeGeneric. If a caller invokes a method handle on the wrong number of arguments, and the method handle has a type handler, the type handler is consulted with the caller's intended type, and given a chance to define the call.Here is some background: When invokeGeneric is applied to a method handle, there is a negotiation (with or without type handlers) between the caller's invocation descriptor and the method handle's intrinsic type. (The descriptors may be represented as MethodType objects.) If the descriptors match exactly, the method handle is invoked through its main entry point, its "front door". If the descriptors do not match exactly, the method handle is (virtually) converted to the caller's desired type, and invoked in its converted form. (The JVM will probably do something more clever than creating a new method handle just for one call.) If there is no type handler, the caller and callee types must agree on arity, and the equivalent of MethodHandles.convertArguments is used to pairwise coerce the arguments and return values, if such coercion is possible according to certain rules (a variation of Java method invocation conversions).If this overloading trickery applied only to Scheme functions, it would be only mildly interesting, since Scheme does not boast many overloaded functions. But there is a final interesting point in SIOC which I would like to observe, and that is its ability to call Java APIs. The same mechanism that allows the interpereter to resolve write and other Scheme names also applies to Java APIs, given suitable conventions for representing Java names as Scheme symbols. (In this matter of Java integration I give deep deference to other, much better JVM Lisps, notably Kawa, ABCL (added), and Clojure. The present exercise simply aims to show the usefulness of JSR 292 for getting at Java APIs.)Specifically, there are a number of Scheme symbol resolution rules which jump down the Java rabbit-hole. Here are some examples:$ $JAVA7X_HOME/bin/java -jar dist/sioc.jar -i;; SIOC> (write "hello")"hello"> (set! n (+ 2 2))> (list n (+ 2.1 2.3))(4 4.4)> java.lang.Stringclass java.lang.String> java.lang.String#concatconcat(java.lang.String,java.lang.String)java.lang.String> (import 'java.io...)import java.io as ...> (define f (File#new "manifest.mf"))> fmanifest.mf> (.getClass f)class java.io.File> (.getSimpleName (.getClass f))"File"> (File#exists f)#t> (define p (FileReader#new f))> (set! p (BufferedReader#new p))> (write (BufferedReader#readLine p))"Manifest-Version: 1.0"> (quit)Getting at such APIs, randomly and interactively, has been impossible before JSR 292, at least without statically or dynamically spinning bytecoded adapters. Now it works well in a small program. One of JSR 292's benefits should be much easier access to Java from dynamic languages.Update: Several people have noted that reflection allows this also. I am guilty of HotSpot-centrism here, because on HotSpot reflection internally uses bytecode spinning to gain performance. But it is true that reflection has always allowed programs, like Kawa and ABCL, to make ad hoc access to interactively selected APIs. There is further discussion of this point at http://groups.google.com/group/jvm-languages/msg/4a4b1e23bb0b2e75. The thing that is new with JSR 292 is that method handles, especially when used with invokedynamic, optimize comparably to hand-spun bytecodes.That's the last interesting point I have to show for now. The fuller support for a Scheme compiler (if I get around to it) will bring semi-compiled Scheme code into the mix, still in exactly one class. As a sort of cheat, the Scheme code of the compiler itself will be in a resource file associated with SIOC. The next step after that would be (finally) spinning bytecodes from the semi-compiled representation, after initial execution to settle out global variable bindings, variable types, etc. Perhaps at that point I can borrow a class file spinner (coded in Scheme) from somewhere else, and still keep everything to one class file, with some associated Scheme files.A final caveat: Some of these JSR 292 API elements (such as type handlers and SAM conversion) may not make it into this year's definition, and JDK 7. This is all bleeding edge stuff. You can find a recent API draft here: http://cr.openjdk.java.net/~jrose/pres/indy-javadoc-mlvm/The code itself is at http://kenai.com/projects/ninja/sources/sioc-repo/content/src/sioc/SIOC.java?rev=7

One of the secondary design goals of JSR 292 is to give dynamic language implementors freedom to delay bytecode generation until an opportune moment. In other words, we want to encourage...


a modest tool for writing JSR 292 code

An earlier version of JSR 292 included some syntax support in the Java language for issuing invokedynamic instructions. This is no longer the case in JDK 7. Such support, if it shows up at all, will be dependent on Project Lambda, and that is after JDK 7.(By the way, for a larger example of the syntax support in action, check out this hand-transformed benchmark code for Rhino. This was part of the exploratory work described in my JavaOne talk.)Well, that’s the bad news. But here’s a bit of good news: I have hacked together a classfile transformer, named “indify”, which is able to pattern-match some byte-compiled Java programs, and transform them into equivalent programs which use JSR 292 bytecode features. It can be used to generate invokedynamic instructions, including those with the new bootstrap method argument format. The transformer can also generate “ldc” instructions for MethodHandle and MethodType constants. (Such constants are a little-known feature of JSR 292.)The code for “indify” is in a Mercurial repository on Kenai.com. It is all in a single file of less than 1500 lines. (Update: More recent versions are slightly larger.)Of course, the creation of semantics-preserving program transformations is often difficult. And in this case, the transformation is slightly uphill, more like a decompiler than a compiler. Lower-level expressions are turned into simpler, higher-level JSR 292 operations.To make the project workable and small, I cut corners. The tool transforms only a limited set of Java programs. In order to make it work for you, you have to prepare stylized, stereotyped Java programs, which make it very clear where you are expecting to perform the transformations.The javadoc in the source code gives some rules for preparing your code. There is also a small example file. The basic trick works like this. Suppose you have a method that you want to create a method handle constant on, like this one:public static Integer adder(Integer x, Integer y) { return x + y; }Getting the method handle requires an expression that looks like this:lookup().findStatic(Example.class, "adder", fromMethodDescriptorString("(Ljava/lang/Integer;Ljava/lang/Integer;)Ljava/lang/Integer;", null));Suppose you want to transform that expression into a simple method handle constant. Perhaps in Java with closures you'll be able to write #adder but for now, put the noxious expression into a small private method like this one:private static MethodHandle MH_adder() throws NoAccessException { return lookup().findStatic(Example.class, "adder", fromMethodDescriptorString("(Ljava/lang/Integer;Ljava/lang/Integer;)Ljava/lang/Integer;", null));}The name must begin with the string MH_; the rest of the name is arbitrary. Call the function whereever you like (within the same file). Compile the Java code, and run “indify” on the resulting class file. The constant implied by MH_adder will be appended to the class file constant pool, and all such calls to the method will be transformed into ldc instructions.The pattern for invokedynamic is more complex, and builds on top of simpler method handle and method type constant expressions. Here is an example:private static MethodHandle INDY_tester() throws Throwable { CallSite cs = (CallSite) MH_mybsm().invokeWithArguments(lookup(), "myindyname", fromMethodDescriptorString("(Ljava/lang/Integer;Ljava/lang/Integer;)Ljava/lang/Integer;", null)); return cs.dynamicInvoker();}This code assumes that your bootstrap method is provided by a private constant method MH_mybsm. The factored method INDY_tester calls the bootstrap method on the standard arguments (class scope, name, and type), obtains the call site, and returns the site’s invoker method handle. (This is a late-bound alias for the target, which follows the call site target as it changes.) The caller of INDY_tester is required to immediately invoke the result, like this:System.out.println((Integer) INDY_tester().invokeExact((Integer)40,(Integer)2));That invokeExact call is part of the built-in semantics of invokedynamic. The “indify&rdquo tool tranforms the two calls INDY_tester and invokeExact into a single invokedynamic instruction. The static linkage information for this instruction (which is obtained by inspecting the contents of INDY_tester) is appended to the class file constant pool, as in the previous example.Here is a pre-built distribution. For now, you should throw the switch “--transitionalJSR292”, to make it emit the older format (tag 17) of CONSTANT_InvokeDynamic. (The newer format, tag 18, is not widely available yet.) [12/23 Update: Build b123 of OpenJDK 7 supports tag 18, and contains a version of indify.] To find out about the bytecode instructions, you can always look here.For large scale use, this tool is a non-starter. For medium sized projects, it is at best a dead end. Real language implementations will need to use a real bytecode backend, like ASM. A small tool like “indify&rdquo is not robust or user-friendly enough for medium to large jobs. And, I’m not planning on supporting it, or developing it much further. Actually, I would be sad to see it get much bigger or more complex.But “indify” may be useful for small inputs, in cases where the result of the transformation is easy to verify. For example, if you are experimenting with small bytecode examples, or writing or teaching about them, this is a way to make your work be readable as Java code, instead of pseudocode or bytecode assembly language.

An earlier version of JSR 292 included some syntax support in the Java language for issuing invokedynamic instructions. This is no longer the case in JDK 7. Such support, if it shows up at all, will...


larval objects in the VM

Or, the ascent from squishy to crunchy.I want to talk about initialization of immutable data structures on the JVM. But first I want to note that, in good news for Latin lovers (and those with Latin homework), Google has made Latin translation available ad stupor mundi.That has nothing to do with Java, but it reminds me of a Java design pattern that deserves to be unmasked and recognized: The larval stage. The idea (which will be instantly familiar to experienced programmers) is that every Java data structure goes through a construction phase during which it is incomplete, and at some point becomes mature enough to publish generally. This pattern is most readily seen in Java constructors themselves. The this value during the execution of an object’s constructor in general contains uninitialized fields. Until the constructor completes normally, the object may be regarded as in a larval stage, with its internal structure undergoing radical private changes. When the constructor completes, the object may be viewed, by contrast, as having entered a permanent adult stage. In that stage, the object is “ready for prime time” and may be exposed to untrusted code, published to other threads, etc.This pattern is so important that we amended the 1.1 version of Java (here is an old link to my paper) to include the concept of blank finals to enable the adult stage of an object to be immutable. Later on, the JSR 133 Expert Group enhanced the Java memory model to guarantee that larval stages of immutable objects could not be made unintentionally visible to other threads. The result is that immutable Java objects (starting with Integer and String) can be easily defined and safely used even in massively multi-threaded systems. Especially on such systems, immutable data structures are of great importance, because they allow threads to communicate using the basic capabilities of the multi-processor memory system, without expensive synchronization operations. (This isn’t the best imaginable way for processors to communicate, but it is the communication channel to which chip designers give the most attention. The Von Neumann haunting appears to be permanent.)Something is missing, though. The support I just described applies only to objects which are able to enter the adult stage when their constructor completes. This means that the complete information content of an object must be supplied as arguments (or some other way) to the constructor. If an immutable object’s contents are built up incrementally in a variable number of optional steps, the construction of the object is better expressed using a builder pattern. In this pattern, a builder object acts as a front-end or mask for the object under construction. The builder object has an API which accepts requests to add information content to a larval object (which may or may not actually exist yet), and is eventually asked to unveil the adult object. The first instance of this pattern in Java was the trusty StringBuffer, which collects append requests and eventually produces the completed string in response to the toString request. More recently, Google has made an admirable investment in builder-based APIs for immutable collections of various sorts.Still, from my viewpoint as a JVM tinkerer, something else is missing. It seems to me that the most natural way to express the creation of an immutable object is what I see when I read the machine code for creating an Integer or String: First you allocate a blank object, then you fill it in, then you publish it. Just before you publish it, you take whatever steps needed to make sure everybody will agree on its contents. (This is sometimes called a fence, which is a subject of multiple posts.) After you publish the object, nobody changes it ever. (Well, in some cases, maybe there’s an end to the epoch. But that is another story.) The only changes ever made to that block of memory are performed by the garbage collector, until (and unless) the block gets reused for another object.In other words, using the larval/adult metaphor for this pattern, the object building process starts out with an incomplete larval object, which is masked behind (or cocooned within) a module (wish I could say monad here) that sets up the object, and when the object is mature, eventually hatches it out into the open as an adult. The organization of the code which sets up the object is not in general as simple as a constructor body.In order to make this work better in more cases, I want to give the builder object fully privileged access to the object in its larval state. And I want to these full privileges to extend to the initialization of finals, a privilege which is currently given only to constructors. The increased flexibility means that the final fields will be initialized by multiple blocks of code. The blocks of code may even repeatedly initialize finals (as the underlying array in a StringBuffer is subject to repeated extension or truncation). The blocks of code may be invoked by untrusted code (via the builder API, not directly). Eventually the builder declares that the object is done. Just before it publishes the object, it flushes all the writes. The flush sometimes appears as a memory fence operation in the machine code. This part is especially problematic, since the current Java and JVM specifications only guarantee correct flushing of writes for variables used to reach final fields. The guarantees for non-final fields and array elements are weaker, and the interactions are subtle.What would this look like in an amended Java and JVM? Maybe non-public access to finals could be relaxed to allow writing, so that if a builder object has privileged access to do so, it can write the finals of a larval object. There would have to be an explicit “hatching” step, I think, to clearly mark the pre-publication point at which writing must stop and memory must be flushed. One or more new keywords could be used to indicate statically which finals are writable, which methods or classes can do the writing, and where the writing must stop. There is probably a way to express it all without keywords, too, or a combination of keywords (“volatile final”, for for those looking to recoil from rebarbative syntax). The surface syntax is less important than the pattern. The pattern must prevent you from applying a larval operation to an adult operation, both in the language and in the JVM. That might be an innocent mistake, or a deliberate attack; in either case it must be provably impossible. The important thing to recognize is that there are separate larval and adult sets of operations (APIs) and only the larval ones are allowed to change finals.But a static pattern cannot ensure such safety, unless we allow another new thing. That is some kind of type-state or changeable class, which expresses the transition from larval to adult stage. A direct and flexible way of making this distinction would be to allow a Java object to have two types over its lifetime, a larval type with an extended set of initialization operations, and an adult type. The type change operation from larva to adult would be a low-level JVM operation which would do several things:mark the object permanently as adultforbid all future attempts to invoke methods classified as larvalforbid all future changes to finalsflush all pending memory changes relevant to the objectIn all cases to date, the de facto larval API has privileged elements, like the package-private String constructor used by StringBuffer. Most uses of the larval type-state I am suggesting would probably continue to be restricted to the internals of a specific module. Explicit larval objects would tend to allow builder objects to be simpler, since the builder could drop information into the larval object as soon as it is known, rather than (as at present) keep shadow copies until the constructor is finally invoked.With better protection, based on type-state, it might be reasonable in some cases to make larval APIs public. For a large-scale example, a transactional database might support a public API for creating a larval view of a new version of the database, which would allow free mutation of the database contents. The adult form would be a read-only view of some other version.In the small scale (which is more typical), when a Java API designer creates a tuple-like class for something like complex numbers, there are always conflicting impulses between making the structure immutable, so that it can be safely used across threads, or else making the structure mutable (and often with public fields), so that the objects can be used directly as scratch variables in a computation. If the choice is made for mutability, every object must be defensively copied before it handed to untrusted code. If the choice is made for immutability, a fresh object must be made for every intermediate value of a computation. Years ago, the tilt was towards mutability, probably under the assumption that objects were expensive to allocate. This tilt might be visible in the choice to make Java arrays only mutable, and in the mutability of the Date class. (In the worst case there are mostly-immutable data structures, such as java.lang.reflect.Method, which has one mutable bit.) For Complex, there are mutable sketches floating around the net, but the Apache commons design is immutable. What I want to point out here is that, if we had type-state with control of mutability, programmers would get both from two stages of the same type: The larval form could have public mutable fields, useful as temporaries, while the adult form would be immutable and safe to throw around. Defensive copying would be rare to nonexistent, and failures to copy could be detected by checks on type-state.All this begs the question of what type-state looks like in the JVM. That is a discussion for another day, but I will drop a hint, with another biological metaphor: If a class is a standard unit of taxonomy, what would we say if we had to suddenly distinguish objects of the same class? Well, we would have to invent a subsidary unit of taxonomy, such as the species. There are several potential uses of such a refined distinction: Storing the type parameters for reified generics, tracking life cycle invariants (as here with larva vs. adult), and optimizing prototype based languages. At the JVM level, all these things are an extra indirection between object and class, and could be formalized and made available to the library implementor.A final word about terminology: The term “larva” comes from a Latin word which can mean a mask; a “larvatus” is something that is masked. Creepily, the word also refers to witches, skeletons, and (as with modern languages) worms. I suppose that if we invent larval data structures, the name will remind us to keep them well covered, at least until their little exoskeletons have hardened. More specifically, unfinished data structures should be carefully masked. Or, as an ancient Roman software engineer might have remarked, Collectio imperfectum larvatum sit.

Or, the ascent from squishy to crunchy. I want to talk about initialization of immutable data structures on the JVM. But first I want to note that, in good news for Latin lovers (and those with Latin...


JavaOne in 2010

This week I gave two talks at JavaOne. Here are my slides.On Monday Brian Goetz and I talked (again, as in 2009) about the future of the JVM in One VM, Many Languages. The room was full of attentive listeners. I gave an update on the theory and practice of invokedynamic and JSR 292.On Tuesday evening, I talked about an experiment I did last December combining Rhino with invokedynamic. The talk was Great Thundering Rhinos! (an expedition into JavaScript optimization). That time, the hour was late and the room was not at all full. But the listeners were still attentive. (Thanks, guys!)For the record, the current draft of invokedynamic (soon to become final) is always available here:http://cr.openjdk.java.net/~jrose/pres/indy-javadoc-mlvmIntroduction to One VM: The Java Virtual Machine (JVM) has, in large part, been the engine behind the success of the Java programming language. The JVM is undergoing a transformation: to become a Universal VM. In years to come, it will power the success of other languages too.Introduction to Thundering Rhinos: JavaScript presents difficult implementation obstacles. JDK 7 provides an excellent toolkit for implementors.This is a case study of optimizing Rhino with invokedynamic.The above links are to HTML renderings of the presentations. PDF renderings are also available for One VM and Thundering Rhinos in the HTML presentation of the first part of this sentence.

This week I gave two talks at JavaOne. Here are my slides. On Monday Brian Goetz and I talked (again, as in 2009) about the future of the JVM in One VM, Many Languages. The room was full of attentive...


advice from a master teacher

My father Greig Rose was a college science teacher for about 35 years, mostly at West Valley College. He also served on faculty at West Point. As a youngster I learned the genetic code watching him teach the cadets. Later on, interesting classes were routinely available to me. Growing up in an educator's house was a privilege and a blessing. It is one of the main reasons I have been directly involved in my own children's education, to the point of home schooling and volunteer teaching.Tonight my father and I were talking about education, with the end of the summer and fall classes coming on. He is glad to be retired, but he is still a teacher at heart. I asked him, “What is your best advice to teachers?” Here is the answer he gave, as well as I can reconstruct it.“Listen” is the first word. Listen to your students with your eyes and ears. Understand how they are approaching the class, and whether they are understanding the lesson. If you are sending a message, but they are not receiving, no communication is happening. Ask questions.Aim for self-education, and model it. Show them how to learn for themselves. In a typical class, you will teach them a few cardinal facts of subject matter, and show them ways to fill in everything else later. Encourage questions. Be willing to say “I don't know; let's find out”.Allow a little chaos into the classroom, to make room for conversation and discovery. Tightly scripted lesson plans do not work. On the other hand, know where you are going. Have clear class objectives and lesson plans, and steer the interactive conversations back to the class objectives.To make students accountable for the required reading, give take-home quizzes to be turned in at the beginning of each week. Make each quiz from a handful of simple write-in questions drawn from the text. Give the quizzes significant weight, as a group. Allow students to use any resources to answer the question, but do it in a way that makes reading the text the easiest way to get it done. Allow students to work together on the quizzes. Study groups are good, as long as they are not too large. If this is done well, nearly all students will put in the work and gain nearly all the points. Then, in your lectures, you can then assume the basic reading work has been accomplished.For science classes, know that you will be teaching a field that changes each year. The internet is a good source for new information, better than the paper journals of yesteryear. For classic humanities, what is new each year is the teacher's deepening understanding of the subject matter.When tackling a difficult text, as in a humanities class, use a three-phase process: First observe, then interpret, then apply. (To me, this reflects the phases of the classical Trivium: grammar, logic, rhetoric; or, facts, ideas, actions.) This three-phase process works both for individual study and for discusion.Of course, every subject is new to each new student. Listen to them and help them discover.

My father Greig Rose was a college science teacher for about 35 years, mostly at West Valley College. He also served on faculty at West Point. As a youngster I learned the genetic code watching him...


after the deluge: how to tidy an overflow

Joe Darcy has posted a fine note on the problem of managing integer overflow in Java. This was raised again last week at the JVM Language Summit. Joe’s blog has already attracted some good comments. Here is my comment, which has inflated itself into a blog entry of its own.The problem is to take two machine words, do a common arithmetic operation on them, and obtain the arithmetically correct results. In particular, it means that at least one bit must be collected besides the machine word of non-overflowing result data. For example (as Joe notes), the extra bit required to report overflow could be implied by an exceptional return. In any case, there are a couple of degrees of freedom in this problem: What we will use the extra bit(s) for, and how to deliever it (or them). After examining a couple of half measures, I will enumerate about ten ways to return normally with a pair of machine words.Use CasesIf overflow is expected to be rare (at most 10\*\*-4, say) then an argument-bearing exception is reasonable and harmless. That is, the intrinsic that performs add (or multiply) can be specified to throw an exception that contains “the rest of the story”, if that story does not need to be told often. Joe suggests putting helpful information into the exception to help slow-path code. I think the slow path is likely to be able to “start from scratch”, so exception-carried information is likely to have only a diagnostic value.If overflow is somewhat common (10\*\*-2 or more, say), a pre-constructed exception is an option to consider. The catch for such an exception amounts to an alternate return point, which would contain slow-path code to the desired results in a robust way. The pre-constructed exception would have no information content.If the overflowing operation is being used to build multi-precision math, overflow will be more common, even pushing 50% on random-looking inputs. (Considerations like Benford’s law suggest that random-looking inputs are paradoxically rare. Bignum addition is not likely, in practice, to create a bignum longer than its inputs, so the overflow condition will be rare on one out of the N adds in the bignum operation.) If overflow is common (more than 10%), it is important to process the requested operation as fast (or nearly as fast) as the non-overflowing case.The last use case is the demanding one. It seems to require that the machine code generated in the relevant inner loop be nearly as good as an assembly programmer would write, using any available CPU-specific overflow indicator. This takes us, oddly enough, to the question of API design.API Design CasesVarious source code patterns are compatible with various optimizations. APIs which involve exceptions can be partially optimized, but not (reliably) down to the single instructions required for really tight bignum loops. So let’s talk about APIs for delivering the extra result bits on the fast path. For ease of reference in subsequent commenting on Joe’s blog, I will make a link for each type of API.Throw to Slow PathFor the record, here is a fragment from Joe’s example which shows the previously mentioned technique of reporting overflow with an exception:static int addExact(int a, int b) throws ArithmeticException { ... }...int z;try { z = addExact(a, b); }catch (ArithmeticException ignore) { ... }Null to Slow PathBesides the half-measure of a slow path reached by an exception, there is another ingenious trick that Joe considers and rejects: Report the exceptional condition by returning null instead of an boxed result. The effect of this is to extend the dynamic range of the operand word (32 or 64 bits) by one more code point, a sort of NaN value. Since JVMs are very good at optimizing null checks, and (more recently) good at optimizing box/unbox optimizations, this trick might be easier to optimize than an equivalent throw-based trick. But it still feels like a trick, not a real API. Also, it cannot deliver additional result bits to the fast path.Longs as Int PairsAnother option would be to work only with 32-bit ints and forget about longs. Then the API could stuff two ints into a long as needed. The code for addExact become trivial: return (long)a+b. Then, for the cases where overflow is rare and simply needs detection, it is a relatively simple pattern-matching optimization to extract the overflow bit from an expression like z!=(int)z. This little comparison could be intrinsified, perhaps, for greater likelihood of recognition, but it is already probably simple enough.This techinque requires stereotyped shift-and-mask expressions to pack and unpack the value pairs. (The expressions contain characteristic 32-bit shift constants.) Such expressions are probably optimized in all JVM compilers. Therefore, it is probably best, at present, to code bignums in radix 2\*\*32. (Or, if signed multiplication is a problem, perhaps radix 2\*\*31 is better.)Although the code of such things is really simple, it is not so simple as to prevent inventive programmers from creating alternative formulations which the optimizer doesn’t expect and can’t optimize. So to make this hack optimize reliably, it is worth defining standard constructors and selectors between int-pairs and longs, and encourage programmers to use that API.static long intPair(int a, int b) { return ((long)a >> 32)); }static int lowInt(long ab) { return (int)(ab >> 0); }static int highInt(long ab) { return (int)(ab >> 32); }I have been unspecific about how many bits are really flying around. The simplest case is 32 bits of operand, and one extra bit of overflow. The pattern just presented has 32 bits of operand and 64 bits of result. A more general case is 64 bits of operand and 128 bits of result. Any exact (overflow-handling) arithmetic routine needs to return a logical tuple of (long,long). This case resists half-measures that report state but not information, or that work for ints but not longs. For the rest of this note, let us consider the problem of returning a 128 bit result.Multiple Return ValuesYou may have guessed already that my favorite (future-only) realization of this would be multiple return values. The intrinsics would look something like this:static (long,long) multiplySignedExact(long a, long b) { ... }static (long,long) multiplyUnsignedExact(long a, long b) { ... }static boolean signedOverflow(long a, long b) { return (b != (a>>63)); }static boolean unsignedOverflow(long a, long b) { return (b != 0); }...(long z0, long z1) = multiplySignedExact(a, b);if (signedOverflow(z0, z1)) { ... }The idiom for detecting overflow is simple enough that it could be detected by simple pattern matching in the optimizer. Making it into an intrinsic (as noted before) helps programmers to use patterns which the compiler is expecting. The overflow detection intrinsic itself does not need any special magic. The 128-bit multiple intrinsic will almost certainly be special-cased by an optimizing dynamic compiler.If your eyes are still stinging from reading the pseudo-code above, I don’t need to remind you that we don’t have tuples or multiple value returns right now. But there are alternatives for wrapping up the return data. The sad story is that none of the alternatives is a compelling win.Arrays as Values PairsA simple expedient is to use a two-element array to hold the two returned values. Escape analysis (EA) is present in modern optimizing compilers. (See this example with JRockit.) One could hope that the arrays would “just evaporate”. But EA patterns are fragile; a slight source code variation can unintentionally destroy them. Most ameliorations by the programmer, such as attempting to cache the array for reuse, will break the real optimization.Wrappers as Value PairsThe same point goes for a specially constructed wrapper type modeled on the standard wrappers (like Long) but with multiple fields (think LongPair). This wrapper type would be immutable and therefore less liable to abuse, but its identity as a Java object would still be an obstacle to full optimization.Value-like PairsIf we add value types to the JVM, with relaxed identity semantics, a value-like LongPair would play better with EA, since pair values could be destroyed and reconstituted at will by the optimizer. Alas, that also is in the future.Return by ReferenceSome Java API designers have tried the C-like expedient of passing a pointer to a result variable, as an extra parameter. As far as I know, the experience with this tactic is not satisfactory. (Such a buffer in Java is usually a one-element array.) Buffer re-use is a continual temptation to the programmer, and a downfall to EA. Returning a value by poking it into a mailbox is not something Java optimizers are looking for. If the whole idiom needs to boil down to an instruction or two, allocating an array is a hazard to optimization.static long multiplySignedExact(long a, long b, long[] z1a) { ...; z1a[0] = z1; return z0; }...long[] z1a = {0}long z0 = multiplySignedExact(a, b, z1a), z1 = z1a[0];if (signedOverflow(z0, z1)) { ... }Return by Thread LocalAnother possible mailbox for delivering a second result would be a thread-local variable. The abstract model is like that of continuation-passing machines, in which multiple arguments and multiple return values are passed through a set of virtual registers. Java (pre-tuples) only provides access to one such register for return values, but additional registers could be modeled as thread-local values.The optimization hazard here is that the thread local values are too widely visible, and it is hard to analyze their use in live ranges, as one can do with registers. (Specifically, they can be reached via pointers!) So the optimizer has to be careful working with side effects to such things. Still, I think this is a promising area of research.Static Single AssignmentThe idea of thread-locals is interesting, since they are almost structured enough to optimize as if they were registers. (This is counter-intuitive if you know that they are implemented, at least in the interpreter, via hash table lookups.) Perhaps there is a subclass of ThreadLocal that needs to be defined, with a restricted use pattern that can be optimized into simple register moves. (Such a standard restricted use pattern is static single-assignment, used in HotSpot.) If, after intrinsics are expanded, the second return value looks like a register (and nothing more) then the optimizer has full freedom to boil everything down to a few instructions.Return to an Engine FieldSo far we have supposed that the arithmetic methods are all static, and this is (IMO) their natural form. But it could be argued that arithmetic should be done by an explicit ArithmeticEngine object which somehow encapsulates all that goodness of those natural numbers. If you can swallow that, there is an unexpected side benefit: The engine is an acceptable (I cannot bring myself to say natural) place to store the second return value from intrinsics that must return one./\*non-static\*/ long multiplySignedExact(long a, long b) { ... this.highWord=z1; return z0; }...ArithmeticEngine ae = new TitanArithmeticEngine();long z0 = ae.multiplySignedExact(a, b), z1 = ae.getHighWord();if (signedOverflow(z0, z1)) { ... }With some luck in EA, the engine field (highWord above) might get scalarized to a register, and then participate in SSA-type optimizations.Continue in a LambdaWith closures (coming to a JDK7 near you...) you could also drop to continuation-passing style (CPS) to receive the two result words from the intrinsic:static long multiplySignedExact(long a, long b, LongReceiver z1r) { ... z1r.receive(z1); return z0; }...long z1 = 0;long z0 = multiplySignedExact(a, b, #(long z)->{z1=z});if (signedOverflow(z0, z1)) { ... }Optimizing this down to an instruction or two requires mature closure optimization, something we’ll surely get as closures mature. On the other hand, we will surely lack such optimizations at first.Those are a lot of choices for API design! I encourage you to add your thoughts to Joe’s blog.

Joe Darcy has posted a fine note on the problem of managing integer overflow in Java. This was raised again last week at the JVM Language Summit. Joe’s blog has already attracted some good comments....


an experiment with generic arithmetic

Most computer languages let you add two numbers together. This is harder than it looks, since numbers come in a variety of formats, starting with fixed-point vs. floating-point. The logic for numeric addition has to classify its operands, perhaps coerce them to a common format, and then switch to the correct algorithm for that format. Dynamic languages have do this all at runtime, without keeping the user waiting too long.(By the way, structuring this problem is much more than an exercise in classic object-oriented programming, since arithmetic operations have symmetries which cannot be modeled by sending a request to an encapsulated method in the left-hand operand. Some sort of double dispatch is required, which is an interesting problem, which I won't go into. Acid test: Design your generic arithmetic framework so that it simulates Java arithmetic, but allow users can add new number classes, starting with BigInteger and BigDecimal. Now test it by adding rational numbers and complex numbers, as two separate and independent modules. If that was easy, add units or formal polynomials. Don't forget that your users are expecting to perform arithmetic on integer iteration variables of their inner loops, and they refuse to declare a static type for it.)The first trick of optimizing generic arithmetic is to assume that each particular operation (let's say it an addition) is going to apply to a limited range of operand formats, usually a single format, usually a machine word (boxed and tagged neatly, in an object). Regardless how many exotic number types are running around globally, each local neighborhood is pretty boring. The theory of invokedynamic is that each operation (a call site in the JVM bytecodes) is separately linked by a bootstrap method. Even if two sites look just the same, each one gets its own bootstrap process. This allows (but does not require) the invokedynamic linkage logic (which is part of the language runtime, not the JVM) to assign separate profile points to each individual operation. Invokedynamic allows the JVM to separately customize and optimize each arithmetic operation, according its operand types and context.To put this theory into practice, I tried a first experiment combining invokedynamic with Kawa, a mature implementation of Scheme on the JVM. Kawa adds dynamically typed numbers together with routine called AddOp.$Pl. As suggested above, it takes two object references, classifies them as numbers, converts them to a common type (Integer, Double, whatever) and adds them together according to that common type.You might expect that the classification logic is expensive, and it is, compared to the actual cost of adding two machine words. The main cost, however, appears to be the allocation of intermediate values on the heap. In arbitrary approximate units, the cost of classifying operand objects is 1 to 3, the cost of adding their "payloads" is less than 0.1, and the cost of boxing the result is 5.The best way to speed up such computations, therefore, is to elide the allocations, either some interprocedural analysis or handshake that keeps the payloads in registers or on the stack, or to encode small payloads in pseudo-pointers. Failing that, we are always working to make GC faster. I suppose JVM engineers need to do all three...Meanwhile, there is the interesting problem of classification. The range of costs noted above is affected by the success or failure of type profiling. JVMs and other dynamically typed systems usually keep track of operand types (in many clever ways) so that they can simplify and specialize their execution. Specifically, although the Kawa routine AddOp.$Pl is ready to handle numbers of all types, if it is presented with only (say) Integer operands, the HotSpot JVM will notice that the internal classification tests never observe anything but that one type, and it will adjust the machine code for that routine to work well for that type, and "bail out" to an expensive recompilation if the assumption changes. If this tactic works, the total cost of adding two Integers together is 6 units. Otherwise, it is 8 units.Using invokedynamic, we can add to the JVM's profiling information. The trick is to have the bootstrap method link in invokedynamic site with a method handle that observes operand types, briefly records them, and then specializes itself to a new method handle that works better for the previously observed operand types. My experiment (above) is very simple: The call site observes the type of one operand of one call, and spins an "if/then/else" which says, "as long as future operands are also of that type, cast to that type, and call AddOp.$Pl".The effect of the extra cast is to send specialized type information into the AddOp.$Pl routine, which enables it to fold up, about the same as if the JVM's global type profile had succeeded. In essence, invokedyanmic works as a front-end to the AddOp.$Pl routine, helping it to classify its operands. The overhead of the invokedynamic instruction is about 1 unit (which is higher than it should be, but that's young technology for you). But, the extra profiling prevents the worst case classification cost of 3 units, keeping it down to 1.(The comment at the top of the benchmark has more information about how these numbers were obtained. Full disclosure: This experiment was run on the latest mlvm experimental build. Also, the JVM's inlining heuristics, designed circa 1999, had to be twisted around manually in order to keep the JIT interested in processing the program in large enough units. To get valid system-level results, we'll have to engineer better inlining tendencies into the JVM's logic.)This result is encouraging for a few reasons. First, under the realistic assumption of a polluted type profile, invokedynamic can reduce the cost of generic arithmetic by double-digit percentages. That margin will improve as JVMs improve the quality of invokedynamic code generation. See the upcoming paper by Thalinger & Rose in PPPJ 2010 for more observations on this point.Second, this particular experiment required no change at all to the Kawa runtime routine. The new JVM features can be used in an evolutionary way. But deeper changes can get better results. For example, dynamic languages sometimes have "metaobject protocols" which allow the dynamic type system to communicate (in a modular way) with the compiler. A good MOP can encode optimization decisions about generic arithmetic sites, and invokedynamic provides a clean way to plug those decisions into the JVM's generatedcode.Third (and most important) is the fact that the temporary allocation, although it is by far the highest cost, is also vulnerable to local analysis. After a type-customized version of AddOp.$Pl is inlined at an invokedynamic site, it becomes an exercise in escape analysis (or fixnums; take your pick) to remove the garbage collector from the picture. This is a work in progress for us JVM folks...

Most computer languages let you add two numbers together. This is harder than it looks, since numbers come in a variety of formats, starting with fixed-point vs. floating-point. The logic for...


JVM Language Summit agenda is posted

Samuel Johnson (the dictionary writer) once observed, “Language is the dress of thought.” This is true enough, and I suppose it implies that the JVM Language Summit is a clothing show. (Should we meet in Paris?) More boldly, Ludwig Wittgenstein wrote, “The limits of my language mean the limits of my world.” In those terms, the Summit is about building the world, at least those parts that run on silicon, and as that world becomes multi-core, our languages are becoming a limiting factor. In any case, for me, the Summit is a chance to visit, chat, and dream big with you fellow world builders (or tailors) out there.Next month, we are once again gathering a room full of language and VM developers at Oracle’s Sun Santa Clara Campus, for the three-day meeting known as the JVM Language Summit. This year (like before) we have an excellent set of speakers lined up. I have just posted the agenda.If you missed the call for speakers, but are a VM or language implementor who wants to have an extended chat with your colleagues about where the JVM is going, consider coming to Santa Clara on Jul 26-28. There are still some participant slots available. Space is limited, because this is designed as a one-room event. If you can’t come, you may also enjoy the recorded talks from previous years, and those from this year, since we are planning to record them.P.S. Samuel Johnson was quoting a well-known Roman language researcher, who elsewhere noted that language design is almost a solved problem, as follows: “Atqui plerosque videas haerentes circa singula et dum inveniunt et dum inventa ponderant ac dimetiuntur.” (“When our language is good Latin, significant, elegant and aptly arranged, why should we labor for anything more?” Institutio Oratoria, VIII.0.30, pub. A.D. 95.) Luckily for us aficionados, language design continues. (To all Quintilian fans: Yes, I took him out of context. Sorry.)

Samuel Johnson (the dictionary writer) once observed, “Language is the dress of thought.” This is true enough, and I suppose it implies that the JVM Language Summit is a clothing show. (Should we meet...


tailcalls meet invokedynamic

It is Autumn, and tail recursion is in the air. A JSR 292 colleague, Samuele Pedroni, just raised the question of how invokedynamic might interact with a tail call feature in the JVM. Here are a few thoughts on that subject, which I hope will provoke further discussion. The first few are (I think) not very controversial, but I try to get a little crazy toward the end. Where, do you suppose, is the boundary?dynamic tail-call sitesOf course, the tail-call prefix should apply to invokedynamic as well as to the other invokes. The most recent design of invokedynamic has been kept clean enough to accommodate this. (The version of 2008 did not, since it allowed the bootstrap method to fulfill the initial execution of an invokedynamic site; this would have been very awkward to make tail-recursive.) The semantics of invokedynamic are (a) link the call site if necessary to create a reified call site object, (b) extract the current target method handle from the call site, (c) invoke the method handle as if by an invokevirtual instruction. To make this a tail call, only step (c) needs to be adjusted, and in the obvious way.combinators which call method handlesThere are a number of combinators that take an existing method handle M and create a new one M′ that calls the previous M. There are a number of cases where, if a caller invokes M′ as a tail call, he also rightly expects that M will be invoked as a tail call. In other words, certain adaptations of M should not spoil its tail-callability. Let us say in the case of such a combinator C, that C is tail-call transparent to its argument M. In Scheme, APPLY is tail-call transparent to its first argument, while FOR-EACH is not.The cases appear to be something like this:convertArguments — This one should be tail-call transparent, since all it does is adjust one method handle to an equivalent method handle of a different static type. There is a problem with return values, though; see next section.invokeExact, invokeGeneric, invokeVarargs (etc.) — Any flavor of method handle invoker should be tail-call transparent. Many of them have the same problems with return type conversion as does convertArguments.exactInvoker, genericInvoker (etc.) — These are type-specific versions of the statically typed invokers, presented in the API form of method handle factories. They have the same issues as statically typed invokers.insertArguments, dropArguments, permuteArguments, spreadArguments (etc.) — Any kind of argument motion should clearly be tail-call transparent. Note that this requires somethign like a “stretchy” frame for the initiating caller, which can be opened up to hold larger argument lists than were sent by the initiator (the original non-tail caller). This problem is generic to most tail call mechanisms, so we can assume the JVM will have stretchy frames in the right places. (What is a good term for these frames?)collectArguments, filterArguments, foldArguments — All of these combinators call a potentially complex transformation on the outgoing arguments, and then call the target M. They are tail-call transparent with respect to their first argument (the ultimate target M) but not with respect to any other method handles that are used to transform the argument list. (collectArgument has an implicit call to an array constructor method; maybe we could make this optionally explicit.) guardWithTest — This guy is an if-then-else; he clearly has to be tail-call transparent with respect to the second and third arguments, but cannot be with respect to the first, which is the predicate to evaluate. catchException — You cannot seriously expect (I claim) to tail-call a method and simultaneously catch exceptions around it. But this guy should be tail-call transparent in the exception handler argument.return value castsThe MethodHandles.convertArguments combinator makes up the difference between a caller&rsqo;s and callee’s idea of a method’s static type; the callee specifies M and the caller matches the adapter M′. This is crucial on the JVM, which is mostly statically typed. If the return value types differ enough, then M′ might need to apply a cast, box, or unbox operation to the return value of M. This appears to spoil tail-calling, since M′ must “wait around” for M to complete in order to apply the return transformation. Do we want to force such a thing to look like a tail call?I think the answer is yes. Note that the pending transformation is restricted to be a chain of casts, boxings, and unboxings. Such a chain can be represented in an approximately finite space. (Or so I claim, without providing all the details. It’s an exercise for the reader!) Thus, even if a loop is building up an infinite number of pending return value conversions, they can all be collapsed into a small record at the stable edge of the control stack. (By that I mean in the execution frame of the initiating caller, with a non-tail call, the problematic series of looping tail calls.) To do this will require some special pleading in the JVM, but I think it is a worthwhile investment.arbitrary return value transformations Suppose we support an arbitrary chain of low-level type conversions. Do we also support user-specified transformations on the return value, such as turning an object to a string or a string to a number? This immediately dispels any pretense of finite space. But we have garbage collectors to help us with infinite space. Perhaps it would be reasonable to add a combinator transformReturnValue which tail-call transparent in both arguments.It would work like this: First, a stretchy frame is lazily created to cache any pending return value transformations (including built-in ones like casting). Next, the return value transformer is linked into a heap-allocated list rooted in the frame. Then, the second argument is tail-called (keeping the stretchy frame, of course). When that call finally returns, the head element of the return value transformation list is popped and tail-called in its turn. Eventually, if the list is ever exhausted, the stretchy frame returns the last return value to its caller. This pattern seems useful and natural, as a complement to the regular stack-based recursion. I don’t think it can be coded at the user level, so maybe it deserves to be a VM-level mechanism.looping combinatorsIt is possible to use foldArguments to implement a looping state machine. The idea is to apply a variable, possibly infinite series of method handles (functions) to a fixed set of arguments (perhaps a state control block). The loop is initiated by calling foldArguments on a target of MethodHandle.invoke itself (any form), and require the fold function to look at the arguments and return a successor function to be the next one to look at the arguments. The successor function will either be the final function in the chain, which will return to the initiator. Or, if the successor function is another instance of foldArguments, the process continues. If this is not to blow the stack, it requires tail calls within the foldArguments combinator. Unusually, this use of tail calling does not require the user to issue an explicit tail call instruction, so it is plausible to require that this looping combinator pattern work even in the initial version of JSR 292, which lacks user-written tail calls.This also raises the question of what is a more natural form of “Y combinator” for the JVM. If we had multiple-value return and/or tuples the options would be a little nicer. Any suggestions out there?generators, anyone?The combination of looping combinators and pending return value transforms may provide efficient ways to express generators. Note that a tree-walking generator, if it is not to blow the stack, has to do something like maintain a pending visit list, which maps nicely (I think) to a low-level return value transformer list.bytecoded state machinesDizzy yet? Strap in now; we’re going over the top… An irreducible use case (i.e., a use case without practical workarounds) for tail calls is machine-coded state machines. For this pattern, we can use tail calls to perform computed jumps from one state to another, when the successor state cannot simply be represented as a branch target. The nice thing about this pattern is that when successor states are few and static, conditional and unconditional branches are efficient, and not every transition needs to be a computed jump. This sort of partly-static control flow can be compiled pretty well.An advanced example of this pattern is the code generated by PyPy, which is a dynamically growing set of machine-level basic blocks. Each block terminates with a computed goto (or switch, not sure which). This could be represented naturally in the JVM with a tail-call to a computed next block pointer. As an extra wrinkle, the switches are open-ended and can grow new successor cases. (This is how new traces are linked in.) The JVM way to express this, probably, is using invokedynamic to compute the successor method handle, so that the successor logic is patchable.Where does this take us? Well, think of a tail-call as a providing a way for a block of code to replace itself with another block of code. Or, as a way for a method call to have a multiplicity of dynamically computed successors. I think a natural complement to this is to allow a method to have a multiplicity of dynamically computed predecessors also. Though it is a loose analogy, it suggests to me that tail calls would synergize well with bytecoded methods with multiple entry points. In this way, an executable state machine could be coded up as a bytecoded control flow graph, with both exit and entry points for the parts which are not statically definable. Method handles could be formed not only on the first bytecode of the method, but also on other (well-defined and verifiable) entry points. This would allow control to leave the state machine via a tail call, and re-enter it via a secondary entry point. Any type-state required (in locals) at that entry point would be verifiably related to the signature of the secondary entry point. I think it would be a slick way to build (say) compiled regular expression scanners, in a mostly non-recursive format. The occasional embarrassments like backtracking could be handled in a way decoupled from the basic state machine, and (particularly) from the host control stack.Also, maybe, the type-state as of an exit from the state machine (i.e., a tail call) could be reified (opaquely for security) and passed along to the tail call itself. (I guess this is a kind of bounded continuation.) Later on, whenever the state machine needs to return to the source of the tail call, it could wake up the reified type-state and resume the call. This would be done without holding onto resources on the host control stack, and could be done more than once (for things like backtracking) on a given type-state.

It is Autumn, and tail recursion is in the air. A JSR 292 colleague, Samuele Pedroni, just raised the question of how invokedynamic might interact with a tail call feature in the JVM. Here are a few...


Thursday at the Summit

Start with one generously sized conference room. Fill it to capacity with language and VM implementors. Stir vigorously for three days, folding in talks and workshops in successive layers. Garnish with professional videography. Yield: One JVM Language Summit, about 80 servings.It's been wonderful so far, and I'm looking forward to the final day tomorrow. You can see what we've been talking about by clicking on the talk links in the agenda; most of the slide decks are uploaded there. Yesterday I was personally inspired to see how Charlie Nutter and Attila Szegedi are experimenting with invokedynamic and the rest of JSR 292. Later in the day we had a workshop on JSR 292, in which actual and potential customers and implementors of invokedynamic gave high-volume, high-quality comments on the remaining design questions (see the proceedings wiki for some). I also enjoyed David Pollak's presentation on the internals of Scala, a language I dream of retiring to when I'm done coding VMs in C++.Many of the talks are encouraging us to "think different" about the problems of organizing complex software systems on managed runtimes on gigantic machines. Two big contributions to this quest today were Erik Meijer's talk on his non-blocking "reactive" framework, and Rich Hickey's keynote. Rich (like Erik last year) encouraged us to rethink how we use memory and data structures, and particularly get much tighter control over mutability in our data structures.On another personal note, I was delighted to see the use Rich made of classical philosophy in the writings of Alfred North Whitehead. On about slide 16 I was struck by the observation that "all things flow". You see, on Monday I was teaching Plato's Theaetetus to a classroom full of bright high schoolers (including my son Bob), in the course of which I had written on the board "all things flow" (παντα ρει, specifically), as an early problem with which Socrates and his successors struggled. I like the great books, and I like software engineering, and so it's a real treat when they synergize!The problem of "flow" or "flux" was vigorously posed 2500 years ago by Heraclitus, who liked to speak in riddles about rivers and flow and The Word and The Way. The problem is that, if you can't step into the same river twice, the identity of the river itself dissolves into a series of isolated events or perceptions. Rich's observation is this old problem nicely describes some of the root problems of the object-oriented model: You can try to make an object have a stable identity over time, but as you allow state transitions to pile up with little structure, you begin to risk its integrity (or the integrity of your use of it).So the next ancient "paradigm" to flux was stability. Parmenides and others (like the famous Zeno of the motion paradoxes) said that change is not real (a tragic maya-illusion for us commoners) but that real reality must be eternal and stable. Plato inclined towards this position too, hence the proverbial "platonic world" of ideas; he accepted the world of change as a poor imitation of the pure eternal world.There in fact is a similar "paradigm 2.0" for software systems: Pure functional programming tries to remove the embarrassing problem of flux by working in a timeless, stateless reality of immutable values. Want to set element #59 of an array? No problem, but first you have to build, beg, or borrow a copy of the whole array, one which has "always had" the desired value at #59. It's hard to get the hardware people to build us memory that can do that in one step, like our old fluxy "set and forget" arrays.But we know from common sense (that trusty yet abused helper) that we live somewhere between the fluxy world of Heraclitus and the timeless world of Parmenides. In the days of Plato, it was his student Aristotle who balanced the claims of the two accounts, picking a middle way between the two. In short, the formal ideas we perceive and know inhere (in a way real though bound) in the mutable, moving reality around us. Object identities (what he called "substances") are real, though temporal. If Aristotle had marketers like we do, he might have called his account a hybrid paradigm (more Greek, actually, though to him "hybrid" would mean "having hubris") of flux and form. I think it's likely the story will end similarly for us, in our professional difficulties with software, with some sort of middle way.Just to speculate wildly, maybe those hybrid functional-and-object-oriented languages (like Scala and F#) can provide a sort of dictionary and grammar of useful terms and constructs. These would somehow describe computations as immutable in most places, and manageably mutable in those few aspects needed to model real-world transitions. (Or to implement localized compute engines, like a quick-sorter.) Compared to Scala and F#, Rich's Clojure language and data structures provide a more explicitly limited set of patterns for managing flux in programming. It's growing to be familiar story these days... but today the connection to my dead old Greek friends was very fresh to me. I never before dreamed of a connection between Heraclitus and Smalltalk, or Haskell and Zeno. Thanks, Rich!A final note, firmly back in the 21st century: The hardware people continue to build us faster and bigger Von Neumann machines, with a fiction of globally knowable, infinitely mutable memory. It's a combination of flux and stability: The memory cells are notoriously fluxy (more each year as new processors are piled into the designs), but their addressing scheme is changeless. It's an odd model, since in reality modern memory systems are extremely busy changing the addresses under our noses (that's what caches do) and running highly complex network protocols (totally invisible to us software folks) to keep up the fiction of a fixed global sequence of cells. The changeless addressing scheme makes it hard to talk about sharing and unsharing chunks of data, as they flow (that word again!) from processor to processor. On the other hand, the radical changeableness of each and every memory cell (by any processor in the global world) makes every load from memory at least slightly uncertain. Some of worst failures we call "pointer stomps", and they amount to dipping into the same river twice and getting toxic waste the second time. For most of our data structures, most of the time, the radical mutability of their component memory cells is a potential liability. Because their cells are located in a single place in the global memory order, it is often difficult to send them to all the places they are needed. As I said earlier today, maybe the problem with today's global, mutable memory systems is that they are global and mutable. What might a mostly-immutable, mostly-local memory system look like?

Start with one generously sized conference room. Fill it to capacity with language and VM implementors. Stir vigorously for three days, folding in talks and workshops in successive layers. Garnish...


only 36 shopping days until the JVM Language Summit

If you (like me) are someone who actually enjoys contemplating the details of how languages turn into bytecodes and thence into wicked-fast machine code... If you lose sleep wondering about the joint future of programming languages and managed runtimes (especially the JVM)... If you think VM and language designers can save the world from a dystopian future of multi-core computers with no software to run on them... Please read the enclosed Call for Participation!=== CALL FOR PARTICIPATION -- JVM LANGUAGE SUMMIT, September 2009 ===http://jvmlangsummit.com/Dear colleague;We are pleased to announce the the 2009 JVM Language Summi! t to be h campus on September 16-18, 2009. Registration is now open for speaker submissions (presentations and workshops) and general attendance.The JVM Language Summit is an open technical collaboration among language designers, compiler writers, tool builders, runtime engineers, and VM architects. We will share our experiences as creators of programming languages for the JVM and of the JVM itself. We also welcome non-JVM developers on similar technologies to attend or speak on their runtime, VM, or language of choice.The format this year will be slightly different from last year. Attendees told us that the most successful interactions came from small discussion groups rather than prepared lectures, so we've divided the schedule equally between traditional presentations (we're limiting most presentations to 30 minutes this year) and "Workshops". Workshops are informal, facilitated discussion groups among smaller, self-selected participants, and! should e the subject matter. There will also be impromptu "lightning talks".We encourage speakers to submit both a presentation and a workshop; we will arrange to schedule the presentation before the workshop, so that the presentation can spark people's interest and the workshop will allow those who are really interested to go deeper into the subject area. Workshop facilitators may, but are not expected to, prepare presentation materials, but they should come prepared to guide a deep technical discussion.The Summit is being organized by the JVM Engineering team; no managers or marketers involved! So bring your slide rules and be prepared for some seriously geeky discussions.The registration page is now open at:http://registration.jvmlangsummit.com/If you have any questions, send inquiries to inquire@jvmlangsummit.com.We hope to see you in September!

If you (like me) are someone who actually enjoys contemplating the details of how languages turn into bytecodes and thence into wicked-fast machine code... If you lose sleep wondering about the joint...


Tuesday at JavaOne

It has been a busy day, of course. For those interested, I have posted the talk Brian Goetz and I gave on the Da Vinci Machine Project. We divided our attention between a “grand vision” of what VMs are and where they are going, and the exciting particulars of how invokedynamic and method handles work. (Warning: You’ll find lots of transition slides in the deck; I have not had time to condense them back to their combined versions.)By contrast, yesterday’s talk at CommunityOne was more of a call for participation in the project, with a little motivational logic and history thrown in. The maximum amount of gory detail will be in Thursday’s talk called JSR 292 Cookbook.Although I spent much of the day preparing for my own presentation, I rounded out the evening by attending some other talks. I was edified by a vigorous (if at times skeptical) discussion of interface injection, moderated by Tobias Ivarsson, who is prototyping it. The Collections Connection talk by Google folks was fun, and I was drooling over the new data structures—but I wonder did they talk to Rich Hickey and think about pulling in some of his finely crafted functional data structures. Google is using the builder pattern to create immutable values; I wish we knew how to integrate this into Java, since there are so many use cases for it, starting with formatted strings. (In this vein F# is working on computation expressions; I wonder what Scala has?)The day ended with a rousing interaction with HotSpot JVM customers at the Meet the HotSpot Team BOF. Too few engineers and too many great (even urgent) ideas. I am glad the JDK is open-source now, so the community can influence the engineering priorities by voting with code.

It has been a busy day, of course. For those interested, I have posted the talk Brian Goetz and I gave on the Da Vinci Machine Project. We divided our attention between a “grand vision” of what VMs...


a beautiful life

Just about 24 hours ago, I was working on invokedynamic compilation in my hotel room with colleague Christian Thalinger. I have since learned that around that time, my dear grandmother, wise and loving to the end, slowly breathed her last and went to be with her Maker.I suppose this is a strictly personal event, but as beautiful things deserve be shared, I would like to provide a glimpse of what this woman was like, in a photo of her and me taken 18 months ago.That picture shows Grandma Ev's authentic, habitual brightness of expression. It shone brightly even on days when the mortal machinery was breaking down. "Don't ever get old", she'd say then, with a gentle smile. In her life, hardship was met with integrity and grace, and deep-set habits of peacemaking and service were always at work building up what had been torn down. Her family and friends found her to be an earnest and perceptive encourager, ready to celebrate any of our well-doings.As her systems slowed to a halt, she remained the woman she has been for 90-odd years, the woman who gave me some of my first lessons in kindness. Those at her bedside say that as breath for speech became scarce, she continued to bless her family with assurances of love, with smiles, kisses, and mouthed words of comfort.Ev's signature motto, given as an answer to "how are you doing?" has long been a cheerful "I'm Happy on the Way", meaning specifically the way of Jesus. Grandma, your way has been a gift to many, and has brought you to earthly completion with the honor of family and friends. May such happiness be in our way also. Thanks for an example worth following.

Just about 24 hours ago, I was working on invokedynamic compilation in my hotel room with colleague Christian Thalinger. I have since learned that around that time, my dear grandmother, wise and...


JSR 292 support in javac

In order to work with dynamic types, method handles, and invokedynamic I have made some provisional changes to javac as part of the Da Vinci Machine Project. The mlvm wiki has a full description for Project COIN. It is most desirable, of course, to program invokedynamic call sites as Java expressions, not just ASM code, and that's what those langtools patches are for.The essential features are four: The type java.dyn.Dynamic will accept any method call and turn it into an invokedynamic instruction, and the full range of such instructions can be spelled from Java code.The type java.dyn.MethodHandle will accept any argument and return types for a call to the method named invoke, which means that Java code can spell the full range of method handle invocations.The full range of bytecode names acceptable to the JVM can be spelled from Java code, using an exotic identifier quoting syntax.The type java.dyn.Dynamic serves as a bare reference type: Anything implicitly converts to it, and it can be cast to anything, but it is not a subtype of java.lang.Object. Its methods, of course, are those from point #1, so it is handy for forming invokedynamic calls.The rationale is pretty simple: If we put some minimal support into Java for defining and using names in other languages, then Java can be used as a system programming language for implementing them. Otherwise, the system programming language will be assembled bytecode. (I like to visit ASM, but don't want to live there.) If that piques your interest (and if you read my blog, I suppose it might) do check out the wiki page.Since I use NetBeans, I've also adapted the NetBeans Java frontend (so it won't keep putting red squigglies under code I know is correct). For those adventurous souls who are already willing and able to hack their NetBeans (and take all the pertinent risks!), here is a JAR to replace the corresponding one in the bowels of NetBeans 6.5. Obviously, be sure to place it only in a scratch copy of NetBeans that you can afford to burn. If you don't know what I'm talking about, you are lucky, and please don't hack your NetBeans!

In order to work with dynamic types, method handles, and invokedynamic I have made some provisional changes to javac as part of the Da Vinci Machine Project. The mlvm wiki has a full description for Pr...


PyCon VM summit

I was at the PyCon VM summit; it was great! There were about 20 talks (in the 10-20 minute range). Since "Sun" comes late in the alphabet, I had the pleasure of watching my fellow summiteers go first. When everybody was good and tired, I gave a presentation on the Da Vinci Machine Project. Here are some notes on a few of the other talks...Microsoft .NET DLR: Rolling along nicely; they have fixed a number of performance problems since last year, and have matured their dynamic language support, as showcased in IronPython. Call-site caching is the name of the game this year, on DLR, the JVM (that's what invokedynamic does), and the rush of new high-performance JavaScript implementations. DLR folks also enjoy working with .NET expression trees, very rich run-time type information, reified generics, iterators (generator-like things). They aren't fooling with continuations, at least this year. They have apparently sped up tail-call, which used to be much slower than regular calls.There is a fork of CPython at Google called "Unladen Swallow" (I can't stop thinking of how Pythons eat; aghh). Its goal is to get 5x speedup in one year, by the application of google-osity to pythonic performance problems. BTW, like Ruby, new Python implementations are challenged by the need to either rewrite or cope with existing giant libraries written in C. This would be a huge problem for Java, as the JVM has all sorts of strong execution invariants that C code would break, were it not for the firewall of JNI.JRuby (yes, Charlie and Tom were there) is able to punt some of its performance problems to the JVM's JIT, GC, and general multiprocessor support. Their challenge is reimplementing Ruby without a specification; this also is a characteristic problem of the new crop of languages. (I'm now even more grateful for the excellent early work on specs. in the Java ecosystem.) They are running a Ruby interpreter, which then JITs (after a short warmup) to JVM bytecodes, which (they hope) the JVM eventually JITs to machine code. Like Python (and Java) they need a foreign function interface which is fast and easy to use.Meanwhile, Jython has very simple bytecode translation; hopefully both they and JRuby will make more use of invokedynamic by JavaOne.PyPy continues to be a fascinating exercise in bootstrapping. Their toolchain generates, from a factored group of executable specifications, the whole Python interpreter and GC, which makes it easy (when it works) to fold in read/write barriers, security checks, trace JITs, and other exotica. They run fully stackless. (A number of people talked about stackless execution. Part of the attraction is easy support for generators and perhaps other coroutines; part of the attraction is an inverse repulsion from the C stack, which is hard to control.)Rubinius is a reimplementation of Python Ruby in the style of Smalltalk 80, including dead-simple bytecodes and those cute backtracking primitives. (ST80 primitives can fail on arguments they don't want to handle, causing the VM to run the associated bytecoded method.) It also is stackless. It has an "Immix" GC (regional, kind of like OpenJDK's G1). As with a number of these systems, a JIT seems imminent... though JITs always take longer.There were other cool VMs, including Spider Monkey (trace-based from Mozilla), Parrot (continuation-based stackless), MagLev, and Factor. After the presentations, we hung around the room all day and talked ourselves until exhaustion, then went and ate and talked more. It was a memorable day!

I was at the PyCon VM summit; it was great! There were about 20 talks (in the 10-20 minute range). Since "Sun" comes late in the alphabet, I had the pleasure of watching my fellow summiteers go first....


bloom filters in a nutshell

Bloom filters are a charming data structure. A predecessor idea dates from pre-electronic days, when decks of Hollerith punch cards would be queried in the 1940’s by inserting a rod into one of M holes along the top of the deck. Only cards with notches at the selected position would then drop out of the deck for further processing. Repeating the procedure K times would select only cards with notches at all K positions. A false drop occurs when a card happens to have the required K notches, but not for the reason it is being sought.Similarly, a Bloom filter (dating from 1970) is an array of M bits which is queried at K quasi-randomly selected positions pk (k < K). If all of the bits are set, then the query returns positive, indicating that someone has already visited the array, setting the bits at all the positions pk. If this happened by chance, perhaps because of several independent visits, we get an error called a false positive or (with a nod to tradition) a false drop.The filters are quite simple, but the math is a little slippery until you get the right grip on it. Here’s the way I like to grab it, presented in case it helps anyone else.First of all, the bottom line: Size your Bloom filter to contain NK bits, plus an overhead of 44%. Put another way, for an error rate of ε, allocate lg(1/ε)·lg(e) bits for each key you intend your filter to hold.Loading a Bloom filter is much like spraying bits randomly into a bit-array target. If the bits land randomly, the key measure of what happens is number of sprayed bits per bit of target. Call this density ρ (“rho”).An easy analysis shows that any given bit in the array has dodged all the “bullets” with a probability of P0 = exp(-ρ).Sanity tests on this formula work nicely: ρ=0 ⇒ P0=1 (no bullets), and ρ=large ⇒ P0=small.Interesting cases arise when you are attempting to fill the array half full. (Why half full? That’s when the array is at maximum entropy, when the most information can be packed into it. As a programmer, I want each array bit to have close to a 50-50 probability of being 0 or 1. Otherwise, I think I’m wasting memory space.)For our first try, let’s spray half as many bits as are in the array. But ρ=1/2 ⇒ P0=0.61. That is, I only hit 39% of the target. There are too many zeros, because some target bits got hit more than once!In order to hit exactly half of the array targets, we have to choose a larger ρ such that P0=0.5. Solving gets us ρ50=ln(2)=1/lg(e)=0.69. Notice that we haven’t said anything about target size at all. It is the density that counts.For Bloom filters, each bit is sprayed by one hash function for one key. (A hash function produces a quasi-random index less than M given the bits of a key. The key can be of any size. Typical hash functions select bunches of bits from the key and XOR them together, to produce each bit of hash.) So if you have K hash functions times N keys, and a bit-table of size M, then ρ=NK/M.(The choice of K is proportional to the log of the error rate you are shooting for. If you ask K questions, the chance of the Bloom filter “guessing” all K answers correctly, and spoofing your key, is exponential in K. It is 1/2K if the filter is tuned to a 50-50 distribution of bits.)Repeating the failed experiment above, if you were to try for a half-full target, you would size it as M=2NK, and spray in your NK bits. But instead of a 50-50 distribution, you’d get a bunch of extra zeroes.That in turn would drive down your error rate (which would be nice) but it would waste memory to store all those extra zeroes. The math (not given here) says that you could get the same error rate with a smaller M by increasing K. (Actually, in real applications, K must be an integer, so K+1 might overshoot the sweet spot.)It turns out that the optimum size for M, if you are allowed to vary K but preserve the error rate, is the size which fills the target array with nice clean 50-50 bits. Our programmer’s intuition about entropy (mentioned above) is vindicated.So, solving for a 50-50 bit distribution in terms of NK gives M = NK/ρ50 = NK·lg(e) = NK·1.44.There are many more tricks and variations on Bloom filters. One brief introduction is at cap-lore.com and there’s lots of stuff at Wikipedia. The nicest presentation of the math I have seen is Broder & Mitzenmacher. Enjoy!

Bloom filters are a charming data structure. A predecessor idea dates from pre-electronic days, when decks of Hollerith punch cards would be queried in the 1940’s by inserting a rod into one of M holes...


simple Java linkage: an invokedynamic apéritif

As is abundantly documented elsewhere (and in this blog), the JVM performs all of its method calls with a suite of four bytecode instructions, all of which are statically typed in all arguments and return values. The object-oriented aspects of Java are served by the JVM’s ability to dynamically select a resolved method based not only on the static type of the call, but also on the dynamic type of the receiver (the first stacked argument).Here is a quick summary of the invocation instructions: Both invokestatic and invokespecial resolve the target method based only on static types at the call site. They differ in that invokestatic is the only receiverless invocation instruction. Both invokevirtual and invokeinterface resolve the target method based also on the dynamic type of the reciever, which must be a subtype of its static type. They differ in the static type of the receiver; only invokeinterface accepts a receiver of an interface type.JSR 292 is adding a fifth invocation instruction, invokedynamic. Like all the other instructions, it is statically typed. Like invokestatic, it is receiverless. What is new is that an invokedynamic instruction is dynamically linked (and even re-linked) under program control. There are many applications for such a thing, and in this blog I will be giving “recipes” to demonstrate some of them. For today, here is a light aperitif showing how invokedynamic could be used to simulate the other invocation instructions. This of course is relatively useless as-is, but it is an apt demonstration that invokedynamic can be used as a building block for more complicated call sites that include the standard JVM invocation behaviors as special cases. (Caution: This blog post is for people who enjoy their bytecodes full strength and without mixers.)Here is a code fragment that creates a File and performs two calls on it, an invokevirtual call and an invokedynamic:java.io.File file = ...;String result1 = file.getName();String result2 = java.dyn.Dynamic.<String>getName(file);The static type of both calls is exactly the same. Their symbolic descriptors differ, but only because the first (and only) argument is explicit in the second call, but implicitly determined by the symbolic method reference in the first call. Here is the disassembled bytecode:$ MH=$PROJECTS/MethodHandle/dist/MethodHandle.jar$ LT=$DAVINCI/sources/langtools$ cd $PROJECTS/InvokeDynamicDemo$ $LT/dist/bin/javac -target 7 -d build/classes -classpath $MH src/GetNameDemo.java$ $LT/dist/bin/javap -c -classpath build/classes GetNameDemo... 26:aload_1 27:invokevirtual#6; //Method java/io/File.getName:()Ljava/lang/String; 30:astore_2... 38:aload_1 39:invokedynamic#9, 0; //NameAndType getName:(Ljava/io/File;)Ljava/lang/String; 44:astore_3...Since invokedynamic is dynamically linked under program control, there is no guarantee in the code above that the second invocation does the same thing as a the first. In order to provide the required semantics, an invokedynamic instruction requires a bootstrap method to help it link itself. In the present recipe, the required bootstrap method splits neatly into a link step and a continuation step, and looks like this:private static Object bootstrapDynamic(CallSite site, Object... args) { MethodHandle target = linkDynamic(site); site.setTarget(target); return MethodHandles.invoke_1(target, site, args);}The link step, handled by the linkDynamic and setTarget statements, ensures that the call site is supplied with a non-null target method. The continuation step (the third statement) simply invokes target method on the given arguments.The middle statement (with setTarget) installs the target on the call site, so that if that particular invokedynamic instruction is ever executed a second time, the JVM itself will execute the target method on the stacked arguments, without a trip through the bootstrap method. This is why we say that invokedynamic is dynamically linked by the bootstrap method, not just dynamically interpreted. If the setTarget call were left out, the program would perform the same operations, but interpretively, with the linkage step performed every time.The interesting part of this example is the linkage routine itself. It is handed a call site with a specific name and resolved type descriptor, and is expected to produce a target method to fully implement the call site. In this present example, that consists of deciding which virtual method to call, and asking the JVM for a handle on it:private static MethodHandle linkDynamic(CallSite site) { String name = site.name(); MethodType type = site.type(); // static type of call Class<?> recvType = type.parameterType(0); MethodType dropRecvType = type.dropParameterType(0); MethodHandle target = MethodHandles.findVirtual(recvType, name, dropRecvType); if (target == null) {throw new InvokeDynamicBootstrapError("linkage failed: "+site); } return target;}In this example, the value of name will be the method name supplied at the call site, "getName", and the value of type will be the method type resolved from the symbolic descriptor (Ljava/io/File;)Ljava/lang/String; (as found in the bytecodes). The first (and only) argument type is dropped and used as the class to search for the matching method.This provides a faithful emulation of the invokevirtual bytecode. The other bytecodes could also be emulated by small variations. For example, since findVirtual works for interface types as well, if the invokedynamic call has stacked a first argument of an interface type, the end result would have been the same as the corresponding invokeinterface call. To emulate an invokespecial or invokestatic call, the call to findVirtual would change to findSpecial or findStatic, respectively.Since one bootstrap method serves an entire class, with any number and variety of invokedynamic call sites, the method names at the call sites must in some way encode not only the target method name, but also other information relevant to specifying the target. In the case of invokestatic, the intended target class is not represented anywhere within the descriptor (there is no stacked argument of that class), the target class must also be encoded. Here are some examples emulating additional types of calls, with the linkDynamic logic left as an exercise to the reader:String result3 = java.dyn.Dynamic.<String>toString((CharSequence)"foo"); // invokeinterfaceString result4 = java.dyn.Dynamic.<String>#"static:java\\|lang\\|Integer:toHexString"(123); // invokestaticUsing invokedynamic merely to emulate the other instructions does not have a direct use, unless the linkDynamic logic is varied in interesting ways. But that is the point: There is no end to such variations. Here are some of the degrees of freedom:use the call site name as a starting point but link to a method of a different name (e.g., "GET-NAME" links to "getName")use the call site type as a starting point but link to a method of a different type, via a method handle adapter (e.g., supply a default value for a missing argument)combine some or all of the actual arguments into an array or list, and pass them as a unit to the target method, via an adapteruse the actual, dynamic types of any or all reference arguments to dispatch to a variable target method (e.g., implement multiple dispatch for generic arithmetic)For the curious, I have uploaded a NetBeans project which presents the code fragments mentioned above. It will not run without an updated JVM built from patching in the Da Vinci Machine Project. To track that project, please join the mlvm-dev mailing list.There is one final detail which commenters have asked about. The JVM needs to be informed where to find the bootstrap method for any given class containing an invokedynamic instruction. This is done by a static initializer in the class itself:static { MethodType bootstrapType = MethodType.make(Object.class, CallSite.class, Object[].class); MethodHandle bootstrapDynamic= MethodHandles.findStatic(GetNameDemo.class, "bootstrapDynamic", bootstrapType); Linkage.registerBootstrapMethod(GetNameDemo.class, bootstrapDynamic);}Watch this space for more invokedynamic recipes. Next up, Duck Typée à la Invokedynamic.

As is abundantly documented elsewhere (and in this blog), the JVM performs all of its method calls with a suite of four bytecode instructions, all of which are statically typed in all arguments and...


the date of Christmas comes more from Easter than the winter solstice

Christmas can be enjoyed as a much needed vacation day, a cheery cultural pageant, or a profound spiritual observation. For my part, I’ll take generous helpings of each. As it is a widely shared holiday, the first question is where to put it on the calendar. Thanks to Julius Caesar and his calendrical reforms, and to their enthusiastic adoption by the early Christian Church, we possess a clear date for Christmas.Because it is based on the solar calendar, there are none of the lunar uncertainties associated with many other pre-modern holidays. The specific solar date we know as December 25 can be found proposed in writings from the early 200’s. But, why did the eventual consensus settle on that date? Accounts vary, and it is a curious mystery. Nobody claims that the date was written on a Bethlehem birth certificate. There is no such document, and if there had been, the date would have been expressed as a lunar date from the ancient Jewish calendar. I think our solar date is equal parts historic reconstruction, arbitrary convention, and high art.The best web page I’ve seen on the origin of the December 25 date for Christmas is this one by David Bennett. Bennett recounts several plausible theories, and debunks some popular ones, notably that ancient Christians somehow shot themselves in the foot by co-opting one or more pagan holidays.Bennett’s article describes two biblical lines of evidence (known of course to the ancients) that indicate that Mary became pregnant some time in March. This underlies the traditional celebration of the Annunciation on March 25. Add nine months gestation, and you get the traditional celebration of Jesus birth.Apart from historical truth, I think it is fitting in these days to remember that the historic Church has reckoned the Incarnation as beginning at Christ’s conception, even though the most visible celebration of it is tied to his birth. The respect Christians have for the unborn Christ (which is biblical: see Elizabeth’s greeting to Mary) is no small part of the Church’s traditional aversion to abortion.I personally incline toward William Tighe’s theory that the early Church assigned the Annunciation date to the conventional estimate of Jesus’ death date, March 25. (The ancients were trying unsuccessfully to refer to the historic Passover date; see Tighe’s Touchstone article.) This is all conventional, but when data is lacking, a celebration requires some such convention. It’s not a stretch given that real historical data places the Annunciation some time in March, and the Crucifixion some time in March or April.Doubly identified with both conception and crucifixion, March 25 can prompt both deep sorrow and great joy. A few years ago when Easter fell near March 25, a Catholic Eastern Rite liturgy I went to recognized Jesus’ incarnation and death in one remembrance. It was powerful. My point here is that a proper understanding of Christmas in the calendar points not only back to the Annunciation but also forward to the Crucifixion. As some carols point out, the little Babe was born to die.In any case, given the conventional meaning of March 25, December 25 is a simple corollary, and the proximity to the winter solstice is not man’s invention but a humanly unintended consequence. If so, the link of Christmas to the rebirth of the Sun, like that of Easter to the festival of Passover, is best understood as a creative flourish of divine poetry.Christmas blessings,John Rose

Christmas can be enjoyed as a much needed vacation day, a cheery cultural pageant, or a profound spiritual observation. For my part, I’ll take generous helpings of each. As it is a widely...


the view from the Summit

Last week some Sun engineers and I invited our colleagues over to Sun Santa Clara to talk about the future of VMs and languages. We called the meeting the JVM Language Summit. It was a blast; I’d like to tell you why, and what I learned.Pizza with extra MOPI’ll give the technical bits first, then some non-technical comments.Here are my top-level takeaways:The invokedynamic design is sound, but the exposition needs more work.The synergy of JSR 292 with Attila Szegedi’s MOP looks very promising.Interface injection is going to be helpful to a lot of people, and it is not hard to implement (on top of method handles).Tailcall and value types will never go away. We have to plan for them.Unless we do this sort of innovation on the JVM, crucial multicore research will move elsewhere.We have to do this again next year.The JVM change laundry listWe encouraged the speakers to talk about future directions, especially at the interface between VM and language runtime. We JVM people wanted to hear the “pain points” of the language people. There was a lot to hear!If you have read the proposed JVM changes in my blog, you know I went primed to hear certain requests for JVM changes. Unsurprisingly, I heard them. Here is my take on what people said about JVM futures, both expected and unexpected...too many little classes —Iulian Dragos complained of a “class explosion” from closure creation in Scala, leading to slow startups.Paul Phillips notes that the Scala compiler creates dozens of tiny classes for every “real” source class.Charlie Nutter has spent lots of time working around the problem of adapter generation.Both Paul and Charlie asked for a better way to package such auxiliary classes, in some sort of multi-class file format.An especially annoying problem, faced when dynamic languages call Java APIs, is the choice between slow reflection and fast Java method invokers, where the latter require too many little invoker classes.Method handles should help with most of these concerns; Charlie has found they tend to make his workarounds go away.Similarly, Rob Nicholson expects to use JSR 292 features to simplify library and method calls in PHP.(I hope the JVM can also adopt a multi-class package format, some day soon.)new method linkage or dispatch —The purpose of invokedynamic is to build compact, optimizable call sites with programmer-defined semantics. There were a number of places where this might be useful.Fortress has to support dynamic multiple (multi-argument) dispatch.Chris Dutchyn, on the other hand, argued that it was a mistake in Java’s design not to perform overload resolution at runtime; he exhibited a modified JVM that rectifies original design flaw.More conservatively, Attila Szegedi showed how his metaobject protocol (MOP) can supply dynamic overload resolution logic on top of the existing JVM; combining that with invokedynamic and Dutchyn’s algorithm might be the best way for dynamic languages to acces Java APIs.tagged primitives (aka fixnums) —A number of languages are noticing that their numeric (or character) operations cannot always be optimized down to primitives, and some point out that tagged references can help remove the allocations and indirections required by boxed numbers.Rich Hickey asked for fixnums for Clojure.Cliff Click studied integer loops in several languages, and found excessive boxing in Jython and Rhino (JavaScript); he reports that escape analysis might help, but notes that the complexity of Integer.valueOf caching impedes the optimizer.Real fixnums would be a robust fix to this problem, since they would make Integer.valueOf trivial.Unfortunately, they also complicate paths in nearly every module of the JVM.guaranteed tail call optimization —Tail calls on the JVM are an old wish, back to the early days of Java (e.g., with Kawa Scheme).Doing this right is not easy and seemingly never urgent, so the JVM has not yet supported this.Requestors included Clojure, Scala, and Fortress.Along with continuations, tail calls are a key checkoff item in order for the functional community to become more interested in the JVM.(Immutable data structures apepars to be a third key item; Rich Hickey is providing good leadership on that front.)interface injection —There was a lot of talk about interface injection; it seemed to be a significant new degree of freedom for many of the language implementors. David Chase mentioned it in his Fortress slides, and there was a breakout session to discuss it. I got excited enough to start cutting code for it...continuations —Fortress has challenges with work-stealing deadlocks that arise between callers and callees in the same thread. Introspective continuations may give some extra traction on such problems.Functional programming languages usually claim want continuations, though they often make do with weakened substitutes.I think the best use case for large-scale continuations on the JVM is self-reorganizing threads; small-scale continuation-like structures might also be the right way to make coroutines (generators, fibers, etc.).interpreter vs. bytecodes —Most high-level languages (e.g., JRuby, Jython, Fortress) start out running on some sort of AST interpreter. As they mature, and as they aspire to better performance, implementors start to think about byte-compiling them. At this point, the flexibility and compactness of invokedynamic should be a big help. By the way, Scala cuts across this trend, by decompiling library bytecodes up into ICode (their IR), to assist with non-local optimization.byte compiler vs. JIT —JRuby does delayed byte-compilation, playing the same games as HotSpot, except on the virtual metal. They have noticed that this sometimes confuses the JVM JIT, which assumes that an application’s bytecode working set is stable.Jython would also benefit from a way to deoptimize from fast bytecodes back to a more flexible (but slower) prior representation, in which stack frames are fully reified. Doing this gracefully is an open problem.naked native methods —JRuby needs Posix calls; the Java APIs do not supply many of them (a “glaring” flaw).The right answer is probably not to wrap all the missing Posix calls in Java, but rather to build a lower-level native call facility into the JVM.strange arrays —Clojure has some wonderful functional-style collection data types which could be tuned even more if the JVM offered invariant arrays, arrays with program-defined header fields, or arrays of tuples. Fortress (see a pattern?) made a request for arrays of value types.a fast numeric tower —The Java world needs a flexible, performance-tuned numeric tower implementation.(Clojure mentioned this as a pain point; presumably the other Lisps could us it also.)Kawa, which is admirably factored software, has such a thing. Probably the right way to present it now is via a MOP and invokedynamic, which would allow maximum flexbility and optimizability.Brian Goetz points out that this might also need value types to get right, so a number can live in two registers (the fast version and the slow version, like Rhino does).Note that arithmetic is a poster child for optimized dynamic multiple dispatch.Cliff’s talk noted that the JIT code for Clojure (while generally good) suffered from overflow checks; this sort of thing has to play well with the JIT, which means the JVM manufacturers should way attention to optimizing the library—when we build it.miscellaneous, persistent ideas —Fortress could also use floating point operations with the full panoply of rounding modes, and a way to profile subprograms (e.g., to pick the fastest). Clojure asked for a Rational type.JVM earthquakesSome ideas promised to stretch the fundamentals of the JVM in unpredictable ways.Stress on Java’s heap model include immutability, transactions, and reified types.Fortress wants transactions; it is hard to say how to mix this uniformly into the heap. They are also worried about the problem of “double tagging” objects with both a JVM-level erased type and their own types.(BTW, there were some favorable comments, from those who had worked with both JVM and CLR, about the simplicity of working with the JVM’s erased types, as opposed to the unerased type parameters of CLR, which tend to get in the way of dynamic language code generation.)Finally, Erik Meijer challenged us to design systems which abstract over (and thus isolate) all side effects, including seemingly innocent ones like cloning an object.Threads are also stressed by the new languages. In Fortress, they do not always mix smoothly with workstealing parallelism or with coroutines required by complex comprehensions. Ideally, there should be a way to break a computation into “fibers” which can be created and completed in a few instructions. Parrot (a continuation-based VM) may be able to offer insight into this. Clojure offers a concept of agent which clearly maps to JVM threads, but may require something finer if it is to scale well.Neal Gafter explained how, as the JVM supports new languages, the Java APIs risk behind left behind on the wrong side of a semantic gap, between Java (circa 1997) and the consensus features of the new languages, all of which include some sort of lambda expressions (closures). They also often include richer dynamic dispatch, proper tail calls, exotic identifiers, and continuations. This could be a stressful change, if it requires significant retrofitting of Java APIs.We can do this: A similar stressful change was the retrofitting of the standard Java libraries to generics, and I think that worked out smoothly enough.We (in the Da Vinci Machine project incubator) are working on some of the technically more difficult JVM changes, including tail calls, continuations, and extended arrays. As noted above, fixnums also look technically difficult, because they touch just about every corner of the JVM, so (as far as I know) nobody is looking at them. Some changes that look difficult today may turn out to be practical, once somebody has applied enough thought and experimentation.JSR 292Naturally, I heard (and was responsible for) lots of talk about the JSR 292 effort, invokedynamic, and other Da Vinci Machine subprojects. One breakout session was titled “Invokedyamnic — the details”. Here's a picture of our confabulation:The tale of invokedynamicFrom left to right are yours truly, Jochen Theodorou (Groovy), Rob Nicholson (Project Zero PHP), Tom Enebo and Charlie Nutter (both of JRuby), Fredrik Öhrström (JRockit), Rémi Forax (JSR 292 backport), and Attila Szegedi. And here are Rémi and Jochen writing up method call scenarios:Parallel call site examplesOne of the more useful hints I got from the meetup was about the pitfalls of expressing invokedynamic as just another sort of interface invocation. (We do this to avoid breaking static analysis tools, and because invokeinterface has a four-byte operand field, not because invokedynamic and invokeinterface have similar semantics.) The exposition of the JSR 292 Early Draft needs to be adjusted to make it clear that invokedynamic has no predefined receiver, that all stacked arguments are treated equally. Thanks, Fredrik.It seems clear that, as the invokeinterface encoding is a workaround for static tools, that when in the future we do a separately motivated verifier-breaking change (say, for tailcalls or tuples), it will be wise to revisit the encoding of invokedynamic. I suppose we will want to adopt the unused invoke bytecode for a clean, receiverless invokedynamic, and deprecate the old compatible encoding. But not until we have enough changes planned to warrant a verifier-breaking change. And even now, it seems right to express dynamic calls in pseudo-Java using static syntax:Dynamic.greet("hello", "world", 123);// => ldc "hello"; ldc "world"; ldc 123;// invokeinterface Dynamic.greet(String,int)ObjectFinally, it was really good to stand up and defend (and debug) the design in front of the people who will be using it.It got chewed on for a good long while...Whiteboard aftermathSo, how did it all go?Overall, people seemed very happy with the Summit. I know I was.Our goal was to fill Sun’s largest classroom (90 seats) with engineers and researchers.Result: There were about 75 registered attendees, rounded out by a number of local Sun engineers attending part time.Since it needed to be a small meeting, we wanted a true summit, of lead designers and implementors working on JVM languages and related technologies.Result: Wonderfully, our visitors were a who’s–who in that world, including two out of three original Java Language Spec. authors, JVM team members from Sun (HotSpot, Maxine, Monty), IBM (J9), Oracle (JRockit), and Azul, senior CLR designers, and key engineers from an astonishing variety of JVM languages. We even saw a Parrot.We wanted lots of conversation, for the designers learn from each other and tune up their plans for the future.Result: Except during the talks, the classroom (plus all available breakout spaces) was full of the buzz of intense technical conversations. Many people stayed late trying to finish the last chat, even after the food was gone and it was time to go home. On the comment cards, the highest score (4.9 out of 5) went to the category “quality of conversations you had”.We wanted to be good hosts, and didn’t want logistics to get in the way of the conversation.Result: For a conference organized solely by engineers, it was pretty good, and we think we can make it better next time. Actually, we had the good fortune of enlisting a project manager, who helped us keep our actions coherent. (Thanks, Penni!) The comment cards were both encouraging (4.4 out of 5, our lowest category) and helpful in detail.We wanted to capture much of the discourse, so that more people could share in the talks than would fit in the Sun classroom.Result: Most of the slides, as well as interesting extra information, is captured as informal proceedings in the Summit wiki.Have a look! (I did learn that wiki is a reasonable way to assemble a conference proceeding, but a bad medium for interaction. Note that all the Talk pages are empty, and the OpenSpaces organization moved onto a physical bulletin board.)Best of all, InfoQ.com has professionally recorded the prepared talks, and has promised to put all of them up on the web, some nicely edited with slides, and other posted as raw video. We will have to wait for this, though. (Ask them when!)P.S. Thanks, Oleg Pliss, for the great pictures!(The not-great ones are frames from my video camera.)Laptop – 1, Rose & Goetz – 0

Last week some Sun engineers and I invited our colleagues over to Sun Santa Clara to talk about the future of VMs and languages. We called the meeting the JVM Language Summit. It was a blast; I’d like...


Happy International Invokedynamic Day!

I have been working furiously this summer, patching the OpenJDK HotSpot JVM for the JSR 292 implementation of dynamic invocation.In the wee hours of this morning, the JVM has for the first time processed a full bootstrap cycle for invokedynamic instructions, linking the constant pool entries, creating the reified call site object, finding and calling the per-class bootstrap method, linking the reified call site to a method handle, and then calling the linked call site 999 more times through the method handle, at full speed. The method names mentioned by the caller and the callee were different, though the signatures were the same. The linkage was done by random, hand-written Java code inside the bootstrap methdod.The Email thread of the announcement is truly international, since Guillaume Laforge celebrated by sending virtual champagne.The example code is included in the Email, and also posted (as a truly rebarbative test in a NetBeans project) with the patches. As for the JVM code, it only works on x86/32; the next step is to move the assembler code into the right files, and finish the support for x86/64 and SPARC.Happy International Invokedynamic Day!(And by a curious anagrammatic permutation of letters, it could also be International Davinci-Monkey Day. My co-workers, who watched me pounding on my keyboard all summer, claim to see some significance in this.)

I have been working furiously this summer, patching the OpenJDK HotSpot JVM for the JSR 292 implementation of dynamic invocation. In the wee hours of this morning, the JVM has for the first...


with Android and Dalvik at Google I/O

Invited by some friends at Google, I went to Google I/O this week to find out about Android, and specifically their Java story. I went to a few talks and had some excellent chats with various colleagues.The top ten things I learned about Android and the Dalvik VMAndroid is a slimmed down Linux/JVM stack. They rewrote libc to be 200Kb, redoing speed-vs.-space optimizations, and throwing out C++ exceptions and C-level wide char support. (As far as the JVM is concerned, this reminds me of recent work we have done to “kernelize” HotSpot on Windows.)A special strength of the platform is their attention to detail about reducing the cost of private pages. Many pages are read-only mapped from files, which means they can be dropped from RAM at a moment’s notice. Many other pages are shared with only rare use of copy-on-write (e.g., between JVMs). (This is similar to the work we have done on HotSpot with class data sharing.)The first and main reason they give for using Harmony instead of OpenJDK is the GNU license (GPL). Cell phone makers want to link proprietary value-add code directly into the system (into JVM-based apps. and/or service processes), and they do not want to worry about copyleft. Perhaps there is some education needed here about the class path exception. (I know I don’t understand it; maybe they don’t either. And, their license wonks appear to have a well-considered preference for Apache 2 over GPL+CPE.)The VM they have for Android 1.0 is very basic: A “malloc-like” heap, interpreter only. This means they still have many build-vs-buy decisions to make. I told them they should adopt Hotspot’s first-level JIT (C1), that we should work together on kernelization, and that it is time for a classfile format update anyway.Key reasons against using JVM bytecodes are interpreter complexity and dirty page footprint. The Dalvik bytecode design executes Java code in less power (fewer CPU and memory cycles) and with more compact linkage data structures (their constant pool replacement looks like that of Pack200, and reminds me of some recent experiments with adapting the JVM to load Pack archives directly).The VM uses “dex” files like Java cards use their own internal instruction sets. The tool chain does use class files, but there is a sizeable (100K 70K LOC) tool called “dx” that cooks JARs into DEX assemblies. The dex format is loaded into the phone, which then verifies and quickens the bytecodes and performs additional local optimizations.Something like the dx tool can be forced into the phone, so that Java code could in principle continue to generate bytecodes, yet have them be translated into a VM-runnable form. But, at present, Java code cannot be generated on the fly. This means Dalvik cannot run dynamic languages (JRuby, Jython, Groovy). Yet. (Perhaps the dex format needs a detuned variant which can be easily generated from bytecodes.)The “dx” tool turns classfiles into SSA and then (after reorganization) to dex files. However, optimizations are missing. Loop invariants must still be pulled up manually by programmers. The dex format is not known to be easily JIT-able, however the designers have given some thought to make it so. (Probably the dex format needs some work in this direction. Let’s do that, and standardize on it!)The dex format has the usual merged, typed constant pool with 32-bit clean indexes. (Cf. Pack200.) This work is likely to stimulate the Java world to update the classfile format standard in that direction. Hopefully we can do this in a way that benefits much of the ecosystem.People are thankful to Sun for past stewardship of Java, but are not seeing much guidance from Sun toward the future. At least, that is the story I heard. Whether that story is mere perception or an actual leadership vacuum, our actions with OpenJDK need to reverse it.Bonus: The view from YahooI also met Sam Pullara, and had a good long chat with him about (among other things) what Yahoo would like to see JVMs do better. Here are my notes:about NIOCustomers want simplified non-stateful buffers; buffer.flip is inscrutable.Customers want poll not select. That is, when the data is ready, give it to the listener without an additional fetch, or else when the listener wakes up the data is likely to be gone again (swapped out).about runaway JVMsNeed a way to watch for runaway memory and/or CPU usageCustomers want to kill a whole VM that goes off the rails. (Perhaps applets also.)I noted that TLAB fill events are the right place to cut in a per-thread allocation monitor.JMX is really helpful here. It can report memory threshold events.So let’s add thread-specific ones, and CPU threshold events.JVMs as sandboxes for bad old codeCustomers are eager to use Java VMs as containers for sandboxing old C libraries. These libraries are often non-reentrant and may age badly (crash, run out of memory) after heavy use. But server systems need to run multiple instances of them, in order to scale across HW resources. It it not enough just to load one into your JVM and wrap a thread around it.Customers are sometimes desperate enough to use the “nested VM” hack (JVM interprets MIPS code generated by gcc). Surely there is an opportunity here!So why (I am asked) is there no action on isolates? They look to the customer like a no-brainer (like Android’s zygote). In any case, it is clear to me we need better plumbing and monitoring for isolating such old code. A better story could look like:Keep pre-warmed JVM ready to form new isolates.When we want to start up another instance of a C library, we fork a new isolate.On the client VM side, we have some nice light RPC binding.On the service VM side, we have swig-generated tight (unsafe) binding to the C library.The service instance is monitored and can die or be killed if it misbehaves.Crunchy search goodnessSam showed me a cool Yahoo search plugin for sifting between versions of Java APIs. Here is an example query for hashmap, which probably does not work on your browser until the plugin is installed.(Thanks, Sam and Dan, for sending some corrections. Any remaining errors are still my fault.)

Invited by some friends at Google, I went to Google I/O this week to find out about Android, and specifically their Java story. I went to a few talks and had some excellent chats with...


the golden spike

Part 1: the road to Babel In the Java cosmos we can reckon time in terms of JavaOne conferences. For programming languages on the JVM, the just-finished epoch has seen much progress, and the next epoch looks even better. Here is some of the progress that I am excited about, after bouncing around at JavaOne:Started just in the last year, the JVM Languages Google group has carried dozens of in-depth conversations between language implementors. (Thanks, Charlie!) It has been a great place to air new ideas and sort out complicated design questions.After about 20 months of hard work, JRuby (JVM implementation of Ruby) is reporting competitive performance with the original C-based Ruby, strengthening hopes that, (in Tim Bray’s words as posted by Frank Sommers), “If we arrange for JRuby to be compiled into Java bytecodes, it’ll be running on the JVM, which is one of the world’s most heavily-optimized pieces of software. So JRuby might end up having a general performance advantage.”In the JavaOne talks, the Jython and Groovy projects are reporting similar efforts to slim down their runtime overheads. They lag behind JRuby in this, but they are also making use of JRuby’s hard-won experience.This week, I enjoyed in-depth conversations with key developers of all three languages. I found they are enthusiastically determined to remove layers of overhead between their languages and the JVM they run on. With JRuby in the lead, they are stripping excess dispatching logic, and getting close to the point where the remaining obstacles to speed are a problem not with their system, but with the JVM. I think we all hope that, when the dust clears in a year or two, nearly Java-like performance will be a normal experience for optimized dynamic languages.Language developers are working towards common runtime support libraries on the JVM, to share coding effort, and (more importantly) to embody best practices for structuring the lower levels of the language runtime. In his CommunityOne talk “The Road to Babel”, Charlie Nutter eloquently raised this flag, and offered to contribute code from JRuby. And Attila Szegedi has been working steadily on metaobject infrastructure for JVM languages.At JavaOne I saw a various impressive demos of workstation-like Java stuff running on cell phones. It is clear that mobile device software is in a confused scrum including iPhone, JavaFX, Java ME, and Android, and also is, this year, in a race to the top. The goal of the race is still undiscovered, but doubtless will include a mix of scripting, dynamic languages, and virtual-machine based runtimes. I expect Java will be part of the winning mix.On the pavilion floor I ran into David Chase and Christine Flood of the Fortress Project. They are mostly done with language design and interpreter implementation, and are thinking about compilation to the JVM. Disregarding the desperately cool stuff about parallelism, modern types, DSLs, and mathematical notation, it seemed like any other fledgling interpreted language ready to mature into a compiled one. David, Christine, Bernd Mathiske and I had a great talk about compiling overloading, traits, continuations, and other tricky bits to the JVM, including possible JVMs of the near future. The JVM will be there to help Fortress lead the multicore revolution. Interlude: Railroad history In other news, today (May 10th) is National Train Day. On this day 139 years ago, a ceremonial golden spike was driven at Promontory Summit, Utah, joining the Central Pacific and Union Pacific railways into a single transcontinental line. The continent was very suddenly smaller, because people, goods, and mail could be moved more quickly from coast to coast, a task which was previously done with wagons, horses, and boats. In the years leading up to this watershed event, two railroads were built, with great difficulty and ingenuity, from each coast, to meet at Promontory Summit.There is a Promontory Summit and a golden spike in our future also: I mean, in the future of us JVM and language geeks. I have been describing efforts of the wild and wooly dynamic language types to forge their way eastward, from the sunny shores of La-La Language Land across the forbidding peaks of metaobject-based optimizations.Meanwhile, we JVM developers have been working our way west from the drab industrial zones of profile-directed optimization, loop transformation, code generation, cycle counting, and the SpecJBB benchmarks. We have always held the early Smalltalk and Lisp implementors in reverence; some of our technology even ran Smalltalk before it ran Java. How pleasant it would be to prove that the JVM has what it takes to run not only Java but also Smalltalk and its many hyperactive little brothers.Language developers, working to bring their systems down to the virtual metal of the Java virtual machine, will find their last mile blocked by JVM features they cannot work around. But as they reach that point, they will find the JVM engineers have been working to make the JVM open to the passage of new codes. We JVM geeks are reshaping the virtual metal of the JVM to remove restrictions peculiar to Java, to make it accept the new shapes soon to be produced by the dynamic language compilers. Part 2: making hard things as easy as the easy things JVM optimization tricks (here are some of HotSpot’s) are of course occasioned by problems in Java programs. But the tricks are mostly the same (though updated) as those which streamline the machine code for Smalltalk and other dynamic languages.For example, in my own work on HotSpot, I refactored parts of the system to unify the compilation of corresponding reflective and non-reflective operations, so that the same code generation paths handle both cases, for operations like object creation, type testing, and bulk copying. Reflective operations fold down to the same code as their regular counterparts, if the class parameters are non-constant, and the reflective overheads are still small when the classes are non-constant. This is sometimes useful for Java. More to the poitn, it is always useful for dynamic languages.In the JVM, the crucial action is usually around call sites, and here the road is almost blocked to dynamic languages. The syntax of JVM call sites (in the bytecodes) favors Java and disfavors any call which does not follow the rules of Java’s static typing system. To get past this point, dynamic language implementors are forced to load their belongings into the overland wagon train, fitting all their objects into a limited set of implementation classes—language-specific wrappers—whose Java types express all the operations needed by the languages. Or, they choose the reflective water route, taking a slow boat through the general Core Reflection API offered by the JVM. (Some build their own speedboats, crafting their own somewhat faster reflection mechanisms, but none approach the speed of normal method calls, which are the JVM’s natural element.) At this point, some travellers may give up the journey toward native code, and go back home to tree-walking interpreters. (Yo, those are palm trees, dude.)But the golden spike will be driven when the language developers have done all they can, and the JVM has removed its restrictions against calls outside the Java type system. More generally, the express trains from La La Land to industrial-strength machine code will start running when JVM languages are compiling to the virtual metal, and the virtual metal no longer has unnecessary restrictions inherited from the JVM’s single-language days. Part 3: the JVM breaks out This leads me to a second list of observations which I am happy to make this week, about the reshaping of the JVM in the last year.As Charlie Nutter reports, the current state of the JVM art is delivering great benefits to languages beyond Java. There is no better starting point than happy customers who will be even happier with a better product.The OpenJDK open-sourcing of the HotSpot JVM is finally, really open for business. I know we announced it last JavaOne, but now the public repository is the live, primary copy of the code. If a Sun engineer like me fixes a bug, you will see the fix within minutes of integration, in a public repository like the one my workgroup uses. Now anybody can contribute to the HotSpot JVM. Crucially, this means no more waiting for Sun (or IBM or BEA) business priorities to line up with your pet project: You can change JVM history by adding to the JVM. Vote with Code!Encouraged by co-workers, and by a sense that the time had come at last, I opened the Da Vinci Machine Project, an incubator for new language support in the JVM, including both crazy and sober ideas. (I will let you decide which is which.) It is built on, and part of, the OpenJDK project. I posted the first patch (for anonymous classes) in January, and saw people building it almost immediately. I especially liked the early adopter comment, “New toys!” Actually the JVM is full of great toys; we are now unlocking the toybox to kids on the non-Java playgrounds.Several non-Sun developers (thanks, Remi, Lukas, Arnold, Selvan) have already contributed code or design ideas to the project. One contributor is about to commit fully working runtime support for continuations.Charlie Nutter, pioneer that he is, has begun the experiment of integrating the anonymous classes facility into JRuby.The ever-bubbling blogosphere has blessed the project with buzz, much of it curious and friendly. There have also been three conference presentations, plus numerous favorable mentions at JavaOne. (No, I did not present at this JavaOne. Next year.)The JSR 292 Expert Group met at JavaOne, with face-to-face representation of three competing JVMs (HotSpot, J9, JRockit) and two languages (Groovy, Jython). The EG has been hashing out details the invokedynamic instruction, and has released (to the JCP) its first Early Draft Review.In the last year I have shared (in this blog) many long-held aspirations for the JVM, concerning tail calls, tuples, anonymous classes, exotic names, fixnums, method handles, continuations, interface injection, and dynamic invocation. None of these ideas is new, and my personal involvement with them goes way back to my pre-Java days (Common Lisp, and the cute little Scheme VM I left behind). These ideas are old and new again; their time is coming in the mainstream world of the JVM. In the next year or two, I hope to see each of them at least lightly incubated, if not cooked to perfection and served on a JSR.I think we are on the right track here, letting the JVM grow independently of the Java language. (The language, if it has room to grow, will catch up.) James Gosling expressed a similar sentiment at his February Java Users Group talk “The Feel of Java, Revisited”, when he said he sometimes felt more interested in the future of the JVM than that of the Java language. “I don’t really care about the Java language. All the magic is in the JVM specification.” (Yes, I think that is hyperbolic. No, he is not abandoning Java.) I love both Java and the JVM, and I am pushing on the JVM this year.It has been a great year to be a JVM languages geek, and the coming year looks even better. With an incubator repository open, a draft standard in flight, and language implementors eager to collaborate, we are ready to cut some serious JVM code. Watch this space!

Part 1: the road to Babel In the Java cosmos we can reckon time in terms of JavaOne conferences. For programming languages on the JVM, the just-finished epoch has seen much progress, and the next...


dynamic invocation in the VM

Or, who will message the messengers? Introduction For several years now, JSR 292 has promised an invokedynamic instruction in one form or another. The problem has been with picking the one form that simultaneously enables a good range of use cases, addresses several architectural challenges in the JVM, and can be optimized by a variety of commercial JVMs. It has been a restless search for “one bytecode to rule them all”.The EG has decided to propose an answer, in the form of an Early Draft Review, which is (to me at least) surprisingly simple to specify and implement. It does not even introduce a change to the bytecode format or verifier, yet it provides a hook which refers all important decisions at a dynamic call site out of the JVM and into Java code. This note builds on a previous blog entry, giving more concrete details and use cases.The previous blog entry promised an EDR “in a few weeks”—it will have been twenty-one when the JCP releases the EDR next week. The internal reason for the delay was our sense that early versions of the design tried to do too much in one monolithic bytecode. The current design, serving the same set of use cases and requirements, is refactored by heavy use of method handles, which greatly reduces complexity and clarifies the various roles of language implementors and the JVM. Requirements Why add another invoke bytecode? The answer is that call sites (instances of invoke bytecodes) are useful, and yet the existing formulas for invocation are tied so closely to the Java language that the natural capabilities of the JVM are not fully available to languages that would benefit from them. The key restrictions are: the receiver type must conform to the resolved type of the call site there is no generic way to create adapters around call targets (a corollary of the previous point) the call site must link, which means the resolved method always pre-exists the symbolic call name is the name of an actual method (a corollary of the previous point) argument matching is exact with no implicit coercions (another corollary) linkage decisions cannot be reversed (although optimization decisions change, invisibly) Dynamic languages implementors expend much time and effort working around these limitations, simulating generic calls in terms of JVM invoke bytecodes constrained by the Java language. Although individual language requirements differ in detail, the following generic requirements seem (to me) to be representative: the receiver can be any object, not just one created by a specific language runtime call site linkage is under the complete control of the runtime call site linkage can change over time type checking of receivers (and all other arguments) is under runtime control there are generic ways for the runtime to build wrappers, ways that work for all relevant descriptors despite all this, a call site make a direct call to its target method There is another set of requirements, too, which is about the practical problems of introducing new bytecode behavior into a large, mature Java ecosystem, one with many independent vendors and creators of JVMs, tools, and libraries. The new facilities must be as backward compatible as possible, in all dimensions: the new bytecode behaviors must have a simple and precise specification tools that manipulate bytecodes must be minimally disrupted the new behaviors must be reasonably simple implement, across the range of JVM implementation styles existing JVMs must be able to readily process them, with good code optimization Solution: The linkage state is a method handle Our solution to these requirements is in three steps. First, we factor out method handles as a simple and generic way of managing methods (arbitrary JVM methods) as units of behavior, which are (as methods should be) directly callable. Second, we define an invokedynamic instruction with one machine word of linkage state, a handle to the call site’s target method. Third, we define a set of core Java APIs for managing linkage state and creating the target method handles for call sites, taking care they these APIs can present the right optimization opportunities to JVMs that wish to exploit them. Step one: method handles The first step, of creating method handles, is described in my previous post on the subject. The method handle lifecycle requires new APIs for creating, adapting, and calling them. The key features of method handles are: a call to a method handle directly calls the wrapped method method handle invocation can potentially include the receiver-based dispatch behavior invokeinterface and invokevirtual method handle invocation can potentially include access privileges enjoyed by the creating class (including invokespecial) method handles can invoke both static and non-static methods there is an API to bind a method handle to a receiver, creating a bound method handle there is an API to adapt a method handle to include changes in argument and return type and value an invokeinterface bytecode on a method handle receiver has extended linkage behavior, allowing any descriptor (call signature) to be paired with the symbolic name invoke when a method handle’s invoke method is called, the resolved descriptor exactly matches the receiver’s method handle type(Note: In this account, the term resolved descriptor means the information in the symbolic descriptor, with all class names resolved with respect to the class enclosing the call site. I might have included that term in the original account of call site anatomy!)This is the most complex part of the invokedynamic design, but it is also the most boring part, because every functional language includes the same sort of function types and their associated operations. The least boring part is where method handles touch (and therefore provide direct handles to) pre-existing JVM capabilities, notably interface and virtual invocation.One we have a firm foundation for working generically with methods as directly callable units of behavior, the remaining steps are relatively easy. Step two: dynamic linking The invokedynamic instruction almost identical to any invokeinterface instruction, in that it has a symbolic method name and descriptor, and can process any type of non-null receiver. (Note that the JVM verifier does not make a static requirement on the receiver of an interface call.) However, an invokedynamic instruction does not specify a particular receiver in which the method name and descriptor are linked.Indeed, an invokedynamic instruction specifies no particular interface type against which to link. Syntactically, it is identical to an invokeinterface instruction, except that its symbolic interface is the dummy type java.dyn.Dynamic. (This interface is defined in the JVM, but it has no methods, and nobody ever needs to implement it.) The verifier passes such instructions without objection, in all past and future JVMs. The same is true for any other tool (such as Pack200) that is less restrictive than the verifier about the bytecodes that it will process.Each instance of an invokedynamic instruction is associated with a hidden variable, called the target method, which encodes that call site’s linkage state. This variable is managed by the JVM but not visible as a named variable in any Java class. Its type is a method handle, but it starts out null, which means the site is not yet linked. When the site is linked, any call to that site is equivalent to a call to the target method handle.Thus, an invokedynamic works very much like a normal method handle invocation site, with the same generality of calling sequences (descriptor polymorphism). The argumeent and return type matching rules are the same: The resolved descriptor of the call site must exactly match the target method’s type. But there are these differences from explicit method handle invocation: the called method (the explicit receiver with non-dynamic invocation) comes from the linkage state word the symbolic method name can be any constant string whatsoever there is an unlinked state which causes special processing of the call site the JVM may have more opportunity to constant fold and/or inline the linkage state, if it appears to be stable Step three: target method management Beyond the basic APIs for managing method handles, there are specific APIs for managing linkage state of call sites. The most important API pertains to bootstrap methods, which I have previously discussed. Other APIs have to do with the specification of bootstrap methods, bulk invalidation of multiple dynamic call sites, and optimizable combinators which produce useful target method handles.There is not room in this blog for full treatment of these APIs; I will merely sketch them. More details are given in the JSR 292 EDR, which is coming out next week. (The ‘E’ in ‘EDR’ stands for early. It is still early.) Full details will evolve over time, and will be easily observable as javadoc comments on the reference implementation (RI) in the Da Vinci Machine repository forest, and will of course be duly separated out from the RI into the final JSR 292 specification.To continue... Each class containing invokedynamic instructions must also specify a bootstrap method. This specification happens either via a classfile attribute, or else an explicit registration call.The bootstrap method is itself a method handle, whose signature is universal in about the same way that reflective invocation is universal: The arguments are boxed into an object array, and primitive return values are unboxed. (Unlike reflective invocation, any thrown exceptions are left unwrapped.) In effect, the JVM allocates one additional machine word per class to hold the method handle of the bootstrap method for that class.When an unlinked dynamic call site is executed, the bootstrap method is called. It receives the following information: the outgoing arguments in an array, with all primitives boxed static information, of type java.dyn.StaticContext about the context of the call: caller class, method name, resolved descriptor, etc. a capability object, of type java.dyn.CallSite, which supports getting and setting of the call site’s target method (linkage state) First of all, the bootstrap method is responsible for fulfilling the current call. It may ignore the CallSite value, or it may store it in a table somewhere, or it may immediately compute an appropriate target method (as a method handle) and use the CallSite.setTarget method to link the call site.The extra information (beyond the arguments themselves) reifies (or makes real to Java) the call site itself. Most of this reification is merely informative (introspective), but the CallSite, crucially, lets the language runtime change the reified call site’s target method. The bootstrap method can be viewed as a call for help from the call site itself, a messenger temporarily incapable of delivering a message, sending a meta-message to the language runtime. The answer to the call for help is the setTarget, a meta-message back to the messenger. When linkage is complete, the messenger disappears again, and the call site may again be viewed as a simple message, which calls the target method.By the way, passing a null to setTarget will immediately unlink the corresponding call site, restoring it to its original linkage state. Passing it another method handle will immediately change the linkage of the call site. Interactions with the memory model have to be nailed down, but the setTarget call will probably have a similar effect (regarding memory order) to setting either a plain static variable, or a volatile one. Performance considerations All this will be disappointingly slow if the JIT is not able to process dynamic call sites with similar optimizations as other call sites. Because the linkage state is a target method, it can be directly invoked, the same as with a normal method handle call.Because the linkage state is a single reference, it is relatively simple for a non-optimizing JIT to compile a dynamic call site with a pluggable pointer, not too different from the old monomorphic call pattern. The pointer handles the current target method, or perhaps jumps through a signature adapter to the bootstrap method. The bootstrap adapter can be generated by brute force in the JVM, or (more likely) by an up-call to let Java code do the heavy lifting.The linkage state is not an exposed static variable, as in the case of pure Java simulations of dynamic calls. Having it be hidden means that the JIT can conspire with the JVM to do a deep analysis on the structure of the target method. Getting to the bottom of this analysis requires one more thing: The target method handle, if it is more complex than a simple direct method reference, must still be simple enough for the JIT to analyze (as a compile-time constant object) and “see through to the bottom”.In particular, if the target method is an adapter which performs type tests and dispatches to one of several ultimate targets, that dispatch logic must be fully foldable. Who is responsible that this happens? First, the JVM implementor must provide the APIs which create such cascading, adapting method handles, and make them have a format which is transparent to the JIT. (This is a JVM-specific task.)Second, the dynamic language implementor should use those APIs which the JIT is able to optimize, rather than using custom versions unknown to the JVM implementor. (There is a style of higher-order function called a combinator which I think will be useful, but this blog entry is already long enough.) Nevertheless, sometimes the Java coders lead while the JIT follows as best it can. We will sort this out during the EDR period, as we prototype. Applications Here are a few of the applications of this design: simple one-time linkage (bootstrap method finds a handle, installs it as final target, and calls it in tail position) call to Java method (bootstrap method emulates JVM linkage, does one-time linkage) call to pseudo-Java extension method (bootstrap method finds extension method, does one-time linkage) monomorphic inline cache (bootstrap method installs an optimistic target, with a type test and fall back to bootstrap again) polymorphic inline cache (like previous, but bootstrap method re-balances a decision tree each time a new type is seen) call site warmup (bootstrap method increments a count and gathers type info. until call site is mature for optimization, then installs a well-crafted target method) traits (like polymorphic inline cache, but method lookup does pattern matching and method instantiation) pattern matcher (decision tree within target method has patched-in recognizers, can grow over time as new cases appear) PyPy-style block-oriented JIT (each branch point is a patchable invoke site, adds more compiled blocks as new types are seen; requires tailcall also) All of these applications (patterns, really) can be coded up today as pure Java patterns. In fact, if the invokedynamic feature is to be successful, the same APIs I described above as belonging to upgraded JVMs will also have to be implemented, backward compatibly and more expensively, on existing JVM releases. The fact that this is possible does not remove the need to extend the JVM architecture.Such backward-compatible code patterns are really simulations, not direct encodings, because the unit of behavior cannot be represented today as a simple method, named via a method handle. In the backward compatible code, a unit of behavior will be represented by a method, wrapped in a class, named by a throwaway name, loaded in a throwaway classloader (if GC is required), installed in the JVM’s internal dictionary, implementing a signature-specfic interface, and finally stored as a target method in a static variable (or worse, in a table of some sort). Everything I just described beyond the method handle is the overhead required by a simulation of method handles in existing systems. By integrating method handles into the JVM, then building invokedynamic on method handles, we get a new, more direct way of accessing the power of modern JVMs for dynamic code.

Or, who will message the messengers? Introduction For several years now, JSR 292 has promised an invokedynamic instruction in one form or another. The problem has been with picking the one form that...


JSR 292 meeting at JavaOne 2008

Hello, JSR 292 observers and language implementors! The JSR 292 Expert Group met today at JavaOne.There were representatives from three major JVMs and two dynamic languages (Groovy, Jython). Here are some of my notes from that meeting. I hope you find them interesting.The EDR for invokedynamic has been given to the JCP. It is a milestone!Grouping of JSR 292 features Current EDR is invokedynamic only (with method handles required for support) Adjustments during the 90-day EDR period will not add unrelated features Other features are likely (class modification of some sort) but will be independent When the final spec. is presented, it will include whichever features are ready. Relation to Da Vinci Machine Project http://openjdk.java.net/projects/mlvm/ interesting experiments with anonymous classes, interface injection, continuations, etc. JSR 292 EG controls whether these become an EDR on the way to standardization community controls whether they get tried out the EG produces specification only, not code (JVM implementors insist on this!) Adoption of JSR 292 features depends on... Demonstrated usefulness to language implementors (must integrate & demo. a POC)\* \* Optimizability by JVM implementors (must think it through to the instruction level) We will do this work this summer and reconvene (with more people) at the JVM Language Summit.Technical Discussion of Draft DesignJVM to language implementors: Do dynamic languages really need performance? As long as dynamic languages are the 5% scripting portion of an app., they do not. If they get parity with Java, they can grow from 5% to 50%. If they get enough performance to beat the original C implem., they can become platform of choice. For any given language (e.g., Python) if the pure (non-C) libraries get parity with C, lots of things get easier. What do language implementors want? Short list: Invokedynamic, method handles, continuations. JVM implementors are nervous about continuations (Da Vinci M. experiments ongoing) Method handles and invokedynamic Method handles look good to all parties; a nicely balanced design. They give direct (JVM-native, non-reflective) access from caller to a method. Multiple use cases, for multiple languages. (Discussed in detail for Jython and Groovy.) The JVM managed state for invokedynamic is one word only, a single method handle. Simple for JVMs. Caching & receiver guard logic is responsibility of language implementors. We discussed a few JVM-centric questions about method handles and invokedynamic.

Hello, JSR 292 observers and language implementors! The JSR 292 Expert Group met today at JavaOne. There were representatives from three major JVMs and two dynamic languages (Groovy, Jython). Here are...


interface injection in the VM

Or, how to teach an old dog new tricks. Introduction “Self-modifying code...” used to be a phrase always uttered (by us hackers) with tones of both admiration and dread. Admiration, because there are stories from the earliest days of the stored program computer of how impossibly clever programmers would make their code flip state with perfect grace, simply by modifying (as data) an instruction it was about to execute. Dread, because when we tried the graceful flip on our own, the usual result was... less graceful. Painful, actually. Yet many of us all have a self-modifying code story, somewhere back in time, that we view with pride, perhaps like the bowler’s perfect game, or the golfer’s hole-in-one.Is self-modifying code still an object of fear? It goes in and out of fashion. Operating systems and VMs are required to support it (always, in the loader). Aspect oriented programming has made a cottage industry of it, and I haven’t heard the horror stories yet, nor the backlash, that sometimes turns such things from a hip, edgy exploration, into a firearm on the playground. Though a longtime practitioner (I’m a JVM nerd), I still fear it, and when I hear customers ask for an API to edit classes in the JVM, I always reach for an alternative, a prescription subtitute for the illegal substance. Inteface injection is a good substitute for a surprisingly wide range of use cases; perhaps it can handle your use case for self-modifying code also.(And if not, I will reach for yet more substitutes. Most any state change in your program could be modeled as self-modifying code, even a variable rebinding. [partial edit deleted] But it is a very powerful measure, liable to disastrous consequences from even small mistakes, and very hard to implement efficiently in the poor JVM which is loading the new bytecode. Not only do you have to load the new code, you have to undo the relevant effects of the old code, and there is always the temptation to “diff” the old and new versions, so as to avoid undoing and redoing everything. But diff-patching something that complex leads you down a long path of painful bugs.)Ramble over. Now to business... Interface injection is additive Interface injection (in the JVM) is the ability to modify old classes just enough for them to implement new interfaces which they have not encountered before. Here are the key design points, in brief:When an interface is injected into a class, any unimplemented methods must be supplied at the same time.If a method is injected with an interface, it is not accessible except via that interface itself. (It does not alter or interfere with virtual dispatch or name linkage.)When an interface is injected into a class, it is visible (via normal runtime type checking) on all instances of that class, whether pre-existing or created later.If a type check ever finds that a given class does not implement an interface, that interface cannot later be injected into the class.Every injectable interface is equipped with static injector method, which is solely responsible for managing the binding of that interface to any candidate class.For any given class and injectable interface, the injector method is called the first time the question arises whether the class implements the interface. (This could be an invokeinterface, an instanceof, or a reflective operation.) Just before the decision, the class is called an injection candidate.A class can be a candidate at most once. The decision made that point is final. (Except maybe for power tools like debuggers, etc.)If the injector method must supply missing implementations of interface operations (this is the general case), they are supplied as a collection of method handles.For any given class and injectable interface, the injector method is called the first time the question arises whether the class implements the interface. (This could be an invokeinterface, an instanceof, or a reflective operation.)The decision made by the injector method (no, or yes with needed methods) is final.For example, the Java class String implements the useful interface Comparable. This interface was not present in 1.1, but was added in 1.2, as a compatible extension to the 1.1 API. (This is a nice thing about interfaces: They coexist nicely. Java super-classes are by contrast territorial; there can only be one per class.) As it happens, the compareTo method was pre-existing in 1.1.Suppose, for argument’s sake, that Java did not have the notions of comparability and sorted collections. With interface injection, a language runtime could add these notions (for its own use) in a modular way. It would define (in its own package, not in java.lang or java.util) the relevant interface and collection types. The language runtime would then define an injector method which knows about all standard classes (like String and Integer) to which the language wants to assign its idea of comparability.When the program starts to put system-defined types (like String) into a sorted collection, there are type checks or interface invocations against the injectable interface. This leads to decisions about injecting the new interface to the old classes. The feel of it (though not all the details) is like the early linkage phase that Java programs go through when they first execute their symbolic references. The system types that the language runtime cares about are retrofitted with the needed interface, and Java strings coexist smoothly with language-specific collections.Perhaps the language runtime handles an unexpected candidate class by inspecting it, looking for a compareTo method that it understands. This pattern matching is open-ended, limited only by the imagination (and good taste) of the language designer.Fast forward to the present: We already have Comparable, but think of an interface that some non-Java language needs. A simple example would be a different flavor of serialization, like Smalltalk’s inspect operation, which can output a more or less human-readable Smalltalk program for reconstituting the object. Now that the JVM world is not all about Java, it is no longer possible for the person writing in java.util to reach over and add a few lines of code to each system type in java.lang. But interface injection can do this, without the need to introduce new code into the system classes.Consider the current crop of dynamic languages (Groovy, JRuby, Jython, etc.). Most or all of them have some sort of master interface that provides access to their dynamic lookup mechanisms (aka. their metaobject protocol). For example Groovy has this master interface:package groovy.lang;public interface GroovyObject { MetaClass getMetaClass(); ...}(By the way, thanks to Guillaume Laforge, for raising this example at Charlie Nutter’ excellent “Road to Babel” session today at Moscone. This blog post is for all the groovy people...)The GroovyObject interface has other methods for picking apart the object via methods and properties, but the getMetaClass operation is the only one really necessary, since the other operations (like getProperty) can just as well be placed on the metaclass object. This is the style of coding that HotSpot uses internally, in its C++ code, and is close to the style of code that Attila Szegedi has adopted in his admirable dynalang project.Interface injection does not simplify Groovy’s task of deciding how to bind new Groovy methods to old types. (There are lots of them, like java.lang.String.tokenize. And java.lang.String.execute will never be a real Java method, since it executes the string as Groovy code!) But interface injection radically simplifies the process of operation dispatch, since a Java string can be linked to its Groovy’s metaclass in one call to getMetaClass.The current system must (in the general case) fetch the string’s Java class (using getClass) and somehow do a table lookup on that class to find the Groovy metaobject. This table lookup defeats JIT optimizations, since the JIT cannot reasonably know the contents of the ad hoc type mapping table. But it routinely optimizes interface calls; the JIT can ask the JVM whether the string implements the GroovyObject interface. If it then inlines the getMetaClass call, it then has an excellent chance of inlining the ultimate call to tokenize or execute or whatever. Making it fast This design is not slow, although it has a clunky bootstrapping behavior. (Perhaps there is a way to make it more declarative, at least in common use cases...) For example, the method handles supplied by the injector method can be almost as directly invoked as normal predefined interface methods.JVMs use a variety of indexing structures to organize classes and their interface methods. Typically, there is a runtime lookup, often involving a short search, when invokeinterface executes. This search finds the interface in the receiver object’s class, and then loads the method from some sort of display of that interface’'s methods in the receiver class.If virtual calls use a long array of method pointers called a vtable, then interface calls may well use a short array of method pointers (specific to one interface only) called an itable. The invocation operation first finds the itable in the receiver class, and then jumps through the relevent method slot. This is how HotSpot works; it is not unusual, but as I said there are many variations on this theme. The irreducible minimum is a quick search or test, and a jump.Anyway, the search or test has the potential of failing. The JVM (verifier or no) can present an invokeinterface instruction with an object which does not implement the desired interface. The result is something like an IncompatibleClassChangeError. The important point to notice is that the search is followed by a slow error-reporting path.Interface injection can work smoothly in most any JVM by putting fallback logic into that slow path, between the fast lookup of predefined interfaces, and the final error report. If the normal search comes up with a little itable embedded in the receiver class (as in HotSpot), the fallback search can also come up with a little itable, linked after the fact into the receiver class. In essence, the interface lookup degrades, at worst, into a linked list search. But there are all the usual optimizations that can apply, since the JVM and JIT know it all. Applications Let’s quickly overview some of the applications of this design for continuations. Metaobject protocols This application was sketched above. Every language can have its own metaobject, and they can coexist, reusing the original system objects without modifying them. Traits Languages which define traits (the structural version of nominal interface types) can readily implement themm, and most efficiently, on the JVM via interface injection. Each trait is called via an injectable interface, with a corresponding class (a nested class in the interface, I would say) which carries the trait implementation, as a set of static methods. Note that method handles allow you to mix and match static and non-static methods as long as the signatures line up.Traits sound exotic, but they usually amount to utility methods on interfaces. With traits, most of the static methods in (say) java.util.Collections could be re-imagined as extension methods on the various collections interfaces. Actually, the injectable interfaces would be subtypes derived from the main interfaces. Numeric towers Numeric towers (as in Scheme) are difficult to engineer well, and exceedingly difficult to engineer in a modular way. The acid test of a numeric tower is whether you can add a new numeric type (like complex, or rational, or formal polynomials) without changing a line of code in the old types. Interface injection gives a hook for constructing doubly-dispatched binary operations (like addition) in terms of type-specific unary operations. For example, complex would define operations like addComplex, and inject them into previously defined number types like Integer. Virtual statics Sometimes it is helpful to define a value which is shared by all objects of a given class, but which subclasses can override. It is as if a static variable were also declared virtual. A canonical example of this might be the sizeof operation, which gives the size in storage units of a class’s instances. If you are willing to ask instances (not the class itself) for the shared value, interface injection can be used to define constant-returning methods on relevant classes. This is actually just a generalization of the getMetaClass pattern. As in that case, if the JIT can predict the class of the receiver, it can constant fold the “static” constant, and optimize from there. Conclusion This feature is under consideration in the Da Vinci Machine Project, and some form of it may make its way into the JSR 292 standard. Remember my peeve about self-modifying code? The original form of JSR 292 includes a vague promise of considering features for class extension.

Or, how to teach an old dog new tricks. Introduction “Self-modifying code...” used to be a phrase always uttered (by us hackers) with tones of both admiration and dread. Admiration, because there are...


continuations in the VM

Or, how to finish a job twice.Or, anything worth starting is anything worth starting is anything worth starting is anything worth starting is ... Introduction A continuation, simply put, is a reference to the rest of some program P, as of some given point in the midst of P. It is an interesting design problem to introduce continuations into the JVM. I don't know of a full design for JVM continuations, yet, but it's possible to observe both the easy and the hard parts, and to survey some of the reasons we should care.More specifically, in a language with statements and procedure calls, a continuation at a given statement M.S within a procedure M consists of an indication of the statement after S. If M was called from another procedure N at a statement N.T, then the continuation also includes an indication of the statement after N.T, and so on down to some primal call. Adding expressions complicates the account superficially; it is as if the expressions were broken up (with temporaries) so that each expression contains at most one call statement, and the call statement (plus assignment of its value) is the last action in the expression.The structure of a continuation, therefore, is much like the backtrace you see in a debugger. Unlike a debugger, continuations are not generally created between any two instructions, but at predetermined points. As with a debugger, you can issue a command to continue from a continuation. As a decisive difference from debuggers, a continuation is not literally a suspended computation state, but is rather a recorded copy (or virtual copy) of one. Your application can continue on its way, pushing and popping the control stack, perhaps recording new continuations. At some later point, if a saved continuation is resumed, the application drops what it is currently doing and continues from that saved point... with the old backtrace.A method that resumes a continuation abandons its future execution, and the the future of each of its callers, all the way back to some boundary state that the continuation and the caller share in common. This is very similar to a non-local jump: Throwing an exception make the current method exit, and forces its callers also to abandon their normal executions. The boundary state in this case is the catch block which receives the exception. (We will talk about try/finally blocks later.) What is different about continuations is that they can “throw” downward and sideways also. Resuming a continuation in general unwinds the stack back to a shared state and then restores a bunch of stack frames.It is almost as if the saved continuation were being held by a suspended thread somewhere, and when it is invoked, the current thread gives up all claim to its future, passing all execution responsibilities to the suspended thread. This allows continuations to implement (or simulate) coroutining structures, where method entry and exit are not rigorously paired bracket operations, manageable by a control stack. Continuations are not equivalent to coroutines (or generators), because a coroutine can yield control to another coroutine without losing its own future; the other coroutine eventually yields control back. But the difference is not huge, since you can always record a new continuation and save it somewhere for the next yield, just before invoking an old one. In fact, threading packages (“green” style) have been built on top of continuations. Building coroutines and threads from continuations is actually efficient if stacks can somehow be switched by altering a few registers. This becomes straightforward if stack frames are allocated on a heap (at which point we start calling them activation frames). It is more painful on ordinary C-style stacks, where resumption requires overwriting the control stack with new bits.You may have noticed that continuations can simulate an interesting variety of control flow patterns. In fact, they seem to be a universal primitive for building single-threaded programming constructs. Since Sussman and Steele’s famous Lambda Papers and specifically Guy Steele’s Rabbit thesis, optimizing compilers use the continuation passing style to represent and transform all kinds of (localized) control flow. In a compiler’s intermediate representation, a continuation can reify (make concrete) the control flow of a program as just another data value. The result is such flexibility and precision that no other representation of control flow is necessary. For example, an if statement turns into an expression which selects one of two continuations and jumps into it. The actual selection involves no control flow, so all the control flow happens via the continuations themselves. Continuations native to the JVM Most runtime systems provide only limited access to the control stack or to the intended future of a computation. Typically, the high water mark is some ability to abort a computation back to some cut point (a longjump or exception throw). Some languages also allow a certain amount of stack introspection. The JVM was designed with such features, but not with full continuations in mind. It is time to consider how to add them in, now that the JVM is being used as the substrate for a variety of languages. There is a very good thread on the JVM languages group, CPS languages in Java which discusses various tactics for emulating continuations on the JVM. In this note we will think about low-level extensions to the JVM itself, to remove the need for difficult workarounds.The essential primitive operations read and write the control stack as a whole. An API which is sufficient for this would include a copyStack method and a resumeStack method. Because of the security loopholes in these operations, they must be highly privileged. In a prototype, I have placed them in an internal class called sun.misc.Unsafe, which already supplies risky operations like reading and writing bytes to arbitrary addresses. The API looks like this:// low-level hook, to be wrapped in one or more safe APIs:class sun.misc.Unsafe;public nativeObject copyStack(Object context, CopyStackException ex) throws CopyStackException;public native /\*unreached\*/void resumeStack(Object stack, Object value, Throwable exception);public staticclass CopyStackException extends RuntimeException { /\*\* The JVM state, in an implementation dependent format. \*/ protected Object stack;}public nativevoid doCopyStackContext(Object context, Runnable r);A call to copyStack captures a snapshot of the current thread, places it into the given exception, and throws that exception. The caller may choose to catch it immediately, or let someone up the call-chain catch it. The normal return value (typed as a simple object) does not come into play at first. If and whenever the captured thread snapshot is resumed, the function returns normally, with whatever value the resumer provides, or (again) throws an exception, with whatever throwable the resumer provides. This can happen any number of times. Thus, we distinguish between the initial call to copyStack, and any number (zero or more) of returns from copyStack.If the context argument to copyStack is null, the captured thread state specifies all frames of the current thread. Otherwise, the context must correspond to a live call (in the same thread, not yet returned) to doCopyStackContext. The captured JVM state then specifies only stack frames which are younger than that call to doCopyStackContext. It is an error for a resumed computation to attempt to return to that call to doCopyStackContext. The context is stored into the supplied CopyStackException, which must not already have a stored context. (That is, its stack field must be null.)The basic idea is simple and powerful: An application can control its own continuation on the JVM by reading and writing its own thread stack, or recent sections of its thread stack. A stack of difficulties As a wise webslinger once said, with great power comes great responsibility. A corollary in this case is that great power can blow a limb off. That is why the above primitives are in a class named “unsafe” with limited access.For example, try/finally blocks are usually enforcing a program invariant that depends on paired brackets. The finally clause cleans up something (a close bracket) set up earlier (as an open bracket) near the try. When continuations are introduced, if a method exits twice (say, once to checkpoint, and once later after it is done), then the close bracket will happen twice, mismatching with a single open bracket. Language with continuations deal with this by requiring the programmer to express a three-part initially/try/finally structure, so then when control re-enters the block, the initially action has a chance to re-assert the open bracket. Retrofitting this to the JVM will be challenging.Locals in the JVM are part of the continuation. This means that when a program returns to an earlier state, the locals will assume their old values. This might be surprising. Languages in which continuations are native work with this fact by moving mutable variables to the heap (boxing them, in effect). In this way, their state is independent of the thread stack, and of any continuations built from it.Uncontrolled introspection of thread state is a huge security loophole. If I am able to build an inspectable continuation, and I can convince a privileged routine to call me (as a callback or even handler), then I can use the continuation to look at the local variables of the privileged routine, perhaps revealing secrets like passwords. If I can invoke the continuation, I can also forge a clever copy (and here I use the word “forge” in its dark sense) of the privileged routine, and perhaps restart it on values of my choosing. The unsafe continuation API presented here allows all this, and it therefore badly needs to be tamed by a suitable security model. Applications Let’s quickly overview some of the applications of this design for continuations. Reification The JVM reveals the stack state as an explicit object, which (as we say) reifies the stack state (which was only implicit before). The object is placed by the JVM in CopyStackException.stack. (This is a protected field; the encapsulation prevents random exception handlers from accessing the state.) This object will be be given an API which allows the application to inspect the state of the stack as a whole, and each stack frame. This includes locals, stack, monitors, current method, and BCI (bytecode index within method).In the first prototype, the object happens to be is an array of objects, starting with a byte array of serialized structure information. Indexes in the serialization call out to other elements of the object array. The format is non-opaque, and allows new JVM stack states to be created from whole cloth, supporting a low-level form of thread mobility. Previously captured states can be modified, supporting advanced performance tuning or configuration management, as Hotspot does internally at present. This simplicity is appealing, but requires stack frames to be parsed and unparsed into the external representation, always. A more tightly coupled representation could have more opaque handles to JVM data, with less representational distance to live stack frames. It could still be unparsed when the occasion arises. Inspection Given reification, it is then simple to write browsers and reporte generators which walk the stack and present its contents. Intercession If the reificiation can be copied, adjusted, or forged from raw materials, then the resumeStack operation can put any contents whatever into the stack. If inspection is a read-only operation, intercession is the powerful update operation. Migration If a continuation can be forged from raw materials, it follows that, with suitable mechanisms for serialization, a computation can save itself to disk or network, and be picked up again in a different process or network node, to be continued there. There are big problems defining the edge of the computation, ripping it apart on serialization, and sewing it together again in the new location. But the problem is worth solving. (For a use case, consider that this is how the Second Life engine works.) Serializable reification provides the hook for relocating a much wider range of JVM computations.Local migration is an interesting case also. As computers get larger and larger, the illusion of uniform memory gets fainter and fainter. Increasingly, threads will need to follow data. (When? Compare the bandwidth to move a thread with the bandwidth required to move the data it works on. It will sometimes be cheaper to ship the thread to the data than vice versa.) In this case, the movement could occur within a single space of managed objects, so serialization won’t be part of the migration sequence. Generators A generator is a small coroutine with a loop coupled to a caller’s loop in such a way that as the subroutine loop yields values, they are presented to the caller loop as an interation space. Generators are often compiled in a continuation passing style, so that they can be run on the caller&rqsuo;s stack. This can put limitations on the code of a generator. With limited continuations on the JVM, a generator could be compiled to normal code that natively coroutines (by switching continuations) with the caller loop. The limited switching never leaves control of the block containing the loop, and this condition can be naturally represented by a doCopyStackContext block. The generator and client continuations will contain only a couple of frames. We should be able to arrange things so that the JIT can see the whole thing, unfold the continuations at compile time, and generate the normal straight-line code for producing and consuming each generated value. Coroutines More generally, scoped coroutines are subroutines which run inside a parent block, passing control back and forth, but never leaving the block (except to exit the overall computation). This also can be represented in the JVM by contextual continuations. Proper continuations The Scheme language has a specific function (call/cc) to form a handle to the program’s continuation. This handle is opaque; it can only be invoked, not inspected. It allows the thread’s entire computation to be reset back to the execution state (of the control stack) where the continuation was formed.This has been used in the past to implement large-scale coroutines, which behave like monoprocessor threads (aka “green threads”). It is more commonly used in Scheme for generator-like control structures. (And for brain-cracking code puzzles—enough said.) User sessions Third use case: Some programming frameworks for Java, notably RIFE, use continuations as a natural notation for web sessions and other intermittent or restartable processes. They current do this by transforming Java bytecodes into a stackless format. There would be less need for bytecode rewriting if the JVM stack did not get in the way, and editable continuations seem the likely native expression in the JVM of such patterns. Reoptimization At the level of a language runtime, reified continuations can provide a hook for the language runtime to refactor the programmer’s computations on the fly. For example, if a method is never overridden, it can be called directly (and perhaps inlined). If during a long loop, the method is overridden by class loading, the code might need to be adjusted to call a dispatch routine. In general, this requires rewriting stack frames. HotSpot has done this for a decade; we call the backoff process deoptimization. The first prototype of copyStack took a day or two to write, because it just exposed the underlying reification mechanism (written in C++) needed for reading and writing stack frames.Beyond the problems of Java reoptimization, there is a big world of higher level optimizations. For example, a loop can be optimized very differently depending on the size of its iteration space; some language implementors may choose (like HotSpot itself) to wait until the application warms up, and then refactor it on the fly to (say) use a parallel algorithm for a large, important loop. What’s next? I have posted an initial version of JVM continuations in the mlvm project. HotSpot bug 6655643, “some dynamic languages need stack reification” tracks this line of investigation.Lukas Stadler (at Johannes Kepler University) has gone much farther, implementing resumption of continuations. He will be updating this patch soon. A recent update from Lukas gives an idea of where he is headed. I expect we can turn the power on, soon.

Or, how to finish a job twice. Or, anything worth starting is anything worth starting is anything worth starting is anything worth starting is ... Introduction A continuation, simply put, is a reference...


method handles in a nutshell

The JVM prefers to interconnect methods via static reference ordispatch through a class or interface. The Core Reflection API letsprogrammers work with methods outside these constraints, but onlythrough a simulation layer that imposes extra complexity and executionoverhead. This note gives the essential outlines of a design formethod handles, a way to name and interconnect methods withoutregard to method type or placement, and with full type safety and nativeexecution speed. We will do this in three and a half swift movements...1. Direct method handlesGiven any method M that I am able to invoke, the JVM provides me a wayto produce a method handle H(M). I can use this handle later on, evenafter forgetting the name of M, to call M as often as I want.Moreover, if I provide this handle to other callers, they also caninvoke M through the handle, even if they do not have access rights tocall M by name. If the method is non-static, the method handle alwaystakes the receiver as its first argument. If the method is virtual orinterface, the method handle performs the dispatch on the receiver.A method handle will confess its type reflectively, as a series ofClass values, through the type operation.In pseudo-code:MHD h1 = H(Object.equals);MHD h2 = H(System.identityHashCode);MHD h3 = Hs(String.hashCode);assert h1.type() == SIG[(Object,Object)boolean];assert h1.invoke(r1,a1) == r1.equals(a1);assert h2.invoke(a2) == System.identityHashCode(a2);assert h3.invoke(r3) == r3.invokespecial:String.hashCode();The actual name of the type MHD will be given shortly.The actual API for H and Hs is uninterestingly straightforward, andmay be found at the end with the other details.To complete the low-level access (and fill a gap in the CoreReflection API), there is a variation Hs(M) which forces staticlinkage just like an invokespecial instruction, and isallowed only if I have the right to issue aninvokespecial instruction on M.From the JVM implementor’s point of view, there are probablythree or four distinct subclasses of direct method handle,corresponding to the distinct varieties of invoke instruction.To round things out, one kind of method handle should work forinvoking a method handle itself. These are low-level concerns,which hide nicely behind the H (and Hs) operator described above.2. Invoking method handlesGiven a method handle H, I can invoke it by issuing aninvokeinterface bytecode against it. The signature I usemust exactly match the original signature of the target method. (Evenbeyond the spelling, the linked meaning of class names must be thesame, in the argument and return types.) The method name I use mustalways be invoke (not the name of the target method).In pseudo-code:MHI h1 = ...;h1.invoke(a1...)The type MHI is special interface type known to the JVM.(Its actual name will be given shortly.)MHI functions as a marker interface to tell the JVM that thisoccurrence of the invokeinterface bytecode must be treatedspecially, different from all other interface invocations. For onething, normal JVM linking rules cannot apply, because the signature ofthe call site relates to the target method, not to the markerinterface.This kind of call site works on direct method handles (type MHD)created in part 1 above. In a moment we will drop the other shoeand observe that it works on other types of method handles.The invokeinterface instruction is uniquely suited forthis sort of JVM extension, because the result for bytecodeverification allow any object to serve as the receiver of an interfaceinvocation.3. Adapting method handlesThe type MHI provides a very flexible jumping off point, for thebytecodes of one method to call any other method, of any givensignature. The next question is whether the calling method andreceiving method have to agree exactly on the signature, and theanswer is “no”. This brings us to the third and finalmajor design point, of adapting method calling sequences.The most important case of adaptation is partial invocation(sometimes known as currying or binding).A direct method handle by itself is really quite boringbecause, unlike nearly everything else in an object-orientedsystem, it is pure code, with no data to modify its meaning.Thus, given a method handle and some arguments for it, the JVM willgive me a partial invocation of that method handle, which isthe new method handle that remembers those arguments, and, wheninvoked on the remaining arguments, will invoke the original methodhandle with the grand total set of arguments.At the very least, the JVM is willing to let me specify the firstargument R of a virtual or interface method handle H(M), because thatlets it perform method dispatch when the handle is created, and handme back a method handle Adapt(H(M),R) that not only remembers theargument R, but has also pre-resolved the method dispatch R.M.This special case of partial invocation, sometimes called “boundmethod references”, is enough of a hook to let programmersintroduce the usual object-oriented flexibilities into method handles.In pseudo-code:MHD h1 = H(Object.equals); // SIG[(Object,Object)boolean]MHB h2 = Bind(h1, (Object)"foo");assert h2.type() == SIG[(Object)boolean];assert h2.invoke(a2) == "foo".equals(a2);The type MHB stands for a bound method reference. (Please wait amoment for its actual spelling.)3.5 Further adaptationAs long as we are messing with arguments, there is a fairlyunsurprising range of other adaptations that arise naturally from therichness of JVM signatures, and the conversions that apply betweenvarious data types. (The details of varargs and reflective invocationalso bear on this design.)Specifically, given two method signatures (A)T and (A')T', and amethod handle H(M) of type (A)T, there is a library routine which willcreate me a new method handle H' = Adapt(H(M), (A')T). It is myresponsibility to help the library routine match up the correspondingarguments of the two signatures, to direct it to drop unneededarguments in A', to supply preset values for arguments in A missing inA' (this is where partial invocation comes into the general picture),and to tell it of the presence of varargs in either signature. Thelibrary is happy to insert casts, primitive conversions, and boxing(or unboxing) to make the arguments match up completely.Here are some pseudo-code examples:MHD h1 = H(String.concat); // SIG[(String,String)String]MHA h2 = Adapt(h1, SIG[(String,String)String], $1, $0);MHA h3 = Adapt(h1, SIG[(String)String], $0, $0);MHA h4 = Adapt(h1, SIG[(String)String], $0, ".java");assert h2.invoke(a,b) == b.concat(a);assert h3.invoke(c) == c.concat(c);assert h4.invoke(c) == c.concat(".java");That is a longish step beyond bound method references, but I believethe sweet spot of the design will supply a flexible set of methodsignature adaptations (including currying), and let JVM implementorschoose how much of that the JVM wants to take responsibility for.At a minimum, bound method references must be special-cased by theJVM, but everything else could be supplied by a Java library (onewhich is willing to dynamically code-generate many of its adaptermethods).At a maximum, the JVM could supply a Swiss Army Knife combinator whichinterpretively handles all possible argument wrangling. This isprobably the right way to go for HotSpot, since the HotSpot JIT is aswell suited for optimizing complex adapters as simple ones, and havingthe complex ones appear to the compiler as single intrinsics is no bigdeal.Breaking the suspense: And the name of the winner is...So we have four different types floating around:MHD - a direct handle to a user-requested method (either virtual or static)MHI - the magic type which warns the JVM of a method handle call siteMHB - a bound method handle, which remembers the method receiverMHA - a more complex adapted method handleI can see no particular benefit in distinguishing all these types inan API design. Therefore, I believe the proper spelling for all thesetypes is something all-encompassing: java.dyn.MethodHandle.Clearly there will be other types under the covers, such as theconcrete types chosen by the JVM for specific direct method handles(MHD), or various implementation classes of adapted methods (MHB,MHA). But there is no reason to distinguish them to the user.However, one specific case of bound method handles is important toconsider from the user’s viewpoint. If a receiver object R hasa public method (in a public API type) already namedinvoke, with a signature of (S)T, then R is alreadylooking very much like a bound method handle for its owninvoke method, with signature (S)T.For completeness of exposition, let’ll give this kind ofnon-primitive method handle its own informal type name:MHJ - a Java object that implements MethodHandle and a type-consistent invoke operationSo, at the risk of adding a chore to the JVM implementor’s list,I think an object of such a type (MHJ) should serve (uniformly in thecontexts described above) as a method handle. (It is may be necessaryto ask that R implement the marker interface and thetype method; but is something the system could alsofigure out well enough on its own.) I admit that this is not anecessary feature, but it could cut in half the number of smallmethod-like objects running around in some systems.And the MHA implementation above probably requires an MHJ anyway.Background: How did we get here?One of the biggest puzzles for dynamic language implementors on theJVM, and therefore for the JSR 292 (invokedynamic) Expert Group, ishow to represent bits of code as small but composible units ofbehavior. The JVM makes it easy to compose objects according to fixedAPIs, but it is surprisingly hard to do this from the back end of acompiler, when (potentially) each call site is a little different fromits neighbors, and none of them match some fixed API. The missinglink is an object which will represent a chunk of callable behavior,but will not require an early commitment to a fixed calling sequence.In theory-language, we want an object whose API is polymorphic overall possible method signatures, so the compiler (and runtime call sitelinker, in turn) can manage calls in a common framework, not oneframework per signature.Put another way, we cannot represent all callees asRunnable or Callable, because fixedinterfaces like those serve just a subset of all interesting callsignatures. APIs which attempt to represent all possible calls,notably Java’s Core Reflection API, simulate all signatures byboxing arguments into an array, but this is a simulation (withtelltale overheads) rather than a native JVM realization of eachsignature.We know signature polymorphism is powerful, from our experience withmany dynamic and functional languages. (For an old example, considerthe Lisp APPLY function, which is an efficient but universal callgenerator.) Integrating such polymorphism into the Java language ischallenging; that’s why the function types in NealGafter’s closures proposal are a significant portion of thespecification.Happily, it is a simpler matter to integrate signature polymorphisminto the JVM. As part of the JSR 292 process, I have been worryingabout this for some time. The result is the present story of methodhandles which (a) JVMs can implement efficiently, which (b) are usefulto language backends, and which (c) have a workable Java API. Thatlast is actually the hardest, which is why I have not given it yet.(See previous paragraph.)Before giving the API, I want to emphasize a few more points. First,method handles (per se) are completely stateless and opaque. Theyself-report their signature (S)T (via a type operationon MethodHandle) but they reveal nothing else about theirtarget. They do not perform any of the symbol table queries suppliedby the Core Reflection API.Every native call site for a method handle is hardwired with aparticular signature. Compiler writers have every right to expectthat, if the target method has a similar signature, the call will haveonly a few instructions of overhead. Likewise, a methodhandle’s signature is intrinsic to the handle, and completelyrigid. Calls to near-miss signatures will fail, as will violations ofclass loader naming consistency.Besides signature simulation, one serious overhead in the CoreReflection API is the requirement that, on every call to a reflectedmethod, the JVM look at the caller’s identity and perform anaccess check to make sure that he is not calling someone else’sprivate method. The method handle design respects all such accesschecks, but performs them up front at handle creation, where(presumably) they are more affordable. But you can publish a handleto your own private method, if you choose.One use case (which I have used to test the quality of this design) iswhether it can be used to re-implement the invokefunctionality in the Core Reflection API, for better speed and codecompactness. This has long been a sore spot for language implementors(for reasons detailed above). This one reason I have included varargsin the competency of the method adaptation API.The calling sequence for a method handle (in part 2 above) will beapproximately as fast as today’s interface invocations.Searching for an invoke method in a receiver is the samesort of task as searching for an interface (and its associated“vtable”, if you use such things). The search can be spedup by the usual sorts of pre-indexing. A JVM-managed method handlewill advertise its signature prominently in its header, so that apointer equality check (remember, signature agreement is exact) is allthat needs to happen before the caller jumps through a hardware-levelfunction address.Details and a hasty exitFinally, here is a sketch of the API:package java.dyn;public interface MethodHandle /\*\*/ { // T type(); public R invoke(A...); public MethodType type();}public interface MethodType { public Class parameterType(int num); // -1 => return type public int parameterCount();}public class MethodHandles { public static MethodHandle findStatic(Class defc, String name, MethodType type); public static MethodHandle findVirtual(Class defc, String name, MethodType type); public static MethodHandle findSpecial(Class defc, String name, MethodType type); public static MethodHandle unreflect(java.lang.reflect.Method m); public static MethodHandle convertArguments(MethodHandle mh, MethodType newType); public static MethodHandle insertArgument(MethodHandle mh, Object value);... // The whole enchilada: public static MethodHandle adaptArguments(MethodHandle mh, MethodType newType, String argumentMovements, Object values);}That’s it, in a nutshell. Perhaps rather large coconut shell.Actually, quite small, if you are used to Unix shells.You will have noticed that there is no way to call these guys fromJava code, unless you assemble yourself a class file around therequired invokeinterface. It is simple enough to createa Java API for calling method handles. Getting performance beyond thereflective boxed-varargs style of calling is a little messier, butdoable. Dynamic language implementors solve this sort of thing asthey fight to remove simulation overheads from their system. Givenclosures in Java, there would be nicer bridges for interoperability, to saynothing of implementing closures on top of method handles.But the point is not calling or using these things from Java; thepoint is using them, down near the metal, to assemble the next 700witty and winsome programming languages.

The JVM prefers to interconnect methods via static reference or dispatch through a class or interface. The Core Reflection API lets programmers work with methods outside these constraints, but onlythro...


words of power from the ancient world

Hammering my way through a posting on the JVM and continuations, I realized again how oddly poetic are some of our terms of art. Some of them seem to have been with us from the dawn of the single-threaded stored-program computing machine. Terms like “continuation”, “closure”, “thunk”, even “call” and “loop” are metaphoric, evocative, polyvalent, elusive of final definition. What I mean is, they are poetic.These terms name basic ideas and patterns that we, as programmers, do not know how to live without. We make up new terms for the old concepts, and endow the old terms with new meanings, but there seems to be a resonance among the old terms and meanings that was established early on by general consensus, as more than a simple convention.I am thinking particularly of the words “continuation” and “closure”. They sound vaguely mathematical but they are not. It is hard to find a clean definition for them, even today. It seems to me that luminaries like Hoare, Landin, Strachey, Reynolds, and various MIT AI Lab researchers, busy trying to bridge the gaps between computer behavior and programmer reasoning, had to invent terms on the fly.If a term had enough raw metaphorical power, it could take on some life. Some terms did not thrive: Continuations once kept company with program-closures and dumps. It did not matter in the long run whether a term was defined rigorously for the purpose of a paper, or whether (in a hallway conversation?) it was defined implicitly by usage. As Peter Landin says, “Such a borrowing from ordinary language can certainly bear several meanings.”Meanwhile, in other hallways and papers, as the same basic idea was repeatedly rediscovered and reused, a good term could be reused, and could accumulate more useful definitions. John Reynolds’ paper “The Discoveries of Continuations’, reprised I think in a 2004 talk, tells more of this story.With closures, there seems to have been a point at which someone said, during a discussion of frustrating scoping bugs in Lisp, “you cannot have just program or just data, you must have a little record that combines the two; it is a ‘closure’ of the two”... and that was enough. The metaphor is that free variables are inconveniently open to the surprising effects of dynamic scoping, and must be closed in order to use safely. Joel Moses (cited in the previous link) tells how the AI Lab learned about non-dynamic scoping:A useful metaphor for the difference between FUNCTION and QUOTE in LISP is to think of QUOTE as a porous or an open covering of the function since free variables escape to the current environment. FUNCTION acts as a closed or nonporous covering (hence the term "closure" used by Landin). Thus we talk of "open" Lambda expressions (functions in LISP are usually Lambda expressions) and "closed" Lambda expressions.The paper goes on to record standard excuses we language implementors always give to our users, when we hand them warty, bug-bait semantics. They are, you wouldn’t know how to use it, and it’s too expensive anyway:It has been our experience that most LISP programmers rarely return closed Lambda expressions, and hence rarely encounter the full environment problem. There does not currently appear to be any way of solving the complete environment problem efficiently without losing efficiency whenever one accesses or modifies free variables.(We language users should always be ready to reply, we refrain from using it only because your implementation of it stinks, and go and figure out how to do it efficiently, please.)Even from the depths of time in 1970, Moses harkens back even further:Peter Landin once said that most papers in Computer Science describe how their author learned what someone else already knew. This paper is no exception to that rule. My interest in the [lexical scoping] problem began while Landin, who had a deep understanding of the problem, visited MIT during 1966-67. I then realized the correspondence between [Lisp closures] and ISWIM's Lambda Closures.That final reference nicely closes the temporal loop on all of us: P.J. Landin, "The Next 700 Programming Languages", CACM, March 1966. That, of course, is Landin’s paper more or less accurately describing the (repetitive) course of programming language design through the present. In particular, we are still rediscovering closures, and the simple lexical scoping rules they were invented to support.And by the way, we (that is, we language implementors) are still mangling this stuff. If your favorite language’s soi-disant closures still keep variables subject to dynamic scoping or refuse to compile an construct that reaches out to an enclosing scope, then that language still suffers from the Lisp 1.5 bugs that closures were introduced to fix. (Especially if it supplies some dynamically determined substitute for the unavailable static meaning, as with Smalltalk out-of-scope returns. Simple refusal is better, in my book, than bait-and-switch. That is why your uplevel references in Java are final.) It is true that our poetic terms of art collect new meanings as we go along, and moreover it is best if the new meanings can illuminate the old meanings, rather then obscure them by contradiction. So closures which don’t really close all the way, don’t deserve to be called closures. Call them semi-closures, or inner classes with closure-like features.Finally, and for what it is worth, here’s my vote that somebody who was there will give a talk or paper of “The Discoveries of Closures’.

Hammering my way through a posting on the JVM and continuations, I realized again how oddly poetic are some of our terms of art. Some of them seem to have been with us from the dawn of...


fixnums in the VM

Or, the headless object rides again. Introduction Dynamic languages typically perform arithmetic using dynamically typed references to boxed numbers. Normally (at least in a JVM) a dynamically typed reference is a pointer to an object in the heap, whose header in turn contains the required dynamic type information. But language-specific implementations often use pseudo-pointers to represent a commonly used subset of numbers, with the actual bits of the pseudo-pointer carrying a value field “payload”. Also, because their relaxed typing encourages data structure reuse, dynamic languages typically feature a few small but ever-present types like Lisp’s cons cell which are overwhelmingly common in the heap. Again, language specific implementations have historically provided “headerless” representations for these, in which the fields are stored in the heap but are not accompanied by a header.The JVM can support fixnums and other headerless objects for the sake of these languages, and even for Java. The idea is to make ordinary object pointers (sometimes called oops) coexist with more specialized formats for headerless objects, which we will call iops and xops. The techniques are mature and well-known, and the overheads (of extra tag checking before pointer usage) can be controlled by optimizations already used widely in JVMs. In particular, the Hotspot JVM could cleanly represent fixnums and other immediate data types by modifying extending its oop type. Background In Hotspot, a typical mature JVM, an object header is two words, a mark and a klass. (That ‘k ’is not a typo, but a dodge to avoid a reserved keyword in C++.) The mark is used for bookkeeping of operations like synchronization and identity hash code. The klass is a full machine pointer, to a metadata object which describes the object’s layout, methods, and other type information. A dynamic typing operation must first load the klass and then query its contents. (The actual query logic may require one more dependent load, from the klass object.) This object layout is flexible and general, but it requires two machine words of overhead in every object, plus a possible word or so of alignment overhead, plus (of course) the payload of the object, i.e., the actual fields containing application information.In many language-specific systems, integers of 20-30 bits or less are compressed into pseudo-pointers. Lisp and Ruby call them fixnums, while Smalltalk uses the name SmallInteger. Some languages, like Emacs Lisp, simply call them “integer” and do not allow overflow. Most dynamic languages check for arithmetic overflow beyond the 20-30 bits, and either signal an error or provide a more or less seamless transition from fixnum to bignum, i.e., from a machine word subrange to a multiple precision array-based representation. Integer overheads In the case of a Java Integer on a 32-bit Hotspot JVM, the 32-bit payload (a Integer.value field) is accompanied by a 96 additional bits, a mark, a klass, and a word of alignment padding, for a total of 128 bits. Moreover, if there are (say) six references to this integer in the world (threads plus heap), those references also occupy 192 bits, for a total of 320 bits. On a 64-bit machine, everything is twice as big, at least at present: 256 bits in the object (which now includes 96 bits of padding), and 384 bits elsewhere. By contrast, six copies of an unboxed primitive integer occupy 192 bits. The point of introducing fixnums is that the overhead of dynamically typed integers can be much closer to that minimum of 192 bits than it is now. (The number six is chosen arbitrarily to model a common use case, a number that has been boxed and then copied a few times by procedure calling or other computation.) Here is a summary of the sizes:RepresentationHeap bitsReference bits (N=6)Total bitsComparisonprimitive int0321921.00fixnum, 32-bit JVM01921921.00object, 32-bit JVM1281923201.67fixnum, 64-bit JVM03843842.00object, 64-bit JVM2563846403.33When Java introduced autoboxing of primitive ints into Integer objects, application codes began to perform implicit creation of integer objects, and the allocation rate went up. Of course it depends on the application, but codes which mix generic collections (like List<>) with primitive values routinely box and unbox frequently. The implicit boxing is done by the factory method Integer.valueOf(int). This method has been carefully specified to allow memoization or uniquification techniques, so that more references can share fewer copies of the same value.In the case of a widely shared heap object, such as Integer.valueOf(13), the heap overhead is negligible, but there is still an execution overhead from loads of klass, klass internals, and Integer.value fields, with associated costs of cache traffic and occupancy. With fixnums, dynamically typed arithmetic can proceed with no memory references at all.Note that in a 32-bit system compromises must be made in representing the 32-bit value of an integer. Historically, fixnums represent integers in widths like 30, 28, or 24 bits, leaving a few additional bits available to distinguish the fixnum from ordinary object pointers and other kinds of pointer-like values. In a 32-bit JVM, some calls to Integer.valueOf will not be able to return a fixnum; they will have to product ordinary objects. But in practice, no applications use all 32 bits of every integer; most integers are small values like array indices, and can be compressed into less than 32 bits. This is true (to a lesser extent) of longs, floats, and doubles. Smaller values like characters and booleans can be stored in full precision.On a 64-bit JVM, there is so much “slack” in the pointer layout that the full 32 bit range of values can be represented in fixnum format. Even larger values like longs and doubles can fit into the pointer layout, with a little compression. But regardless of pointer size, some Integer object will always have to be ordinary objects, such as in strange edge cases where reference equality is significant, or when (wrongheadedly) a code synchronizes on a Integer object.This reasoning can be applied, and the fixnum optimization constructed, for any JVM type with the following properties:The object contains a payload of a machine word or less.The payload is immutable (a final and/or private field, unmodified outside the constructor).If the payload is a machine word, it is usually compressible (e.g, via sign truncation).The type is heavily used, enough to justify the complexity of a headerless representation.The object is never used for synchronization, or can be created in a locked state.The last requirement (no synchronization) is necessary, since synchronization is a kind of object state that can potentially live in an object’s header, regardless of the immutability of its payload. Since the type Object itself supports synchronization, it is hard to avoid in the JVM. However, there are two observations here which can let us make progress:Objects which the user merely shares (and does not allocate) can behave as if someone else had locked them and will never unlock them.Types not already in the Java language do not need to support synchronization.The first case covers Integer.valueOf and the other factory methods used by autoboxing. The second case could cover closures, method handles, tuples, complex numbers, etc. It will probably be a good idea to standardize on a marker of some sort which disables the locking feature for a given class (and its subclasses). A marker interface Unsynchronized is probably best, since synchronization is, for better or worse, a public operation.Let’s call any pseudo-pointer format which carries its payload directly an immediate object pseudo-pointer, or iop. Below we will discuss more specific implementation tactics for iops. Headerless pairs, tuples A similar analysis can be made for small objects, objects for which the extra space occupied by an ordinary header is a significant overhead. fFor example, in some Lisp applications, the majority of the heap is occupied by small two-element tuples called cons cells, which Lisp uses to build lists and trees. Removing the JVM header from a cons cell can double the effectiveness of memory—and, more importantly, of cache.(A further optimization called cdr coding could double memory density again, almost, by eliding next pointers if they immutably point to a following cons cell created at the same time. But this would require interior pointers on the JVM, a topic which must wait for another day.)Although a headerless pair would be referred to via a pointer, it would not qualify as an ordinary object pointer, but rather as a specialized headerless pointer. Let’s call this an extraordinary object pointer, or xop. As with an iop, an xop must somehow be tagged to distinguish it from an ordinary object pointer. (Following Smalltalk tradition, Hotspot calls the latter an oop.) Along with the tagging, some means of extracting a klass value is needed. The klass could be bit-encoded into the reference as with an iop, especially on 64-bit systems. Since an xop refers to memory, the klass could also be stored in memory in such a way (e.g., a card header) that it is shared by several nearby objects of the same type. Or it could just be stored in an abbreviated object header (one-word, markless).An extraordinary object pointer could be useful for any JVM type with the following properties:The object contains a payload of a few machine words.The payload need not be directly synchronized. (E.g., it is immutable, or does not need multiword transactions, or is single-threaded, or is locked by an external mutex.)The type is heavily used, enough to justify the complexity of a headerless representation.The object is never used for synchronization, or can be created in a locked state.Types which might possibly benefit from this treatment include any immutable tuple type, method references or closures, Lisp cons cells, multipart numeric types like complex, or strings.(Pronunciation Warning: In homage to Dr. Seuss, I intend to pronounce these neologisms monosyllabically as Yopps and Zopps.) Dirty details Pseudo-pointer format In JVMs where this makes sense, such as Hotspot, one can divide the machine word of an oop into an implementation-defined set of three fields, tag, klass, and value. The sizes of these are tunable, but on 32-bit systems the tag is generally 2-3 bits, the klass 0-5 bits, and the value 24-31 bits. On 64-bit machines, the klass may in fact be a restricted pointer value (31 bits or so), with klass and tag sharing one 32-bit subword and the value in the other. For jumbo payloads, the value could be expanded to almost 64 bits, at the expense of klass and tag.Tag: On most machines, the tag is in the high bits. On machines that require address alignment, the tag could be the low bits, some of which are non-zero (the traditional “low-tag” representation), though this could impose an extra check on unaligned operations. In all cases, tag bits are a fixed combination that is distinct from all valid heap pointer values and in addition should (if at all possible) cause a trap of some sort if it is used. This will enable certain favorable compilation tricks[1], exactly like the ones that JVMs use today for null pointer decoding.Klass: On 32-bit systems, the klass must be an index into a table of well-known classes. On 64-bit systems it can be a more general pointer. But the generality would still limited to classes with single immutable value fields, and factory methods like Integer.valueOf! So the generality of a full klass pointer is only useful if it helps with performance.The tag and klass fields can often be unified. For example, if the three low bits are used for an alignment-based tag, then a unified field (say, the low eight bits) can be used, where any klass value can be used as long as at least one of its three low bits is non-zero. Or, the tag field can be used to control the width of the klass field, allowing a tunable range of value widths.Value: For integers and other values of 32 bits or less, the value field is obviously a clipped copy of what would ordinarily be stored in the value field of the class, if the reference were a regular object. With the low-tag representation, the value can be extracted in a single instruction (signed right shift). For floats, the value can be zero-filled to the right. A straw-man design Though I do not have time for it at present, if I were to try out this design space today in Hotspot, I might start hacking with the following specific parameters:32-bit JVM (more challenging than 64-bit, but a more immediately usable result)28-bit long immediate or 20-bit short immediate value field, left justified2-bit right-justified tag (supported directly by SPARC memory bus, requires explicit testing on Intel)2-bit tag values of 1 or 3 are reserved for iopsonly iops have klass fieldsif 2-bit tag is 1, the klass field is 2 bits and value is 28 bits28-bit long immediate value represents an int (later, perhaps long, float, and/or double)if 2-bit tag is 3, the klass field is 10 bits and value is 20 bitsinitial short immediate classes are void, byte, boolean, short, and char.xops to be attempted only after iops are working2-bit tag of 2 reserved for xopsa klass pointer at the front of each xops, at value+4, and the first field at value+8in a xop, the word at value+0 is part of the previous object; it overlays the mark field of any oop Semantic features of fixnums and other iops The call java.lang.Integer.valueOf((int)x) returns a fixnum if x is in the supported subrange.All Integer methods just work as normal on this alternate representation. Internally, the JVM codes them as a tiny if/then/else, controlled by a tag check.Ditto for all Object and Number methods (toString, etc.).Regarding monitorenter, wait, etc.: All iops must behave (w.r.t. subsequent synchronization attempts) as if they were locked immediately upon creation, and are never unlocked.Explicit construction expressions new Integer(x)) must continue to create ordinary objects.Compiler escape analysis may sometimes be able to weaken explicitly allocated integers to iops.The identity hash code of any iop is obtained by extracting as many of its value and klass bits as will fit in the 31-bit result.A number of other classes might merit the iop treatment. Here is a list ordered by roughly decreasing plausbility. Note that there are two alternate representations for strings; they could co-exist with some care.Byte, Short, Character, Boolean, Voidmost enum instancesLong, Float, Doublevery short strings (value packs 0-3 7-bit bytes or one 20-bit extended char)well-known interned strings (value is index into global table of packed UTF8 sequences)some zero-length arrays or strings (note: this should combine the length check with a tag check) Semantic features of cons cells, tuples, and other xops Because of the interactions with object identity, extraordinary object pointers fall into two categories: Immutable and mutable.In the immutable case, user-level semantics are simple, while the JVM maintains the fiction that object identity and structure equality are the same. We will use the complex number (a 2-tuple of doubles) as an example:The complex type is defined from the start as unsynchronized, so that the JVM is free to omit headers uniformly.The expression new Complex(x,y) always produces a headerless object, as does the underlying newinstance instruction.All Object methods just work as normal on this headerless representation. Internally, the JVM codes them as a tiny if/then/else, controlled by a tag check.Re. monitorenter, etc.: Complex numbers, like all other xops and iops, must behave as if they are “locked at birth”.The acmp instructions (Java’s reference equality comparisons) are illegal when applied to two complex numbers (or any two xops). Enforcing this requires extra bit testing, mitigated by some type analysis in the compiler. As an exception, the code of the complex class can use reference equality comparison without restriction on the self type. (That is, the operator acts like a protected method.)The identity hash code of complex number is computed by calling its hashCode method.The equals and hashCode methods from Object must be overridden by Complex.The JVM is free to clone and unclone equivalent complex number instances, in the GC and/or on the fly, without notifying the application.Note that, under these rules and those of Java, objects produced by Integer.valueOf can be represented as immutable xops when they do not fit in iops.The semantics of mutable xops are somewhat less regular, and so the concept may not be as valuable. We will use the (mutable) cons cell as an example. Here are the points where they differ from their immutable cousins:The identity hash code of a mutable cons cell is -1. This lets identity-sensitive hash tables detect unsuitable keys quickly. (Or should System.identityHashCode throw an error?)As with regular oops, the system is not free to clone or unclone mutable cons cells.Reference equality works the same as with oops. Implementation consequences Some types implemented with non-oops may be invisibly split by the JVM into ordinary and immediate or extraordinary formats. Additional type inference and tag testing is required wherever such splits can occur, or where super types like Object may disguise the non-oops.A getfield instruction for Integer.value (etc.) must first test the tag, and then issue either a load or a shift to obtain the value field.Bytecodes which access the object header must also test the tag. (These include monitorenter, checkcast, instanceof instructions.)Invocations of Object (and Number) methods must also test the tag. Access to unrelated classes (e.g., String) will be unaffected.Compilers can use the usual combination of static analysis and profile-driven optimism to make the overheads go away.If bus alignment is being used to check low-tags, and a super class has a field of size N (e.g., a Object.klass header field of size 4), be sure that any possible non-oops have low bits that are misaligned modulo N, or else check the bits explicitly. This could creep up, say with an attempted memory access to the Byte.value field via a iop that happened to look like an address. Conclusion We have examined low-level techniques for headerless objects, and investigated how to integrate them into the JVM and even (in some cases) the Java language. The techniques are on a sliding scale from the straightforward (low tag bit for immediate Integer only) to the subtle (compile-time and profile-driven type estimation). Implementing even some of this in an existing JVM is likely to be an adventure on the order of, say, compressing oops. A language implementor might prefer, at first, to try these techniques in the closed environment of a from-scratch, specialized VM. But it is probably more profitable in the end to fit this these techniques into an optimized, mature JIT and GC, such as Hotspot... Especially now that Hotspot is part of the Open JDK, and even has a Multi-Language VM subproject specifically for these explorations!Footnote [1] ...certain favorable compilation tricks... What compilation tricks are those, you ask? Well, most pointer indirection instructions in a JVM will never provoke null exceptions. Yet the JVM is obligated to always defend against possible nulls. Some instructions can statically be proven never to see null, while many others cannot be so proven. The latter are compiled optimistically, so that if a null ever does come through, the instruction will cause some sort of hardware-level trap. In that case the null check comes for free with the CPU address decode logic. The recovery from a failure of such a check is very complex and painful, but complete; the result is that the newly failing instruction is emulated somehow in the interpreter or by other cleanup code. It is OK for that recovery to be very expensive, especially if the JVM recompiles the offending instruction to be guarded by an explicit null check. The JVM keeps a historical profile about exceptional events, which the compiler consults; it optimistically uses an implicit null check when the profile looks clear, and the system backs off to a slightly more expensive explicit check when that fails. The same thing can be done with tag bits, if they look ugly enough to make the CPU throw a trap.Bug Reference: 6674617, dynamic languages need more efficient autoboxing

Or, the headless object rides again. Introduction Dynamic languages typically perform arithmetic using dynamically typed references to boxed numbers. Normally (at least in a JVM) a dynamically typed...


destroy the home schoolers in order to save them

Some of my colleagues have noticed the news flurry about home schooling and the sudden declaration of its illegality by a panel of federal judges in Los Angeles. The formal decision features a spicy stew of judicial threats to parents, in a section entitled “Consequences of Parental Denial of a Legal Education”.That certainly got my attention and that of many friends, since (dare I now admit?) I've been home schooling my children since 1987. Two have finished with honors at good universities and are now productive taxpayers, two more are now making their way through college, and the rest are ahead of grade level and nicely socialized, thank you. Who knew my wife and I were guilty of Parental Denial of a Legal Education? (Gotta get some of that Legal Education. It must make you as wise as a Judge.) To those of us in the home schooling community, the general consensus is more adequately phrased in a San Francisco Chronicle Op-Ed: “What planet are those judges coming from?” I realize the education of one’s children is a culturally subversive thing to do, but since when is California suddenly shy of cultural deviancy?One can only wince in wonder at the ideal California those judges are contemplating. The state has an interest in many children’s rights beyond mere education, such as nutrition. Perhaps we should require parents to be certified dieticians before they cook their children’s lunch. Or, let’s just go all the way and eliminate the inconvenient families, by requiring a parental license before the first child is brought to term. That would bring everything nicely under control, and our Wise Judges could rule a utopian, aristocratic Plato’s Republic—which is really a nice place to study, but a terrible home.In my own home town of San Jose, I just noticed a reasonable Mercury News editorial on the subject. Common sense still rules in San Jose!I make one key exception to the Merc.’s editorial position: All else being equal, I as a private citizen greatly prefer benign neglect to any form of regulation. But unlike us private citizens, editorial writers and politicians seem to have a professional rule: Never make ringing calls to do nothing. (And the corollary: Never be without a ringing call.) I am thankful that, somehow despite all the political fidgeting, life goes on anyway.Also, I’m proud to say that the two debaters the Mercury mentions are from our group’s debate club. I think it is not too much to hope that, in their day as judges or other community leaders, they will write better opinions.In the end, my advice to judges, and even to friendly editorialists and politicians, is: Leave parenting to us parents. It worked when all of us were growing up, and it works now.August 2008 Update: The court has reversed its decision. Here is Governor Schwarzenegger's take on it:This is a victory for California's students, parents and education community. This decision confirms the right every California child has to a quality education and the right parents have to decide what is best for their children," he said. "I hope the ruling settles this matter for parents and home-schooled children once and for all in California, but assure them that we, as elected officials, will continue to defend parents' rights.And Superintendant Jack O'Connell says,As head of California's public school system, it would be my wish that all children attend public school, but I understand that a traditional public school environment may not be the right setting for each and every child... I recognize and understand the consternation that the earlier court ruling caused for many parents and associations involved in home schooling. It is my hope that today's ruling will allay many of those fears and resolve much of the confusion.(Source: LA Times.)

Some of my colleagues have noticed the news flurry about home schooling and the sudden declaration of its illegality by a panel of federal judges in Los Angeles. The formal decision features a spicy...


Bravo for the dynamic runtime!

This week several of us from Sun attended the Lang.NET Symposium. The symposium was intensely technical and not too large to fit into a single room, so the presentations and conversations were a fine exercise in what one might call N-3: Nerd-to-Nerd Networking. Sometimes they were even downright inspiring—bravo Anders, Jim, Erik, Gilad, Peli, Jeffrey. Our hosts kindly welcomed presentations from three of us at Sun: Dan Ingalls showed everyone the Lively Kernel, while Charles Nutter and yours truly introduced the Da Vinci Machine project.The centerpiece of the conference was of course Microsoft’s Common Language Runtime, and especially the new Dynamic Language Runtime, Jim Hugunin’s encore to IronPython which factors out the reusable logic that glues dynamic languages on top of the CLR.Why am I suddenly excited about Microsoft technology? Two or three reasons. First, the DLR (with IronPython and IronRuby) is another evidence that we are in some sort of renaissance or resurgence of programming language design. For some reason, people are inventing programming languages again in a big way, expecting to get audiences, and sometimes getting them. I think the “some reason” is a combination of better tools and higher-level runtimes and cheaper CPU cycles and the open source movement.These new conditions are combining in a deep and exciting way with some very old ideas. (I mean “very old” in the field of computing, dating back no more than fifty years.) Somehow the basic ideas were sketched early. I am thinking of distinctions of syntax (concrete, abstract, and “sugar”), data structures (including procedural, object-oriented, functional, symbolic, and relational), the idea of exploratory programming, full inter-conversion between program, data, and text, declarative versus imperative notations, lexical versus dynamic versus message-based scoping, maximal versus minimal versus extensible languages, closures, continuations, reflection, garbage collection, caching, lazy evaluation, and more. (I apologize for the length and baldness of the list, and would welcome a reference to a good retrospective survey to improve on the list.) People like Peter Landin were already mapping the landscape of programming languages in the 1960’s. The categories have shifted little, because they remain amazingly useful.(Side note: The symposium ended with a lecture on, among other things, the merits of data-driven programming and XML. This led to an exchange, apparently ongoing, about the merits of XML versus JSON, which was a new thing to me. As the exchange petered out in comments on robustness and compactness, I just had to loudly propose “S-expressions!”, the 50-year-old Lisp syntax that may be more robust, regular, and compact than either.)Although I do appreciate history for history’s sake, what is really interesting to me is observing the new changes on the old themes. Thus it is informative to characterize new languages like Ruby in terms of quite old terms (Lisp in C syntax). It exciting when a dormant idea comes into widespread practical use, such as when Java popularized garbage collection, or the Hotspot JVM applied message optimization techniques originally developed for Smalltalk and Self. In the case of the DLR, it is exciting to see those techniques extended into a programmable metaobject protocol.As readers of this blog know, Sun has also been designing and developing technology apply the power of the JVM to the difficult problems of dynamic language implementation. The second thing that excited me at Redmond (along with the Microsoft people), was a striking case of parallel evolution between the DLR over the CLR on one hand and the Da Vinci Machine over the JVM on the other side. My talk was shortly after Jim’s, and (as I remarked at the time) I had less explaining to do than I expected, since Jim had already explained a very similar architecture in the DLR. In my work on JVM dynamic invocation, I knew I was applying tried and true ideas from Smalltalk, Self, and CLOS, but I was encouraged to find that a colleague (and competitor) had been busy proving the practicality of those ideas.The differences between the CLR and JVM extensions are interesting to note. They work completely above the level of the CLR without significantly enhancing it, while we are developing the JVM and libraries at the same time. I have been busy with the JVM, while Charles Nutter has been doing great work rebuilding JRuby downward toward the JVM. The latter work has converted JRuby from a tree-walking interpreter to a compiler which emits just the sort of bytecodes the JVM most likes to optimize. I think of what we are doing as a sort of transcontinental railroad, with the JVM building out from the West as JRuby extends from the East; we are putting in the Golden Spike this year.One reason for this difference in approach is that the Microsoft CLR JIT does not appear to be under active development; its optimization level is as rudimentary as the earliest Java JITs. In the CLR that kind of performance is just the accepted cost of running managed code. The CLR has no notion of inlining interface calls or compiling under dynamic profiles, so all the dynamism of the DLR has to be inside the little call-site objects that the DLR itself manages. We Sun people realized afresh (and so did our colleagues) what an astonishing thing it is to have a great compiler in your VM, which can use online type information to inline and optimize huge swathes of bytecode.Another contrast between the DLR and our work is our design of method guards and targets, versus theirs of tests and targets. (The CLOS invocation protocol speaks of discrimination functions and applicable methods.) In all these systems, an up-call populates the call site with one or more target methods, and each target method has an associated argument-testing rule.So what’s the contrast here? The DLR up-call expresses the test and target as a pair of abstract syntax trees, which the lower layers of the DLR combine with previous ASTs to compile a new version of the call site. The Da Vinci Machine design combines the guard and target into a single chunk of behavior, called a method handle; a dynamic invocation site can have a small set of these handles. (Currently JRuby simulates this pattern above the level of the JVM.) The JVM interpreter will sift through them, invoking each eagerly, expecting success, and fielding the small exceptions it can throw if its guard logic fails.Eventually the JIT will kick in, observe the state of the call site, and generate a suitably optimized decision tree based on the method handles it sees on the call site. (Since method handles are immutable, it will be able to inline their structure completely, if that is desirable.) The JVM can afford to use exceptions for call site composition, because they are cheap. One surprise for me at Redmond was learning that the CLR architecture makes it nearly impossible to compile exceptions to simple “goto” jumps, not only because their JIT does not optimize much, but also because CLR exceptions have a much more complex semantics than those of Java. Hotspot as been optimizing exceptions since the JVM98 performance sweepstakes. This reminds me: What great thing competition is, and specifically in the Java ecosystem. Hotspot is as fast as it is thanks to a reasonable set of standard benchmarks, and our race with our other JVM competitors.This leads me to another metaphor about gold: In Redmond I realized (as I said above) that systems built on Hotspot and other JVMs are sitting on a gold-mine of performance. While IronPython on the DLR has to do hard, brilliant work to “iron” out the wrinkles in the CLR JIT’s weak performance profile, the “irony” is that Hotspot has already been optimizing highly dynamic programs for almost a decade. JRuby is already tapping our gold mine of JIT optimizations, and showing benchmark performances even greater than the original C-coded version of Ruby. It can only get better from here. (Jim, all, I hope you’ll forgive the metallurgical puns... The Scot in me loves a cheap laugh.)The final reason I am excited about Microsoft’s DLR is that I am pleased for their customers, since they will enjoy using the emerging variety of new languages on the CLR. It is time for those languages to arise, because the platforms are strong and the CPU cycles cheap. But, as you might guess, I am even more excited for the customers of the JVM, because they will also enjoy the new languages in an expansive open-source community, and on their choice of blazingly fast Java virtual machines. Starting (I hope) with Sun’s Hotspot.Bravo for a new golden age of language design, and a renaissance of high level languages!

This week several of us from Sun attended the Lang.NET Symposium. The symposium was intensely technical and not too large to fit into a single room, so the presentations and conversations were a...


symbolic freedom in the VM

Or, watching your language with dangerous characters. Introduction The JVM uses symbolic names to link together the many hundreds ofclasses that make up an application.  Like source code in the Javaprogramming language, these symbols fall into a small number ofcategories:  Class, package, method, field, type signature. Unlike source code, symbols in the JVM are represented in auniform syntax, a counted sequence of Unicode characters.  Sincethis same format is used in class files to represent String literals,and since Strings are arbitrary character sequences, perhaps the JVMcan readily accept class, package, method, field, and type names whichcould be any string, not just the strings accepted by the Javacompiler.  Let's call such names “exotic names”.The JVM originally inherited symbol spelling restrictions from the Javalanguage, but in recent years it has removed most restrictions.This note describes how to remove the remaining restrictions,by presenting a universal mangling convention to encodearbitrary spelling strings into a form which is permitted by the JVM.This mangling is easy for humans to read and for machines to decode.The motivation for this is from non-Java languages, which have their own rules forcomposing symbols of types, variables, and so on.  Some languages,like Common Lisp, allow any string whatever (even the empty string!) tospell a symbol name.  Languages with operator redefinitionabsolutely require a way to process symbols like “+” (a single plussign).  When such a language meets the JVM, language-specificnames must either be kept completely separate from JVM names, or thelanguage’s bytecode compiler must use some sort of name mangling tokeep the JVM from panicking. Quick StartFor those who prefer to guess rationale from bald facts, hereare the encoding rules in tabular form:Dangerous CharacterWhy DangerousWhere IllegalEscape Sequence / 002Fdelimits a package prefix in a class nameany name\| 005C 007C. 002Elooks like a package prefixany name\, 005C 002C; 003Bdelimits a type within a field or method signatureany name\? 005C 003F$ 0024looks like a nested class name or synthetic membernowhere\% 005C 0025< 003Clooks like <init>, delimiter in generic type signaturemethod name\^ 005C 005E> 003Elooks like <init> method name\_ 005C 005F[ 005Bbegins the name of an array classclass name\{ 005C 007B] 005Dnot dangerous, but goes with ]; reservednowhere\} 005C 007D: 003Anot dangerous, but reserved for language usenowhere\! 005C 0021\ 005Cnot dangerous, except when forming an accidental escapenowhere\- 005C 002D(null string)bytecode names must be non-emptyany name\= 005C 003D Avoiding Dangerous Characters The JVM defines a very small set of characters which are illegalin name spellings. We will slightly extend and regularize this setinto a group of dangerous characters.These characters will then be replaced, in mangled names, by escape sequences.In addition, accidental escape sequences must be further escaped.Finally, a special prefix will be applied if and only ifthe mangling would otherwise fail to begin with the escape character.This happens to cover the corner case of the null string,and also clearly marks symbols which need demangling.Dangerous characters are the union of all characters forbiddenor otherwise restricted by the JVM specification,plus their mates, if they are brackets([ and ],< and >),plus, arbitrarily, the colon character :.There is no distinction between type, method, and field names.This makes it easier to convert between mangled names of differenttypes, since they do not need to be decoded (demangled).The escape character is backslash \(also known as reverse solidus).This character is, until now, unheard of in bytecode names,but traditional in the proposed role. Replacement Characters Every escape sequence is two characters(in fact, two UTF8 bytes) beginning withthe escape character and followed by areplacement character.(Since the replacement character is never a backslash,iterated manglings do not double in size.)Each dangerous character has some rough visual similarityto its corresponding replacement character.This makes mangled symbols easier to recognize by sight.The dangerous characters are/ (forward slash, used to delimit package components),. (dot, also a package delimiter),; (semicolon, used in signatures),$ (dollar, used in inner classes and synthetic members),< (left angle),> (right angle),[ (left square bracket, used in array types),] (right square bracket, reserved in this scheme for language use),and : (colon, reserved in this scheme for language use).Their replacements are, respectively,| (vertical bar),, (comma),? (question mark),% (percent),^ (caret),_ (underscore), and{ (left curly bracket),} (right curly bracket),! (exclamation mark).In addition, the replacement character for the escape character itself is- (hyphen),and the replacement character for the null prefix is= (equal sign).An escape character \followed by any of these replacement charactersis an escape sequence, and there are no other escape sequences.An equal sign is only part of an escape sequenceif it is the second character in the whole string, following a backslash.Two consecutive backslashes do not form an escape sequence.Each escape sequence replaces a so-called original characterwhich is either one of the dangerous characters or the escape character.A null prefix replaces an initial null string, not a character.All this implies that escape sequences cannot overlap and may bedetermined all at once for a whole string. Note that a spellingstring can contain accidental escapes, apparent escapesequences which must not be interpreted as manglings.These are disabled by replacing their leading backslash with anescape sequence (\-). To mangle a non-empty string, three logical stepsare required, though they may be carried out in one pass:In each accidental escape, replace the backslash with an escape sequence(\-).Replace each dangerous character with an escape sequence(\| for /, etc.).If the first two steps introduced any change, andif the string does not already begin with a backslash, prepend a null prefix (\=).To mangle the empty string, prepend a null prefix.To demangle a mangled string that begins with an escape,remove any null prefix, and then replace (in parallel)each escape sequence by its original character.Spelling strings which contain accidentalescapes must have them replaced, even if thosestrings do not contain dangerous characters.This restriction means that mangling a string alwaysrequires a scan of the string for escapes.But then, a scan would be required anyway,to check for dangerous characters. Nice Properties If a bytecode name does not contain any escape sequence,demangling is a no-op: The string demangles to itself.Such a string is called self-mangling.Almost all strings are self-mangling.In practice, to demangle almost any name “found in nature”,simply verify that it does not begin with a backslash.Mangling is an invertible function, while demanglingis not.A mangled string is defined as validly mangled ifit is in fact the unique mangling of its spelling string.Three examples of invalidly mangled strings are \=foo,\-bar, and baz\!, which demangle to foo, \bar, andbaz:, but then remangle to foo, \bar, and \=baz\-!.If a language back-end or runtime is using mangled names,it should never present an invalidly mangled bytecodename to the JVM. If the runtime encounters one,it should also report an error, since such an occurrenceprobably indicates a bug in name encoding whichwill lead to errors in linkage.However, this note does not propose that the JVM verifierdetect invalidly mangled names.As a result of these rules, it is a simple matter tocompute validly mangled substrings and concatenationsof validly mangled strings, and (with a little care)these correspond to corresponding operations on theirspelling strings.Any prefix of a validly mangled string is also validly mangled,although a null prefix may need to be removed.Any suffix of a validly mangled string is also validly mangled,although a null prefix may need to be added.Two validly mangled strings, when concatenated,are also validly mangled, although any null prefixmust be removed from the second string,and a trailing backslash on the first string may need escaping,if it would participate in an accidental escape when followedby the first character of the second string.A null prefix may need to be added to the result.If languages that include non-Java symbol spellings use thismangling convention, they will enjoy the following advantages:They can interoperate via symbols they share in common.Low-level tools, such as backtrace printers, will have readable displays.Future JVM and language extensions can safely use the dangerous charactersfor structuring symbols, but will never interfere with valid spellings.Runtimes and compilers can use standard libraries for mangling and demangling.Occasional transliterations and name composition will be simple and regular,for classes, methods, and fields.Bytecode names will continue to be compact.When mangled, spellings will at most double in length, either inUTF8 or UTF16 format, and most will not change at all. Suggestions for Human Readable Presentations For human readable displays of symbols,it will be better to present a string-like quotedrepresentation of the spelling, because JVM usersare generally familiar with such tokens.We suggest using single or double quotes before and aftersymbols which are not valid Java identifiers,with quotes, backslashes, and non-printing charactersescaped as if for literals in the Java language.For example, an HTML-like spelling<pre> mangles to\^pre\_ and coulddisplay more cleanly as'<pre>',with the quotes included.Such string-like conventions are not suitablefor mangled bytecode names, in part becausedangerous characters must be eliminated, ratherthan just quoted. Otherwise internally structuredstrings like package prefixes and method signaturescould not be reliably parsed.In such human-readable displays, invalidly manglednames should not be demangled and quoted,for this would be misleading. Likewise, JVM symbolswhich contain dangerous characters (like dots in fieldnames or brackets in method names) should not besimply quoted. The bytecode names\=phase\,1 andphase.1 are distinct,and in demangled displays they should be presented as'phase.1' and something like'phase'.1, respectively. Fine Print These rules build upon the JVM specification, as modified by JSR 202.   The relevant language goes something like this:4.3.2   Unqualified Names Names of methods, fields and local variables are stored as unqualifiednames.  Unqualified names must not contain the characters '.', ';', '[' or '/'. Method names are further constrained so that, with the exception of the special method names <init> and <clinit> (§3.9), they must not contain the characters '<' or '>'. JVMs use these new relaxed identifier rules as of Java 5 and later.The JVM requires that the bytecode name of a method not contain angle brackets,but it allows them in fields and type names.Actually, there is a problem with putting left angle brackets in type names, sinceleft angle bracket is also a delimiter character in genericsignature encodings, such as LFoo<T>;.(Does the preceding mean an unparameterized type spelledFoo<T>or is it an instance of the generic spelledFoo?)Thus, left angle bracket is a character that the JVMspecification does not realize is dangerous!A future version of the JVM spec. could amend this,and simplify the rules, by declaring angle bracketsillegal everywhere (except for certain method names).It should also make left square bracket legal intype names, except in the first character positionof a qualified type name (where it may denote an array type).There are plenty of other characters which look dangerous, but are innocuous to the JVM.For example, in settings where method names are concatenated with method signatures,it might seem that parentheses pose a danger to correctly parsing them apart again.(Try this concatenation: (I)(D)(L)(J)(;)V.Can you tell which is method name and which is signature?)Such concerns about parsing apply (in various settings) to spaces, newlines, the null character, etc.For this specification, we do not propose to predict all such parsing risks, and instead focus onexactly those characters which the JVM itself disallows.The concept of display names introduced above can help with applications which must produceparseable text that include mangled names.(The example above could be displayed in a more parser-friendly form as '(I)(D)'(L')(J)(';)V.)In the case of method signatures, note that method (and field) names and signaturesare always presented in the class file as separate CONSTANT_Utf8 references,so there is no need (inherent in the JVM) to concatenate or parse them.Bytecode compilers areencouraged to use this mangling whenever they represent a name directlyto the JVM (as a so-called “bytecode name”).  In the very few places (if any) where the JVM orreflective APIs undertake to transform names from the bytecodelevel to language-level strings, they should remove the mangling also.The escape character is not a universal “superquote”.There are only a few escape sequences, and any other occurrencesof the backslash (reverse solidus) character are not treatedspecially. (They do not need further quoting to avoid treatingthem as accidental escapes.) In this, the escape characterworks like it does within Unix shell quoted strings, where(for example) "\x" is a stringwith two characters. It is actually useful define a few extra dangerous characters.In many languages, symbols have attributes beyond their basic spelling.(For example, Fortress symbols have a font attribute,and Common Lisp symbols have a package prefix.)Characters like square brackets and backquote which are allowedby the JVM but dangerous in the present sense are opportunitiesfor adding more structure to the bytecode names of symbols.Languages are therefore free to use the colon character,and (in non-class names) square brackets and colon toadd structure to their symbols.It may be best if the basic spelling of the symbol comes at a fixed position,so that low-level tools have a reasonable chance of demanglingat least that part of the symbol. However, this is a matter foranother layer of software to decide, specifically a metaobject protocolwhich is concerned with uniform naming and access of program elements.A name which consists of a sequence of colons andmangled names may be called a compound name.It is natural to use compound names with the invokedyanamic instruction,to convey necessary information about a call site to the metaobject protocol.Names like x:get andx:set are appealing for building cleanproperty APIs. The slash, dot, and semicolon characters have a more central rolein the syntax of bytecode names and signatures, so languagesshould not use them. Future extensions of the JVM could appropriatethese characters if the JVM itself needed to add structure tobytecode names.For example, the JVM specification allows dot . inside field names,but since manglings avoid that dangerous character in field names,then dot can be used, without conflict, as a delimiter for encodingtuple element references. Remaining Issues In order to test these ideas, I have coded them in Java and written a small test harness called StringNames.java.You are welcome to try it out.There is also mangling code in the OpenJDK, as part of JSR 292. In the Da Vinci Machine patch repository there are unit tests.One bit of work has not been attempted here. We do notconsider manglings over the much more restricted setof characters allowed by earlier versions of the JVM.Because these allowed only valid Java identifiers,some sort of much more complex and disruptive schemewould be required to encode free spellings as Java identifiers.It would probably use the dollar sign $(dollar) as an escape character, and encode non-alphabeticsas sequences of alphabetics in a high-base numbering system.Collisions with pre-existing uses of the escape characterwould complicate matters.Another bit of work saved for later is removal of length restrictions.JVM bytecode names (and signatures that contain them) must berepresentable in less than 65536 bytes of (modified null-free) UTF8.I guess freedom is more of a journey than a destination... Change Log and Acknowledgements March 2008: Per Bothner pointed out that the dollar sign is in fact dangerous,since various bits of code (including Class.java)look for it as a special delimiter in bytecode names.So now we replace it by an escape sequence with a percent sign.(I guess I was too close to that problem to see it!)Note that all this stuff works in Java 5 and later.September 2008: The Da Vinci Machine project has a patch to javac which lets you pass exotic names through the javac frontend. It does not mandate this or any other mangling scheme.February 2009: Improved the language a little, and put in cross-references to invokedynamic and compound names.August 2009: Tweaked the mangling and concatenation rules. (Hat tip to David Chase.)September 2012: Fixed web page damage around backslashes, updated a couple pathnames. (Hat tip to John Cowan.)

Or, watching your language with dangerous characters. Introduction The JVM uses symbolic names to link together the many hundreds of classes that make up an application.  Like source code in the Javapro...


anonymous classes in the VM

Or, showing up in class without registering. Introduction This post describes a VM feature called anonymous classes.This feature is being prototyped in the multi-language project called the Da Vinci Machine,and it is tracked by Sun bug6653858.One pain point in dynamic language implementation is managingcode dynamically. While implementor’s focus is onthe body of a method, and the linkage of that body to some desiredcalling sequence, there is a host of surroundingdetails required by the JVM to properly place that code.These details include:method nameenclosing class namevarious access restrictions relative to other named entitiesclass loader and protection domainlinkage and initialization stateplacement in class hierarchy (even if the class is never instantiated)These details add noise to the implementor’s task,and often enough they cause various execution overheads.Because class of a given name (and class loader) must be defined exactly once,and must afterwards be recoverable only from its name (via Class.forName)the JVM must connect each newly-defined class to its definingclass loader and to a data structure called the system dictionary,which will handle later linkage requests. These connectionstake time to make, especially since they must grab varioussystem locks. They also make it much harder for the GC tocollect unused code.Anonymous classes can partially address these problems,and we are prototyping this feature in the Da Vinci Machine(which is the fancy name for the OpenJDK Multi-Language VM project).Desired features of anonymous classes:load an arbitrary class from a block of bytecodesassociate the new class with a pre-existing host class, inheriting its access, linkage, and permission characteristics (as if in an inner/outer relation to the host class)do not associate the new class with any globally-defined namedo not make the new class reachable from the class loader of its host classput the class in class hierarchy logically, but allow it to be garbage collected when unusedallow the definer to patch class elements in the constant pool, to provide local access to previously defined anonymous classesallow the definer to patch constants in the constant pool, to provide local access to dynamically specified data relevant to the language implementationallow UTF8 elements in the constant pool to be patched, to make it easier to build glue classes from canned templatesThe key motivation is that we want to cut ClassLoaders and the system dictionaryout of the loop.This means there will be fewer locks and no GC entanglements.Drop the last object, and the class goes away too, just as it should.Why the patching stuff? There are a few corner cases where,because we are dealing with anonymous entities, an essentiallysymbolic constant pool is not up to the task. Since the standardclass file format is specified as a byte stream, there is no wayto introduce live objects or classes to the newly-loaded class,unless they first are given names. Therefore, there must besome sort of substitution mechanism for replacing constants(classes, at least) into the loading classfile. Given thatrequirement, it is an easy step to generalize this substitutionmechanism to other types of constant pool entries.The resulting facility is powerful in interesting ways.You can often build a template classfile by passing Java codethrough the javac compiler, and then use constant substitutionto customize the code.Generated code often needs to get to complex constants(e.g., lists or tables) and this provides a hook to introducethem directly via the CP. The string-type constant pool entryis extended to support arbitrary objects, if they are substitutedinto the loaded anonymous class. This need not scare theverifier; it just treats non-strings as general objects.General objects, of course, are not a problem for dynamic languages.Here is a toy example,which actually works in a current prototype. Note that the anonymous classes are defined in a chain, with each new onea subclass of a previously loaded one.The API is a single static method defineAnonymousClass, which is privileged.In the current prototype it is in sun.misc.Unsafe, which is a non-standard class.If it is standardized, it will be given a suitable name. (And suitable security checks!)This method takes an array of bytecodes, a host class, and an optional array of constant pool patches.It returns the newly created anonymous class.Unlike all other class queries, it can never return the same class twice.[New text:]Here is a more polished API.As you can see from the sample output in the test code, an anonymous class has a name (via getName) which consists of the original name of template class, followed by a slash (which never otherwise appears in a class name) and the identity hash code of the anonymous class.This prototype will help provide the basis of further experimentation with other constructs,notably anonymous methods and method handles.

Or, showing up in class without registering. Introduction This post describes a VM feature called anonymous classes. This feature is being prototyped in the multi-language project called the Da...


Notes on an Architecture for Dynamic Invocation

IntroductionIn a previous post I enumerated the various parts that go into a call site in the JVM. In order to support the program structure of other languages, especially dynamic languages, the JVM must extend the present schemes of method definition, invocation, and linkage. We don’t want to create completely new mechanisms, since we want to reuse the highly developed infrastructure of mature JVM implementations.The JSR 292 Expert Group, which I chair, is working on a specification to address this problem in detail, via carefully chosen changes to the bytecode architecture. The first EDR (early draft review) will be available in a few weeks. Meanwhile, I would like to point out some design requirements for method calling in a multi-language world. The key new requirements are dynamic relinking, programmable linkage, programmable guards, descriptor polymorphism, and method handles.In brief:dynamic relinking allows dynamic linkage decisions to be revokedprogrammable linkage allows dynamic linkage decisions to be made reflectively by “bootstrap methods”programmable guards allows call sites to select among overloaded methods according to language-specific rulesdescriptor polymorphism allows call sites and adapters to be managed and connected genericallymethod handles let bootstrap code and call sites refer directly to bytecoded methodsDynamic RelinkingIn Java, as with many languages and systems, it is convenient for each call site to be dynamically linked. That is, a call’s meaning is not fully determined until it is first executed. At that point, various symbols might be looked up and matched together, and as a result the JVM will decide where the call gets directed, and how the type of the receiver might affect the selection of that method.In many languages, a call site’s meaning is always partially provisional, in that future events can change the effect that a call will have. Although Java does not allow the redefinition of classes and methods, some languages do, and this means a call site might occasionally be “relinked”, so that a method previously called by it is no longer a target. Many JVMs already do something like this in response to class loading, if a previously monomorphic method becomes polymorphic. (Hotspot has an operation called “deoptimization” which can have this effect.) The new requirement is that call sites can somehow be edited under the control of language-specific runtimes.A simple way to satisfy this requirement is to allow call sites to be cleared, thus forcing them to be relinked, but with a new chance to take into account changes in program structure (such as method loading or unloading). The very simplest would be to allow all call sites everywhere to be cleared by some sort privileged operation. A very complex way would be to allow call sites to be reflected and edited in detail, thereby allowing the dynamic runtime to be looking over the JVM’s shoulder at all times.Programmable LinkageThe JVM performs dynamic linkage according to fixed rules. These rules are designed with Java in mind. For example, the symbolic method requested by the call site must exactly match the resolved method identified after linkage, and the actual method eventually invoked. The method names and descriptors must exactly match among all the methods. Only the classes (or interfaces) of these methods are allowed to vary, along type hierarchy relations. The set of methods that can be overloaded on a call site (at run-time) is limited to actual methods defined in subtypes of the resolved class (or interface).However, it is common, especially in dynamic languages, for different callers to call the same method in different ways. This means that not all call sites (in a multi-language JVM) will match the actual methods called. There will be mismatches in descriptors and in method names. Also, the actual method eventually called may or may not have a type hierarchy relationship to the classes (or interfaces) mentioned in the call’s symbolic type or symbolic descriptor.It is quite possible that a call which is apparently directed at a receiver argument’s class is first routed to a generic adapter method. Adapter methods performs some sort of language-specific bookkeeping or argument transformation, before (or after) calling the ultimate target method. Or perhaps the call jumps into a language-specific interpreter which emulates the intended method, which has yet to be committed to bytecodes.The upshot of all this is that dynamic linkage must become programmable. Parallel to cases where the JVM runs hard-coded algorithms to link Java call sites, it must also be possible for the JVM to defer to a language-specific bootstrap method to decide how to link a language-specific call site. This bootstrap method will be given information about the call site and actual arguments, and decide how to route the call. It must then return to the JVM an indication of how to handle similar calls in the future. At this point, the call site will be linked. Or, I should say, it will be more linked than before, because it is often possible that fundamentally new types will occur later at the same call site, as the program executes further. Such new types will trigger new calls to the bootstrap method.Programmable GuardsThis leads us to the next problem, of determining applicability. After a bootstrap method has linked an actual method to a call site, it may still be necessary to check that this method is still applicable to each new set of actual arguments, or whether the linkage process is required to begin again.(In Java, after a call site has been linked, there is no second chance for relinking if the method is not applicable to an unexpected argument. An error must be raised.)In addition, if a call site is overloaded (as some must be, in most dynamic languages), there may be several actual methods at a call site, each applicable to a different set of arguments.The crucial idea here is applicability, which is a condition that is checked on every call, and is expected to be usually true for at least one previously linked method.It follows that we need an idea of a guard, which is a bit of logic which classifies the arguments of a call site and determines whether an actual method should be called. Or, which of several methods should be called. In Java call sites, guarding is performed by examining the receiver object’s class, and picking the most specific matching method defined within that class.Here are some classification schemes used to dynamically classify arguments:symbolic type specifiers may help choose among alternate methods (as in Lisp)structure-template matching is used on tagged tree-like structures (e.g., Haskell)unification matching with backtracking determines applicability in Prologequality checks are suitable for a dynamic switch (as shown with PyPy)A language may also look at arguments beyond the receiver; this is called multiple dispatch. It is useful for operations which have algebraic symmetries, such as arithmetic, or for operations with many ad hoc optimizations on argument restrictions.A guard is inseparably linked to a method it guards. In fact, it is better to think of a guard as a prologue to a method, rather than some sort of “mini-method” attached to it. The guard prologue of a method sniffs at the incoming arguments, and if something is wrong, backs up the invocation forcing the call site to select another method, relink, or throw an error.There must be a specific code sequence for “backing out” of a method call when a guard fails. Rather than invent a new bytecode for this, the JVM can use a system-defined exception type to expression guard failure. This need not be expensive.The HotSpot JVM uses a technique like this to speed some calls. If you look at the source code, you’ll find a distinction between “verified” and “unverified” entry points in methods. The former is a short prefix of code which guards the method to determine applicability. If the guard fails, the system either throws an error or edits the calling instruction to perform a slower but more powerful (vtable-based) calling sequence. Dynamic languages deserve a similar mechanism.The possibility of stacking adapters onto target methods adds interest to the design of guards. A guarding adapter would perform checks on incoming arguments, throwing control back to the caller on guard failure, and passing normal control to the ultimate target on success. The target may in turn apply further guards, or may be a further adapter with some other function.Descriptor PolymorphismThe previously described mechanisms could be easily simulated in Java code. Call sites could be represented as invocations on associated stateful objects. Language-specific logic could be described as a concrete subclass of an abstract class with operations like “link this call site”. What would methods look like? The interface java.util.concurrent.Callable provides a hint: The actual method called by a call site must be something like an interface object with a single call (or invoke) method.The problem with interfacesThe problem with such interfaces is that can only provide access to a narrowly restricted set of methods methods, ones which satisfy these restrictions:the receiver type explicitly implements the interface and the methodthe method name is determined by the interfacethe descriptor (type signature) is also determined by the interfaceBy contrast, dynamic languages often need to break out of these restrictions:processing values over which they have little definitional control (platform types like boxed integers or collections)calling methods of a variety of names, and even of computed namesinvoking a method according to a descriptor chosen by the caller based on incomplete knowledge of the calleePut another way, it is impractical to define a customized interface for every distinct shape of call. And, in different ways, it is impractical to force a single interface to handle all calls. Proposals to put function types into Java try to steer a middle ground, but for the language implementor, the problem is that interfaces contain too much type information, without the right kind of genericity.What about reflection?Taking a different tack, java.lang.reflect.Method provides a way to invoke any imaginable method, but with two serious problems. First, there is no way to express general adapters as reflective methods, since an adapter is implemented (usually) as a multi-use method partially controlled by data, which indirectly calls its ultimate target method. But reflective methods do not have a data component; they are good only for a single use. The second problem with reflective methods is that they make one descriptor (type signature) serve for all, with a large simulation overhead. Primitive types must be boxed, as must return values, thrown exceptions, and the argument list as a whole. Reflective methods trade away the JVM’s native invocation sequence in exchange for genericity over calling sequences.Polymorphism without simulationThese unpleasant alternatives point to an as-yet unexploited middle ground in the JVM, of true polymorphism over calling sequences, which execute natively and without simulation overhead. Consider a hypothetical invokesignature instruction, like an invokeinterface but without an associated interface. The adjusted restrictions on the receiver would be:the receiver implements the desired methodthe method has the name and descriptor specified by invokeinterface instructionThis hypothetical instruction (by itself) does not meet the other requirements in this note, especially those pertaining to linkage. In order to link such a call programmatically, it must be possible for create target methods for the call to branch to, and these will have to be objects in their own right. (Else adapters become impossible.) In a true dynamic invocation scenario, there are two receivers, the nominal receiver of the call (which is probably just a distinguished first argument), and the target method whose guards have been satisfied (if any). As noted above, the target method may be an adapter, a direct reference to a real bytecoded method, or something complex like a closure over a tree-walking interpreter.The method name in the invoke instruction is significant to the linkage process, but the actual invocation of the target method should use a system-wide name (say, invoke) so that we don’t have to rebuild an adapter architecture for every distinct method name. The adjusted restrictions on the call are therefore:the nominal receiver has no restrictionsthe target method implements a method named invokethe target method accepts the nominal receiver and other arguments according to the instruction’s descriptor, perhaps with small variationsSince descriptors do not play a central role in linking such calls, it is reasonable to relax the requirement that caller and callee descriptors match exactly. We must preserve both the (likely) intent of the caller and the type-integrity of the system, as checked by the verifier. But, just as the verifier allows some shift between the stacked types in a frame type-state and the types in a call descriptor, the JVM can allow certain safe shifts, such as:boxing and unboxingconversion of a reference to one of its supertypescasting of a reference to a subtypeThat last is slightly doubtful, since it could fail. But it is a convenient feature in many languages, and would ease the bootstrap method’s task of creating adapters, since they could then be simplified by type erasure. It is a requirement if target methods are to “play nicely” with Java’s type erasure rules.There are only a few more bits to look at before we can begin to imagine how a bootstrap method might link target methods into a dynamic call site, and those pertain to method handles.Method HandlesSo, a bootstrap method is asked to provide a target method for a dynamic invocation site. What happens next? In the case where the dynamic language is making a Java-like call (perhaps to a system-level object like a String or OutputStream), all we need is to emulate both the compile-time and run-time lookups to find the desired method. The bootstrap method should then link that method into the call site.This linkage could be done with reflective methods, but this is a very limited solution. It is better to define a new kind of reference, a method handle, which refers as directly and nimbly as possible to the desired bytecoded method. What interface should it implement? Answer: Not much of one, since interfaces are not generic enough across signatures. The method handle object must behave as if it contains a method named invoke (or whatever the standard name is), with a descriptor which exactly matches the bytecoded method. The bootstrap method can return this method handle, which then (as sketched above) will then allow the call to complete in the desired manner.This is very odd, since the JVM does not anywhere else have instance-specific methods. But descriptor polymorphism, if applied to arbitrary methods, requires such odd objects. In effect, the descriptor itself serves in place of the object’s interface. Indeed, the JVM may implement method handles with such interfaces under the covers, though that may be inefficient. It is more likely that the JVM will implement calls to such things with a quick object-specific descriptor check, followed by a direct branch to the bytecoded method.The interesting operations on method handles are the same as with any system of functional types:creating one from an isolated methodcreating one from a bit of code, with associated data valuesinvoking one on some argumentscreating an adapter around oneasking one for its argument and return typesIt is a problem of detailed design, for another day, to explain how each of these operations can be expressed in JVM bytecode without further innovations in the JVM’s type structure. Preferably, they should be expressible in standard Java APIs. Certainly, inner classes or closures can provide a plentiful source of invoke method receivers, to use as adapters or even interpreter entry points. Also, the JVM can provide higher-order functions to perform currying, varargs management, and other similar transformations. These functions can be controlled via a reflective-style API that simulates argument lists with boxing, but the JVM can implement the opaque method handles themselves without the boxing.Just as normal Java calls are checks for legal access at linkage time, dynamic calls must also be checked. When the details are worked out, we find that access checks must be done (by the bootstrap method, on behalf of the caller) when a bytecoded method handle is created. Once the handle is created, no more access checks are needed. (Access checking on every call is an additional major overhead with reflective methods.) The privilege to access a private method is given to anyone who receives a handle to that method. Is this less secure than the present system? No, because the module which created the handle must have had access to that method in the first place, and could therefore have wrapped another kind of handle around the same method, and passed it around just as freely. Method handles are just as safe, and inherently faster, than reflective methods.ConclusionI hope the logic of this design persuades you that dynamic languages need these features. They are not a cheap weekend hack, but they will pay for themselves as the JVM expands its usefulness with the new languages that are arising today.In short:dynamic relinking lets languages change their application structure dynamicallyprogrammable linkage gives full authority over method linkage to the languageprogrammable guards are needed to express post-linkage, per-call type checksdescriptor polymorphism lets calls run at native JVM speed, without simulation overheadsmethod handles give the language the full run of all bytecoded methods, at native JVM speeds

Introduction In a previous post I enumerated the various parts that go into a call site in the JVM. In order to support the program structure of other languages, especially dynamic languages, the JVMmus...


A Day with PyPy

Yesterday, Charles Nutter and I played host to a roving band of PyPy researchers. Their aim is to get people interested in their new techniques in dynamic language compilation, and they succeeded with us. I had a special goal to figure out how the JVM might be useful in their world.Here are some bits from the conversation that stick in my memory...When dynamic language implementors say “C speed” they are talking about the speed of the original C-coded interpreters of their language. The original technology is almost always an AST-walking interpreter, which is about the slowest way to go about it. By contrast, when JVM implementors say “C speed”, they are comparing a Java program with a transliterated C or C++ program, compiled, optimized, and statically linked. Those two standards are separated by two or three orders of magnitude.The PyPy idea of promotion includes the idea of capturing runtime values for use by the compiler, by lazily growing a PIC (polymorphic inline cache). A PIC is a dynamic call site (or other branch point) which directs control flow based on selectors (often class IDs). It also keeps a record of those selectors as they appear. PyPy characterizes this as a “growable switch”. As each new selector appears, the PIC grows another branch edge, to lazily linked code which is appropriate to that selector. This is more or less classic. (HotSpot relies on interpreter profiles more than PICs, but most implementations of the forthcoming invokedynamic instruction will probably create PICs.)Also, PyPy promotion involves lazy compilation, even at the basic block level. The code “appropriate” to a selector might be immediately linkable, or it might not yet exist. In the latter case the JIT creates the code, potentially in the full type and IR context of the caller, with the particular selector. That is how it is promoted: The JIT IR downstream of the switch can “see” the selector value as a compile-time constant.Moreover, the generation does not stop with the end of the callee method (assuming the PIC was for a method call), but it continues with the next statement of the caller. In this way, the promoted selector value is available downstream of the call. The compilation is not just of the method call, but of the caller’s whole continuation from the point of the call. I think I want to call this kind of JIT a continuation JIT, because around here JITs do not use continuation passing style. (Other VMs do this—kudos.)The cost of all this is frequent JIT runs, potential exponential code blow-up, and some trouble getting loops right. Another problem is the perennial trade-off between thoroughness and agility: The PyPy JIT runs most naturally on small pieces of code, with a correspondingly small window of context for the optimizer. To get global optimizations (such as loop transformation that HotSpot provides) PyPy code will need a second tier of recompilation, once the dynamic flow graph has settled down a bit. All in all, it is a very promising approach.Since PyPy PICs do not return anywhere, the underlying platform really needs to supply a reliable tail-call optimization (TCO). We found some workarounds for the current set of JVMs, but the lack of TCO gets in the way of directly optimizing code produced by a continuation JIT... or a trace-based JIT.PyPy promotion does not apply only to the class ID of a message receiver (as with classic PICs). A promoted switch selector can be a byte loaded from a bytecode stream, which is the way they partial-evaluate their interpreter loop into a templating JIT. A selector could also be composed from two or more receiver classes, which is how one could do CLOS-style multiple dispatch (for generic binary arithmetic, etc.).We talked about method replacement, especially the kind where a method is replaced by a better-optimized version. The invokedyanmic instruction, combined with method handles, will provide some hooks for this. The HotSpot JVM can rewrite groups of stack frames (in both directions: optimizing and deoptimizing). We realized that first-class continuations in the JVM would allow Java code (in a dynamic language runtime) to express such transformations effectively. (Not surprising in hindsight, since my prototype version of call/cc on the JVM uses vframes, which is the mechanism HotSpot uses for deoptimization.) First-class continuations could also help with stack inspection operations and coroutines.We talked about class splitting, in two cases: boxed int and Dictionary. In the latter case, you want a special system-designed subclass for (say) string keys only. In the former case, you want a special system-supported representation for small integer values which could be packed into an immediate field in a tagged pseudo-pointer.The intermediate representation is a graph parsed from an RPython (restricted Python) program. RPython is a true subset of Python, but one which avoids the really dynamic features. To a first approximation, it is the part of Python with Java semantics: It can be statically typed. Another restriction is that its world of types is closed, so the compiler can enumerate the type hierarchy statically. (So-called application classes, loaded dynamically need more metadata, an extra level of indirection away.)Like any fully dynamic language, the RPython optimizer performs scalarization pretty eagerly. (Or if you like, it performs boxing lazily and unwillingly.) This is needed for unboxing the normal Python integer type, and also for decomposing things like argument lists. It doesn’t work so well (yet) outside the closed world of RPython.Why does RPython makes a good intermediate language? Partly because it is block-structured, typeable, easy to program in, has nice types like lists, and is a true subset of a rich parent language (Python). So far, Java is like that also. Probably Python would support little languages for specialized problem spaces like pattern matching or template generation, but perhaps the PyPy people feel that would be a programming luxury they cannot afford right now.The main edge RPython has over Java as an intermediate language is its complete dynamic typing, which allows the same code to work on compile-time and run-time representations of objects. (It sounded like abstract interpretation to me, but they say it is more a species of partial evaluation.) There is a compile-time object space which mirrors the regular (R)Python object space.I would think that doing abstract interpretation over compile-time booleans would work best when the if/then/else statements, etc., are represented with functions and closures. Then, as Yogi Berra said, “When you come to a fork in the road, take it.” A side effect of each branch would be generating the IR for that branch, with the boolean promoted to a constant. One of the trade-offs in RPython, though, is to eliminate closures. This simplifies variable analysis.They were excited to hear about our nascent multi-language VM project. I expect we can collaborate there on some experiments with invokedynamic.All in all, it was an interesting and enjoyable day.

Yesterday, Charles Nutter and I played host to a roving band of PyPy researchers. Their aim is to get people interested in their new techniques in dynamic language compilation, and they succeeded with...


Anatomy of a Call Site

IntroductionIn the Java Virtual Machine, method calls are the way work gets done. This note is a slightly simplified description of the parts of a call site and what they do. I will also sketch some of the implications of this design on the JVM’s support for languages other than Java.For the absolutely complete and correct details, you’ll have to read the Java VM Specification, version 3 as amended by JSR 202. If the alert reader of the account below finds an inconsistency with the JVM specification, the latter is of course correct (and I myself would appreciate an alert).Call, in Fourteen PartsHere are the parts that make up any method call, as found in the JVM bytecodes:bytecode instruction — the actual calling instruction in the bytecode streamsymbolic name — a fixed identifier (a string)symbolic descriptor — a fixed tuple of argument and return types (a formatted string)symbolic type — a fixed symbolic reference to a type (class, array, or interface)symbolic method — symbolic reference to the method (if any) of the given symbolic type, name, and descriptorresolved type — a fixed, loaded type which matches the symbolic typeresolved method — a fixed, loaded method which matches the symbolic methodreceiver — optional, a variable object referencereceiver type — optional, a variable class or array type, derived from the receiverarguments — a variable tuple of primitive or reference values (types derived from the descriptor)actual method — a variable method, the actual entry point of the call (not symbolic)return value — an optional variable primitive or reference value (type derived from the descriptor)thrown exception — an optional variable exception (or other throwable) produced instead of a return valueexception handlers — zero or more continuation points in the same method, marked by exception types(The term “symbolic” is adopted from the JVM specification. The JVM specification uses the term “target” instead of “receiver”, but the HotSpot JVM uses the term “receiver”, which dates back to its Smalltalk roots.)Here, a “fixed” part of the call is one which can be determined at some point before the first call. A “variable” part is one which may change over time, each time the call is executed.A bytecode instruction is one of four kinds: invokestatic, invokespecial, invokevirtual, or invokeinterface. The format of all these is substantially the same. The instruction format includes an operand field which refers to a constant pool entry that encodes the symbolic type, name, and descriptor.If the receiver is missing, the bytecode must be invokestatic. If the receiver is present and the resolved type is an interface, the bytecode must be invokeinterface. Otherwise, the symbolic type is a class or array, and the bytecode must be invokevirtual or invokespecial.The resolved type is derived from the symbolic type at some point before the first execution of the call. Likewise, the resolved method is derived from the resolved type by searching for the given symbolic name and descriptor. If these derivations fail, errors are raised, and the call is never executed. We shall generally pass over such errors with a respectful silence.Both the resolved type and resolved method must be accessible to the class containing the call instruction.If the method returns normally, it produces a return value (if not void). If the method returns abnormally, it produces a thrown exception. As a third possibility, it might never return at all.Treatment of receiverIf there is a receiver, the JVM ensures that it is not null. It does not convert the receiver in any way, although it may dynamically test the receiver’s type. In the case of invokespecial and invokevirtual, the JVM’s verifier proves statically that the receiver type is always a subtype of the resolved type.In the case of invokeinterface, the verifier allows any reference type, but the JVM performs a dynamic check on the receiver type, on every call. This check ensures that the receiver type is actually a subtype of the resolved type.From one point of view, the receiver is not an argument, because its type does not appear in the call’s symbolic descriptor. However, in the layout of the JVM stack (of the caller) and locals (of the callee), the receiver looks exactly like an initial argument before the arguments mentioned in the descriptor.We can therefore talk about the effective call descriptor of a call, which is the type of the tuple on the stack which the call consumes, plus the type of the return value which the call produces on the stack. If the call is not invokestatic, this effective descriptor can be spelled by modifying the symbolic descriptor, prepending the call’s symbolic type.A callee method also has an effective method descriptor, which is the type of the tuple initially in the JVM local variables on entry to the callee (and also describes the value eventually returned back to the caller). Again, this effective descriptor may be spelled (if the method is non-static) by prepending the method’s defining class to the method’s symbolic descriptor. The caller and callee must agree exactly on the symbolic descriptor used to make the call, but they may disagree in the first argument of their effective descriptors. This is safe because the JVM tests the receiver type so as to ensure that no improper method is called.Treatment of argumentsThe JVM does not perform conversion on arguments. The JVM’s verifier proves statically that each argument type will match the corresponding descriptor type. The matching is exact for primitive types in a descriptor, except that any type smaller than int can convert (with sign extension) to an int. The matching for class or array types follows the class hierarchy in the expected way. An interface type in a descriptor will match any reference argument, whether it implements that interface or not. (The JVM defers interface type checking until the reference is used as the receiver of a call.)Treatment of arguments — FuturesThere is no essential reason the JVM cannot convert the arguments, as long as the conversions “preserve information”. More specifically, they should not violate intentions shared by the caller and the callee. The JVM cannot know such intentions, but it can provide conventions which align well with the implicit conversions found in most languages.For example, if the caller passes an int and the callee receives an Integer wrapping the passed value, no type safety is violated, no information is destroyed, and the intentions of caller and callee should continue to match accurately. The Java compiler performs this conversion (called “autoboxing”) as part of method calls. The inverse conversion (“unboxing”) is also reasonable. Conversion between reference types is also reasonable: If a caller passes a String and the callee expects any Object (which includes strings), there would be no harm if the JVM allowed the different descriptors to match. The inverse conversion (casting, with the possibility of a ClassCastException) is also reasonable. Again, the Java compiler performs this conversion when it converts between generically typed methods and their erased types.A more spectacular (but still valid) conversion would be to package up some or all of the argument tuple into an object array, and pass it as a single argument to the callee. As long as the callee is expecting this format of arguments, again the structure of the program as a whole is preserved. The Java compiler performs this transformation for varargs methods. The inverse would also make sense: A caller could pass an object array, with the callee receiving a tuple of arguments. The Java Core Reflection API performs this conversion on every call to Method.invoke.Here are some basic argument transformations which could be considered to be intention preserving:widening conversions between primitives (byte to short to int to long, float to double, etc.)conversion from any reference type to any of its supertypes (e.g., Object)casting conversion from any reference type to any subtypereference casting conversion to any interface typeargument list boxing (zero or more arguments to an object array received by a varargs method)argument list unboxing (an object array to zero or more arguments, passed to a non-varargs method)converting a receiver to or from an initial argument (preserving the effective descriptor)adding a new argument (e.g., an optional argument or closure data; requires more information about callee intentions)removing an argument (e.g., an explicit method selector or method handle; requires more information about caller and callee intentions)Such conversions would violate the rules of Java method resolution, but other languages would find them useful. (Maybe even a future version of Java…) In particular, dynamic languages routinely convert back and forth between more and less specific call descriptors. For example, Lisp’s APPLY function performs argument list unboxing. Any kind of bridge from a dynamically typed language to Java APIs is likely to perform many unboxing and cast conversion on arguments passed to regular Java methods. Although specific languages are likely to require even more types of argument conversions (such as between strings and numbers, or lists and arrays), it is likely that the JVM can reduce their implementation complexity by providing a basic repertoire of conversions between calling sequences, including some of the above conversions.All this about implicit conversions of arguments assumes that there is some way of linking a symbolic method with one descriptor to an actual method of a different descriptor… read on for that part!Treatment of return valueThere is a symmetry between the treatment of outgoing argument values and incoming return values. Both are subject to the same type restrictions, as mediated by the descriptor.The JVM’s verifier proves statically that returned value matches the descriptor type of the method doing the return, except for interface values.Treatment of return value — FuturesReturn values could be subjected to intention-preserving transformations much as arguments could be.More interestingly, there is no fundamental reason the JVM cannot return several values from a single call. It would be possible to slightly extend the syntax of method descriptors to allow several return values to be specified just as several argument can be. This would be useful for languages that feature tuples; it would allow compilers to avoid boxing a tuple value on return from a method.For languages which support dynamically selected multiple value returns (e.g., Common Lisp), a varargs return convention would be simple to create, corresponding to the varargs argument passing convention already in the JVM. Conversion between varargs and positional value passing would be intention preserving for return values just as for argument values.Treatment of exceptionsThe verifier does not check exception types. The JVM is always ready to receive any throwable from any call, without distinction. Exception handlers are defined by a per-method table indexed by bytecode index ranges, and a handler is often shared by several call instructions. A thrown exception which does not match any handler terminates the calling method abruptly, directing control to its caller. This process of popping the stack continues until a handler is found or the thread stack is completely emptied.Determination of actual methodIn the case of invokespecial and invokestatic, the actual method is just the resolved method. Bad things happen if the symbolic method cannot be resolved, or if the actual method is abstract or native and not loadable. After resolution, the call site can jump directly into the actual method.In the case of invokevirtual and invokeinterface, the actual method is derived from the resolved method by searching for an override in the receiver type.These rules insure various type safety properties of the JVM. Here are some crucial ones:No actual method will be passed a receiver of an unexpected class.If one method overrides another, the overriding method gets control first (except with invokespecial, which has other limitations)The formal parameter types of every method may be trusted (except for interface types).In particular, if a formal parameter is a class, every actual argument will be that class or a subclass.Oddly enough, interface types per se are never statically verified. The JVM verifier (though not the Java language) will allow any reference to be passed as an actual argument under an interface in a descriptor; it does not attempt to prove interface types. This is why invokeinterface must always perform a dynamic type check.Here are more details: A method M defined in a concrete class C overrides a method N in a class or interface B if the following are all true:C is a subtype of BC can access NM and N have identical names and descriptorsC and B mean the same thing by every name in M’s descriptorM and N are neither private nor staticN is not finalThe accessibility restrictions complicate things a bit, since package-private methods with the same name and descriptor and a common superclass can be mutually inaccessible, and therefore neither overrides the other.Since two independent class loaders can assign the same class name C load two distinct (and incompatible) types, purely symbolic descriptors are not strong enough to ensure the type safety of arguments passed under the name C. That is why there are extra checks for the meaning of names found in method descriptors. These checks are called signature constraints, and that’s all I want to say about them in this note.Determination of actual method — FuturesA JVM call instructions raise various kinds of errors when its symbolic method fails to correspond to a resolve method. Currently, this raises an error of some sort (about which we are being vague), but such a mismatch also provides room for interesting extensions. Most dynamic languages have a “message not understood” hook, which can be used to assign meaning to method calls that have no built-in meaning. In a specific language, such a hook will usually contain reflective code provided by a library writer (sometimes an end user) which looks around for a way to satisfy the caller’s intentions in a less literal way. For example, many languages provide a way for a dictionary object (Java calls it Map) to implement implement any protocol as long as the dictionary contains a suitably keyed closure for each of the protocol’s methods.In a multi-language VM (and with the JVM, specifically), such a hook needs to be placed on call sites, not on specific objects, because it is call sites that are most directly tied to the intentions of its language. An object might be shared between several languages, and in fact it would be a sign of weak VM design if each language had to implement its own types like Integer, String, Object, etc. Therefore, instead of several languages competing to define the API of shared types like Integer, each language needs its own hook for extending the shared APIs.There is another difference between the JVM and single-language systems which bears on the design of this hook, and that is the JVM’s strong distinction between reflective and normal method invocation. As discussed below, reflective calling sequences are slower and more complex because they perform many steps of boxing, unboxing, dispatching, and access control on every call. With normal JVM calls, these steps, if done at all, are finished in a linkage step before the first call executes. A “message not understood” hook appropriate to the JVM needs to work this way also: It need to perform its linkage work once before a number (potentially unlimited) of actual method calls. When the call to the hook delivers an actual method to be used, this method should be associated with that call site and reused for similar calls in the future.A final difference between the JVM and a specific language’s runtime is that the details of the “method not understood” should not be tailored to one language, but should rather be a general and flexible means of satisfying method calls. Single inheritance or even single-argument dispatch are too limited a range of functionality, especially for dynamic languages. (Consider the case of an extensible “add” operation in a symbolic algebra library.) This means that there needs to be a low-level convention for associating the receiver and argument types with an actual method that has previously been associated with those types, and can be reused in the future without further up-calls to the hook.We will call this matching actual method the applicable method. It is characterized not only by a particular method to invoke, but also a set of guards (argument and receiver types, or other constraints) which must be satisfied if the actual method is to be run. Because the guards can fail, there is the logical possibility that a given call site might have several applicable methods, with distinct guards.Thus, the best future shape of a “message not understood” hook for the JVM is probably a bootstrap method with the following characteristics:attached by each language-specific compiler to (some of) that language’s call sitescalled when the call site’s symbolic method cannot be resolved by the JVM using normal rules (which are common to all languages)called with information about receiver and argument typesreturns an applicable method, including a guard to test applicabilityA bootstrap method would be invoked, at a compiler’s request, when the JVM cannot link the call site normally, nor can it find a previously used actual method that is applicable. The returned applicable method would be cacheable (or not!) and reusable when applicable method’s guards permit.Note that the applicable method could differ in its symbolic name and symbolic descriptor from those of the call site. This is a great difference from the current JVM behavior, which uses exact mapping of name and descriptor to drive method linkage. The name (in this setting) is irrelevant, but the descriptors must be reconciled somehow, or the JVM will no longer be type-safe. The bootstrap method could always make a mistake, and return a grossly mismatched method; this is logically equivalent to a linkage error.More interestingly, the bootstrap method could return an applicable method whose descriptor differs from that of the call site, but only by low-level argument conversions, such as were discussed above. In this case, the call sequence itself should include the necessary argument conversions, without further ceremony.Some of these features can be intrinsic to the JVM. Others could be implemented by low-level Java code, either specific to one language or (hopefully) shared by a group of languages. Some descriptor reconciliation is so obvious and low-level that it can be done by the JVM, while language-specific conversion must be handled by the bootstrap method. For example, a language which can convert numbers to strings should call Object.toString in a bridge method that then invokes the intended actual method, but it is the bridge method that must be returned from the bootstrap method to the call site.Actual method structureAn actual method has the following parts relevant to its invocation by a call instruction:formal receiver type — an optional fixed reference type (absent in static methods)formal parameter types — a fixed tuple of primitive or reference typesformal return type — an optional fixed primitive or reference typestack frame — a new block of storage for the incoming receiver and arguments, plus various temporariesbytecode — a series of bytecode instructionsThe actual method’s bytecode is executed in its own stack frame, whose first few locals are initialized to the incoming receiver and arguments. Eventually, a return instruction may pass a value back to the caller, or an exception may be thrown back to the caller.Actual method structure — FuturesIf call site linkage is extended to include method handles, then the actual method would be a method handle, an object in its own right which would be invoked. It is natural to think of such an actual method as an arbitrary closure, an object which the JVM would call in order to fulfill the intention of the call site. If the actual method is in fact literally a method handle, then the JVM would immediately call that method (either virtually or statically, as the case may be).Sometimes complex or language-specific argument transformations are needed before the intended actual method is entered. (By “intended” I mean the one which the programmer thinks is getting called directly.) These should be handled by a bridge method (aka adapter ethod). The bridge takes control, adjusts the arguments, calls the intended method, and then adjusts any return values or exceptions on the way back. When it is linked into a call site (by the bootstrap method) it must take a form compatible with a plain method handle. This implies that method handles and closures should be very similar in form, and to some degree interoperable.Such bridging of calling sequences is complex to describe, but straightforward to implement, and to execute. There is one relevant optimization which is not obvious, and that is tail recursion elimination. When a call must convert arguments but not return values, it is most efficient for the eventual method and the bridge method to share a stack frame. This can be described (from the bridge method’s point of view) as a tail call at the end of the bridge method to the eventual method. When this tail call completes, the control stack looks as though the caller had directly called the eventual method, and it will return directly to it.As I have pointed out in another note, tail calls are useful in their own right, if the language makes firm promises about them. (If they are an optional optimization, they are less useful, because programmers cannot rely on them to code up state machines such as loops.) A call is a tail call when the caller offers up its own stack frame to be reused by the callee, instead of requiring the callee to create its own stack frame.Another future form of method target could be a continuation or coroutine. Stack frames could be allocated on the heap, and/or serialized and deserialized in groups to secondary data structures. But this note is already too long to talk about those ideas.Type safety and access controlThe JVM is type-safe. This prevents bad behaviors such as the following:inspecting the machine-level representation of a referenceforging a reference from a primitive valueinvoking a method on a receiver that does not implement that methodaccessing a field from an object that does not possess that fieldThese false steps, if allowed, could crash the JVM, expose private data, or allow an attacker to perform privileged actions.Type safety depends on many factors, such as the fact that the JVM heap is automatically managed and each heap object has a specific type which is easy to check and cannot be changed. The type safety of calls derives from the correct matching of arguments and receiver with the actual method’s formal parameters and formal receiver. In particular, the values consumed from the caller’s stack must agree in number and type with the values stored in locals on entry to the callee.The JVM also provides certain assurances about access protection, for example:a private method can only be called from its own classa private field can only be read or written from its own classa final field can only be written from its own classa package-scoped method, field, or type can only be referenced from within its own packageprotected methods or fields can only be referenced in certain ways that restrict use to subclassesmethod overrides are reliable, except for privileged super invocations(More details: A protected member of class D can be accessed from a class C that is a subclass of D or in the same package as C, and there are additional constraints on the symbolic type and receiver type. An overridden method in class A can be invoked, via invokespecial, on a receiver whose class C overrides that method, but only from within some class B between A and C inclusive.)The effect of access control is that programmers can mark parts of their code so as to prevent untrusted code from touching those parts. The high-level rule is that nobody can request a non-public service or make a non-public state change except parties that have a right to do so. Of course, such privileged parties can, as mediators, provide those actions to the public. This is why, for example, many private fields have public getter methods.(For many of us, in practice, the benefit of access control is not so much self-protection as protection-against-self. I don’t trust the code I write next month to interact properly with some finely-balanced algorithm I wrote this week, so I often put fences around chunks even if there are no security problems that could arise. And the same goes between me and my esteemed colleagues, both ways. Good fences make good coders.)Type safety and access control — FuturesThere is not much need for an overhaul of the JVM’s type and access control model. But there are a few points to mention, in connection with the them of enhancing method calls.Certain aspects of Java inner classes, and closures, will be easier to compile if the JVM would let groups of classes share private methods or fields, and this is true of non-Java languages also.If the JVM were to allow separate method loading (“autonomous methods”), then such a method could benefit from sharing of privates, if the system gave permission to load it into a host class’s interior. For example, a debugging method could be adjoined to String, which would give it access to String’s private fieldsIt is quite helpful (when making proofs of correctness or security) that a JVM objects cannot change its class. Languages which provide objects with typestate or mutable classes are generally implemented with an extra level of indirection, and the JVM will accommodate such patterns reasonably. There may be a call in the future for “class splitting” or “class changing” features, where a JVM object can (in some limited and structured way) modify its class pointer. An example of class splitting could be refining the raw type List into a number of types List<Integer>, etc., so that list objects can “know” what their creator intended their element type to be. The array types native to the JVM are something like a set of splits from the raw array type, an array of Object.The most aggressive enhancement to JVM object mutability is probably Smalltalk’s become method, which replaces one object (everywhere) by another. This feature certainly interferes with many optimizations, and requires expensive and pervasive checks (akin to null checks). But it could be the right answer if the JVM were to support lazy functional programming, or languages with “futures”. The JVM’s GC could give special help in updating all references to an object which had changed its identity.Reflective modelAny call instruction (except invokespecial) can be emulated via the method java.lang.reflect.Method.invoke. The key emulated parts are as follows:resolved method — the receiver of the invoke callreceiver — the first argument to the invoke call (which is ignored if the call has no receiver)arguments — an array of reference values (primitive values are emulated by wrappers like Integer)actual method — computed from the resolved method and receiver type as in the normal casereturn value — a reference value (primitive values and void are emulated by wrappers and null)thrown exception — if the actual method throws an exception, it is wrapped in an InvocationTargetException, which is then thrownVarious error conditions are reported via thrown exceptions. A few errors are logically unique to reflection:null method object, null or wrong length argument arraytype mismatch between formal parameters and actual arguments (reflection does not use descriptor matching)type mismatch between receiver and actual method class (reflection does not use the verifier)Since reflection does not use symbolic method references, reflective calls cannot produce symbol resolution errors.Reflection can produce linkage errors if the caller of the invoke method cannot access the resolved method. These access checks are performed dynamically, every time Method.invoke is called. They can be quite expensive, since the invoke method must walk the stack to identify its caller. Method objects can be configured to omit these checks.Method objects are not only used for invoking methods, but also for responding to a wide variety of symbolic queries. For example, even if you have no privilege to invoke a particular method, you can still ask for its name, parameter and return types, and other information. These purposes require data structure overheads and API complexity far beyond the simple task of method invocation.Reflective model — FuturesThere are several weaknesses in the current Core Reflection API:the same API is used both for symbolic description and for method invocationreflective methods are expensive to construct (requiring elaborate caching schemes)invocation is always fully boxedexceptions are wrapped and must be translatedaccess control is performed on every callthe invokespecial instruction cannot be emulatedAll these weaknesses can be solved by introducing a lower-level JVM construct called method handles. This construct deserves a note of its own, but in short it is:lightweight (only enough structure to support invokes)narrow API (no other operations besides invoke)directly invocable (the invoke descriptor is that which is native to the method; no extra boxing)stateless (there is no setAccessible method)optimizable (the invoke method can be compiled down to an invoke of the target)unchecked (no access checking per call)The last point (unchecked invocation) might sound like a security hole, but it is not. The checking is simply done when the handle is created, rather than when it is used. For example, a handle to a private method could only be created by code which already had the right to call that method. The alert reader can see that this aspect of methods is exactly as secure as inner classes, which can also provide public wrappers for private methods.The main difference between method handles and an emulation with inner classes is a simplified API. Specifically, there is neither an interface which holds the invocation descriptor, nor a class which holds the bridge method. Instead, the JVM simply provides a direct connection between any callee method and its eventual caller. This leads to fewer overheads when defining and using method handles.Just as the ldc instruction was extended to apply to Class constants (in JSR 202), it might be natural to extend it to apply to method handle constants in a JVM which supports method handles. Thus, there would be two interesting things to do with a symbolic method: Call it, or get a handle to it, preparatory to calling it later. Both operations would be subject to the same linkage and access rules.ConclusionI hope this has been an interesting summary for those who are curious about the inner workings of the JVM. I have also tried to explain and motivate some plausible future expansions for the JVM, to support a broad array of languages. There is active work and discussion in this area, with JSR 292, and with the jvm-languages group at googlegroups.com. Stay tuned!

Introduction In the Java Virtual Machine, method calls are the way work gets done. This note is a slightly simplified description of the parts of a call site and what they do. I will also sketch some...


JSR 292 meeting summary

From: John RoseDate: October 18, 2007 12:22:49 AM PDTTo: jsr-292-observers@jcp.orgSubject: JSR 292 EG meeting summaryHello, JSR 292 observers.The Expert Group has been quietly considering designs for an "invokedynamic" instruction for about 3 months.Yesterday (10/17/2007) we had a "kickoff" meeting to discuss our next steps.Here are some highlights from that meeting: We are making significant changes to the JVM instruction architecture. The occasion for this is a crop of new dynamic languages. There are business reasons for the JVM to support these languages. Liz Kiener of the JCP.org PMO introduced us to the Java Community Process (JCP). EG consensus: We want public review early. This means producing an EDR (early draft review required by JCP). To make the current Proof of Concept design public, we need to pass the "red face test". That is, the design shows a direction that we as an EG think is worth explaining and improving. Since this is not a voting milestone, the EDR spec. can be incomplete and have unresolved issues. The original JSR 292 language includes not only invokedynamic but also some sort of class modification or extension. Based on recent experience, what's needed beyond invokedynamic is probably some sort of lightweight behavioral extension (method handles, autonomous methods, etc.). Further discussion is required. An OpenJDK open source sub-project is going to help create the RI (reference implementation required by JCP) for JSR 292. This project (the MLVM) is likely to include other changes, and it is up to our EG which changes we think are ready to standardize and which to push off to MR (maint. rel.) or another JSR. We should use jvm-languages@googlegroups.com as a sort of early comment source, to help us shape our work. We will continue to work on EDR readiness and meet again soon.Best regards,-- John RoseChair, JSR 292 http://jcp.org/en/jsr/detail?id=292P.S. To join the observer list, send an Email message to listserv@java.sun.com.The subject doesn't matter, and the body must look like this (replace the example name with your own name): SUBSCRIBE JSR-292-OBSERVERS Jim Andrews aka Dynamo

From: John Rose Date: October 18, 2007 12:22:49 AM PDT To: jsr-292-observers@jcp.org Subject: JSR 292 EG meeting summary Hello, JSR 292 observers. The Expert Group has been quietly considering designs...


Scope ambiguities between outer and super

Or, combing out a stubborn tangle.Gilad Bracha has published a nice paper reviewing an interaction between inheritance and block scoping, specifically the problem of deciding whether a simple (unqualified) name comes from a superclass or from an outer scope. This issue only arises in languages which can nest classes within block scopes, and intermix regular definitions with subclass definitions. This issue arose for Java when we added nested classes.Similar problems arise any time two definitions of the same name come into scope in one place. Examples include shadowing local variable definitions, import statements, multiple inheritance, or method overloading. The usual way to resolve ambiguity in a name is to arrange the candidate definitions in some sort of order of specificity. (It need not be a total order, but total orders are usually the simplest choice. If there is a total order, it can be viewed as a search order, in which potential definition sites are searched) Then language uses the "most specific" definition, if there is a unique one, else the program is in error. When such rules apply to block structure, they almost always prefer more recent, more "inner" definitions over earlier, "outer" definitions. (This is more or less what block scoping is.)When this generic practice is applied to a combination of block scoping and inheritance, the current class at a given scope level imports its inherited names, as if they had been defined along with the current class. (A subclass definition can be viewed including an import or even a reassertion of the super class's definitions. Try following this logic in to give a meaning to block-local import statements in Java, and you'll see some interesting language extensions.) Thus, we look for definitions in an inner class's super-class chain before consulting the next-most-inner class's super-class chain, and so on out to any top-level super class chain. As Gilad points out, this has been called "comb semantics" for identifier lookup, because a comb-shaped (single-spline) tree is being searched.A decade ago, I briefly contributed to this topic in an aside in the original whitepaper from Sun:Sometimes the combination of inheritance and lexical scoping can be confusing. For example, if the class E inherited a field named array from Enumeration, the field would hide the parameter of the same name in the enclosing scope. To prevent ambiguity in such cases, Java 1.1 allows inherited names to hide ones defined in enclosing block or class scopes, but prohibits them from being used without explicit qualification. [Sun, Inner Classes White Paper, 1997]This rule has been dubbed the "Mother May I" rule, because when a programmer uses a name with this ambiguity (between outer and super), the compiler reports an error, and requires the programmer to supply a more specific intention, by providing a qualifier which refers either to the outer scope or the inherited scope. (If there is no such qualifier possible, the intended variable is a local, and it may therefore be renamed to remove the ambiguity.) Instead of using the classic total order (the comb rule) to resolve the ambiguity, the compiler refuses to pick between the alternatives, and (with a mildly annoying maternal nudge) requests the programmer to make an explicit choice.This rule rejects ambiguity by refusing to totally order the choices. It is something like Java's original rule for local variable shadowing. Java does not allow two definitions of the same local variable, when one is in the scope of the other. The error is reported when the inner definition is parsed, but the rule can also be thought of as a refusal to linearly order the two definitions, so that any use of the name in the shared scope fails to match to a uniquely most specific definition.The "Mother May I" rule was enforced in the 1.1 through 1.3 versions of the Java language. It was not always perfectly enforced in 1.1; see for example the 1997 Sun bug 4030368 against javac, which includes a concise code example. If you try this example in version 1.3 or earlier, you'll get a message like this:InheritHideBug.java:10: m() is inherited from InheritHideBug.Sand hides method in outer class InheritHideBug. An explicit'this' qualifier must be used to select the desired instance. m(); // BAD \^In 1.4 or later versions of javac, the compiler will quietly use the total order from the comb rule, and (like the kind of mother who puts your things away for you) selects the inherited method as the more desirable choice. Is this a recurrence of bug 4030368? Not exactly; the "Mother May I" rule defined in 1.1 was quietly removed in 1.4 when generics were introduced. I disagreed with this choice when it was made, but I was working on different things (the Hotspot JVM) at the time, and thus was born a small subspecies of Java puzzler.I ran into this subspecies again a few months ago, during a visit to Sun by Neal Gafter, who cited it as an interesting source of bugs arising from inner classes, a problem that can be repaired by closures. (As fate would have it, this was in Sun's "Mother May I" conference room. I am not making this up.) I rejoined that this used to be a solved problem.I'm glad that Gilad has taken up this problem, and I agree (since 1997) with his contention that, in a conflict between an unseen inherited name and a visible (or global) definition in an outer scope, the programmer usually wants the uplevel name, not the inherited name. As his current paper's conclusion says,We argue that the classic "comb" semantics whereby inherited members may obscure lexically visible ones are counterintuitive. Raising this issue is itself a contribution of this paper. Beyond that, we have demonstrated a solution in the context of Newspeak, a language derived from Smalltalk. We believe that lexically visible definitions should be given priority over inherited ones, either implicitly via scope rules or by requiring references to inherited features to explicitly specify their intended receiver. [Bracha, On ... Inheritance and Nesting, 2007]I contend that the "Mother May I" rule (an instance of the last option he mentions) provides the best way to simply inform the programmer of lurking ambiguities, and to protect references to uplevel names from being silently captured by superclasses.You may be wondering why Sun took out this rule in 1.4, about five years ago. The answer is also stranger than I could make up. Gilad was responsible for writing the updated specification for Java which included generics. (As both a language wonk and a programmer I enjoy the brilliant work he did adding generics to Java.) He removed the rule from 1.1, apparently preferring the simplicity of the comb rule. When I asked him to reconsider, he refused. At this point the term "Mother May I rule" was first applied to this problem, though each of us thinks the other coined this usage. (Gilad recalls that I used that phrase generically for a variety of rules that Java imposes on programmers for their own protection--whether they want it or not. For my part, I prefer to think in terms of ambiguity hazards and having the compiler press the programmer for clearer formulations.)It is good that Gilad has taken a deeper look at this problem, and that his new language will not suffer from Java's problems in this matter.Better yet, thank you, Gilad, for re-raising the issue. I see that it's past time to file a bug and ask for the return of the "Mother May I" rule in Java.

Or, combing out a stubborn tangle. Gilad Bracha has published a nice paper reviewing an interaction between inheritance and block scoping, specifically the problem of deciding whether a...


tuples in the VM

Or, juggling with more than one ball. Introduction For several years there have been ideas afloat for marking Java classes as "value-like", and treating them like (named) structs. This seems to be a necessary part of an initiative in Java to support types like Complex necessary to numerical programming. There have also been proposals for pure "tuple" types (e.g., ordered pairs (T,U) for any types T, U). This note proposes some incremental changes to the present JVM architecture which would support value-like classes and tuples.The basic semantics of value classes in Java can be summed up as (a) a marker of some sort that distinguishes value-like classes from ordinary (identity-like) classes, and (b) reinterpretation of all operations in the Java language to ignore the identity of instances of the class, and instead operate on the field values as a group. Such a group of field values can be modeled as (compiled to) a tuple. In order to support name equivalence, the tuple needs to be joined (at compile time) by a "tag" which distinguishes it from similar tuples representing differently-named classes. JVM Support for Tagged Tuples We propose a tagged tuple signature as a new kind of signature string, consisting of a class signature followed by a series of zero or more arbitrary type signatures (including nested tagged tuples). Every tagged tuple signature is surrounded by curly braces '{}' which unambiguously separate it from any nearby signature parts. signature = ONEOF "ZBCHIFJD" | 'L' classname ';' | '[' signature | tagged_tuple | plain_tuple tagged_tuple = '{L' classname ';' signature\* '}' plain_tuple = '{V' signature\* '}'A tagged tuple represents a single value of the given class (which might be a value class), represented as a series of typed values grouped in a "tuple".A plain tuple is like a tagged tuple, except that there is no value class mentioned at the head of the signature; the character 'V' holds its place.The character 'V' cannot occur within any type signature except at the head.Like Java 1.5 generic type signatures, tagged tuple signatures have an "erasure" which removes type structure and normalizes down to a JVM 1.1 standard signature. The erasure of a tuple (of either type) omits the curly braces and the type signature component immediately after the open brace. ERASE "{Ljava/math/Complex;DD}" == "DD"Unlike Java 1.5 generic type signatures, tuple markings are significant to the linker and verifier. Two methods with the same class, name, and erased signature are different if their unerased signatures differ. Methods of distinct unerased signatures cannot link to each other, simply because their signatures differ, as Utf8 strings.This significance means that tools which manipulate method and field descriptors will need to be updated. The cost of such tool updates is probably the major objection to an enhancement of the basic signature language (as opposed to the creation of a second language, as in generics). It seems that any scheme that supports multiple-value returns will always require tool updates, so it may not be possible to move the enhanced signatures to another attribute, as was done with generics. Method Calls The method invocation bytecodes ('invokestatic', etc.) treat tuple types specially. When applied to a method whose signature contains tuples, an invoke performs the same motions of stack data as it would if calling a method with the erased signature. For example, the following two methods have different signatures but the same calling sequence: invokestatic pow ({Ljava/math/Complex;DD}I)Ljava/math/Number; invokestatic pow (DDI)Ljava/math/Number;In both cases, two doubles and an int are popped from the caller stack, and the returned reference (of type Number, heap-allocated) is pushed on the stack. (Note that Complex could also be returned on the heap. The VM is neutral as to stack or heap allocation policies of value classes.)If the erasure leaves an empty string for the method return signature, the string "V" is used instead. If the erasure leaves more than one type signature for the return, the caller receives multiple values on the stack. (A new 'vreturn' bytecode is required to generate these values.) Method Entry and Exit Likewise, when a method with a tuple-bearing signature is entered, the values stored in the locals are of the same placement and type as if the method's signature were erased of tuples.If the method's return signature erases to the null string, the 'return' bytecode must be used to exit the method. if the method's return signature erases to a single signature, that signature (which cannot be 'V') determines the type of return bytecode required. If the method's return signature erases to two or more type signatures, the method must use the 'vreturn' bytecode to exit.The 'vreturn' bytecode is always a valid exit instruction, for any method whatsoever. It operates as follows:the returning method's return signature is inspectedthe return signature is erased of tuplesif the return signature is 'V', it is taken to be emptyFinally, for every individual value signature in the resulting signature, in right-to-left order, a corresponding typed value is popped from the method's stack. These values are transmitted back to the callee, and pushed onto the top of the callee stack in the same order that they were stored on the method's stack.Note that 'vreturn' potentially renders all other return instructions obsolete. Object Field Layout The following is an optional feature. It can be simulated by emitting multiple scalar fields for each tuple field in a class definition. Supporting this feature will complicate some reflective operations.The binary representation of a class can contain a field with a tuple signature. The effect of such a signature is to allocate multiple variables for the field, one for each type signature in the erasure of the signature. These variables are stored at JVM-dependent locations; there is no guarantee that they occupy contiguous storage, though they might in some VM implementations.The bytecodes allow these variables to be read either as a group or as individual scalar components.(The simulation of this with standard JVM fields would replace a field 'phase' of signature '{Ljava/math/Complex;DD}' by fields named 'phase.0' and 'phase.1' with signature 'D'. In order to preserve type checking of tuple tags, the signature must be decorated by appending the tuple type itself, yielding 'D{Ljava/math/Complex;DD}' or 'D{Ljava/math/Complex;DD}'.) Scalar Field Read and Write The getter and setter bytecodes ('getfield', 'putstatic', etc.) continue to operate on single values, either of primitive or reference type. When applied to a field with a tuple signature, the field signature must contain an additional decimal numeral. This numeral N selects the field variable corresponding to the Nth value (zero-origin) in the erasure of the field's tuple type. (Note that nested tuple signatures flatten into a single list of primitive and reference types.) The signature of the bytecoded reference (minus the numeric suffix) must agree with the signature of the field itself; the verifier or linker checks this.Thus, the following field references access the two double variables of a complex field of signature '{Ljava/math/Complex;DD}': getfield phase {Ljava/math/Complex;DD}0 getfield phase {Ljava/math/Complex;DD}1 Tuple Field Read and Write The following is an optional, complex feature.There is a new instruction prefix 'vaccess' which may precede any of the four getter or setter bytecodes. The effect is to force the assignment to operate on all the variables of a field, pushing onto or popping them off of the stack. This prefix is provided as an abbreviation, and a bytecode prefixed by 'vaccess' is precisely equivalent to a similar series of scalar accesses, except that the field's unerased type must match the bytecoded reference signature. The order of the equivalent series of accesses is (as one might expect) in order from zero.The bytecoded reference signature may be either erased or an unerased tuple. In the former case, the signature is compared against the erased type of the field. In the latter, the signature must match exactly.Thus, the two field references of the previous example would be equivalent to the following single reference: vaccess getfield phase DDAll previous accesses would also be valid for a field of type '{LPoint;DD}'. However, an access with the unerased signature would signal an error for tuple field not specifically marked as Complex: vaccess getfield phase {Ljava/math/Complex;DD} Tuple Data Transfer There are also tuple-related variants 'vload' and 'vstore' of the local variable access bytecodes. They take a signature constant pool reference in the bytecode stream, before the local number (which is always in 2-byte width). They transfer a number of typed values between stack and locals; the precise sequence of values is determined by the signature named by the constant pool reference, which must be a CONSTANT_Utf8 string.These bytecodes are provided as an abbreviation, and can be exactly simulated by a series of scalar loads or stores. JVM Neutrality It is important to note that the JVM makes no distinction between value and non-value classes. Most processing of tuple type signatures depends only on the erasure of those signatures. The chief exception to this rule is that method and field linkage requires an exact match of unerased signatures. However, all calling sequences and object layouts are driven by erased signatures.The JVM makes no restriction on simultaneous manipulation of both heap-allocated instances of classes and their tuple representations. It is likely that the language will contemplate conversion between these representations but the JVM has no specific support for them.The JVM allows bytecoded operations which maybe inappropriate to value objects. There is no need to reinterpret these operations for value classes; it is enough for the compiler to refrain from emitting them. For example, reference comparison bytecodes are probably useless with value objects, if their semantics are intended to be free of object identity. But there is no need for 'acmp' bytecodes to call Object.equals. Java Reflection The Java Reflection API works mainly at the language level, and so would follow whatever conventions (if any) were settled for processing of tuples in standard Java. Historically, much of reflection has been implemented in side the JVM, but implementation of reflective access to tuples would probably be implemented wholly in Java, on top of low-level unsafe operations and/or dynamically generated adapter classes.It is likely that the tuple-enhanced language will provide a canonical translation between groups of typed values and references to heap objects carrying those values. For example, each value class is likely to support a constructor whose argument signature is the tagged tuple for that class. A tuple is likely to be representable on the heap as a fixed-length Object array or as a generic immutable utility class, with primitive component values in wrapper classes (as in varargs). The JVM per se does not need to decide such matters, but the reflection API must follow the lead of the language, and properly convert raw tuples into the appropriate boxed representation, when passing values into and out of methods, or when loading and storing field values. (Recall that the Java Reflection API uses 'Object' as a universal wrapper for values of all types; this should extend to both tagged and plain tuple types.)Any low-level unsafe API for making field and method references is likely to ignore tuples, and operate only on erased types. The 'field offset' query functions of sun.misc.Unsafe must be extended to report a distinct offset for each scalar component of a tuple-typed field; this requires a variant for each query functions which takes an integer parameter selecting the Nth tuple component of a field. Notes on Compiling Java with Values and Tuples The javac compiler should strive to keep value objects and tuples in "exploded" form and only box them in the heap when a conversion requires it.There will be occasions where the compiler will want to keep track of both representations at once, to avoid repeated conversions.An assignment to a pure tuple component is likely to require the compiler to forget a cached box.Sometimes a boxed value or tuple will "escape"; for example, it might be the argument to List.add(Object). If the relevant value is a purely final value class, the compiler may continue (as a matter of arbitrary choice) to cache the box and reuse it later. Otherwise, it must "give up" the escaped box, and reconstruct a new one if a further coercion is required. The reason for this is that an escaped value object with non-final fields is subject to mutation.Value objects with final fields and those with non-final fields are both reasonable. To preserve value semantics, their usage patterns will differ, since the identity of mutable objects is observable in more ways than the identity of immutable objects, and (one presumes) any language design will strongly insulate the programmer from perceiving the identities of boxed value objects.Mutable value objects will have the possibility of aliasing, leading to the observation of unintentionally shared changes. The language should work hard to reduce or even eliminate aliasing, by defensively copying tuple values carried by boxes. For example, this code might end up reallocating: List<Complex> l1, l2; Complex c = l1.get(0), c2 = c; c2.imag = 0; assert(c != c2); // no sharing here l2.add(c); // defensive copy necessaryOn the other hand, the following code might mutate a box in place: l1.get(0).imag = 0;This is only reasonable if there is some theory under which List elements do not alias. This complicates generic algorithms like Collections.nCopies, which freely replicates references to list components. (If mutable value objects are tagged by an interface which includes a "defensive copy" operator, the generic algorithms can check for this.)On the other hand, componentwise assignment to immutable types makes sense, as a shorthand for reassigning the whole tuple with one differing component. Suppose Complex is immutable; then the last two lines are equivalent: Complex k = Complex(3,4); k.imag = 3; k = Complex(3,4); A Bonus Signature Hack: Setter Methods The desired to do componentwise update on immutable objects immediately leads to a requirement for more formal correspondence between getter and setter functions, so that the following equivalence can work: l1.get(0).imag = 0; // short for and same effect as: Complex tem = l1.get(0); tem.imag = 0; l1.set(0, tem);The challenge here is that the correspondence between methods like List.get and List.set is documented only in human-readable form, or perhaps in some framework built on top of the language (such as Java Beans). In order to define "in place update" of values obtained via an access method, there must be a more formal correspondence established.This correspondence can be done in the JVM with signatures also: get (I)Ljava/lang/Object; get (I=Ljava/lang/Object;)V set (ILjava/lang/Object;)VThe latter two method descriptors would be (for List) made equivalent by some as yet unspecified means, and classes would be allowed to declare setter methods that are formally coupled to getter methods.Once again, the JVM could largely ignore the extra decoration by using an erasure convention (which would eliminate the '=' in this case). The presence of the extra character would simply disambiguate the setter from some other getter.The language needs to supply a way to declare these setter methods, something like this: void get(int i, = Object x) { this.set(i, x); } void foo() { this.get(23) = "value #23"; }Setter methods for immutable objects would need to return the updated version of self: BigInteger testBit(int n, = boolean z) { return z ? setBit(n) : clearBit(n); } void foo(BigInteger b) { b.testBit(23) = true; }A alternative (less clean) technique would be to decorate the method name itself, like this: void 'get='(int i, Object x) { this.set(i, x); }There is a significant retrofitting problem here (but not an unsolvable one) with all old Java APIs that feature getters and setters. A method aliasing facility, based on optional class file attributes, might be helpful.(Note: This proposal dates from 8/2004.)

Or, juggling with more than one ball. Introduction For several years there have been ideas afloat for marking Java classes as "value-like", and treating them like (named) structs. This seems to be a...


tail calls in the VM

Or, how to chase one's tail without throwing up. Introduction Tail-calling is an old idea; see the MIT “Lambda the Ultimate” papers. It is usually treated as an optimization. If A calls B calls C, and the last action of B is to call C, then the machine resources (such as stack) are adjusted so that it is as if A had directly called C. In effect, B pops its stack frame back to A, pushes arguments to C, and jumps to C, so that when C pops its arguments and returns, it branches directly back to A.This optimization can save instructions, stack space, and instruction cache activity, so it is often found in compilers. But as long as it’s only an elective optimization of what’s really just a recursive call, it matters only as a performance hack. Call this hack “soft tail call”.On the other hand, a “hard tail call” is a guaranteed tail call, one not only allows the compiler to improve the call site (from B to C then back to A), but requires the improvement. Some languages, notably Scheme, Haskell, and Fortress, feature hard tail calls, and use them to form iterative constructs, threaded interpreters, and other forms of control flow delegation. JVM Support for Hard Tail Calls We propose a new bytecode prefix called tailcall, which may precede any invoke instruction. This prefix has no other action than to require that the following instruction constitute a tail call, and be implemented as a hard tail call.The byte code for tailcall is numerically the same as the wide bytecode.The verifier statically checks the following properties:The callee method’s return signature is identical with the caller method’s return signature.The invoke instruction is immediately followed by a return instruction.No exception handlers apply to the invoke instruction.The caller method is not synchronized.The caller method is holding no object locks. (It has executed sufficient monitorexit instructions to undo the effect of any monitorenter instructions.)The callee method is accessible and linkable from the caller method.The JVM executes the invoke instruction as if the tailcall prefix were not present, except that the JVM discards the stack frame for the caller before executing the callee. The following return instruction is never directly executed; it is present only to simplify code analysis.The effects of removing the caller’s stack frame are visible to some APIs, notably access control checks and stack tracing. It is as if the caller’s caller had directly called the callee. Any privileges possessed by the caller are discarded after control is transferred to the callee. However, the linkage and accessibility of the callee method are computed before the transfer of control, and take into account the tail-calling caller.Removing the caller’s stack prevents StackOverflowError conditions which would otherwise be eventually caused by iteratively tail-calling one or more methods but never returning.Because of the difficulty of ensuring correct caller access, the following methods may not be the subject of a hard tail call. In particular, Method.invoke simulates access checks which properly occur when the caller links to the callee, and therefore requires the caller to be visible on the stack.java.lang.Class.\*java.lang.reflect.Method.\*java.lang.reflect.Constructor.\*java.security.AccessController.\*(what else?) Soft Tail Calls The JVM may also support a soft tail call optimization, but this must be invisible to programmers. In particular, stack traces and access control checks must take the caller into account, even if the callee has been soft-tail-called.

Or, how to chase one's tail without throwing up. Introduction Tail-calling is an old idea; see the MIT “Lambda the Ultimate” papers. It is usually treated as an optimization. If A calls B calls C, and...


Truth, Beauty, Machine Code

I was struck by Roger Penrose's words in his recent book The Road to Reality, as he describes the difficulties physicists face in evaluating each other's mathematical accounts. It sounds like he knows the difficulties we software designers face:...Mathematical coherence [let alone mathematical beauty] need not itself be readily appreciated. Those who have worked long and hard on some collection of mathematical ideas can be in a better position to appreciate the subtle and often unexpected unity that may lie within some particular scheme. Those who come to such a scheme from the outside, on the other hand, may view it more with bewilderment, and may find it hard to appreciate why such-and-such a property should have any particular merit, or why some thing in the theory should be regarded as more surprising--and, perhaps, therefore more beautiful--than others. Yet again, there may well be circumstances in which it is those from the outside who can better form objective judgements; for perhaps spending many years on a narrowly focused collection of mathematical problems arising within some particular approach results in distorted judgements!(Road to Reality, page 1014f)It may be enough, for some computer scientists reading that quote, just to substitute "Scheme" for "scheme".Unlike physical models which are useless apart from experimental validation, software systems can manage and describe anything that people care about, whether it's material or not. But, given any particular goal, software designers face similar problems as physicists. They must effectively and robustly formulate the behavior of their systems. This requires formal systems of notation called computer languages ("code"), which have a strongly mathematical feel to them. And it is often not enough to hack together some code which gets the job done: The code must be correct, robust, and maintainable. (Correct obviously means free of defects, operating as advertised. Robust means trustworthy, solid, even when under stress. Maintainable means suitable for further modifications, scrutable to maintainers.)To be correct, robust, and maintainable, software must communicate clearly and effectively, and not just with the machine it runs on. It must communicate with the programmers who are responsible for its continued use, and (through its user interface) with its customers. If there is a flaw in the code, the flaw should not be hidden by the code's murky depths, but should be apparent to the programmer; it should have nothing to hide. If the code's behavior is to be simple and trustworthy, the code itself should appear so. Complex code usually has complex quirks. Code should also be orderly, not leaving the reader guessing what comes next. It should be simple, saying one thing at a time, and saying it well.What kind of code is communicative like this? Clear, simple, coherent, expressive code. In a word, beautiful code. So why don't we just require programmers always to write beautiful code? Or physicists always to choose beautiful theories? Sometimes because beauty is at odds with truth or utility, but more often the three travel together. That is where the mysteries begin, of the ideal versus the subjective, the universal versus the parochial or the arbitrary. Who is in charge, the language or the poet? Which should we follow, the angel or the siren, and which is which... or should be suspicious of any guide more handsome than ourselves? I salute Roger Penrose for attempting to pull such stubborn mysteries into the light.A few more notes on the book: Driven by a weakness for audacious science popularization, I bought The Road to Reality: A Complete Guide to the Laws of the Universe by Roger Penrose. Eating dessert first, I found that the last chapter is a stimulating group of essays on physics and its future, and the second-to-last chapter could be a whole book in itself, on Penrose's favorite subject of twistors. It's a pleasing book to take off the shelf for a quick ramble on the heights, sort of a cross between an Encyclopedia of Mathematics and the Feynman Lectures (or Feynman's elementary talks in QED).Dr. Penrose frames the whole account in a series of meditations on what he calls the "three deep mysteries". These meditations occur at the beginning and end of the book, and concern the pairwise correspondences between the worlds of matter, mentality, and mathematics. By avoiding a premature reduction to any one of those worlds, he has a better than average standpoint from which to view other mysteries, notably the connections between beauty and truth. He acknowledges that an esthetic sense, an appreciation for beauty, is often helpful to the physicist in search of mathematical models. This is true because mathematical beauty seems to be more than merely a matter of taste, or a subjective emotional response. What more? Nobody knows; it's hard enough to get a handle on what mathematical truth is. But we experience both truth and beauty every day.In his discussions about the importance of notation, he tells some interesting stories about better notations which were beaten by worse. (The choice of good notation is a crucial problem for both mathematicians and programmers. The story of computer science is in part the story of one language replacing another over time, sometimes for better, sometimes for worse.) On page 1019, we find that Bruno Zumino used the "2 spinor" notation to publish an idea, only to find that a later publication of the some idea using the more complex but established "4 spinor" notation became the standard reference. Zumino's resolution for next time was, obviously, to publish using XML instead of Lisp. (Sorry, "4 spinors" instead of "2 spinors".) The saddest part is that "4 spinors" were introduced in 1928 by Dirac, the improved "2 spinors" just a year later 1929, and Dirac himself was using the improved notation by 1936. But the poorer notation was entrenched. Zumino's experience was decades later, in the 1970's!

I was struck by Roger Penrose's words in his recent book The Road to Reality, as he describes the difficulties physicists face in evaluating each other's mathematical accounts. It sounds like he knows...


Longjumps Considered Inexpensive

Whenever the conversation about full closures (on any platform) gets to a certain level of detail, the question of closing over branch targets comes up. For example, we know what break means in a regular for loop. But suppose that loop is expressed as an internal iterator; that is, suppose the loop body is packaged into a closure and passed to a loop-driver method. Then the same break statement has to do something scary to return control past the loop-driver method.Every Java programmer know that the natural way to express this sort of early exit in the JVM is to throw an exception, and programmers are also taught to use this feature sparingly, because it is expensive. Back in the day, C programmers called this a “long jump”, or longjmp, and they too were warned that it could enfeeble their optimizer.But the conventional wisdom needs adjustment here. The most costly part of exception processing on the JVM is creating the exception, not throwing it. Exception creation involves a native method named Throwable.fillInStackTrace, which looks down the stack (before the actual throw) and puts a whole backtrace into the exception object. It’s great for debugging, but a terrible waste if all you want to do is pop a frame or two.In fact, you can pre-allocate an exception and use it as many times as you want. One JVM benchmark a decade old (named “jack”) uses this technique in a parser to implement backtracking. The Hotspot JIT can optimize a throw of a preallocated exception into a simple goto, if the thrower and catcher are together in the same compilation unit (which happens because of inlining). Since “jack” is widely used for JVM bragging rights, it is likely that every commercial-quality JVM has a similar optimization in its JIT.A similar technique, not so widely used yet, is to clone a pre-allocated exception and throw the clone. This can be handy if there is information (such as a return value) which differs from use to use; the variable information can be attached to the exception by subclassing and adding a field. The generated code can still collapse to a simple goto, and the extra information will stay completely in registers, assuming complete escape analysis of the exception. (This level of EA is on the horizon.)Here is a concrete example of non-local return, implemented efficiently via a cloned exception. I have observed the Hotspot server compiler emitting a machine-level goto for the throws. With current server VM optimizations of this code, the cost of a non-local return is less than three times the cost of a plain (local) return; with the client VM the ratio is ten. See the code example and benchmark for details: NonLocalExit.java (javadoc).The exception cloning technique is not widely used in part because Object.clone is not optimized. But an upcoming build of Hotspot fixes this by making clone an intrinsic which simply does a variable-length object allocation and a bulk copy. See the evaluation (mine!) of 2007-04-13 on bug 6428387. Even in existing deployed VMs, the cost of cloning is often less than the cost of walking the stack.The bottom line is that languages on the JVM (including Java, if that’s the direction it goes) can implement non-local returns reasonably directly, and the JITs will turn them into local jumps in important common cases. In particular, when an internal iterator is properly inlined, a break in its body closure will not surprise the programmer with a performance pothole. In fact, it will compile to about the same code as the equivalent old-style for loop.

Whenever the conversation about full closures (on any platform) gets to a certain level of detail, the question of closing over branch targets comes up. For example, we know what break means in a...


JSR 292 Deep Dive Notes

5/09/2007, 2:00-3:00, Westin Civic CRAttending:John Rose (host, Hotspot VM)Oti Humbel, Charlie Groves, Tobias Ivarsson (Jython)Thomas Enebo, Charles Nutter (JRuby)John: JSR 292 needs insight into pain points implementing dynamic langs. Our basic thought: Method invocation in the JVM is too Java-specific. We probably need one more kind of invocation which is not type-bound, and allows systems to adapt to mismatches between caller and callee signatures. Other requirements may appear flowing from that basic fact.Question: Are we missing any Big Idea for dynamic lang implementation?Jython notesgeneric call ops implemented on all PyObject (concrete)callee looks at a sequence of desired typesJava objects are wrapped in Jython proxiesProblem: Emulating Python call frames. There is a “current frame” API, which is infrequently used but costs on every call. (Cost includes a ThreadLocal.get.)Possible solution: Use JVM’s debugging interface to get a bytecode-level view, and have Jython’s byte compiler save away its own local variable mappings. Questions: Would this work in the current JVM? Where are the pain points? Do we need adjustments to the JVM functionality? Or is the current API OK?(Note that the JVM already has “vframes” and deoptimization to manage the correspondences between optimized frames and the bytecode level VM model. Need some sort of user-level vframes or deoptimization requests? Hopefully not.)JRuby notesImplementation tricks:numbers: can unbox inside a method, rebox on out-callsselector-table-index (callee + selector index) allows switch-based method selectionjava objects are wrapped in Ruby proxiesCross-languge interoperabilityCharles N.: ideally Python sees Ruby strings as Python stringsList (either Ruby-list or generic) as receiver of Ruby append request:Pain points in method generationFeels expensive and complex to wrap one method per class. (Sometimes we can batch several methods at once, but not always.) Charles N: want to be able to have O(10\*\*5) of small dynamically generated methods JRuby has a JIT which makes small classes => want small-scalling method loadingCurrent VM overheads for one-method classes:class naming (garbage names clog symbol table)class loader (to get GC-ability, need lots of these too)Possible VM trick: Reuse generated methods by adding extra parameters and currying those. E.g., Object.get(int, extractList:{Object=>List}). The VM can simulate this with inner classes, or it might be worth building into the methods themselves, as they are called by invokedynamic.Wrappers vs. no wrappersCurrent implementations (Jython, JRuby, not Groovy) use wrapping in various places to manage variations in object format and calling sequence. Argument lists are wrapped in object arrays. Numbers are wrapped. System types (like List, String, Integer) are sometimes wrapped as “foreign pointers”. Stack frames are even wrapped! (To provide debuggability; very expensive.) Language implementors would prefer to avoid wrappers.One other major need for wrappers: Languages usually have a “top type” so that all language operands can be verified as handling the common methods of the top type. (Ex: PyObject, which is a concrete class.) An invokedynamic opcode can relax this requirement, allowing a more Groovy-like direct treatment of foreign objects.Some unwrapping is already done in language implementations:Jython unwraps argument lists by using a schema of call1, call2, etc.JRuby and Rhino unwrap fixnums by doubling arguments.Can unwrap values inside the compilation of a single method.Groovy unwraps routinely, uses a marker interface for MOP extensions.Disadvantages of wrappers:inefficient: extra indirections & type dispatcheshard for the JIT to unwrap (therefore harder to do classic C-style optimizations)complex; each language has to re-invent a full suiteinteroperability requires common wrappers or re-wrapping (real use case: Jython and Rhino)Dynamic languages cannot abandon all wrappers, because there usually must be a fallback to a fully general calling sequence.Long-term goal is to be wrapperless. (Compare early Smalltalk and Java implementations with “handle tables”.)Hard parts to emulate with wrapperless objectsJRuby, Jython can add a method to an object (method added to new metaclass, wrapper metaclass changed)some languages need to freeze objectc (without changing object identity?!)some need to taint values (even scalars, but an identity change is OK for those)Handling numbersTricky bits:generic arithmetic seems to require multiple dispatch(or, cascaded single dispatch with anonymous intermediate selectors)want fast paths for int-friendly operations like List.get, (x+1)overflow semantics differ: go to long, double, BigInteger, etc.need slow paths to prepare for rare events like overflow and unexpected operand types (e.g., taints)Optimization goal: Make small integers (indexes) look like ints to the JIT. JVM ints get optimized better: range check elimination, subtype analysis, loop unrolling, etc. All primitive types get preferred treatment with native CPU registers and instructions.View from the VM and JIT(This is according to John, who is looking for comments and corrections.)We want invokedynamic (like other invokes) to be as direct as possible in more than 99% of the calls. This allows your typical good JIT to do its tricks of optimistic receiver prediction, procedure integration, and (then) cross-call optimization.Typical scenario: First use of invokedynamic instruction has an optimistically assigned caller signature. In the 80% case, this matches a signature directly on the receiver object and the call passes type checks. (Why? Because compile-time signatures are carefully chosen, base on actual usage patterns. Hopefully not just because great programmers are lucky.) In that 80% case, no further efforts are needed; the VM checks a class or two and branches directly.In the 20% case, the invokedynamic instruction has to do something like dynamic linking. Each language has already planted a handler on its own call sites, probably a class-valued per-class attribute. The handler is invoked, and the dynamic lang runtime does something slow and complicated. It matches up the caller’s intended signature and the dynamic types of the arguments with the callee’s capabilities. The runtime handler then comes up with an adapter that accepts the caller’s signature, but then checks, shuffles, reformats, and coerces argument types to the callee’s preferred entry point signature. The adapter is passed down (as a sort of method pointer TBD) to invokedynamic, which remembers it. The call goes through, and so do the next 1000 calls, using the same adapter every time. When the VM’s JIT kicks in, it integrates the adapter along with all the other bytecodes.Note that the caller can really win if it guesses rightly which arguments are strongly typed, e.g., as unwrapped ints. In the case of an out-call to a Java object, the dynamic language compiler can often guess the right type. (Maybe List.get(int).) In such cases, the caller’s signature and callee’s signature agree, and the only missing bit is a type check on the receiver. The invokedynamic instruction can determine that the matching method is present on the receiver type, and make the call. It can also cache (as is routine in JVMs) the winning type, and avoid the method lookup on subsequent calls.If an unexpected receiver or argument ever shows up, the invokedynamic instruction notes the surprising types (or perhaps the adapter fails one of its checks) and the language runtime handler is invoked again, and produces a new adapter. Perhaps the JVM’s implementation of invokedynamic remembers both the new adapter and the previous one, or perhaps it only remembers the newest one. It’s a quality issue, not a correctness issue. In principle, the JVM could call the language’s runtime handler on every call, but that would be bad form.In the worst case, every single call is unique and has to be handled with a slow lookup through the language’s metaobjects. In such a case, the adapter passed back down to invokedynamic is probably something excruciatingly general, like the current generation of implementations. It is so general it never again calls up to the runtime handler, but is prepared to handle all calls on its own account. No optimizer will touch it. (The VM equivalent to this is called a “megamorphic call site”.)

5/09/2007, 2:00-3:00, Westin Civic CR Attending: John Rose (host, Hotspot VM) Oti Humbel, Charlie Groves, Tobias Ivarsson (Jython) Thomas Enebo, Charles Nutter (JRuby) John: JSR 292 needs insight into pain...


Whither JSR 292?

As of March I'm the chair of JSR 292. Judging by individual contributions and comments I've received, the committee is eager to start digging into details.  But we have not yet done so in a coordinated way.  Part of the problem is problems with the JCP website, and part is me having to shovel other stuff off of my plate at Sun.The committee is rather small, and composed largely of dynamic language implementors familiar with the JVM. I really like this setup. I expect we'll be able to validate most of our decisions experimentally, on a "prototype as you go" basis.The most likely work this calendar year includes a specification and prototype for a first version of invokedynamic, and a demonstration of the benefits for a few major dynamic languages.If all goes well, the invokedynamic instruction will provide a fast path for common method invocation patterns in several languages.  It will be low-level enough to be optimizable by the JVM JIT, yet have enough hooks to support required dynamic features like dynamic typing and method replacement.  These hooks are not really new to the JVM, since it's already the case that Java dynamically types calls based on receivers, and can deoptimize calls in response to class hierarchy changes.  But the dynamism will no longer be restricted to the Java language and class system.The basic theme (as I see it) is to unleash the JVM's existing compilation and optimization capabilities, on behalf of dynamic languages other than Java. It seems to me (as a longtime Hotspot JVM implementor) that you can go a long way beyond with the existing bag of tricks, which (at least for Hotspot) includes type profiling, semi-static analysis, lazy compilation, inline caches, splitting between fast paths and slow paths, and (when the real world disappoints the compiler) deoptimization.Less likely work includes new JVM support for autonomous methods and/or method handles, which are currently simulated by inner classes. and/or reflection. If the current simulations scale well and are sufficiently optimizable, we won't need to tackle this.Even less likely, but still on the table in case prototyping forces us to consider them, would be additional features for supporting languages. These might include runtime support for language-specific deoptimization, a JVM type other than java/lang/Object which means "dynamic", type profiling hooks, JVM class or object surgery, or relaxation of JVM limits on identifier spellings.Work that is unlikely includes anything that we can reasonably simulate above the level of the JVM without losing optimizability. For example, we're not likely to embed JavaScript method lookup rules into the JVM; it's enough to provide a fast path for most JavaScript calls, and an associated slow path to take when a call site needs extra attention from the JavaScript runtime support. Likewise with Ruby and the other languages.In many dynamic languages, you can add or change methods on classes, or change the class of an object, or (as in Self) add a method to a single object. This is very cool and powerful. It is probably unrealistic to add this flexibility to the JVM's idea of objects and classes. (Some debugger hooks allow such things, but they usually require optimizations to disabled.) I expect that languages which require these features will be able to simulate this flexibility (as they do on non-virtual CPUs) by building their own metaclass structures to keep track of things, and adding lightweight guards to call sites as needed.I hope there will be open-source work on common runtime facilities on the JVM for dynamic languages. We can put our best tricks into it, and make sure it works well with all features of the JVM, both old and new. But runtime support APIs will not be part of JSR 292, except for the bits which absolutely cannot be disentangled from the JVM itself. The only example of this I can think of now is a query which says whether the JVM version supports JSR 292.

As of March I'm the chair of JSR 292. Judging by individual contributions and comments I've received, the committee is eager to start digging into details.  But we have not yet done so in...


Autonomous Methods for the JVM

Byte-coded methods in the Java Virtual Machine are perfectly suited for their main role today, which is to implement (more or less directly) methods defined in the Java Language, version 1.0.Java classes and methods do not directly implement free-standing chunks of code found in other languages, such as function pointers, closures, delegates, or extension methods. As in the cases of inner classes, they can do a reasonable job of implementing such design patterns. But there are numerous overheads, in CPU cycles, memory, load time, application size, and (most importantly for language implementors) architectural complexity.Because the JVM is a highly successful investment in efficient and robust support for byte-coded methods, it is worth while looking at teasing apart JVM methods from JVM classes. If this can be done well, the strengths of the JVM can be applied to programming tasks (scripting, functional programming) beyond the current scope of Java.So, here is a design sketch for autonomous methods (AM) in the JVM. It is not complete, but should be suggestive of new ways to support languages beyond Java. The names Autonomous Methods and AM are pretty ugly, but all the good names for such things seem to be taken; if a proposal such as this catches on, it will naturally have to hijack a good name.What's in an AMAn AM has an optional name, a type signature, and an optional receiver. These work the same as for a regular "class method" (CM). Unlike a CM, the receiver type of a non-static AM is part of the AM's signature, and can be any type. Allowed method modifiers are 'static', 'strict', and 'synchronized'. An AM is always 'public' The newer modifiers 'synthetic', 'bridge', and 'varargs' are also allowed. Other modifiers are reserved for future use.An AM is associated with a class, even though it is not defined when that class is loaded. The AM has the same access rights as any CM in the associated class, and this is the only effect of the AM's association with that class. An AM's name need not be unique. If you succeed in creating an AM on a sealed class (like 'java.lang.String') you can access its private fields (like 'String.value'). This is not a security hole; see below.The decoupling of receiver type from access rights removes a restriction against language methods like Ruby's String.squeeze [1] which really ought to look as much as possible like a Java method. An AM on an interface receiver type amounts to a generic function over that interface.A useful term is in order: Define the "effective signature" of a method as that method's signature, with the receiver type prepended to the argument list, if the method is non-static.An AM may also contain values of some of its arguments, which are called "pre-applied arguments". That is, AMs support currying. This is a simple and flexible basis for all kinds of closures. Unlike inner classes, it does not require an associated class to hold the data.Other than the information exposed by java.lang.reflect.Method, AMs are totally opaque. Specifically, there is no way to find out whether there are pre-applied arguments, or what its bytecodes are, etc. The JVM hides such details to remain free to use dirty tricks for high performance.Naming an AMMethods are reified using the empty (marker) interface Function (in package java.lang). (So are CMs, for that matter.) There are interconversion methods between Function and with java.lang.reflect.Method, which allow (via the latter) reflective invocation.There is no way (except perhaps in a debugger) to retrieve the set of AMs associated with a given class. If an AM reference is dropped, the AM can be garbage-collected individually.When querying via reflection, the containing class of an AM is its associated class. AMs can be distinguished from other methods by the fact that they do not appear on their associate class's list of methods.Defining a new AMAn AM can be defined in one of three ways: 1. by loading its bytecodes, 2. by executing a 'newmethod' bytecode, 3. by renaming or retyping a pre-existing method, or 4. by pre-applying arguments to a pre-existing method. (Case 1 could be subsumed by case 2, or perhaps case 2 is not necessary.)To take these in turn:1. Loading. The 'loadMethod' method on the TBD classloader API works like 'loadClass', except that it accepts a byte array containing a variant of the classfile format. This variant defines a single method in the context of a single class, the associated class. The signature specified in the class file is the method's "effective signature" (it includes any receiver type). The 'loadMethod' call also accepts an optional array of objects, which are pre-applied to the resulting method. (See case 4.) The pre-applied arguments are associated with the resulting AM, and their types are removed from the its signature.The JVM preserves security by refusing to let untrusted code load into sensitive packages, like 'java.lang'. It also requires that a class must grant permission to load new AMs associated with it; in untrusted code, the 'loadMethod' call must originate from the associated class itself. This is why AMs do not prove an attack on private fields like 'String.value'.2. Allocating. The 'newmethod' bytecode has the following operand fields: A set of modifier bits, an optional Signature (Utf8) reference S0, a NameandType CP reference (N, S1), and a bytecode index BCI (referenced by offset, as with a branch). It pushes on the stack a new AM object with properties as follows.The name of the new method is N. The associated class is that of the current method. The modifier bits are as given. The byte-codes are located within the body of the current method, at the given BCI. Bytecodes reached from that BCI must be unreachable except via 'newmethod' instructions referring to that BCI.If S0 has one or more arguments A, the new method will pre-apply arguments of those types, popping them from the current stack.The signature of the new method is S1, with the receiver type removed (if the 'static' modifier is absent).3. Renaming. Library methods allow the name, modifiers, receiver type, or signature of a method to be changed. (The existing method is untouched; a new AM is returned.) If the signature is changed, casting or auto-boxing may be included to ensure low-level type safety. This is a non-primitive part of the proposal, in that the library methods probably can be implemented without help from the JVM. It is an elementary facility for creating adapter (or 'bridge') methods.Converting a non-static method to a static AM moves the original's receiver type into the new AM's signature (in the primary position). Converting a static method to a non-static AM moves the original's first argument from its signature to its receiver type.4. Pre-applying. The JVM supplies a library routine 'preApply' for associating a method with a group of one or more arguments. The routine returns a new AM, which when invoked on the remaining arguments, passes all the the arguments to the original method and executes it. This concept is called "currying a function" in the functional programming community [2]. The new AM has the same name, modifiers, receiver type, andassociated class as the original. Its signature is smaller, missing one argument type for each pre-applied argument.Thus, the invocation of the original method happens in two steps, each with its own set of arguments. The second set of arguments can be empty, or can consist solely of the original method's receiver. The first set of arguments is carried around in the structure of the intermediate AM; this set of values is not visible to any API (except perhaps debuggers). The intermediate method can be invoked any number of times.Pre-applied arguments are added to the end of the argument list. Looked at another way, the types of pre-applied arguments are removed from the end of the type signature of the original method.A receiver argument (in a non-static method) cannot be pre-applied, unless the method is first converted to a static method, which turns the receiver argument into a plain argument.(Note: There seem to be use cases for currying at either end of the argument list. Traditional currying pre-applies to the front, but pre-applying to the end seems slightly more suitable to existing JVM practices. Supporting either end in the JVM allows libraries to support the other end. Libraries must handle more compex argument shuffling anyway.)A special case is "receiver pre-application", which consists of converting an object with a specified method (such as the method of a one-method interface) and turning it to an AM which is a proxy for that method on that object. This is likely to be common in practice, and may merit a special library function of its own.Invoking an AMAn AM can be invoked reflectively by converting it to a Method and using Method.invoke.There is a library function for wrapping any one-method interface around a suitably matching method, creating a 'proxy' object. (There is already a Proxy facility like this in Java, except that it cannot handle unboxed arguments.) In this way, bare methods can be adapted to arbitrary one-method interfaces. Combined with the inverse conversion (receiver pre-application), AMs can mediate API adjustments efficiently, without the argument boxing required by present mechanisms in 'java.lang.reflect'.An AM can be invoked by an 'invokeinterface'. The AM itself is the receiver object, typed as a Function, and the name and calling signature must exactly match the AM's name and effective signature. The verifier allows a loophole here for the Function interface, and the JVM will dynamically check the AM's signature and receiver type, throwing a linkage error in the case of a mismatch.Nameless AMs are not callable in this way. (Or perhaps they should be omni-callable.)So far, this is enough to amply support functional programming styles. Note that the caller must know to push both the method and the receiver on the stack. The caller must be aware that there is a method explicitly involved in the call, not just a receiver, message name, and arguments.So the design so far does not support dynamic object-oriented languages. They require a more seamless integration of object methods with AMs. There must be a way for the caller to push the receiver and the arguments on the stack and hope for the best, with some sort of safety net if the message is not properly received. In fact, the safety net must allow some way for the system to adapt the call site (dynamically) to new types as they appear. Finally, performance requires the call site to include some sort of "fast path" for frequently occurring receiver types.Dynamic InvocationThis leads us to another new bytecode. The generic name would be 'invokedynamic', but let's call it 'invokemethod'. The 'invokemethod' bytecode works like 'invokeinterface' above, but its behavior is more interesting if the AM or receiver fail to have legal type and signatures. (In fact, a new bytecode is not needed if we extend the behavior of 'invokeinterface' in the sole case of the Function interface.)The 'invokemethod' bytecode has two operands: A NameandType CP reference, and a second, arbitrary CP reference. It pops from the stack a method, a receiver, and zero or more arguments, as dictated by the CP signature. The method must be a Function or null; the receiver can be any reference.Here are the specific cases. There are three ways for the call to succeed:1. If the method argument is null, and receiver argument is non-null, and it has a (normal) method of the given name and calling signature, and that method is accessible, the call goes through, to the receiver. The method argument is ignored. This allows transparent access to ordinary Java APIs as a simple default.2. If the method argument is non-static, and the receiver argument is reference-convertible to the method's receiver type, and the calling signature matches the method signature, the call goes through, to the method argument. (Note that an AM's receiver type can be java.lang.Object, so this is the universal case.) This is how a non-standard method (like Ruby's 'String.squeeze') could be invoked on a JVM-native object (like a 'java.lang.String').3. If the method argument is static, and the calling signature matches the method signature, the call goes through, to the method argument. The receiver argument is ignored, and (as with reflection) is conventionally specified as null, if the caller knows it is a static call.Typically, the method argument will be loaded out of some sort of call-site cache; it may be simply a static variable in the enclosing class. The method argument sort of stands there, waiting to be of use if a call needs help.Case 1 is the optimistic case, where the receiver can take the call without help. Note that it is reasonable for a call-site cache to be initialized to a null reference, and this will work as long as the receiver actually handles the intended method.Cases 2 and 3 could support a call-site pre-initialized to use an AM which implements complicated call handler, to perform language-specific dynamic loading and linking, or other bookkeeping. Or, these cases can support a "fast path mechanism" where the AM optimistically checks the receiver (and/or arguments) for expected types and then directly calls another method suited to those types. Note that the behavior of a single call site can evolve over time, simply by using different method arguments.Failed Dynamic InvocationIf none of the above cases succeed, the call fails. All is not lost, because at this point the language implementor (who compiled this thing in the first place) gains control, and can determine present and future outcomes.We assume that the containing class has been defined with an attribute called 'FailedCallHandler', which contains a CP reference to a class. When a call fails, the JVM resolves this reference to a class K and ensures it implements the following interface in java.lang:_ interface FailedCallHandler {_ _ Function failedCall(Method caller, String dope, int bci,_ _ _ Function failure, String name, String signature,_ _ _ Object receiver, Object... args );_ }The dope comes from the second operand to the bytecode instruction, the Utf8 reference. It is otherwise unused and uninterpreted by the system. It is useful to language implementors for encoding intentions about the call. The dope, name, and signature strings are interned.The failure and receiver arguments are the first and second operands popped from the stack.The result returned (which can be null) is used as a new method argument, and the process retried, as many times as necessary (or until the cows come home).The JVM is free (but not required) to memoize values returned by failedCall for the same call site, the identical failing method, and subtypes of the receiver type. (This needs more thought.)It is left as an exercise for the reader how to build a high-performance, natively executing dynamic language on top of a JVM extended this way. I believe it is quite possible.References:[1] http://www.rubycentral.com/book/ref_c_string.html#String.squeeze[2] http://en.wikipedia.org/wiki/Currying

Byte-coded methods in the Java Virtual Machine are perfectly suited for their main role today, which is to implement (more or less directly) methods defined in the Java Language, version 1.0.Java...


Duck Typing in the JVM

There's a thought-provoking post about invokedynamic and duck typing here: http://headius.blogspot.com/2007/01/invokedynamic-actually-useful.htmlCharles asks, "Would all methods accept Object and return Object? Is that useful?"I would answer that this amounts to a limited vision of duck-typing. Most dynamic languages (and the JVM) allow, as an option, for variables and fields to be strongly typed. In many cases it is profitable to customize generated code to take advantage (when possible) of declared types. In the JVM it is profitable to customize the signature of a method when that methods argument or return types (but not exceptions) are strongly typed at the source level. The best vision for duck typing is to adapt between separately compiled callers via permissive adapters, while allowing implementations to mention strong types (and exploit them locally) wherever the programmer wants. Duck typing should not limit the "ducks" to quack only in a limited range of ways.The key challenges for the implementor of a byte-compiled dynamic language (IMO) are three: 1. Customizing generated (bytecoded) methods to full range of JVM-capable signatures for methods. 2. Generating the various adapters needed to make the semantically null conversions (Integer/int, String/Object, etc.) and any implicit conversions the language may define. 3. Arranging calling sequences so that, when the caller and callee agree on a concrete JVM signature, the call has a "fast path" which is as direct as possible. I think some sort of "dynamic invoke" and class extension help most with steps 3 and 2, respectively.Getting the performance right, in this model, also ensures that calls to and from Java will run fast. This is important because Java is the "system implementation language" of the JVM. Every seasoned dynamic language programmer knows how important an efficient "foreign function interface" is. The JVM promises to supply the very best "FFI", if language implementors choose to take advantage of it.Note my bias against using the VM's built-iin "dynamic invoke" capability to handle argument conversion. Many languages have complex implicit argument conversions which are beyond any reasonable purview of the VM; given the need to handle these, the implementor may as well handle autoboxing there also. I don't seen an argument, yet, for signature conversion by the VM. Excessive adapter proliferation could support such an argument, but it's not yet proven to happen.By the way, the JVM's way of saying "not strongly typed" is java/lang/Object, with primitives handled by autoboxing. Also, as Bill Joy pointed out ten years ago, you could also build a weak-typing scheme on top of a fictitious interface, allowing a weak type system with several interchangeable kinds—e.g., multiple co-existing languages.To elaborate on step 3 above, the challenge is to support fast paths without sacrificing generality. Generality requires that if an unexpected argument appears, the runtime system gets a chance to fix up the discrepancy between caller and callee. Generality also requires that the behavior at the call site can change over time. New classes can appear or new behaviors added to existing classes. These requirements can be gracefully met without compromising performance in the common case, where the caller and callee know something about each other, and can successfully predict an efficient calling sequence at compile time. For a concrete example of this, suppose I'm compiling a call to s.indexOf(i), and I have reason to believe (or assume) that s is a java/lang/String and i is an int. I should compile a call site which rechecks the type of s and i (if I don't have a previous assurance) and calls ((String)s).indexOf((int)i). The call site may also need a "slow path" which calls something like Runtime.invoke((Object)s, (Object)i), or perhaps it can use an optimistic deoptimization scheme like the hotspot compilers. It hardly matters, if the fast path is all that gets executed... and that is the common case, in a nicely balanced system.

There's a thought-provoking post about invokedynamic and duck typing here: http://headius.blogspot.com/2007/01/invokedynamic-actually-useful.htmlCharles asks, "Would all methods accept Object and...


Better Closures

As I mentioned earlier, a closure per se has a special nature whichit does not share with other kinds of functions. A closure isproduced from snippet of code, written in the midst of a larger blockof code, in such a way that the meaning of names within the snippetare consistent with the enclosing block.Let‘s continue to examine the case for (a) better closuresand (b) slightly better type parameters but not (c) function types,with the most enjoyable part, (a). We will find that giving upfunction types produces the most compact, natural-looking syntaxfor separable code snippets.(a) Better ClosuresClosures arose from the happy combination of functional language(initially Lisp‘s version of lambda calculus) with a realizationof the benefits of simple, uniform treatment of names (lexicalscoping, referential transparency, etc.). As noted before, a languagecan incorporate those benefits without an explicit function typesystem. This note describes one way to do that for Java.Let's start by noting that inner classes are imperfect closures.The imperfections are (1) inaccessible nonfinal local variables, (2)inaccessible statement labels, (3) bulky syntax, and (4) shadowing byclass and method scopes. As a group, these imperfections stem fromcaution in the early days of Java; the designers avoided constructsthat could invisibly slow down the system. (Nobody had an optimizingJIT yet.) Imperfections (3) and (4) aredue, moreover, to the very explicit “class-ness” of thesyntax which wraps the programmer‘s snippet of code, even if theclass is anonymous. For better or worse, everything needed to looklike a class.For the curious and/or pedantic, I will explain more about theseimperfections later. But first I‘d rather assume we‘re onthe same page about inner classes, and propose some incrementalimprovements, in the direction of closures with full-blown lexicalscoping of all names and labels. The four imperfections motivate fourgroups of improvements: (1) fully visible local variables, (2) fullyvisible statement labels, (3) concise syntax, and (4) suppression ofextranous scopes.(1) Fully Visible Local Variables(1a) Allow the public keyword in all places wherefinal is allowed. This keyword does not affect themeaning of the variable at all, but makes it fully accessible withinits scope.(1b) Allow uplevel access to a local variable if it is declaredpublic or final, or if it could bedeclared final. (That is, it is statically single assigned.) public int counter = 0; int increment = 10; // acts like a final Thread t = new MyFancyRepeatingThread( new Runnable() { public void run() { counter += increment; } }); // I know how to avoid racing the counter: t.start(); t.join(); println("thread ran "+(counter / increment)+" times");Why not just abolish the final rule? Because thereare at least two classes of serious bug which come from the combinationof true closures with mutable variables. Declaring a variablepublic has an appropriate connotation of danger, and willconvey the need for caution to authors, code reviewers, andbug-hunters.This is a key step toward what Guy Steele calls “full-blownclosures”. But there‘s more…, because closuresalso turn out to be great (in combination with visitor-like patterns)for creating new kinds of loops, searches, and other control flow.(2) Fully Visible Statement Labels(2a) Allow all return, break, andcontinue statements to transfer control to (respectively)the innermost matching method definition, matching statement, ormatching loop body statement.It is helpful, when thinking about the meaning of branches, toremember that a continue is always equivalent to abreak from the loop's body, and a return can be renderedas a variable assignment followed by a break from themethod's body. Thus all branches can be converted tobreaks with suitable shuffling of labels.Label matching rules are unchanged, and continue to take labelsinto account. (We can view an unlabeled break orcontinue to refer to some fixed label, which isimplicitly attached to all loops and switches. A returncan taken to refer to the label of its enclosing methoddefinition (as inBGGA).(2b) Define the semantics of non-local branches compatibly withCommon Lisp or Smalltalk. A branch statement can break out of anystatement whose label is lexically in scope.This lets the programmer continue to insert early-return logicinto loops, even if those loops are implemented by library code: loop: mycoll.forEach(new Visitor() { public void visit(T x) { if (ranOutOfTime()) break loop; if (!isTrivial(x)) res.add(x); } });Use unspecified subtypes of Throwable to manage thepopping of stack frames, preserving the integrity oftry/finally and catch(Throwable), butotherwise keeping the operation opaque to users. As before, blocksand method activations can only exit once, either normally orabruptly. An attempt to exit a block a second time will throw anunchecked exception, akin to a stack overflow, probably witha helpful backtrace.(2c) Allow return statements to carry a label. Thelabel must be the name M of an enclosing method definition.The syntax is return M: or return M:X . Theexpression X (if any) is the value returned from the methodM(2d) For the sake of error checking a certain class of concurrentlibrary, define a standard compile-time statement and method parameterannotation BreakProtect. It is a compile-time error foran anonymous inner class or closure expression to be passed directlyto a method parameter marked with this annotation. It is acompile-time error for a branch within a statement marked with aBreakProtect annotationto break out of that statement.And it is an error for a closure expression converted to a markedmethod parameter to break out of the closure body. (Returns andproperly declared exceptions continue to be legal, of course.)If an inner class instance x (or other closure)contains a branch, and x is returned from its creatingmethod or passed to another thread, then the branch will fail with aRuntimeException. The backtrace of this exception willinclude at least the backtrace of the point of closure creation.Here is an example which branches from one method activation toanother, written with the proposed extensions: int outer() { middle( new Runnable() { public void run() { return outer: 123; } }); return 0; }Assume a suitable library type $Break, here is anequivalent rewrite in the present language: int outer() { final $Break ret[] = {null}; final int val[] = {0}; try { middle( new Runnable() { public void run() { val[0] = 123; throw ret[0] = new $Break(); } }); val[0] = 456; } catch ($Break ex) { if (ex != ret[0]) throw ex; // rethrow } return val[0]; }There are many ways to improve this code, notably by means ofcloned or preallocated exceptions. It may be profitable (at thecompiler‘s discretion) to merge the exception object for a givenblock with the block‘s autoboxed locals.Because the standard 228_jack benchmark usespreallocated exceptions for frequent non-local control transfers, itis likely that JITs are already able to optimize similar code shapesinto straight a goto. (Hotspot can.)All of the foregoing changes apply to pre-existing inner classnotations as well as a new closure notation.(3) Shrink the NotationThis is the pretty part. We can minimize the wrapper around thepayload down to one or two tokens, in a way that looks and feels likeJava, but is intelligible to users of Smalltalk, Ruby, and Groovy.(3a) Allow the following syntaxes, to be used for creatinganonymous inner instances: ClosureExpression: SimpleClosureExpression BlockClosureExpression DelegateClosureExpression SimpleClosureExpression:{ Expression }{ ClosureParameters : Expression } BlockClosureExpression:{ Statements } { ClosureParameters : Statements } ClosureParameters: Type Name( MethodParameters )( MethodParameters ) MethodThrows DelegateClosureExpression:& MethodName& Expression . MethodName Statement: YieldStatement YieldStatement:\^ Expression ; A yield statement exits immediately from the innermost enclosingblock closure expression. When a closure is compiled down to abytecoded class and method, its yield statements will similarly becompiled down to return instructions. A closure block can eitherproduce a void result by running off the end, or it can produce one ormore values via yield statements, but it cannot do both.Because a closure expression is not a method or class definition, areturn statement cannot make it produce a value orterminate normally. (A return statement can make anenclosing method terminate normally, abruptly exiting the closure andany intervening callers. It is a normal reaction, but a confused one,to suggest that since a closure in implemented in the VM usingmethods, then the closure syntax must use the returnstatement syntax to specify the closure's final value. But ifclosures are done right, there will be no trace of that method'sexistence, unless you disassemble the compiler's output, andeven then it may be inlined into some other method.)A simple closure is a single expression X surrounded bybraces, and possibly preceded by formal parameters. For anyX the expression {X} is identical in meaning to{X;} if X is a void-returning methodinvocation, else it means the yield statement {\^X;}.The syntax for closure parameter declarations is identical to thatof method parameter declarations, including the possiblethrows clause. If the parentheses are missing, theremust be a single parameter. If there are no parameters the parameterdeclarations can be completely elided, along with the delimitingcolon.This syntax is inspired by Smalltalk (and more recently byRubyand Groovy).The syntax is intended to resemble a Smalltalk block, a free-floatingsnippet of code, more than a lambda expression, with its parameter-bearingfront bumper.Smalltalk-style blocks are somewhat more concise than lambdaexpressions. As a bonus, they lend themselves to the followingpleasant syntax, courtesy of Ruby and Groovy:(3b) If the last argument to a method call is a closure expression,allow that expression to be appended after the closing parenthesis ofthe call. As a matter of style, programmers are discouraged fromdoing this unless the method call is itself a statement. f(x, {y}); f(x) {y};(3c)Now for the fiddly details which make it all possible.Closure expressions are not ordinary primary expressions. They areallowed only in syntactic contexts which allow them to be typed asone-method anonymous Java objects. (Yes, this is target typing. Ifclosures are to be truly concise, they have canonical function typesor else they must assume a type imposed by context.) Closureexpressions can occur only in the following constructs: AssignmentExpression: Name = ClosureExpression VariableInitialization: Name = ClosureExpression MethodArgument: Method ( ... ClosureExpression ... ) NewInstanceExpression: new Type ClosureExpression ReturnStatement: return ... ClosureExpression ; YieldStatement: \^ ClosureExpression ; CastExpression: ( Type ) ClosureExpression MethodInvocationExpression: ClosureExpression ( Expression ... )Every syntax for a closure expression supplies a contexttype which becomes the type of the closure object created. Thecontext type of a cast ed closure expression is the cast type, forexample. (Other context types are left as an exercise to the reader;direct invocation and overloaded functions will be dealt withshortly.) The context type must be a reference type K(interface or class) with exactly one abstract methodK.m. (If K is a class, it must have azero-argument constructor accessible.) In addition, the methodsignature and throws must be compatible with the closure.The closure expression is converted by closure conversionto the context type, using an anonymous class which closesover the lexical context of the expression. Note that closureexpressions in isolation do not have types; they are expressions whoseinteraction with the type system is determined by their formalarguments, thrown exceptions, and yield statements. (One could make afunctional type schema for closure expressions, but it‘s notstrictly necessary, except perhaps as a bridge from the category ofchecked expressions to the category of types, for the sole purpose ofclosure conversion.)An attempted closure conversion of a closure expression to its context typecan succeed under these conditions:The actual arguments of K.m must be applicable, asa group, to the formal arguments of the closure. All method callconversions, including boxing and varargs conversion, are allowed.If the closure has a yield statement, the value of every yieldstatement must convert, by either method call conversion or closureconversion, to the return type of K.m, which must not bevoid. If there is no yield statement, a null, zero, or false valuesuitable to the return type is produced by K.m.Each exception thrown by the closure body must be compatiblewith the checked exceptions declared by K.m.The rules for a delegating closure expressionx=&y.z are similar, except that the signature andthrows of the delegate method z are matched toK.m. For a delegate expression, the return value ofK.m is allowed to be void regardless of thereturn type of z. (This parallels the syntax of callinga method in a statement expression, throwing away its value.) Ifz is overloaded, then each overloading zn istreated as a distinct method, the set of zn which areclosure-convertible to K.m is formed, and (if the set isnon-empty) the unique least specific method z ischosen from that set. (This is the dual of the usual for forselecting the most specific method of an overloaded set.) If thereare convertible zn but there is no unique answer, theprogram is in error.In the special case of a method call where there are severalcandidate context types K (because overloadings acceptdiffering arguments at the closure's position), closure conversion isapplied for each overloading, to the corresponding Kn.mn.If the conversion can succeed, the overloading is applicable,otherwise not. (Note: Non-unique results converting delegateexpressions during overload matching still lead to errors, even ifthere would be other applicable methods. Two-way overloading does notrequire an M-by-N free-for-all.)The special case of direct invocation of a closure expressionis left as an exercise. (Or it could be omitted, but that is perhapstoo surprising.) The result is as if the compiler came up with ananonymous interface K0 which is used nowhere else in theprogram, whose method K0.m0 takes exactly the closuresdeclared arguments, throws what the closure throws, and returns whatthe yield statements return (void if none, or the type oftrue?x:y for yields x and y ifthere are more than one).The purpose of delegate expressions is to convert cleanly from onecalling sequence (method descriptor) to another. We could also havedefined a system of interconversions between all one-abstract-methodtypes (or one-method interfaces), but this would likely lead toexpensive chains of delegating wrappers as types convert back andforth at module boundaries. We make this conversion process explicitby delegate creation expressions, so programmers can control it. Wemake it a simple special case of closure expression, so programmerscan convert delegate types in a simple cast-like notation(NewType)&oldClosure.oldMethod.The ugly ampersand is needed to respect Java‘s absolutedistinction between method names and field names. Otherwise we wouldhave to introduce new rules (as inBGGA)for scoping method and field names simultaneously.(4) Suppressing Method and Class ScopesWe must point out a few more details about the cleanliness ofclosure expressions. Because an inner class is explicitly andsyntactically a class definition with method definitions, expressionsnested inside “see” the class and method wrappers andeverything those wrappers bring into scope. Nested expressions canrefer to class and superclass members by name, they can can requestthe this and super pointers, and theycan issue return statements to the method.A closure expression does not have the syntactic trappings of classand method, and so it cannot “see” the associatedscoped and predefined names and labels. The expression this withina closure body refers not the the eventual closure object but to thesame object (if any) that this would refer to at thepoint where the closure expression is introduced. (Indeed, it wouldbe exceedingly hard to say that the type of this wouldbe in a closure expression, until closure conversion forces a typeonto the expression.) This considerations imply that a closurehas does not any intrinsic “secret name” it can useto invoke itself recursively; any such name must be suppliedexplicitly by the enclosing block. public Function fact = null; fact = { int n : n

As I mentioned earlier, a closure per se has a special nature which it does not share with other kinds of functions. A closure is produced from snippet of code, written in the midst of a larger blockof...


Closures without Function Types

There is a proposal afloat in several places including Javalobbyby some of Java's authors to add closures and function types to Java.The authors are Gilad Bracha, Neal Gafter, James Gosling, and Peter vonder Ahé, and so I will tag their proposal as BGGA.For my part, I think Java needs (a) better closures and (b) slightlybetter type parameters but not (c) function types. Let‘s examinethe last first…(c) Not FunctionsThere are various interesting lesser details to worry over, but the major impact of the BGGAproposal is to introduce a new subsystem (type scheme) of functiontypes. It is a thoughtfully designed system, but it overlaps heavilywith a pre-existing and parallel subsystem of interface types. Thisoverlap with pre-existing functionality robs function types of thepower they would have in another language. (I say this with greatfondness for the functional programming style, as a long time Lispprogrammer and author of an optimizing Scheme compiler.)In a nutshell, any use case for a function signature U(V,W...) throws X can be addressed by a one-method interface I with a method U I.m(V,W...) throws X.(There‘s a little more than a nutshell‘s worth; keepreading.) Thus, whatever value the new type system has must somehowexceed the utility of one-method interfaces.The introduction of generic types was also a major impact on thelanguage, but type parameters compensated for their complexity byaddressing a serious flaw in interface types, a widely felt flaw: List was less workable than List<T>).There is not a corresponding widely felt need which function typesaddress, because using one-method interfaces for function types hasbeen a reasonable workaround.Here are some benefits (real and perceived) of function types, examined critically in the setting of Java:Names are DistractionsIt‘s easier to manage nameless types, because names can be misleading and sometimes cause useless type mismatches. Butclasses offer various workarounds for name management problems. Forexample, an API which features a method that takes a functionalparameter can also feature a nested interface to name the parameter'stype: The parameter type need not be anonymous. The most common handfulof types (such as Runnable) can be in java.lang, while other function-like types can be API-specific.The occasional remaining mismatch can be patched by closure creation oran adapter generation convention without recourse to function types.Pure Structure is ConciseSimilarly, function types can be concise, since they omit extraneous names. Butthe names are not always extraneous. They often convey usefulintentions to the programmer, especially if accompanied by a Javadocumentation comment. Even if the function type is pure structure, asymbolic name can abbreviate it; the name is rarely much longer thanthe spelling of the type itself. The name is much longer only with thesimplest types, like Runnable for void().Easy Mix-and-MatchIdentical calling sequences are fully compatible, instead ofbeing artificially distinguished merely by name. This leads to powerwhen combining separately designed modules, when they serendipitouslyapply to the same function types. But this requires asystem designed for serendipity. Function types that are logicallysimilar must tend to be identical, or else the system will have toadapt to the near misses of calling sequence. Java has too many types.Consider the difference in Java between int and Integer, int and long, throws IOException versus no throws, and varargs versus “vanilla”. Functional type systems tend to avoid such distinctions. Java API designers have too many degrees of freedom to expect to get lucky matching independently authored calling sequences.Mismatches Easily PatchedLogically similar calling sequences can inter-convert (in BGGA). This can make up near-misses between types of values exchanged by unrelated APIs. Butthis is a two-edged sword, because (except at the point of closurecreation) the VM probably has to build an adapter wrapper of some sort,which becomes an invisible overhead. (Or else function types arepartially erased in the VM, which forces dynamic checking overheads,compared to strongly typed interfaces.) Better to have calling sequenceconversion happen (a) at closure creation and (b) as an explicitconstruct (e.g., cast, new closure, or delegate creation syntax)...Like Peas and CarrotsLanguages with closures (Haskell, etc.) generally integrate themwith function types; it is a well-understood paradigm. Definitions ofthe term closure tend to specify that a closure is a function. Buttwo of the oldest languages with closures, Scheme and Smalltalk, do nothave a system of function types per se. (Scheme has just one type, PROCEDURE,which covers both closures and intrinsics. Smalltalk calls its closures“blocks“, and manipulates them as — what else?— one method objects.) Truly closure-like referentialtransparency without function types can be observed in more recentlanguages also, notably Beta and Java itself, whose inner classes actas limited closures. In its essence, a closure is a source-languageconstruct which provides a runtime handle to a snippet of code. The“closing“ of a closure refers to the fact that the codesnippet has full and seamless access to the source code around theclosure. The code inside the snippet is “in synch” with thecode outside; the same names mean the same things. This concept isabout factoring source code, not about type systems. There is surely atype system to help classify the snippet handles, but interfaces can dothat job as well as pure function types.(Your Point Here)(Function types surely provide other advantages which I‘veoverlooked. I‘m also sure those advantages are diluted in thesetting of Java.)These various advantages, properly weighed, do not add up to amandate for a new type subsystem, as long as the crucial advantage(better closures) can be obtained without the new types. There is muchto like about BGGA including concise closure creation, betterclosure semantics, and local functions. But the extra types don‘treally carry their weight.

There is a proposal afloat in several places including Javalobby by some of Java's authors to add closures and function types to Java.The authors are Gilad Bracha, Neal Gafter, James Gosling, and...


Integrated Cloud Applications & Platform Services