• JVM
    July 21, 2010

an experiment with generic arithmetic

John Rose
Most computer languages let you add two numbers together. This is harder than it looks, since numbers come in a variety of formats, starting with fixed-point vs. floating-point. The logic for numeric addition has to classify its operands, perhaps coerce them to a common format, and then switch to the correct algorithm for that format. Dynamic languages have do this all at runtime, without keeping the user waiting too long.

(By the way, structuring this problem is much more than an exercise in classic object-oriented programming, since arithmetic operations have symmetries which cannot be modeled by sending a request to an encapsulated method in the left-hand operand. Some sort of double dispatch is required, which is an interesting problem, which I won't go into. Acid test: Design your generic arithmetic framework so that it simulates Java arithmetic, but allow users can add new number classes, starting with BigInteger and BigDecimal. Now test it by adding rational numbers and complex numbers, as two separate and independent modules. If that was easy, add units or formal polynomials. Don't forget that your users are expecting to perform arithmetic on integer iteration variables of their inner loops, and they refuse to declare a static type for it.)

The first trick of optimizing generic arithmetic is to assume that each particular operation (let's say it an addition) is going to apply to a limited range of operand formats, usually a single format, usually a machine word (boxed and tagged neatly, in an object). Regardless how many exotic number types are running around globally, each local neighborhood is pretty boring. The theory of invokedynamic is that each operation (a call site in the JVM bytecodes) is separately linked by a bootstrap method. Even if two sites look just the same, each one gets its own bootstrap process. This allows (but does not require) the invokedynamic linkage logic (which is part of the language runtime, not the JVM) to assign separate profile points to each individual operation. Invokedynamic allows the JVM to separately customize and optimize each arithmetic operation, according its operand types and context.

To put this theory into practice, I tried a first experiment combining invokedynamic with Kawa, a mature implementation of Scheme on the JVM. Kawa adds dynamically typed numbers together with routine called AddOp.$Pl. As suggested above, it takes two object references, classifies them as numbers, converts them to a common type (Integer, Double, whatever) and adds them together according to that common type.

You might expect that the classification logic is expensive, and it is, compared to the actual cost of adding two machine words. The main cost, however, appears to be the allocation of intermediate values on the heap. In arbitrary approximate units, the cost of classifying operand objects is 1 to 3, the cost of adding their "payloads" is less than 0.1, and the cost of boxing the result is 5.

The best way to speed up such computations, therefore, is to elide the allocations, either some interprocedural analysis or handshake that keeps the payloads in registers or on the stack, or to encode small payloads in pseudo-pointers. Failing that, we are always working to make GC faster. I suppose JVM engineers need to do all three...

Meanwhile, there is the interesting problem of classification. The range of costs noted above is affected by the success or failure of type profiling. JVMs and other dynamically typed systems usually keep track of operand types (in many clever ways) so that they can simplify and specialize their execution. Specifically, although the Kawa routine AddOp.$Pl is ready to handle numbers of all types, if it is presented with only (say) Integer operands, the HotSpot JVM will notice that the internal classification tests never observe anything but that one type, and it will adjust the machine code for that routine to work well for that type, and "bail out" to an expensive recompilation if the assumption changes. If this tactic works, the total cost of adding two Integers together is 6 units. Otherwise, it is 8 units.

Using invokedynamic, we can add to the JVM's profiling information. The trick is to have the bootstrap method link in invokedynamic site with a method handle that observes operand types, briefly records them, and then specializes itself to a new method handle that works better for the previously observed operand types. My experiment (above) is very simple: The call site observes the type of one operand of one call, and spins an "if/then/else" which says, "as long as future operands are also of that type, cast to that type, and call AddOp.$Pl".

The effect of the extra cast is to send specialized type information into the AddOp.$Pl routine, which enables it to fold up, about the same as if the JVM's global type profile had succeeded. In essence, invokedyanmic works as a front-end to the AddOp.$Pl routine, helping it to classify its operands. The overhead of the invokedynamic instruction is about 1 unit (which is higher than it should be, but that's young technology for you). But, the extra profiling prevents the worst case classification cost of 3 units, keeping it down to 1.

(The comment at the top of the benchmark has more information about how these numbers were obtained. Full disclosure: This experiment was run on the latest mlvm experimental build. Also, the JVM's inlining heuristics, designed circa 1999, had to be twisted around manually in order to keep the JIT interested in processing the program in large enough units. To get valid system-level results, we'll have to engineer better inlining tendencies into the JVM's logic.)

This result is encouraging for a few reasons. First, under the realistic assumption of a polluted type profile, invokedynamic can reduce the cost of generic arithmetic by double-digit percentages. That margin will improve as JVMs improve the quality of invokedynamic code generation. See the upcoming paper by Thalinger & Rose in PPPJ 2010 for more observations on this point.

Second, this particular experiment required no change at all to the Kawa runtime routine. The new JVM features can be used in an evolutionary way. But deeper changes can get better results. For example, dynamic languages sometimes have "metaobject protocols" which allow the dynamic type system to communicate (in a modular way) with the compiler. A good MOP can encode optimization decisions about generic arithmetic sites, and invokedynamic provides a clean way to plug those decisions into the JVM's generatedcode.

Third (and most important) is the fact that the temporary allocation, although it is by far the highest cost, is also vulnerable to local analysis. After a type-customized version of AddOp.$Pl is inlined at an invokedynamic site, it becomes an exercise in escape analysis (or fixnums; take your pick) to remove the garbage collector from the picture. This is a work in progress for us JVM folks...

Join the discussion

Comments ( 2 )
  • Howard Lovatt Wednesday, July 21, 2010

    Warning, plug for pet project.

    My pet project is PEC, Pattern Enforcing Compiler, for Java, which is an extended Java compiler that adds patterns, including relevant to this case is Multiple Dispatch:


    If you scroll down the first example given in the description section is adding arbitrary number classes to an RPN calculator.

  • Osvaldo Pinali Doederlein Thursday, July 22, 2010

    Programming Languages are really in their infancy, as we still debate such mind-boggling-fundamental issues as support for numbers and mathematical operations. It's depressing. Even more depressing to admit that I prefer languages with primitive types, that just avoid the problem by throwing it in the programmer's lap. (OTOH I'm not ashamed for static typing - this is justified by other concerns than easy performance.)

    Hopefully if my son (now 4yo) follows dad's career, PLs will be in better shape by the 2030's. Hey, I plan to have a long live and never retire (programming is my cocaine) so hopefully I could benefit too. Only bad part would be the kids looking at our old code and making fun of us. But with the pace that these really-core PL problems are [not] solved, my fear of the ridicule is not very big at this time.

Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.