JVM Language Summit: Numbers big and fixed

Last week, I enjoyed attending the 2010 JVM Language Summit. Some of the requests from the "what the JVM needs" session could be addressed by a few additional numerical libraries in the platform.

The default integer arithmetic in many languages conceptually doesn't overflow, analogous to operating on Java's java.math.BigInteger objects rather than 32-bit int or 64-bit long values. However, small integers that enjoy hardware support on commodity CPUs are much more commonly operated on than big numbers which require multi-word support. Ideally, conceptually unbounded integers would still run very fast when they were small enough without consuming either excess CPU cycles or excess memory. In the limit, fixnums would result, where the transitions between small ↔ big integers were managed and optimized by the JVM. Until that happy day, internally BigDecimal manages longBigInteger transitions and potential future library additions could ease independently coding up similar logic.

Semantically a BigDecimal is a BigInteger significand/mantissa along with an int scale (negated exponent). If the scale/exponent is zero, then exact integer results can be computed. The BigDecimal implementation in the JDK internally often uses a long where possible to hold the significand value, failing over to allocating and operating on a BigInteger value when needed. The compact long representation is seamlessly supported in arithmetic and other operations. BigDecimal is not an ideal type for providing pure integer arithmetic since BigDecimal has many operations unneeded and inappropriate for that use and because each BigDecimal object has various fields other than the ones holding the integer value; however, the class comes with the JDK and may perform adequately out of the box.

The algorithms used for the long overflow checking in BigDecimal are adapted from the indispensable bit-twiddling tome Hacker's Delight. A few conference attendees expressed interest in an API around integer overflow detection. Represented as a Java API, the functionality might look like:

/\*\*
 \* Returns the sum of the arguments, throwing an ArithmeticException 
 \* if the exact sum is out of {@code int} range.
 \* @return the sum of the arguments, throwing an ArithmeticException 
 \* if the exact sum is out of {@code int} range.
 \* @throws ArithmeticException if the exact sum is out of {@code int} range
 \*/
public static int addExact(int a, int b) throws ArithmeticException

The source code of such a method could use the Hacker's Delight techniques while a JVM could intrinsify the functionality to a processor-optimized idiom. There are some design choices even in this simple method. For example, instead of ArithmeticException, a new subclass of that exception called, say, IntegerOverflow, could be thrown instead. Such a subclass could offer the ability to store the original operands to facilitate continuing the computation in a wider type. (An unfriendly API design for the purposes at hand would be for addExact to have int parameters but return a java.lang.Integer, with null being used to indicate overflow occurred!)

While I don't recall it being explicitly mentioned at the conference, a piece of similar functionality is accessing the high-order 64 bits of a full 64 × 64 → 128 bit multiply (5100935).

Adding these low-level library operations to the platform would be a fun project!

Comments:

These suggestions would go a long way to make the JVM more language-neutral, to efficiently support both lower-level languages (with unsigned integral types, overflow control etc.) and higher-level ones (numeric towers). I'm just afraid of too much, informal reliance on helper methods with intrinsic optimization, because their performance is typically not even documented, let alone mandated. I think the javadoc of such methods should make very clear that all JVM implementations are _required_ to implement the method as efficiently as possible in each platform; instead of, just relying on complex bit-twiddling hacks that may or may not be compiled into optimal native code... and won't be accelerated at all in interpreted mode. In theory it's possible to "intrinsify" core APIs even in the interpreter: invokestatic's to these special methods could easily be quickened into internal bytecodes that the interpreter handles with straight, optimal native code. But in practice I think this technique is not used at all, and this is very bad because it's a gratuitous order[s]-of-magnitude penalty for many important core APIs until the JIT kicks in (potential loading/warmup-time penalty).

For BigInteger/BigDecimal, we also need a complimentary API allowing mutable operations. Benchmarking of these APIs consistently shows massive cost of allocation / GC / data churn (cache misses etc.). [I have implemented a Mandelbrot fractal generator with BigDecimal. I's sad to watch.]

Posted by Osvaldo Pinali Doederlein on August 04, 2010 at 04:45 AM PDT #

Yes, I echo the points above for embedded and similar 'smaller' JVM environments where there may be no JIT at all, or only the equivalent of (say) C1. Indeed, I run my main Java Web server in such an environment on an ARM SheevaPlug.

Now, exceptionally clear behaviour and performance semantics might allow some static expansion/peephole/etc up front by tools such as ProGuard where the target is known to have a weak JIT for example.

I wouldn't oblige compilers or JVMs to provide fabulous performance, but like the strong hints in JSR-41 about excising all baggage for asserts wrapped in if(false) { } for the benefit of embedded platforms, something similar could be done here.

Rgds

Damon

Posted by Damon Hart-Davis on August 04, 2010 at 05:19 AM PDT #

These methods are required for JSR-310. We already have implementations, which are tentatively proposed for j.u.Math. Please take a look!

https://jsr-310.dev.java.net/nonav/doc-2010-06-22/javax/time/MathUtils.html

Posted by Stephen Colebourne on August 04, 2010 at 05:20 AM PDT #

+1 on all things Hacker's Delight.

I started a comment which inflated to a blog post of its own. There are a couple of degrees of freedom in this problem: What we will use the extra bit(s) for, and how to deliever it (or them). After examining a couple of half measures, I enumerate about ten ways to return normally with a pair of machine words.

http://blogs.sun.com/jrose/entry/after_the_deluge

Please direct comments back to Joe's blog!

Posted by John Rose on August 04, 2010 at 07:12 AM PDT #

@Stephen: I suggest coding your throw path in such a way that the decision to make a new exception is factored somewhere central:

int safeAdd(int a, int b) {
long zz = (long)a + b;
int z = (int)zz;
if (z != zz) throw notSafe();
return z;
}

...and that your specification allow (as it appears to do already) implementations to preallocate exceptions.

Posted by John Rose on August 04, 2010 at 07:21 AM PDT #

@Osvaldo,

On the topic of intrisification, for at least some of the java.lang.Math methods the HotSpot interpreter does call the fast intrinsified version, the same version called by the client and server compilers. (This is necessary to get consistent results for the methods whose exact input/output behavior is by designed not fully specified.)

Posted by Joe Darcy on August 04, 2010 at 07:28 AM PDT #

@John, JSR-310 only needs these at the edges. All normal cases will be nowhere near reaching these exceptions. Preallocating an exception would be perfectly acceptable.

Key question though. If I could keep the range of most of the numbers to 31 bits rather than 32 bits, would that help any future optimisations?

BTW, these methods would be a fine addition to JDK 7, even if performance enhancements came later...

Posted by Stephen Colebourne on August 04, 2010 at 05:54 PM PDT #

This is a very good idea. Even better that it is already implemented in jsr-310. They should definitely be added to the standard library.

I think it would probably be difficult to restart just the latest operation based on the two input values stored in an IntegerOverflow exception.

If you want to switch between fast int code into slow bignum code, then it can be taken care
of at a higher level. Thus the optimizing compiler in the JVM would prefer to see an calcUsingInt and calcUsingBignum that are used like this:

try {calcUsingInt();} catch (ArithmeticException) { calcUsingBignum(); }

This would allow for some heavy optimizations of calcUsingInt. If we have an exit opportunity to bignum after each integer operation it might make the optimized code significantly larger.

But again, this depends on the actual usage pattern as John pointed out.

Posted by Fredrik Öhrström on August 04, 2010 at 09:10 PM PDT #

@Joe, thanks. I guess we only need to implement this idea more completely. For example, it boggles the mind to think that something like Double.doubleToRawLongBits() - that the JIT turns into NOP - will actually require a JNI call when you are on interpreter. This kind of thing seems to be really low-hanging fruit...

Posted by Osvaldo Pinali Doederlein on August 05, 2010 at 03:53 AM PDT #

Just to say, the JSR-310 code is BSD 3 clause licensed, although you might want to start from scratch.

Posted by Stephen Colebourne on August 05, 2010 at 09:21 AM PDT #

@Osvaldo,

When I last surveyed different processor instructions sets, most had disjoint integer and floating-point register sets and few architectures had instructions for direct moves between the two register sets.

Therefore, the bitwise conversion between integer and floating-point could end up requiring moves through memory, so can't necessarily be implemented as just a nop. (There can also be esoteric beasts like signaling NaNs to worry about.)

Many years ago while a student, I wrote up a tech report which included algorithms to have "100% Pure Java" implementations of the bitwise conversions:
http://www.sonic.net/~jddarcy/Research/ieeerecd.pdf
While these were fun to think about, the algorithms are of mainly academic interest.

In any case, IIRC the bitwise conversion operations are intrinsified in at least the client and server compilers, but I'm not sure about the interpreter.

Posted by Joe Darcy on August 15, 2010 at 03:30 PM PDT #

@Joe, you are of course correct wrt register files, I was thinking more scenarios where the value doesn't really need to transition from FP to "raw" for or vice-versa (mostly cases where the source or destination is memory). For example, "double[] arr; long bits = Double.doubleToRawLongBits(arr[5])". In C language, this would be performed by a simple pointer typecast (which itself generates no code). Of course we need at least a memory move in these cases, but by "NOP" I meant no _extra_ operations. So for example, if I want to write a loop that just copies the entire content of a double[] array to a long[] array, HotSpot should be able to compile this loop to code that's as good as a raw memcpy() (or even, just call memcpy() of some JVM-specific intrinsic like System.arrayCopy() and others).

I didn't know that doubleToRawLongBits() (as well as the opposite raw->double, and also float->raw and raw->float) cannot be implemented by a raw load of the same bits; this is certainly not obvious for me by the javadocs, and is counter-intuitive with the method names ("raw" should mean: same bits, no translation at all!). I thought only the non-raw APIs, like doubleToLongBits(), would perform some handling of special IEEE values. Isn't that the case?

Posted by Osvaldo Pinali Doederlein on August 15, 2010 at 11:08 PM PDT #

@Osvaldo,

Fortunately I can no longer recall offhand all the nasty complications of dealing with signaling NaNs ;-) It is true that a raw load of the same bits can be used to implement the bitwise conversions; the C code in the Java libraries used to provide this functionality (when it is not intrinsified) is an untyped union. However, given architectures may have particularities, such as the old x87 floating-point stack. On the x87, since the in-register format was 80 bits rather than 64, IIRC, doing the 80 -> 64 bit conversion could quash the signaling NaN bit. These sorts of issues are of much less practical importance with the preponderance of x86 boxes with the SSE extensions.

Posted by Joe Darcy on August 17, 2010 at 01:28 AM PDT #

Post a Comment:
Comments are closed for this entry.
About

darcy

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today
News

No bookmarks in folder

Blogroll