Wednesday Aug 04, 2010

JVM Language Summit: Numbers big and fixed

Last week, I enjoyed attending the 2010 JVM Language Summit. Some of the requests from the "what the JVM needs" session could be addressed by a few additional numerical libraries in the platform.

The default integer arithmetic in many languages conceptually doesn't overflow, analogous to operating on Java's java.math.BigInteger objects rather than 32-bit int or 64-bit long values. However, small integers that enjoy hardware support on commodity CPUs are much more commonly operated on than big numbers which require multi-word support. Ideally, conceptually unbounded integers would still run very fast when they were small enough without consuming either excess CPU cycles or excess memory. In the limit, fixnums would result, where the transitions between small ↔ big integers were managed and optimized by the JVM. Until that happy day, internally BigDecimal manages longBigInteger transitions and potential future library additions could ease independently coding up similar logic.

Semantically a BigDecimal is a BigInteger significand/mantissa along with an int scale (negated exponent). If the scale/exponent is zero, then exact integer results can be computed. The BigDecimal implementation in the JDK internally often uses a long where possible to hold the significand value, failing over to allocating and operating on a BigInteger value when needed. The compact long representation is seamlessly supported in arithmetic and other operations. BigDecimal is not an ideal type for providing pure integer arithmetic since BigDecimal has many operations unneeded and inappropriate for that use and because each BigDecimal object has various fields other than the ones holding the integer value; however, the class comes with the JDK and may perform adequately out of the box.

The algorithms used for the long overflow checking in BigDecimal are adapted from the indispensable bit-twiddling tome Hacker's Delight. A few conference attendees expressed interest in an API around integer overflow detection. Represented as a Java API, the functionality might look like:

 \* Returns the sum of the arguments, throwing an ArithmeticException 
 \* if the exact sum is out of {@code int} range.
 \* @return the sum of the arguments, throwing an ArithmeticException 
 \* if the exact sum is out of {@code int} range.
 \* @throws ArithmeticException if the exact sum is out of {@code int} range
public static int addExact(int a, int b) throws ArithmeticException

The source code of such a method could use the Hacker's Delight techniques while a JVM could intrinsify the functionality to a processor-optimized idiom. There are some design choices even in this simple method. For example, instead of ArithmeticException, a new subclass of that exception called, say, IntegerOverflow, could be thrown instead. Such a subclass could offer the ability to store the original operands to facilitate continuing the computation in a wider type. (An unfriendly API design for the purposes at hand would be for addExact to have int parameters but return a java.lang.Integer, with null being used to indicate overflow occurred!)

While I don't recall it being explicitly mentioned at the conference, a piece of similar functionality is accessing the high-order 64 bits of a full 64 × 64 → 128 bit multiply (5100935).

Adding these low-level library operations to the platform would be a fun project!

Thursday Feb 25, 2010

Notions of Floating-Point Equality

Moving on from identity and equality of objects, different notions of equality are also surprisingly subtle in some numerical realms.

As comes up from time to time and is often surprising, the "==" operator defined by IEEE 754 and used by Java for comparing floating-point values (JLSv3 §15.21.1) is not an equivalence relation. Equivalence relations satisfy three properties, reflexivity (something is equivalent to itself), symmetry (if a is equivalent to b, b is equivalent to a), and transitivity (if a is equivalent to b and b is equivalent to c, then a is equivalent to c).

The IEEE 754 standard defines four possible mutually exclusive ordering relations between floating-point values:

  • equal

  • greater than

  • less than

  • unordered

A NaN (Not a Number) is unordered with respective to every floating-point value, including itself. This was done so that NaNs would not quietly slip by without due notice. Since (NaN == NaN) is false, the IEEE 754 "==" relation is not an equivalence relation since it is not reflexive.

An equivalence relation partitions a set into equivalence classes; each member of an equivalence classes is "the same" as the other members of the classes for the purposes of that equivalence relation. In terms of numerics, one would expect equivalent values to result in equivalent numerical results in all cases. Therefore, the size of the equivalence classes over floating-point values would be expected to be one; a number would only be equivalent to itself. However, in IEEE 754 there are two zeros, -0.0 and +0.0, and they compare as equal under ==. For IEEE 754 addition and subtraction, the sign of a zero argument can at most affect the sign of a zero result. That is, if the sum or difference is not zero, a zero of either sign doesn't change the result. If the sum or differnece is zero and one of the arguments is zero, the other argument must be zero too:

  • -0.0 + -0.0 ⇒ -0.0

  • -0.0 + +0.0 ⇒ +0.0

  • +0.0 + -0.0 ⇒ +0.0

  • +0.0 + +0.0 ⇒ +0.0

Therefore, under addition and subtraction, both signed zeros are equivalent. However, they are not equivalent under division since 1.0/-0.0 ⇒ -∞ but 1.0/+0.0 ⇒ +∞ and -∞ and +∞ are not equivalent.1

Despite the rationales for the IEEE 754 specification to not define == as an equivalence relation, there are legitimate cases where one needs a true equivalence relation over floating-point values, such as when writing test programs, and cases where one needs a total ordering, such as when sorting. In my numerical tests I use a method that returns true for two floating-point values x and y if:
((x == y) &&
(if x and y are both zero they have the same sign)) ||
(x and y are both NaN)
Conveniently, this is just computed by using (, y) == 0). For sorting or a total order, the semantics of are fine; NaN is treated as being the largest floating-point values, greater than positive infinity, and -0.0 < +0.0. That ordering is the total order used by by java.util.Arrays.sort(double[]). In terms of semantics, it doesn't really matter where the NaNs are ordered with respect to ther values to as long as they are consistently ordered that way.2

These subtleties of floating-point comparison were also germane on the Project Coin mailing list last year; the definition of floating-point equality was discussed in relation to adding support for relational operations based on a type implementing the Comparable interface. That thread also broached the complexities involved in comparing BigDecimal values.

The BigDecimal class has a natural ordering that is inconsistent with equals; that is for at least some inputs bd1 and bd2,, bd2)==0
has a different boolean value than
In BigDecimal, the same numerical value can have multiple representations, such as (100 × 100) versus (10 × 101) versus (1 × 102). These are all "the same" numerically (compareTo == 0) but are not equals with each other. Such values are not equivalent under the operations supported by BigDecimal; for example (100 × 100) has a scale of 0 while (1 × 102) has a scale of -2.4

While subtle, the different notions of numerical equality each serve a useful purpose and knowing which notion is appropriate for a given task is an important factor in writing correct programs.

1 There are two zeros in IEEE 754 because there are two infinities. Another way to extend the real numbers to include infinity is to have a single (unsigned) projective infinity. In such a system, there is only one conceptual zero. Early x87 chips before IEEE 754 was standardized had support for both signed (affine) and projective infinities. Each style of infinity is more convenient for some kinds of computations.

2 Besides the equivalence relation offered by, y), another equivalence relation can be induced by either of the bitwise conversion routines, Double.doubleToLongBits or Double.doubleToRawLongBits. The former collapses all bit patterns that encode a NaN value into a single canonical NaN bit pattern, while the latter can let through a platform-specific NaN value. Implementation freedoms allowed by the original IEEE 754 standard have allowed different processor families to define different conventions for NaN bit patterns.

3 I've at times considered whether it would be worthwhile to include an "@NaturalOrderingInconsistentWithEquals" annotation in the platform to flag the classes that have this quirk. Such an annotation could be used by various checkers to find potentially problematic uses of such classes in sets and maps.

4 Building on wording developed for the BigDecimal specification under JSR 13, when I was editor of the IEEE 754 revision, I introduced several pieces of decimal-related terminology into the draft. Those terms include preferred exponent, analogous to the preferred scale from BigDecimal, and cohort, "The set of all floating-point representations that represent a given floating-point number in a given floating-point format." Put in terms of BigDecimal, the members of a cohort would be all the BigDecimal numbers with the same numerical value, but distinct pairs of scale (negation of the exponent) and unscaled value.

Saturday Feb 20, 2010

Everything Older is Newer Once Again

Catching up on writing about more numerical work from years past, the second article in a two-part series finished last year discusses some low-level floating-point manipulations methods I added to the platform over the course of JDKs 5 and 6. Previously, I published a blog entry reacting to the first part of the series.

JDK 6 enjoyed several numerics-related library changes. Constants for MIN_NORMAL, MIN_EXPONENT, and MAX_EXPONENT were added to the Float and Double classes. I also added to the Math and StrictMath classes the following methods for low-level manipulation of floating-point values:

There are also overloaded methods for float arguments. In terms of the IEEE 754 standard from 1985, the methods above provide the core functionality of the recommended functions. In terms of the 2008 revision to IEEE 754, analogous functions are integrated throughout different sections of the document.

While a student at Berkeley, I wrote a tech report on algorithms I developed for an earlier implementation of these methods, an implementation written many years ago when I was a summer intern at Sun. The implementation of the recommended functions in the JDK is a refinement of the earlier work, a refinement that simplified code, added extensive and effective unit tests, and sported better performance in some cases. In part the simplifications came from not attempting to accommodate IEEE 754 features not natively supported in the Java platform, in particular rounding modes and sticky flags.

The primary purpose of these methods is to assist in in the development of math libraries in Java, such as the recent pure Java implementation of floor and ceil (6908131). This expected use-case drove certain API differences with the functions sketched by IEEE 754. For example, the getExponent method simply returns the unbiased value stored in the exponent field of a floating-point value rather than doing additional processing, such as computing the exponent needed to normalized a subnormal number, additional processing called for in some flavors of the 754 logb operation. Such additional functionality can actually slow down math libraries since libraries may not benefit from the additional filtering and may actually have to undo it.

The Math and StrictMath specifications of copySign have a small difference: the StrictMath version always treats NaNs as having a positive sign (a sign bit of zero) while the Math version does not impose this requirement. The IEEE standard does not ascribe a meaning to the sign bit of a NaN and difference processors have different conventions NaN representations and how they propagate. However, if the source argument is not a NaN, the two copySign methods will produce equivalent results. Therefore, even if being used in a library where the results need to be completely predictable, the faster Math version of copySign can be used as long as the source argument is known to be numerical.

The recommended functions can also be used to solve a little floating-point puzzle: generating the interesting limit values of a floating-point format just starting with constants for 0.0 and 1.0 in that format:

  • NaN is 0.0/0.0.

  • POSITIVE_INFINITY is 1.0/0.0.

  • MAX_VALUE is nextAfter(POSITIVE_INFINITY, 0.0).

  • MIN_VALUE is nextUp(0.0).

  • MIN_NORMAL is MIN_VALUE/(nextUp(1.0)-1.0).

Friday Feb 12, 2010

Finding a bug in FDLIBM pow

Writing up a piece of old work for some more Friday fun, an example of testing where the failures are likely to be led to my independent discovery of a bug in the FDLIBM pow function, one of only two bugs fixed in FDLIBM 5.3. Even back when this bug was fixed for Java some time ago (5033578), the FDLIBM library was well-established, widely used in the Java platform and elsewhere, and already thoroughly tested so I was quite proud my tests found a new problem. The next most recent change to the pow implementation was eleven years prior to the fix in 5.3.

The specification for Math.pow is involved, with over two dozen special cases listed. When setting out to write tests for this method, I re-expressed the specification in a tabular form to understand what was going on. After a few iterations reminiscent of tweaking a Karnaugh map, the table below was the result.

Special Cases for FDLIBM pow and {Math, StrictMath}.pow
xy y
x –∞ –∞ < y < 1 –1 –1 < y < 0 –0.0 +0.0 0 < y < 1 1 1 < y < +∞ +∞ NaN
–∞ +0.0 f2(y) 1.0 f1(y) +∞ NaN
–∞ < y < –1 +0.0 f3(x, y) f3(x, y) +∞
–1 NaN NaN
–1 < y < 0 +∞ +0.0
–0.0 +∞ f1(y) f2(y) +0.0
+0.0 +∞ +0.0
0 < y < 1 +∞     x   +0.0
1 NaN 1.0 NaN
1 < y < +∞ +0.0 x +∞
+∞ +0.0 +∞

f1(y) = isOddInt(y) ? –∞ : +∞;
f2(y) = isOddInt(y) ? –0.0 : +0.0;
f3(x, y) = isEvenInt(y) ? |x|y : (isOddInt(y) ? –|x|y : NaN);
Defined to be +1.0 in C99, see §F.9.4.4 of the C99 specification. Large magnitude finite floating-point numbers are all even integers (since the precision of a typical floating-point format is much less than its exponent range, a large number will be an integer times the base raised to a power). Therefore, by the reasoning of the C99 committee, pow(-1.0, ∞) was like pow(-1.0, Unknown large even integer) so the result was defined to be 1.0 instead of NaN.

The range of arguments in each row and column are partitioned into eleven categories, ten categories of finite values together with NaN (Not a Number). Some combination of x and y arguments are covered by multiple clauses of the specification. A few helper functions are defined to simplify the presentation. As noted in the table, a cross-platform wrinkle is that the C99 specification, which came out after Java was first released, defined certain special cases differently than in FDLIBM and Java's Math.pow.

A regression test based on this tabular representation of pow special cases is jdk/test/java/lang/Math/ The test makes sure each interesting combination in the table is probed at least once. Some combinations receive multiple probes. When an entry represents a range, the exact endpoints of the range are tested; in addition, other interesting interior points are tested too. For example, for the range 1 < x< +∞ the individual points tested are:

+1.0000000000000002, // nextAfter(+1.0, +oo)
-(double)Integer.MIN_VALUE - 1.0,
-(double)Integer.MIN_VALUE + 1.0,
 double)Integer.MAX_VALUE + 4.0,
 (double) ((1L<<53)-1L),
 (double) ((1L<<53)),
 (double) ((1L<<53)+2L),

Besides the endpoints, the interesting interior points include points worth checking because of transitions either in the IEEE 754 double format or a 2's complement integer format.

Inputs that used to fail under this testing include a range of severities, from the almost always numerical benign error of returning a wrongly signed zero, to returning a zero when the result should be finite nonzero result, to returning infinity for a finite result, to even returning a wrongly singed infinity!

Selected Failing Inputs

Failure for StrictMath.pow(double, double):
       For inputs -0.5                   (-0x1.0p-1) and 
                   9.007199254740991E15  (0x1.fffffffffffffp52)
       expected   -0.0                   (-0x0.0p0)
       got         0.0                   (0x0.0p0).

Failure for StrictMath.pow(double, double):
       For inputs -0.9999999999999999    (-0x1.fffffffffffffp-1) and 
                   9.007199254740991E15  (0x1.fffffffffffffp52)
       expected   -0.36787944117144233   (-0x1.78b56362cef38p-2)
       got        -0.0                   (-0x0.0p0).

Failure for StrictMath.pow(double, double):
       For inputs -1.0000000000000004    (-0x1.0000000000002p0) and 
                   9.007199254740994E15  (0x1.0000000000001p53)
       expected  54.598150033144236      (0x1.b4c902e273a58p5)
       got       0.0                     (0x0.0p0).

Failure for StrictMath.pow(double, double):
       For inputs -0.9999999999999998    (-0x1.ffffffffffffep-1) and 
                   9.007199254740992E15  (0x1.0p53)
       expected    0.13533528323661267   (0x1.152aaa3bf81cbp-3)
       got         0.0                   (0x0.0p0).

Failure for StrictMath.pow(double, double):
       For inputs -0.9999999999999998    (-0x1.ffffffffffffep-1) and 
                  -9.007199254740991E15  (-0x1.fffffffffffffp52)
       expected   -7.38905609893065      (-0x1.d8e64b8d4ddaep2)
       got        -Infinity              (-Infinity).

Failure for StrictMath.pow(double, double):
       For inputs -3.0                   (-0x1.8p1) and 
                   9.007199254740991E15  (0x1.fffffffffffffp52)
       expected   -Infinity              (-Infinity)
       got        Infinity               (Infinity).

The code changes to address the bug were fairly simple; corrections were made to extracting components of the floating-point inputs and sign information was propagated properly.

Even expertly written software can have errors and even long-used software can have unexpected problems. Estimating how often this bug in FDLIBM caused an issue is difficult, while the errors could be egregious, the needed inputs to elicit the problem were arguably unusual (even though perfectly valid mathematically). Thorough testing is key aspect of assuring the quality of numerical software, it is also helpful for end-users to be able to examine the output of their programs to help notice problems.




« October 2016

No bookmarks in folder