Everything Older is Newer Once Again
By Darcy-Oracle on Feb 20, 2010
Catching up on writing about more numerical work from years past, the second article in a two-part series finished last year discusses some low-level floating-point manipulations methods I added to the platform over the course of JDKs 5 and 6. Previously, I published a blog entry reacting to the first part of the series.
JDK 6 enjoyed several numerics-related library changes. Constants for
MAX_EXPONENT were added to the
Double classes. I also added to the
StrictMath classes the following methods for low-level manipulation of floating-point values:
public static double copySign(double magnitude, double sign)
public static int getExponent(double d)
public static double nextAfter(double start, double direction)
public static double nextUp(double d)
public static double scalb(double d, int scaleFactor)
There are also overloaded methods for
In terms of the IEEE 754 standard from 1985, the methods above provide the core functionality of the recommended functions. In terms of the 2008 revision to IEEE 754, analogous functions are integrated throughout different sections of the document.
While a student at Berkeley, I wrote a tech report on algorithms I developed for an earlier implementation of these methods, an implementation written many years ago when I was a summer intern at Sun. The implementation of the recommended functions in the JDK is a refinement of the earlier work, a refinement that simplified code, added extensive and effective unit tests, and sported better performance in some cases. In part the simplifications came from not attempting to accommodate IEEE 754 features not natively supported in the Java platform, in particular rounding modes and sticky flags.
The primary purpose of these methods is to assist in in the development of math libraries in Java, such as the recent
pure Java implementation of floor and ceil
This expected use-case drove certain API differences with the functions sketched by IEEE 754. For example, the
getExponent method simply returns the unbiased value stored in the exponent field of a floating-point value rather than doing additional processing, such as computing the exponent needed to normalized a subnormal number, additional processing called for in some flavors of the 754
logb operation. Such additional functionality can actually slow down math libraries since libraries may not benefit from the additional filtering and may actually have to undo it.
StrictMath specifications of
copySign have a small difference: the
StrictMath version always treats NaNs as having a positive sign (a sign bit of zero) while the
Math version does not impose this requirement.
The IEEE standard does not ascribe a meaning to the sign bit of a NaN and difference processors have different conventions NaN representations and how they propagate. However, if the source argument is not a NaN, the two
copySign methods will produce equivalent results.
Therefore, even if being used in a library where the results need to be completely predictable, the faster
Math version of
copySign can be used as long as the source argument is known to be numerical.
The recommended functions can also be used to solve a little floating-point puzzle: generating the interesting limit values of a floating-point format just starting with constants for
1.0 in that format: