Joseph D. Darcy's Oracle Weblog
https://blogs.oracle.com/darcy/
Joseph D. Darcy's Oracle Weblogen-usCopyright 2015Wed, 4 Feb 2015 01:41:32 +0000Apache Roller BLOGS401ORA6 (20130904125427)https://blogs.oracle.com/darcy/entry/notions_of_floating_point_equalityNotions of Floating-Point Equalitydarcy
https://blogs.oracle.com/darcy/entry/notions_of_floating_point_equality
Fri, 26 Feb 2010 01:00:00 +0000Numericsfridayfunjdkjdk7numericsprojectcoin<p>
Moving on from
<a href="http://blogs.sun.com/darcy/entry/api_design_identity_and_equality" title="API Design: Identity and Equality">identity and equality of objects</a>, different notions of equality are also surprisingly subtle in some numerical realms.
</p>
<p>
As <a href="http://mail.openjdk.java.net/pipermail/nio-dev/2009-November/000792.html" title="Nov. 2009 nio-dev thread on DoubleBuffer.compareTo is not anti-symmetric">comes up from time to time</a> and is often surprising, the "<code>==</code>" operator defined by IEEE 754 and used by Java for comparing floating-point values
(<a href="http://java.sun.com/docs/books/jls/third_edition/html/expressions.html#15.21.1"
title="Numerical Equality Operators == and !=">JLSv3 §15.21.1</a>)
is <em>not</em> an <i><a href="http://en.wikipedia.org/wiki/Equivalence_relation">equivalence relation</a></i>. Equivalence relations satisfy three properties, reflexivity (something is equivalent to itself), symmetry (if <i>a</i> is equivalent to <i>b</i>, <i>b</i> is equivalent to <i>a</i>), and transitivity (if <i>a</i> is equivalent to <i>b</i> and <i>b</i> is equivalent to <i>c</i>, then <i>a</i> is equivalent to <i>c</i>).
</p>
<p>
The IEEE 754 standard defines four possible mutually exclusive
ordering relations between floating-point values:</p>
<ul>
<li><p>equal
</p>
<li><p>greater than
</p>
<li><p>less than
</p>
<li><p>unordered
</p>
</ul>
<p>
A NaN (Not a Number) is <em>unordered</em> with respective to every floating-point value,
including itself. This was done so that NaNs would not quietly slip by without due notice. Since (NaN == NaN) is false, the IEEE 754 "<code>==</code>" relation is <em>not</em> an equivalence relation since it is not reflexive.
</p>
<p>
An equivalence relation partitions a set into equivalence classes; each member of an equivalence classes is "the same" as the other members of the classes for the purposes of that equivalence relation. In terms of numerics, one would expect equivalent values to result in equivalent numerical results in all cases. Therefore, the size of the equivalence classes over floating-point values would be expected to be one; a number would only be equivalent to itself. However, in IEEE 754 there are two zeros, -0.0 and +0.0, and they compare as equal under <code>==</code>. For IEEE 754 addition and subtraction, the sign of a zero argument can at most affect the sign of a zero result. That is, if the sum or difference is not zero, a zero of either sign doesn't change the result. If the sum or differnece is zero and one of the arguments is zero, the other argument must be zero too:
</p>
<ul>
<li><p>-0.0 + -0.0 ⇒ -0.0
</p>
<li><p>-0.0 + +0.0 ⇒ +0.0
</p>
<li><p>+0.0 + -0.0 ⇒ +0.0
</p>
<li><p>+0.0 + +0.0 ⇒ +0.0
</p>
</ul>
<p>
Therefore, under addition and subtraction, both signed zeros are equivalent. However, they are <em>not</em> equivalent under division since 1.0/<b>-</b>0.0 ⇒
-∞ but 1.0/<b>+</b>0.0 ⇒ +∞ and -∞ and +∞ are <em>not</em> equivalent.<a href="#affine"><sup>1</sup></a>
</p>
<p>
Despite the rationales for the IEEE 754 specification to not define <code>==</code> as an equivalence relation, there are legitimate cases where one needs a true equivalence relation over floating-point values, such as when writing test programs, and cases where one needs a total ordering, such as when sorting. In my numerical tests I use a method
that returns <code>true</code> for two floating-point values <i>x</i> and <i>y</i> if:<br>
((<i>x</i> == <i>y</i>) &&<br>
(if <i>x</i> and <i>y</i> are both zero they have the same sign)) || <br>
(x and y are both NaN)<br>
Conveniently, this is just computed by using <code>(Double.compare(x, y) == 0)</code>. For sorting or a total order, the semantics of <a href="http://java.sun.com/javase/6/docs/api/java/lang/Double.html#compare(double,%20double)"><code>Double.compare</code></a> are fine; NaN is treated as being the largest floating-point values, greater than positive infinity, and -0.0 < +0.0. That ordering is the total order used by by <a href="http://java.sun.com/javase/6/docs/api/java/util/Arrays.html#sort(double[])"><code>java.util.Arrays.sort(double[])</code></a>. In terms of semantics, it doesn't really matter where the NaNs are ordered with respect to ther values to as long as they are consistently ordered that way.<sup><a href="#bitwise">2</a></sup>
</p>
<p>
These subtleties of floating-point comparison were also germane on the Project Coin mailing list last year; the <a href="http://mail.openjdk.java.net/pipermail/coin-dev/2009-March/000566.html">definition of floating-point equality was discussed in relation to adding support for relational operations based on a type implementing the <code>Comparable</code> interface</a>. That thread also broached the complexities involved in comparing <a href="http://java.sun.com/javase/6/docs/api/java/math/BigDecimal.html">BigDecimal</code></a> values.
</p>
<p>
The <code>BigDecimal</code> class has a natural ordering that is <em>inconsistent with equals</em>; that is for at least some inputs <code>bd1</code> and <code>bd2</code>, <br>
<code>c.compare(bd1, bd2)==0</code><br>
has a different boolean value than<br>
<code>bd1.equals(bd2)</code>.<sup><a href="#consistent">3</a></sup><br>
In <code>BigDecimal</code>, the same numerical value can have multiple representations, such as (100 × 10<sup>0</sup>) versus (10 × 10<sup>1</sup>) versus (1 × 10<sup>2</sup>). These are all "the same" numerically (<code>compareTo == 0</code>) but are <em>not</em> <code>equals</code> with each other. Such values are not equivalent under the operations supported by <code>BigDecimal</code>; for example (100 × 10<sup>0</sup>) has a <i><a href="http://java.sun.com/javase/6/docs/api/java/math/BigDecimal.html#scale()">scale</a></i> of 0 while (1 × 10<sup>2</sup>) has a scale of -2.<sup><a href="#cohort">4</a></sup>
</p>
<p>
While subtle, the different notions of numerical equality each serve a useful purpose and knowing which notion is appropriate for a given task is an important factor in writing correct programs.
</p>
<hr width="50%">
<blockquote>
<p>
<a name="affine"><sup>1</sup></a> There are two zeros in IEEE 754
because there are two infinities. Another way to extend the real numbers to include infinity is to have a single (unsigned) projective infinity. In
such a system, there is only one conceptual zero. Early x87 chips before IEEE 754
was standardized had support for both signed (affine) and projective
infinities. Each style of infinity is more convenient for some kinds of computations.
</p>
<p>
<sup><a name="bitwise">2</a></sup>
Besides the equivalence relation offered by <code>Double.compare(x, y)</code>, another equivalence relation can be induced by either of the bitwise conversion routines, <a href="http://java.sun.com/javase/6/docs/api/java/lang/Double.html#doubleToLongBits(double)">Double.doubleToLongBits</a> or <a href="http://java.sun.com/javase/6/docs/api/java/lang/Double.html#doubleToRawLongBits(double)">Double.doubleToRawLongBits</a>. The former collapses all bit patterns that encode a NaN value into a single canonical NaN bit pattern, while the latter can let through a platform-specific NaN value. Implementation freedoms allowed by the original IEEE 754 standard have allowed different processor families to define different conventions for NaN bit patterns.
</p>
<p>
<sup><a name="consistent">3</a></sup> I've at times considered whether it would be worthwhile to include an "<code>@NaturalOrderingInconsistentWithEquals</code>" annotation in the platform to flag the classes that have this quirk. Such an annotation could be used by various checkers to find potentially problematic uses of such classes in sets and maps.
</p>
<p>
<sup><a name="cohort">4</a></sup> Building on wording developed for the <code>BigDecimal</code> specification under <a href="http://jcp.org/en/jsr/detail?id=13">JSR 13</a>, when I was editor of the <a href="http://en.wikipedia.org/wiki/IEEE_754-2008">IEEE 754 revision</a>, I introduced several pieces of decimal-related terminology into the draft. Those terms include <i>preferred exponent</i>, analogous to the preferred scale from <code>BigDecimal</code>, and <i>cohort</i>, "The set of all floating-point representations that represent a given floating-point number in a
given floating-point format." Put in terms of <code>BigDecimal</code>, the members of a cohort would be all the <code>BigDecimal</code> numbers with the same numerical value, but distinct pairs of scale (negation of the exponent) and unscaled value.
</p>
</blockquote>https://blogs.oracle.com/darcy/entry/everything_older_is_newer_onceEverything Older is Newer Once Againdarcy
https://blogs.oracle.com/darcy/entry/everything_older_is_newer_once
Sat, 20 Feb 2010 21:41:48 +0000Numericsjdknumerics<p>
Catching up on writing about more numerical work from years past, the <a href="http://www.ibm.com/developerworks/java/library/j-math2.html"
title="Java's new math, Part 2: Floating-point numbers">second article</a> in a two-part series finished last year discusses some low-level floating-point manipulations methods I added to the platform over the course of JDKs 5 and 6.
Previously, I published a
<a href="http://blogs.sun.com/darcy/entry/everything_old_is_new_again"
title="Everything Old is New Again">blog entry reacting to</a> the
<a href="http://www.ibm.com/developerworks/java/library/j-math1/index.html"
title="Java's new math, Part 1: Real numbers">first part</a> of the series.
</p>
<p>
JDK 6 enjoyed several numerics-related library changes. Constants for <code>MIN_NORMAL</code>, <code>MIN_EXPONENT</code>, and <code>MAX_EXPONENT</code> were added to the <code>Float</code> and <code>Double</code> classes. I also added to the <code>Math</code> and <code>StrictMath</code> classes the following methods for low-level manipulation of floating-point values:
</p>
<ul>
<li><code><a href="http://java.sun.com/javase/6/docs/api/java/lang/Math.html#copySign(double,%20double)">
public static double copySign(double magnitude, double sign)</a></code>
<li><code><a href="http://java.sun.com/javase/6/docs/api/java/lang/Math.html#getExponent(double)">public static int getExponent(double d)</a></code>
<li><code><a href="http://java.sun.com/javase/6/docs/api/java/lang/Math.html#nextAfter(double,%20double)">public static double nextAfter(double start, double direction)</a></code>
<li><code><a href="http://java.sun.com/javase/6/docs/api/java/lang/Math.html#nextUp(double)">public static double nextUp(double d)</a></code>
<li><code><a href="http://java.sun.com/javase/6/docs/api/java/lang/Math.html#scalb(double,%20int)">public static double scalb(double d, int scaleFactor)</a></code>
</ul>
<p>
There are also overloaded methods for <code>float</code> arguments.
In terms of the <a href="http://en.wikipedia.org/wiki/IEEE_754-1985">IEEE 754 standard from 1985</a>, the methods above provide the core functionality of the <i>recommended functions</i>. In terms of the <a href="http://en.wikipedia.org/wiki/IEEE_754-2008">2008 revision to IEEE 754</a>, analogous functions are integrated throughout different sections of the document.
</p>
<p>
While a student at Berkeley, I wrote a
<a href="http://www.jddarcy.org/Research/ieeerecd.pdf">tech report</a> on algorithms I developed for an earlier implementation of these methods, an implementation written many years ago when I was a summer intern at Sun.
The <a href="http://hg.openjdk.java.net/jdk7/tl/jdk/file/84792500750c/src/share/classes/sun/misc/FpUtils.java" title="sun.misc.FpUtils as of Feb. 20, 2010">implementation of the recommended functions in the JDK<a> is a refinement of the earlier work, a refinement that simplified code, added <a href="http://hg.openjdk.java.net/jdk7/tl/jdk/file/84792500750c/test/java/lang/Math/IeeeRecommendedTests.java" title="IeeeRecommendedTests.java regression test as of Feb. 20, 2010">extensive</a> and
<a href="http://blogs.sun.com/darcy/entry/test_where_the_failures_are"
title="Test where the failures are likely to be">effective</a> unit tests, and sported better performance in some cases.
In part the simplifications came from <em>not</em> attempting to accommodate IEEE 754 features not natively supported in the Java platform, in particular rounding modes and sticky flags.
</p>
<p>
The primary purpose of these methods is to assist in in the development of math libraries in Java, such as the recent
<a href="http://hg.openjdk.java.net/jdk7/jdk7/jdk/rev/ad1e30930c6c">pure Java implementation of floor and ceil</a>
(<a href="http://bugs.sun.com/view_bug.do?bug_id=6908131" title="Pure Java implementations of StrictMath.floor(double) & StrictMath.ceil(double)">6908131</a>).
This expected use-case drove certain API differences with the functions sketched by IEEE 754. For example, the <code>getExponent</code> method simply returns the unbiased value stored in the exponent field of a floating-point value rather than doing additional processing, such as computing the exponent needed to normalized a subnormal number, additional processing called for in some flavors of the 754 <code>logb</code> operation. Such additional functionality can actually slow down math libraries since libraries may not benefit from the additional filtering and may actually have to undo it.
</p>
<p>
The <code>Math</code> and <code>StrictMath</code> specifications of <code>copySign</code> have a small difference: the
<a href="http://java.sun.com/javase/6/docs/api/java/lang/StrictMath.html#copySign(double,%20double)"
title="java.lang.StrictMath.copySign"><code>StrictMath</code> version</a> always treats NaNs as having a positive sign (a sign bit of zero) while the
<a href="http://java.sun.com/javase/6/docs/api/java/lang/Math.html#copySign(double,%20double)"
title="java.lang.Math.copySign"><code>Math</code> version</a> does not impose this requirement.
The IEEE standard does not ascribe a meaning to the sign bit of a NaN and difference processors have different conventions NaN representations and how they propagate. However, if the source argument is not a NaN, the two <code>copySign</code> methods will produce equivalent results.
Therefore, even if being used in a library where the results need to be completely predictable, the faster <code>Math</code> version of <code>copySign</code> can be used as long as the source argument is known to be numerical.
</p>
<p>
The recommended functions can also be used to solve a little floating-point puzzle: generating the interesting limit values of a floating-point format just starting with constants for <code>0.0</code> and <code>1.0</code> in that format:
</p>
<ul>
<li><p><code>NaN</code> is <code>0.0/0.0</code>.
</p>
<li><p><code>POSITIVE_INFINITY</code> is <code>1.0/0.0</code>.
</p>
<li><p><code>MAX_VALUE</code> is <code>nextAfter(POSITIVE_INFINITY, 0.0)</code>.
</p>
<li><p><code>MIN_VALUE</code> is <code>nextUp(0.0)</code>.
</p>
<li><p><code>MIN_NORMAL</code> is <code>MIN_VALUE/(nextUp(1.0)-1.0)</code>.
</p>
</ul>
https://blogs.oracle.com/darcy/entry/finding_a_bug_in_fdlibmFinding a bug in FDLIBM powdarcy
https://blogs.oracle.com/darcy/entry/finding_a_bug_in_fdlibm
Fri, 12 Feb 2010 09:25:00 +0000Numericsfridayfunjdknumerics<p>
Writing up a piece of old work for some more
<a href="http://blogs.sun.com/darcy/entry/regex_for_integral_strings" title="Recognizing all valid integral strings with regular expressions">Friday fun</a>, an example of
<a href="http://blogs.sun.com/darcy/entry/test_where_the_failures_are"
title="Test where the failures are likely to be">testing where the failures are likely to be</a> led to my independent discovery of a bug in the FDLIBM <code>pow</code> function, one of only two bugs fixed in
<a href="http://www.netlib.org/fdlibm/readme">FDLIBM 5.3</a>.
Even back when this bug was fixed for Java some time ago
(<a href="http://bugs.sun.com/view_bug.do?bug_id=5033578" title="Java should require use of latest fdlibm 5.3">5033578</a>),
the FDLIBM library was well-established, widely used in the Java platform and elsewhere, and already thoroughly tested so I was quite proud my tests found a new problem. The next most recent change to the <code>pow</code> implementation was eleven years prior to the fix in 5.3.
</p>
<p>
The <a href="http://java.sun.com/javase/6/docs/api/java/lang/Math.html#pow(double,%20double)">specification for <code>Math.pow</code></a> is involved, with over two dozen special cases listed. When setting out to write tests for this method, I re-expressed the specification in a tabular form to understand what was going on. After a few iterations reminiscent of tweaking a <a href="http://en.wikipedia.org/wiki/Karnaugh_map">Karnaugh map</a>, the table below was the result.
</p>
<table border>
<caption>Special Cases for FDLIBM <code>pow</code> and {<code>Math</code>, <code>StrictMath</code>}<code>.pow</code>
</caption>
<tr>
<th>
<i>x<sup>y</sup></i>
</th>
<th colspan=11>
<i>y</i>
</th>
</tr>
<tr>
<th><i>x</i></th>
<th>–∞</th>
<th>–∞ < <i>y</i> < 1</th>
<th>–1</th>
<th>–1 < <i>y</i> < 0</th>
<th>–0.0</th>
<th>+0.0</th>
<th>0 < <i>y</i> < 1</th>
<th>1</th>
<th>1 < <i>y</i> < +∞</th>
<th>+∞</th>
<th>NaN</th>
</tr>
<tr>
<th>–∞</th>
<td align=right>+0.0</td>
<td align=center colspan=3>f2(<i>y</i>)</td>
<td align=center rowspan=11 colspan=2>1.0</td>
<td align=center colspan=3>f1(<i>y</i>)</td>
<td align=right>+∞</td>
<td rowspan=10>NaN</td>
</tr>
<tr>
<th>–∞ < <i>y</i> < –1</th>
<td align=right>+0.0</td>
<td align=center colspan=3 rowspan=3>f3(x, y)</td>
<td align=center colspan=3 rowspan=3>f3(x, y)</td>
<td align=right>+∞</td>
</tr>
<tr>
<th>–1</th>
<td align=right>NaN<sup><a href="#c99_diff">†</a></sup></td>
<td align=right>NaN<sup><a href="#c99_diff">†</a></sup></td>
</tr>
<tr>
<th>–1 < <i>y</i> < 0</th>
<td align=right>+∞</td>
<td align=right>+0.0</td>
</tr>
<tr>
<th>–0.0</th>
<td align=right>+∞</td>
<td align=center colspan=3>f1(y)</td>
<td align=center colspan=3>f2(y)</td>
<td align=right>+0.0</td>
</tr>
<tr>
<th>+0.0</th>
<td align=center colspan=4>+∞</td>
<td align=center colspan=4>+0.0</td>
</tr>
<tr>
<th>0 < <i>y</i> < 1</th>
<td align=right>+∞</td>
<td align=right colspan=3 rowspan=3 bgcolor=LightGrey> </td>
<td align=right rowspan=3 bgcolor=LightGrey> </td>
<td align=right><i>x</i></td>
<td align=right rowspan=3 bgcolor=LightGrey> </td>
<td align=right>+0.0</td>
</tr>
<tr>
<th>1</th>
<td align=right>NaN<sup><a href="#c99_diff">†</a></sup></td>
<td align=right>1.0</td>
<td align=right>NaN<sup><a href="#c99_diff">†</a></sup></td>
</tr>
<tr>
<th>1 < <i>y</i> < +∞</th>
<td align=right>+0.0</td>
<td align=right><i>x</i></td>
<td align=right>+∞</td>
</tr>
<tr>
<th>+∞</th>
<td align=center colspan=4>+0.0</td>
<td align=center colspan=4>+∞</td>
</tr>
<tr>
<th>NaN</th>
<td align=center colspan=4>NaN</td>
<td align=center colspan=5>NaN</td>
</tr>
</table>
<blockquote>
<p>
f1(y) = isOddInt(y) ? –∞ : +∞;<br>
f2(y) = isOddInt(y) ? –0.0 : +0.0;<br>
f3(x, y) = isEvenInt(y) ? |<i>x</i>|<sup><i>y</i></sup> : (isOddInt(y) ? –|<i>x</i>|<sup><i>y</i></sup> : NaN);<br>
<a name="c99_diff"><sup>†</sup> Defined to be +1.0 in C99, see §F.9.4.4 of the C99 specification</a>.
Large magnitude finite floating-point numbers are all even integers (since the precision of a typical floating-point format is much less than its exponent range, a large number will be an integer times the base raised to a power). Therefore, by the reasoning of the C99 committee, <code>pow(-1.0, ∞)</code> was like <code>pow(-1.0, <i>Unknown large even integer</i>)</code> so the result was defined to be <code>1.0</code> instead of <code>NaN</code>.
</p>
</blockquote>
<p>
The range of arguments in each row and column are partitioned into eleven categories, ten categories of finite values together with NaN (Not a Number). Some combination of <i>x</i> and <i>y</i> arguments are covered by multiple clauses of the specification.
A few helper functions are defined to simplify the presentation. As noted in the table, a cross-platform wrinkle is that the C99 specification, which came out after Java was first released, defined certain special cases differently than in FDLIBM and Java's <code>Math.pow</code>.
</p>
<p>
A regression test based on this tabular representation of <code>pow</code> special cases is
<code><a href="http://hg.openjdk.java.net/jdk7/jdk7/jdk/file/9027c6b9d7e2/test/java/lang/Math/PowTests.java"
title="Current version as of February 12, 2010">jdk/test/java/lang/Math/PowTests.java</a></code>. The test makes sure each interesting combination in the table is probed at least once. Some combinations receive multiple probes.
When an entry represents a range, the exact endpoints of the range are tested; in addition, other interesting interior points are tested too. For example, for the range 1 < <i>x</i>< +∞ the individual points tested are:
</p>
<blockquote><pre>
+1.0000000000000002, // nextAfter(+1.0, +oo)
+1.0000000000000004,
+2.0,
+Math.E,
+3.0,
+Math.PI,
-(double)Integer.MIN_VALUE - 1.0,
-(double)Integer.MIN_VALUE,
-(double)Integer.MIN_VALUE + 1.0,
double)Integer.MAX_VALUE + 4.0,
(double) ((1L<<53)-1L),
(double) ((1L<<53)),
(double) ((1L<<53)+2L),
-(double)Long.MIN_VALUE,
Double.MAX_VALUE,
</pre></blockquote>
<p>
Besides the endpoints, the interesting interior points include points worth checking because of transitions either in the IEEE 754 <code>double</code> format or a 2's complement integer format.
</p>
<p>
Inputs that used to fail under this testing include a range of severities, from the almost always numerical benign error of returning a wrongly signed zero, to returning a zero when the result should be finite nonzero result, to returning infinity for a finite result, to even returning a wrongly singed infinity!
</p>
<blockquote>
<h3>Selected Failing Inputs</h3>
<pre>
Failure for StrictMath.pow(double, double):
For inputs -0.5 (-0x1.0p-1) and
9.007199254740991E15 (0x1.fffffffffffffp52)
expected -0.0 (-0x0.0p0)
got 0.0 (0x0.0p0).
Failure for StrictMath.pow(double, double):
For inputs -0.9999999999999999 (-0x1.fffffffffffffp-1) and
9.007199254740991E15 (0x1.fffffffffffffp52)
expected -0.36787944117144233 (-0x1.78b56362cef38p-2)
got -0.0 (-0x0.0p0).
Failure for StrictMath.pow(double, double):
For inputs -1.0000000000000004 (-0x1.0000000000002p0) and
9.007199254740994E15 (0x1.0000000000001p53)
expected 54.598150033144236 (0x1.b4c902e273a58p5)
got 0.0 (0x0.0p0).
Failure for StrictMath.pow(double, double):
For inputs -0.9999999999999998 (-0x1.ffffffffffffep-1) and
9.007199254740992E15 (0x1.0p53)
expected 0.13533528323661267 (0x1.152aaa3bf81cbp-3)
got 0.0 (0x0.0p0).
Failure for StrictMath.pow(double, double):
For inputs -0.9999999999999998 (-0x1.ffffffffffffep-1) and
-9.007199254740991E15 (-0x1.fffffffffffffp52)
expected -7.38905609893065 (-0x1.d8e64b8d4ddaep2)
got -Infinity (-Infinity).
Failure for StrictMath.pow(double, double):
For inputs -3.0 (-0x1.8p1) and
9.007199254740991E15 (0x1.fffffffffffffp52)
expected -Infinity (-Infinity)
got Infinity (Infinity).
</pre></blockquote>
<p>
The <a href="http://blogs.sun.com/darcy/resource/FdlibmPowPatch.txt">code changes</a> to address the bug were fairly simple; corrections were made to extracting components of the floating-point inputs and sign information was propagated properly.
</p>
<p>
Even expertly written software can have errors and even long-used software can have unexpected problems. Estimating how often this bug in FDLIBM caused an issue is difficult, while the errors could be egregious, the needed inputs to elicit the problem were arguably unusual (even though perfectly valid mathematically). Thorough testing is key aspect of assuring the quality of numerical software, it is also helpful for end-users to be able to <a href="http://www.cs.berkeley.edu/~wkahan/7Oct09.pdf">examine the output of their programs</a> to help notice problems.
</p>https://blogs.oracle.com/darcy/entry/hexadecimal_floating_point_literalsHexadecimal Floating-Point Literalsdarcy
https://blogs.oracle.com/darcy/entry/hexadecimal_floating_point_literals
Thu, 4 Dec 2008 00:00:02 +0000Numerics<p>
One of the more obscure language changes included back in JDK 5 was the addition of <i>hexadecimal floating-point literals</i> to the platform. As the name implies, hexadecimal floating-point literals allow literals of the <tt>float</tt> and <tt>double</tt> types to be written primarily in base 16 rather than base 10. The underlying primitive types use binary floating-point so a base 16 literal avoids various decimal ↔ binary rounding issues when there is a need to specify a floating-point value with a particular representation.
</p>
<p>
The conversion rule for decimal strings into binary floating-point values is that the binary floating-point value nearest the exact decimal value must be returned. When converting from binary to decimal, the rule is more subtle: the shortest string that allows recovery of the same binary value in the same format is to be used. While these rules are sensible, surprises are possible from the differing bases used for storage and display. For example, the numerical value 1/10 is <em>not</em> exactly representable in binary; it is a binary repeating fraction just as 1/3 is a repeating fraction in decimal. Consequently, the numerical values of <tt>0.1f</tt> and <tt>0.1d</tt> are <em>not</em> the same; the exact numeral value of the comparatively low precision <tt>float</tt> literal <tt>0.1f</tt> is <br>
0.100000001490116119384765625<br>
and the shortest string that will convert to this value as a <tt>double</tt> is <br>
0.10000000149011612.<br>
This in turn differs from the exact numerical value of the higher precision <tt>double</tt> literal <tt>0.1d</tt>,<br>
0.1000000000000000055511151231257827021181583404541015625. Therefore, based on decimal input, it is not always clear what particular binary numerical value will result.
</p>
<p>
Since floating-point arithmetic is almost always approximate, dealing with some rounding error on input and output is usually benign. However, in some cases it is important to exactly specify a particular floating-point value. For example, the Java libraries include constants for the
<a href="http://java.sun.com/javase/6/docs/api/java/lang/Double.html#MAX_VALUE">
largest finite</a>
<tt>double</tt> value, numerically equal to (2-2<sup>-52</sup>)·2<sup>1023</sup>, and the
<a href="http://java.sun.com/javase/6/docs/api/java/lang/Double.html#MIN_VALUE">
smallest nonzero value</a>, numerically equal to 2<sup>-1074</sup>. In such cases there is only one right answer and these particular limits are derived from the binary representation details of the corresponding IEEE 754 <tt>double</tt> format. Just based on those binary limits, it is not immediately obvious how to construct a minimal length decimal string literal that will convert to the desired values.
</p>
<p>
Another way to create floating-point values is to use a bitwise conversion method, such as
<tt><a href="http://java.sun.com/javase/6/docs/api/java/lang/Double.html#doubleToLongBits(double)">doubleToLongBits</a></tt>
and
<tt><a href="http://java.sun.com/javase/6/docs/api/java/lang/Double.html#longBitsToDouble(double)">longBitsToDouble</a></tt>.
However, even for numerical experts this interface is inhumane since all the gory bit-level encoding details of IEEE 754 are exposed and values created in this fashion are not regarded as
<a href="http://java.sun.com/docs/books/jls/third_edition/html/expressions.html#15.28">constants</a>.
Therefore, for some use cases it helpful to have a textual representation of floating-point values that is simultaneously human readable, clearly unambiguous, and tied to the binary representation in the floating-point format. Hexadecimal floating-point literals are intended to have these three properties, even if the readability is only in comparison to the alternatives!
</p>
<p>
Hexadecimal floating-point literals originated in C99 and were later included in the recent <a href="http://en.wikipedia.org/wiki/IEEE_754-2008">revision of the IEEE 754 floating-point standard</a>.
The grammar for these literals in Java is given in
<a href="http://java.sun.com/docs/books/jls/third_edition/html/lexical.html#3.10.2">JLSv3 §3.10.2</a>:</p>
<blockquote>
<dl>
<dt><p><i>HexFloatingPointLiteral</i>:</p>
<dd> <p><i>HexSignificand BinaryExponent FloatTypeSuffix<sub>opt</sub></i></p>
</dl>
</blockquote>
<p>
This readily maps to the sign, significand, and
exponent fields defining a finite floating-point value; <i>sign</i><b><tt>0x</tt></b><i>significand</i><b><tt>p</tt></b><i>exponent</i>.
This syntax allows the literal </p>
<blockquote>
<p><tt>0x1.8p1</tt></p>
</blockquote>
<p>
to be to used represent the value 3; 1.8<sub>hex</sub> × 2<sup>1</sup> = 1.5<sub>decimal</sub> × 2 = 3.
More usefully, the maximum value of <br>
(2-2<sup>-52</sup>)·2<sup>1023</sup> can be written as<br>
<tt>0x1.fffffffffffffp1023</tt><br>
and the minimum value of <br>
2<sup>-1074</sup> can be written as<br>
<tt>0x1.0P-1074</tt> or <tt>0x0.0000000000001P-1022</tt>, which are clearly mappable to the various fields of the floating-point representation while being much more scrutable than a raw bit encoding.
</p>
<p>
Retroactively reviewing the possible <a href="http://blogs.sun.com/darcy/entry/so_you_want_to_change">steps</a> needed to add hexadecimal floating-point literals to the language:</p>
<ol>
<li><p><b>Update the Java Language Specification</b>: As a purely syntactic changes, only a single section of the JLS had to updated to accommodate hexadecimal floating-point literals.
</p>
<li><p><b>Implement the language change in a compiler</b>: Just the lexer in <tt>javac</tt> had to be modified to recognize the new syntax; <tt>javac</tt> used new platform library methods to do the actual numeric conversion.
</p>
<li><p><b>Add any essential library support</b>: While not strictly necessary, the usefulness of the literal syntax is increased by also recognizing the syntax in
<tt><a href="http://java.sun.com/javase/6/docs/api/java/lang/Double.html#parseDouble(java.lang.String)">Double.parseDouble</a></tt> and similar methods and outputting the syntax with <tt><a href="http://java.sun.com/javase/6/docs/api/java/lang/Double.html#toHexString(double)">Double.toHexString</a></tt>; analogous support was added in corresponding <tt>Float</tt> methods. In addition the new-in-JDK 5 Formatter "<tt>printf</tt>" facility included the <a href="http://java.sun.com/javase/6/docs/api/java/util/Formatter.html#dndec"><tt>%a</tt> format</a> for hexadecimal floating-point.
</p>
<li><p><b>Write tests</b>: Regression tests (under <tt>test/java/lang/Double</tt> in the JDK workspace/repository) were included as part of the library support
(<a href="http://bugs.sun.com/view_bug.do?bug_id=4826774"
title="Add library support for hexadecimal floating-point strings">4826774</a>).
</p>
<li><p><b>Update the Java Virtual Machine Specification</b>: No JVMS changes were needed for this feature.
</p>
<li><p><b>Update the JVM and other tools that consume classfiles</b>: As a Java source language change, classfile-consuming tools were not affected.
</p>
<li><p><b>Update the Java Native Interface (JNI)</b>: Likewise, new literal syntax was orthogonal to calling native methods.
</p>
<li><p><b>Update the reflective APIs</b>: Some of the reflective APIs in the platform came after hexadecimal floating-point literals were added; however, only an API modeling the syntax of the language, such as the <a href="http://java.sun.com/javase/6/docs/technotes/guides/javac/index.html">tree API</a> might need to be updated for this kind of change.
</p>
<li><p><b>Update serialization support</b>: New literal syntax has no impact on serialization.
</p>
<li><p><b>Update the javadoc output</b>: One possible change to javadoc output would have been supplementing the existing entries for floating-point fields in the <a href="http://java.sun.com/javase/6/docs/api/constant-values.html">constant fields values page</a> with hexadecimal output; however, that change was not done.
</p>
</ol>
<p>
In terms of language changes, adding hexadecimal floating-point literals is about as simple as a language change can be, only straightforward and localized changes were need to the JLS and compiler and the library support was clearly separated. Hexadecimal floating-point literals aren't applicable to that many programs, but when they can be used, they have extremely high utility in allowing the source code to clearly reflect the precise numerical intentions of the author.
</p>https://blogs.oracle.com/darcy/entry/everything_old_is_new_againEverything Old is New Againdarcy
https://blogs.oracle.com/darcy/entry/everything_old_is_new_again
Wed, 29 Oct 2008 13:54:42 +0000Numerics<p>
I was heartened to recently come across the article
<i><a href="http://www.ibm.com/developerworks/java/library/j-math1/index.html?ca=drs-">
Java's new math, Part 1: Real numbers
</a></i>
which detailed some of the additions I made to Java's math libraries over the years in JDK 5 and 6, including
<a href="http://bugs.sun.com/view_bug.do?bug_id=4851625"
title="Add hyperbolic transcendental functions (sinh, cosh, tanh) to Java math library">hyperbolic trigonometric functions</a> (sinh, cosh, tanh),
<a href="http://bugs.sun.com/view_bug.do?bug_id=4347132"
title="Want Math.cbrt() function for cube root">cube root</a>,
and
<a href="http://bugs.sun.com/view_bug.do?bug_id=4074599"
title="Math package: implement log10 (base 10 logarithm)">base-10 log</a>.
</p>
<p>
A few comments on the article itself, I would describe <tt>java.lang.StrictMath</tt> as <tt>java.lang.Math</tt>'s fussy twin rather than evil twin. The availability of the <tt>StrictMath</tt> class allows developers who need cross-platform reproducible results from the math library to get them. Just because floating-point arithmetic is an approximation to real arithmetic doesn't mean it shouldn't be predictable! There are non-contrived circumstances where numerical programs are helped by having such strong reproducibility available. For example, to avoid unwanted communication overhead, certain parallel decomposition algorithms rely on different nodes being able to independently compute consistent numerical answers.
</p>
<p>
While the <tt>java.lang.Math</tt> class is not constrained to use the particular FDLIBM algorithms required by <tt>StrictMath</tt>, any valid <tt>Math</tt> class implementation still must meet that stated quality of implementation criteria for the methods. The criteria usually include a low worst-case relative error, as measures in
<a href="http://java.sun.com/javase/6/docs/api/java/lang/Math.html#ulp(double)">ulps</a>
(units in the last place), and <i>semi-monotonicity</i>, whenever the mathematical function is non-decreasing, so is the floating-point approximation, likewise, whenever the mathematical function is non-increasing, so is the floating-point approximation
</p>
<p>
Simply adding more FDLIBM methods to the platform was quite easy to do; much of the effort for the math library additions went toward developing new tests, both to verify that the general quality of implementation criteria were being met as well as that verifying the particular algorithms were being used to implement the <tt>StrictMath</tt> methods. I'll discuss the techniques I used to develop those tests in a future blog entry.
</p>https://blogs.oracle.com/darcy/entry/norms_how_to_measure_sizeNorms: How to Measure Sizedarcy
https://blogs.oracle.com/darcy/entry/norms_how_to_measure_size
Thu, 1 Mar 2007 15:20:32 +0000Numerics<p>
At times it is useful to summarize a set of values, say a vector of real numbers, as a single number representing the set's size.
For example, distilling benchmark subcomponent scores into an overall score. One way to do this is to use a <i><a href="http://en.wikipedia.org/wiki/Vector_norm" title="Wikipedia on vector norms">norm</a></i>.
Mathematically, a norm maps from a vector <i>V</i> of a given number of elements to a real number length such that the following properties hold:
</p>
<ul>
<li> norm(<i>V</i>) ≥ 0 for all <i>V</i> and norm(<i>V</i>) = 0 if and only if <i>V</i> = 0 (positive definiteness)
<li> norm(<i>c</i> · <i>V</i>) = abs(<i>c</i>) · norm(V) for real constant <i>c</i> (homogeneity)
<li> norm(<i>U</i> + <i>V</i>) ≤ norm(<i>U</i>) + norm(<i>V</i>) (the triangle inequality)
</ul>
<p>
There are a few commonly used norms:
</p>
<ul>
<li> 1-norm: sum of the absolute values (Manhattan length)
<li> 2-norm: square root of the sum of the squares (Euclidean length)
<li> ∞-norm: largest absolute value
</ul>
<p>
The first two norms are instances of <i>p-norms</i>. A <i>p</i>-norm adds up the result of raising the absolute value of each vector component to the <i>p</i><sup>th</sup> power (squaring, or cubing, etc.) and then takes the <i>p</i><sup>th</sup> root of the sum. The ∞-norm is the limit as <i>p</i> goes to infinity.
</p>
<p>
Given multiple possible norms, which one should be used? The 2-norm is often easier to work with since it is a differentiable function of the vector components, unlike the 1-norm and ∞-norm. On the other hand, the ∞-norm captures the worst-case behavior. Sometimes one norm is easier to compute than the others.
Another norm might <a href="http://www.cs.berkeley.edu/~wkahan/MxMulEps.pdf" title="Kahan on why Matlab's Loss is Nobody's Gain">make an error analysis more tractable</a>.
For vectors, in some sense it doesn't matter which norm is used because any two norms, norm<sub>a</sub> and norm<sub>b</sub>, are equivalent in the following sense, there are constants <i>c</i><sub>1</sub> and <i>c</i><sub>2</sub> such that<br>
</p>
<blockquote>
<i>c</i><sub>1</sub> · norm<sub>a</sub>(V) ≤ norm<sub>b</sub>(V) ≤ <i>c</i><sub>2</sub> · norm<sub>a</sub>(<i>V</i>) <br>
</blockquote>
<p>
This means that if one norm is tending toward zero, all other norms are tending toward zero too. For example, commonly in numerical linear algebra there is an iterative process that terminates once the norm of the error is small enough. Concretely, for vectors of size <i>n</i>, the common norms are related as follows:
</p>
<blockquote>
norm<sub>2</sub>(<i>V</i>) ≤ norm<sub>1</sub>(<i>V</i>) ≤ sqrt(n) · norm<sub>2</sub>(<i>V</i>)<br>
norm<sub>∞</sub>(<i>V</i>) ≤ norm<sub>2</sub>(<i>V</i>) ≤ sqrt(n) · norm<sub>∞</sub>(<i>V</i>)<br>
norm<sub>∞</sub>(<i>V</i>) ≤ norm<sub>1</sub>(<i>V</i>) ≤ n · norm<sub>∞</sub>(<i>V</i>)<br>
</blockquote>
<p>
So to guarantee that the 1-norm is less than epsilon, it is enough to show that 2-norm is less than epsilon/sqrt(n).
</p>
<p>
However, in other ways the different norms are <em>not</em> equivalent; the norms can give different answers on the relative size of different vectors. Consider the three vectors <i>A</i>, <i>B</i>, and <i>C</i>:
</p>
<blockquote>
<i>A</i> = [5, 0, 0]<br>
<i>B</i> = [1, 3, 4]<br>
<i>C</i> = [8/3, 8/3, 3]<br><br>
<table border>
<tr>
<th>Vector</th>
<th>1-norm</th>
<th>2-norm</th>
<th>∞-norm</th>
</tr>
<tr>
<th><i>A</i></th> <td>5</td> <td>5</td> <td><b>5</b></td>
</tr>
<tr>
<th><i>B</i></th> <td>8</td> <td><b>≈5.1</b></td> <td>4</td>
</tr>
<tr>
<th><i>C</i></th> <td><b>≈8.3</b></td> <td>≈4.8</td> <td>3</td>
</tr>
<tr>
<th>Biggest Vector</th> <td>C</td> <td>B</td> <td>A</td>
</tr>
</table>
</blockquote>
<p>
Each vector is considered the largest under one of the norms.
</p>
<p>
I've found the notion of norms to be useful in many different contexts. The performance differences between quicksort and mergesort can be described as quicksort having a better 1-norm but mergesort having a better ∞-norm. Buying more insurance coverage raises the 1-norm of your costs, but lowers your ∞-norm. A more conservative evaluation tends to focus on the worst-case outcome and thus favors something like the ∞-norm. For example, in the
<a href="http://java.sun.com/javase/6/docs/api/java/lang/Math.html" title="Javadoc for java.lang.Math">math library</a>
the relative size of the error at any location must be less than the stated number of <a href="http://java.sun.com/javase/6/docs/api/java/lang/Math.html#ulp(double)">ulp</a>s
(units in the last place). It is not good enough to have a low average error, but a few locations, or even one location, with an very inaccurate result. During software development, risk assessments evolve with the release life cycle. A change that is welcome early in the release may be rejected as too risky a few weeks before shipping; one way to view this phenomena is that a larger value of <i>p</i> is being used to compute risk assessments later in the release.
</p>
<b>References</b><br>
<i><a href="http://www.ec-securehost.com/SIAM/ot56.html">Applied Numerical Linear Algebra</a></i>,
James W. Demmel<br>
<i><a href="http://portal.acm.org/citation.cfm?id=248979">Matrix Computations</a></i>,
Gene H. Golub and Charles F Van Loan<br>
<i><a href="http://www.ec-securehost.com/SIAM/ot50.html">Numerical Linear Algebra</a></i>,
Lloyd N. Trefethen and David Bau, III<br>
https://blogs.oracle.com/darcy/entry/what_every_computer_programmer_reduxWhat Every Computer Programmer Should Know About Floating-Point Arithmetic, Reduxdarcy
https://blogs.oracle.com/darcy/entry/what_every_computer_programmer_redux
Wed, 4 Oct 2006 00:00:01 +0000Numerics<p>
Next week on Wednesday, October 11, at the Silicon Valley <a href="http://www.accu-usa.org/">ACCU</a> meeting in San Jose, I'll be giving a version of my talk on <i>What Every Computer Programmer Should Know About Floating-Point Arithmetic</i>, previously seen at <a href="http://blogs.sun.com/darcy/entry/what_every_computer_programmer_should">Stanford</a> and <a href="http://blogs.sun.com/darcy/resource/J1_2003-TS-2281.pdf">JavaOne</a>. The meeting is open to the public and free of charge, so if you've ever wondered why adding up ten copies of <code>0.1d</code> doesn't equal <code>1.0</code> or doubted the need for a floating-point value that is <a href="http://java.sun.com/j2se/1.5.0/docs/api/java/lang/Double.html#NaN">not a number</a>, come on by.
</p>
<p>
After the talk, I'll post a copy of the slides.
</p>
<p>
<b>Update:</b> <a href=" http://blogs.sun.com/darcy/resource/Wecpskafpa-ACCU.pdf">The slides.</a>
</p>https://blogs.oracle.com/darcy/entry/ieee_754r_ballotIEEE 754R Ballotdarcy
https://blogs.oracle.com/darcy/entry/ieee_754r_ballot
Tue, 3 Oct 2006 15:11:45 +0000Numerics<p>
For a number of years, the venerable <a href="http://shop.ieee.org/ieeestore/Product.aspx?product_no=SS10116">IEEE 754</a> standard for binary floating-point arithmetic has been undergoing <a href="http://grouper.ieee.org/groups/754/">revision</a> and the committee's <a href="http://math.berkeley.edu/~scanon/754/">results</a> will soon be up for ballot. Back in 2003, I was editor of the draft for a few months and helped incorporate the decimal material.
</p>
<p>
The balloting process provides the opportunity for interested parties, such as consumers of the standard, to weigh in with comments; instructions for joining the ballot <a href="http://754r.ucbtest.org/balloting.txt">are available</a>. The deadline for signing up has been extended to October 21, 2006.
</p>
<p>
Major changes from 754 include:
<ul>
<li> Support for decimal formats and arithmetic
<li> Fused multiply add operation
<li> More explicit conceptual model of levels of specification
<li> Hexadecimal strings for binary floating-point values
<li> Annexes giving recommendations on expression evaluation, alternate exception handling, and transcendental functions
</ul>
</p>https://blogs.oracle.com/darcy/entry/what_every_computer_programmer_shouldWhat Every Computer Programmer Should Know About Floating-Point Arithmeticdarcy
https://blogs.oracle.com/darcy/entry/what_every_computer_programmer_should
Fri, 23 Jun 2006 15:07:23 +0000NumericsI'm a part-time master's student in Stanford's <a href="http://icme.stanford.edu/">ICME</a> program and at the
<a href="http://icme.stanford.edu/Events/seminar.html">departmental seminar</a>
I recently gave a talk,
<a href="http://blogs.sun.com/roller/resources/darcy/Wecpskafpa-StanfordIcme500.pdf"><i>What Every Computer Programmer Should Know About Floating-Point Arithmetic</i></a>.
This is a refinement and update of
<a href="http://blogs.sun.com/roller/resources/darcy/JavaOneArchive.html">JavaOne talks</a> I've given with a similar title.