### Transcendental Meditation

#### By jag on Jul 27, 2005

With one exception: code that invokes sin() and cos() heavily is somewhat slower. They asked me why this was happening. I had a pretty good idea, but I checked with Joe Darcy, our local Floating Point God, and he had this to say:

*For many years, the JDK on x86 platforms has used the hardware
fsin/fcos x87 instructions in the range [-*pi*/4, *pi*/4], a
range which encompasses about half of all representable floating-point
values. Therefore, in that range the performance of the JDK's
transcendental functions should be nearly the same as the performance of
the transcendental functions in C, C++, etc. that are using those same
fsin/fcos instructions. Benchmarks which focus on testing the trig
performance of large values, such as almabench, present a skewed
portrait of Java's trigonometric performance. The next question is why
don't we just use fsin/fcos over the entire floating-point range? The
simple answer is that fsin/fcos can deliver answers that are arbitrarily
wrong for the most straightforward way of measuring error in the result. *

*Every finite real number, no matter how large, has a well-defined
value for sin/cos. Ideally, the floating-point result returned for
sin/cos would be the representable floating-point number closest to the
mathematically defined result for the floating-point input. A
floating-point library having this property is called correctly rounded,
which is equivalent to saying the library has an error bound less than
or equal to 1/2 an ulp (unit in the last place). For sin/cos, writing a
correctly rounding implementation that runs at a reasonable speed is
still something of a research problem so in practice platforms often use
a library with a 1 ulp error bound instead, which means either of the
floating-point numbers adjacent to the true result can be returned. This
is the implementation criteria the Java Math library has to meet. The
implementation challenge is that sin/cos are implemented using argument
reduction whereby any input is mapped into a corresponding input in the
[-*pi*/4, *pi*/4] range. Since the period of sin/cos is *pi*
and *pi* is transcendental, this amounts to having to compute a
remainder from the division by a transcendental number, which is
non-obvious. A few years after the x87 was designed, people figured out
how to do this division as if by an exact value of *pi*. Instead
the x87 fsin/fcos use a particular approximation to *pi*, which
effectively means the period of the function is changed, which can lead
to large errors outside [-*pi*/4, *pi*/4]. For example the
value of sine for the floating-point number Math.PI is around *

*1.2246467991473532E-16 *

*while the computed value from fsin is *

*1.2246063538223773E-16 *

*In other words, instead of getting the full 15-17 digit accuracy of
double, the returned result is only correct to about 5 decimal digits.
In terms of ulps, the error is about 1.64e11 ulps, over \*ten billion\*
ulps. With some effort, I'm confident I could find results with the
wrong sign, etc. There is a rationale which can justify this behavior;
however, it was much more compelling before the argument reduction
problem was solved. *

This error has tragically become un-fixable because of the compatibility
requirements from one generation to the next. The fix for this problem
was figured out quite a long time ago. In the excellent paper *The
K5 transcendental functions* by T. Lynch, A. Ahmed, M. Schulte,
T. Callaway, and R. Tisdale a technique is described for doing argument
reduction as if you had an infinitely precise value for pi. As far as I
know, the K5 is the only x86 family CPU that did sin/cos accurately. AMD
went back to being bit-for-bit compatibile with the old x87 behavior,
assumably because too many applications broke. Oddly enough, this is
fixed in Itanium.

What we do in the JVM on x86 is moderately obvious: we range check the
argument, and if it's outside the range [-*pi*/4, *pi*/4]we do
the precise range reduction by hand, and then call fsin.

So Java is accurate, but slower. I've never been a fan of "fast, but wrong" when "wrong" is roughly random(). Benchmarks rarely test accuracy. "double sin(double theta) { return 0; }" would be a great benchmark-compatible implementation of sin(). For large values of theta, 0 would be arguably more accurate since the absolute error is never greater than 1. fsin/fcos can have absolute errors as large as 2 (correct answer=1; returned result=-1).

This is one of those area where no matter what we do, we're screwed.

0.479425538604203</td><td>511

5.095750210681871E-18</td><td>3145

5.095750210681871E-18</td></tr> <tr><td> 1.2</td><td>1423

0.9320390859672263</td><td>2003

0.0</td><td>3726

0.0</td></tr> <tr><td> 3.5</td><td>1573

-0.35078322768961984</td><td>1493

0.0</td><td>3024

0.0</td></tr> <tr><td> 10000.5</td><td>1761

-0.7246894586135745</td><td>1882

0.0</td><td>3685

0.0</td></tr> <tr><td> 1.000000005E8</td><td>14370

0.6433740719829194</td><td>13630

0.0</td><td>3025

9.85878045867139E-14</td></tr> </table> Time here is the number of milliseconds per 10 million calls; the double values below times are the strictfp value of sin(), and in other columns it is the diff between this value and strictfp value.

Posted by

Vlad Patryshevon July 27, 2005 at 03:40 PM PDT #'gonzo' is a Powerbook (10.4.2) . 'gentoo' is a P3 box (linux):

Posted by

Jim Thompsonon July 27, 2005 at 04:03 PM PDT #Posted by

Ryanon July 28, 2005 at 12:08 AM PDT #Posted by

Jess Sightleron July 28, 2005 at 03:31 AM PDT #Posted by

Dick Wallon July 28, 2005 at 03:37 AM PDT #Posted by

Tobyon July 28, 2005 at 04:51 AM PDT #Posted by

KW Behon July 28, 2005 at 10:33 AM PDT #Posted by

Dick Wallon July 28, 2005 at 11:39 AM PDT #Posted by

amon August 02, 2005 at 07:46 AM PDT #Posted by

Belai Beshahon August 03, 2005 at 02:03 AM PDT #