An amazing floating point misoptimization

My thanks to David Hough for bringing this gem from Microsoft to my attention. For those not interested in slogging through the entire page; here are my favorite bits; verbatim, although elided and colorized by me. Red for their most amazing decisions, and blue for my commentary.

I pray that no application whose results matter to anyone use this flag!

This is from the Visual C++ 2005 compiler, and the flag in question is:"fp:fast" and specifically regarding sqrt (but I think it's a more generic problem in their thinking)

Under fp:fast, the compiler will typically attempt to maintain at
least the precision specified by the source code. However, in some
instances the compiler may choose to perform intermediate expressions
at a lower precision than specified in the source code. For example,
the first code block below calls a double precision version of the
square-root function. Under fp:fast, the compiler may choose to
replace the call to the double precision sqrt with a call to a single
precision sqrt function. This has the effect of introducing additional
lower-precision rounding at the point of the function call.

Original function
double sqrt(double)...
. . .
double a, b, c;
. . .
double length = sqrt(a\*a + b\*b + c\*c);

Optimized function
float sqrtf(float)...
. . .
double a, b, c;
. . .
double tmp[0] = a\*a + b\*b + c\*c;
float tmp[1] = tmp[0]; // round of parameter value
float tmp[2] = sqrtf(tmp[1]); // rounded sqrt result
double length = (double) tmp[2];

Although less accurate, this optimization may be especially beneficial
when targeting processors that provide single precision, intrinsic
versions of functions such as sqrt. Just precisely when the compiler
will use such optimizations is both platform and context dependant.

Furthermore, there is no guaranteed consistency for the precision of
intermediate computations, which may be performed at any precision
level available to the compiler. Although the compiler will attempt to
maintain at least the level of precision as specified by the code,
fp:fast allows the optimizer to downcast intermediate computations in
order to produce faster or smaller machine code. For instance, the
compiler may further optimize the code from above to round some of the
intermediate multiplications to single precision.
float sqrtf(float)...
. . .
double a, b, c;
. . .
float tmp[0] = a\*a; // round intermediate a\*a to single-precision
float tmp[1] = b\*b; // round intermediate b\*b to single-precision
double tmp[2] = c\*c; // do NOT round intermediate c\*c to single-precision
float tmp[3] = tmp[0] + tmp[1] + tmp[2];
float tmp[4] = sqrtf(tmp[3]);
double length = (double) tmp[4];

This kind of additional rounding may result from using a lower
precision floating-point unit, such as SSE2, to perform some of the
intermediate computations. The accuracy of fp:fast rounding is
therefore platform dependant; code that compiles well for one
processor may not necessarily work well for another processor. It's
left to the user to determine if the speed benefits outweigh any
accuracy problems. khb: unfortunately this would require the user to read the disassembled code, do a rigourous numerical analysis, and to redo it everytime the code is modified or the compiler updated (and then recompiled). This is, of course, totally impractical.

If fp:fast optimization is particularly problematic for a specific
function, the floating-point mode can be locally switched to
fp:precise using the float_control compiler pragma. khb: this is, of course, backwards. If you are going to define a basically insanely liberal fp optimization, it ought to enabled for the smallest bit of code practical (preferably with scoping, so it can't accidentally impact the whole compilation unit).


Why bother with floats at all? Just use an integer if you are going to use fp:fast - it will be just as accurate.

Posted by PatrickG on November 17, 2004 at 09:54 AM PST #

You hae a point; but then why not just simplify the entire program down to printing "42" as the result...

Posted by Keith Bierman on November 17, 2004 at 10:07 AM PST #

That's one of the reasons why Java is relatively slow for FP/Math operations. Java does everything in IEEE764 format.

Posted by Azeem Jiva on November 17, 2004 at 01:09 PM PST #

I'm pretty sure you mean IEEE 754. But beyond that, I think you are also a bit confused, the Microsoft misoptimization described above uses nothing but IEEE formatted values. The issue is semantic and not format per se.

Posted by Keith Bierman on November 17, 2004 at 09:46 PM PST #

Post a Comment:
Comments are closed for this entry.



« February 2016