### An amazing floating point misoptimization

My thanks to David Hough for bringing this gem from Microsoft to my attention. For those not interested in slogging through the entire page; here are my favorite bits; verbatim, although elided and colorized by me. Red for their most amazing decisions, and blue for my commentary.

I pray that no application whose results matter to anyone use this flag!

`This is from the Visual C++ 2005 compiler, and the flag in question is:"fp:fast" and specifically regarding sqrt (but I think it's a more generic problem in their thinking)   Under fp:fast, the compiler will typically attempt to maintain at   least the precision specified by the source code. However, in some   instances the compiler may choose to perform intermediate expressions   at a lower precision than specified in the source code. For example,   the first code block below calls a double precision version of the   square-root function. Under fp:fast, the compiler may choose to   replace the call to the double precision sqrt with a call to a single   precision sqrt function. This has the effect of introducing additional   lower-precision rounding at the point of the function call.   Original functiondouble sqrt(double).... . .double a, b, c;. . .double length = sqrt(a\*a + b\*b + c\*c);   Optimized functionfloat sqrtf(float).... . .double a, b, c;. . .double tmp[0] = a\*a + b\*b + c\*c;float tmp[1] = tmp[0];    // round of parameter valuefloat tmp[2] = sqrtf(tmp[1]); // rounded sqrt resultdouble length = (double) tmp[2];   Although less accurate, this optimization may be especially beneficial   when targeting processors that provide single precision, intrinsic   versions of functions such as sqrt. Just precisely when the compiler   will use such optimizations is both platform and context dependant.   Furthermore, there is no guaranteed consistency for the precision of   intermediate computations, which may be performed at any precision   level available to the compiler. Although the compiler will attempt to   maintain at least the level of precision as specified by the code,   fp:fast allows the optimizer to downcast intermediate computations in   order to produce faster or smaller machine code. For instance, the   compiler may further optimize the code from above to round some of the   intermediate multiplications to single precision.float sqrtf(float).... . .double a, b, c;. . .float tmp[0] = a\*a;     // round intermediate a\*a to single-precisionfloat tmp[1] = b\*b;     // round intermediate b\*b to single-precisiondouble tmp[2] = c\*c;    // do NOT round intermediate c\*c to single-precisionfloat tmp[3] = tmp[0] + tmp[1] + tmp[2];float tmp[4] = sqrtf(tmp[3]);double length = (double) tmp[4];   This kind of additional rounding may result from using a lower   precision floating-point unit, such as SSE2, to perform some of the   intermediate computations. The accuracy of fp:fast rounding is   therefore platform dependant; code that compiles well for one   processor may not necessarily work well for another processor. It's   left to the user to determine if the speed benefits outweigh any   accuracy problems. khb: unfortunately this would require the user to read the disassembled code, do a rigourous numerical analysis, and to redo it everytime the code is modified or the compiler updated (and then recompiled). This is, of course, totally impractical.   If fp:fast optimization is particularly problematic for a specific   function, the floating-point mode can be locally switched to   fp:precise using the float_control compiler pragma. khb: this is, of course, backwards. If you are going to define a basically insanely liberal fp optimization, it ought to enabled for the smallest bit of code practical (preferably with scoping, so it can't accidentally impact the whole compilation unit).`

Why bother with floats at all? Just use an integer if you are going to use fp:fast - it will be just as accurate.

Posted by PatrickG on November 17, 2004 at 09:54 AM PST #

You hae a point; but then why not just simplify the entire program down to printing "42" as the result...

Posted by Keith Bierman on November 17, 2004 at 10:07 AM PST #

That's one of the reasons why Java is relatively slow for FP/Math operations. Java does everything in IEEE764 format.

Posted by Azeem Jiva on November 17, 2004 at 01:09 PM PST #

I'm pretty sure you mean IEEE 754. But beyond that, I think you are also a bit confused, the Microsoft misoptimization described above uses nothing but IEEE formatted values. The issue is semantic and not format per se.

Posted by Keith Bierman on November 17, 2004 at 09:46 PM PST #

Comments are closed for this entry.

khb

##### Archives
Sun Mon Tue Wed Thu Fri Sat « April 2016 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 Today