## Thursday Aug 28, 2008

### Enabling floating point non-standard mode using LD_PRELOAD

Subnormal numbers are explained in a previous post, together with the use of the flag `-fns` to flush these to zero if they cause a performance impact.

Of course, it's possible that parts of the code need to be computed with subnormals and parts with them flushed to zero. There are programmatic controls to do this in libsunmath, the routines are `nonstandard_arithmetic` and `standard_arithmetic`.

Sometimes it may be that they are occurring in code that cannot be recompiled, it is still possible to disable them. One approach is to write a `LD_PRELOAD` library, containing the following code:

```#include <sunmath.h>
#include <stdio.h>

#pragma init(go)

void go()
{
nonstandard_arithmetic();
printf("NONSTANDARD MODE\\n");
}
```

The `printf` is just to demonstrate that the code is actually called. Taking the same code from the previous post we can confirm that this approach works:

```\$ cc -O ft.c
\$ a.out
...
4.940656e-324
\$ cc -O -G -Kpic libns.c -o libns.so -lsunmath
\$ a.out
NONSTANDARD MODE
...
2.225074e-308
```

## Wednesday Aug 27, 2008

### Subnormal numbers

Under IEEE-754, floating point numbers are represented in binary as:

```Number = signbit \* mantissa \* 2exponent
```

There are potentially multiple ways of representing the same number, using decimal as an example, the number 0.1 could be represented as 1\*10-1 or 0.1\*100 or even 0.01 \* 10. The standard dictates that the numbers are always stored with the first bit as a one. In decimal that corresponds to the 1\*10-1 example.

Now suppose that the lowest exponent that can be represented is -100. So the smallest number that can be represented in normal form is 1\*10-100. However, if we relax the constraint that the leading bit be a one, then we can actually represent smaller numbers in the same space. Taking a decimal example we could represent 0.1\*10-100. This is called a subnormal number. The purpose of having subnormal numbers is to smooth the gap between the smallest normal number and zero.

It is very important to realise that subnormal numbers are represented with less precision than normal numbers. In fact, they are trading reduced precision for their smaller size. Hence calculations that use subnormal numbers are not going to have the same precision as calculations on normal numbers. So an application which does significant computation on subnormal numbers is probably worth investigating to see if rescaling (i.e. multiplying the numbers by some scaling factor) would yield fewer subnormals, and more accurate results.

The following program will eventually generated subnormal numbers:

```#include <stdio.h>

void main()
{
double d=1.0;
while (d>0) {printf("%e\\n",d); d=d/2.0;}
}
```

Compiling and running this program will produce output that looks like:

```\$ cc -O ft.c
\$ a.out
...
3.952525e-323
1.976263e-323
9.881313e-324
4.940656e-324
```

The downside with subnormal numbers is that computation on them is often deferred to software - which is significantly slower. As outlined above, this should not be a problem since computations on subnormal numbers should be both rare and treated with suspicion.

However, sometimes subnormals come out as artifacts of calculations, for example subtracting two numbers that should be equal, but due to rounding errors are just slightly different. In these cases the program might want to flush the subnormal numbers to zero, and eliminate the computation on them. There is a compiler flag that needs to be used when building the main routine called `-fns` which enables the hardware to flush subnormals to zero. Recompiling the above code with this flag yields the following output:

```\$ cc -O -fns ft.c
\$ a.out
...
1.780059e-307
8.900295e-308
4.450148e-308
2.225074e-308
```

Notice that the smallest number when subnormals are flushed to zero is 2e-308 rather than 5e-324 that is attained when subnormals are enabled.

## Thursday Jun 05, 2008

### Floating point multiple accumulate

The SPARC64 VI processor supports floating point multiply accumulate instructions, also known as FMA or FMAC. These instructions take 3 input registers and one output register and perform the calculation:

```dest = src1\*src2+src3
```

The advantage of these instructions is that they perform the multiply and add operations in the same time as it normally takes to do either a multiply or an add - so the issue rate of floating point instructions is potentially doubled. To see how they can do this, look at the following binary multiply example:

```     1010
\* 111
----
1010
10100
+ 101000
-------
1000110
```

You can see that the multiply is really a sequence of adds. Adding one more addition into the process does not really make much difference.

However, we're really dealing with floating point numbers. So consider the usual sequence of operations when performing a multiplication followed by an addition:

```  temp = round(src1 \* src2)
dst  = round(src3 + temp)
```

The floating point numbers are typically computed in hardware with additional precision, then rounded to fit the register. A fused FMA is equivalent to the following operations:

``` dst = round(src1 \* src2 + src3)
```

Which performs rounding only at the end of the FMA operation. The single rounding operation during the computation may cause a difference in the least significant bits of the result when compared with the result from using two rounding operations. In theory, because the hardware does the computation in higher precision, and then rounds down, the result should be more accurate. It is possible to support an unfused FMA operation, where the middle rounding step is preserved, and the result is identical to the two step code.

The SPARC64 VI processor supports fused FMA. As we've discussed, using this instruction may cause minor differences to the output, so you need to explicitly give permission to the compiler to generate it (by default the compiler avoids doing anything that would alter the results of FP computation). So the flags necessary to generate fused FMA instructions are:

```-xarch=sparcfmaf -xfma=fused
```

A binary compiled to use the FMA instructions will not run on hardware that does not support these instructions. With that in mind it is also acceptable to specify the target processor in the flags, which avoids needing to specify the architecture:

```-xtarget=sparc64vi -xfma=fused
```

## Thursday Jul 19, 2007

### Single precision floating point constants

The C standard specifies that by default floating point constants should be held as double precision values. Calculations where one variable is single precision and one variable is double precision need to be computed as double precision. So it's very easy to have a code which works entirely on single precision variables, but ends up with lots of conversion between single and double precision because the constants are not specified as single precision. An example of this is shown in the next code snippet.

```float f(float a)
{
return a\*1.5;
}
% cc -O -S ff.c
/\* 0x0008            \*/         ldd     [%o5+%lo(offset)],%f4
/\* 0x000c            \*/         ld      [%sp+68],%f2
/\* 0x0010            \*/         fstod   %f2,%f6
/\* 0x0014            \*/         fmuld   %f6,%f4,%f8
/\* 0x0018            \*/         retl    ! Result =  %f0
/\* 0x001c            \*/         fdtos   %f8,%f0
```

The fstod instruction converts the constant variable a from single precision to double precision, the fdtos converts the result of the multiplication back to single precision.

The option to specify that unsuffixed floating point constants are held as single precision rather than the default of double precision is -xsfpconst. This option works at file level, so is not ideal for situations where files are a mix of single and double precisions constants. This flag can cause the opposite problem where the single precision constant needs to be converted to double precision in order to be used in a calculation with a double precision variable.

The issue is easy to work around at a source code level. Single precision constants can be specified with a trailing f. For example:

```float f(float a)
{
return a\*1.5f;
}
```

Compiling this code produces the following much neater and more efficient code:

```/\* 0x0008            \*/         ld      [%o5+%lo(offset)],%f2
/\* 0x000c            \*/         ld      [%sp+68],%f4
/\* 0x0010            \*/         retl    ! Result =  %f0
/\* 0x0014            \*/         fmuls   %f4,%f2,%f0
```

Darryl Gove is a senior engineer in the Solaris Studio team, working on optimising applications and benchmarks for current and future processors. He is also the author of the books:
Multicore Application Programming
Solaris Application Programming
The Developer's Edge

##### Archives
Sun Mon Tue Wed Thu Fri Sat « November 2015 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 Today