tracking why an application got a SIGFPE divide by zero.
By timatworkhomeandinbetween on Aug 30, 2006
Back from a long holiday a collegue asked me to look at why a small c++ application was dying with SIGFPE on x86 boxes running Solaris 10. They had run dbx and truss and had worked out that it was taking a SIGFPE divide by zero trap on a idivl instruction deep in the flush of a i/o stream. The truss showed the fault as
Incurred fault #8, FLTIZDIV %pc = 0x0805065E siginfo: SIGFPE FPE_INTDIV addr=0x0805065E Received signal #8, SIGFPE [default] siginfo: SIGFPE FPE_INTDIV addr=0x0805065E
So that would look like a divide by zero, dbx showed that the instruction was a idivl but the divisor register was not zero !
After a bit of looking at the AMD instruction documants we see that the idiv instruction can generate a "divide error" exception for two reasons - a divide by zero error and an integer overflow. The solaris kernel maps the "divide error" exception onto the FPE_INTDIV trap which truss reports but it could be caused by either cause. In this case we had an integer overlow as the result exceeded the capacity of a signed int. Now the folks who maintain the library that made the stream know to go look at their code.