tracking why an application got a SIGFPE divide by zero.

Back from a long holiday a collegue asked me to look at why a small c++ application was dying with SIGFPE on x86 boxes running Solaris 10. They had run dbx and truss and had worked out that it was taking a SIGFPE divide by zero trap on a idivl instruction deep in the flush of a i/o stream. The truss showed the fault as

    Incurred fault #8, FLTIZDIV  %pc = 0x0805065E
      siginfo: SIGFPE FPE_INTDIV addr=0x0805065E
    Received signal #8, SIGFPE [default]
      siginfo: SIGFPE FPE_INTDIV addr=0x0805065E

So that would look like a divide by zero, dbx showed that the instruction was a idivl but the divisor register was not zero !

After a bit of looking at the AMD instruction documants we see that the idiv instruction can generate a "divide error" exception for two reasons - a divide by zero error and an integer overflow. The solaris kernel maps the "divide error" exception onto the FPE_INTDIV trap which truss reports but it could be caused by either cause. In this case we had an integer overlow as the result exceeded the capacity of a signed int. Now the folks who maintain the library that made the stream know to go look at their code.

Comments:

Post a Comment:
Comments are closed for this entry.
About

timatworkhomeandinbetween

Search

Archives
« April 2014
MonTueWedThuFriSatSun
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
    
       
Today
News

No bookmarks in folder

Blogroll

No bookmarks in folder