Why I hate macros that make pointer dereferences look like structure elements

I have a colleague who generated an IDR patch for tcp in Solaris 10 for me to give relief to a customer for a bug while the formal fix was in progress.

As a part of the fix we had this code fragment

 18984          /\*
 18985           \* If the SACK option is set, delete the entire list of
 18986           \* notsack'ed blocks.
 18987           \*/
 18988          if (tcp->tcp_sack_info != NULL) {
 18989                  if (tcp->tcp_notsack_list != NULL)
 18990                          TCP_NOTSACK_REMOVE_ALL(tcp->tcp_notsack_list, tcp);
 18991          }

replaced with this code fragment (the fix actually has a lot more to it than this, but this is what was relevent.)

 18936          /\*
 18937           \* If the SACK option is set, delete the entire list of
 18938           \* notsack'ed blocks.
 18939           \*/
 18940
 18941          if (tcp->tcp_notsack_list != NULL)
 18942                  TCP_NOTSACK_REMOVE_ALL(tcp->tcp_notsack_list, tcp);

Now, the assembly around here reads

ip:tcp_process_shrunk_swnd+0x38:       ldx      [%i0 + 0xf8], %g1
ip:tcp_process_shrunk_swnd+0x3c:       add      %g3, %i1, %g2
ip:tcp_process_shrunk_swnd+0x40:       stw      %g2, [%i0 + 0x80]
ip:tcp_process_shrunk_swnd+0x44:       ldx      [%g1 + 0x48], %i5

and the register dump

pc:  0x7bed2918 ip:tcp_process_shrunk_swnd+0x44:   ldx  [%g1 + 0x48], %i5
npc: 0x7bed291c ip:tcp_process_shrunk_swnd+0x48:   subcc  %i5, 0x0, %g0   ( cmp   %i5, 0x0 )
  global:                       %g1                  0
        %g2             0x761c  %g3             0x68ec
        %g4      0x600216f6e6c  %g5                  0
        %g6               0x1c  %g7      0x2a101f89ca0
  out:  %o0      0x600210d8640  %o1              0x1e0
        %o2              0x5a8  %o3              0x3c8
        %o4      0x600216f6e6c  %o5      0x600216f68c4
        %sp      0x2a101f88c61  %o7         0x7bed2900
  loc:  %l0         0xc7341c85  %l1             0x2000  
        %l2      0x60010972000  %l3      0x600210d8640  
        %l4             0x1000  %l5             0x1000  
        %l6             0x1000  %l7                0x5  
  in:   %i0      0x600210d8640  %i1              0xd30  
        %i2                  0  %i3         0xc73439b5  
        %i4                  0  %i5                0x4  
        %fp      0x2a101f88d11  %i7         0x7becbed4  

The last instruction is where we paniced (yes the customer paniced [twice] as a result of this). As we can see from the register dump, %g1 is NULL, so we definitely have a NULL pointer dereference going on.

So where did this come from? It looks like a dereference of 0xf8 from %i0. %i0 is a (tcp_t \*) making %g1 a (tcp_sack_info \*), namely arg0->tcp_sack_info if we look at the structure; but hang on, the code says tcp->tcp_notsack_list, not tcp->tcp_sack_info. Indeed that element name does not exist within a tcp_t.

A light dawns when we see that:

   299  #define tcp_notsack_list	        tcp_sack_info->tcp_notsack_list

So in reality line 18941 is doing:

 18941          if (tcp->tcp_sack_info->tcp_notsack_list != NULL)

Without checking whether or not tcp->tcp_sack_info is non-NULL. The correct line should perhaps read

 18941          if (tcp->tcp_sack_info != NULL && tcp->tcp_notsack_list != NULL)

Now this would probably not have made it as far as in IDR patch delivered to a customer, if we didn't have that macro definition because alarm bells would have rung that we were doing another dereference!

Comments:

Post a Comment:
Comments are closed for this entry.
About

* - Solaris and Network Domain, Technical Support Centre


Alan is a kernel and performance engineer based in Australia who tends to have the nasty calls gravitate towards him

Search

Archives
« July 2015
SunMonTueWedThuFriSat
   
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
 
       
Today
Links
Blogroll

No bookmarks in folder

Sun Folk

No bookmarks in folder

Non-Sun Folk
Non-Sun Folks

No bookmarks in folder