Whose bug is it anyway?

Whose bug is it anyway?

In the process of trying to get Solaris compiled with the Sun Studio 10, aka, Vulcan compilers, I debugged numerous problems, some of which were not obvious at the time whose bugs they were. Here is one of them:

When libc.so.1 was compiled with GCC, everything worked fine; when it was compiled with Vulcan, all multithreaded programs hung. After some more debugging, the problem seemed to be in the Vulcan compiled usr/src/lib/libc/port/threads/synch.c: if I linked all the object files with a GCC compiled synch.o, everything worked. "It must be a compiler bug!"

I was in the middle of debugging 5 other panics and hangs at the time, so I made an offer to my compiler buddies, "Beer and lunch is on me for whoever figured it out." They tried, but at the end of the day, there was still no root cause. So I took a closer look. It appeared that the hung thread was waiting for a mutex, but nobody owned the mutex, yet the thread was not woken up. I looked at synch.c, and something caught my eye: the various lock routines calling swap32. swap32 is an inline function, and GCC and Vulcan have different inline implementations. If there is a bug there, that could explain why the GCC compiled version worked but not the Vulcan compiled verison.


        .inline swap32, 0
        xchgl   (%rdi), %esi
        .end

Let's see how it can be used:


void
spin_lock_clear(mutex_t \*mp)
{
        ulwp_t \*self = curthread;

        mp->mutex_owner = 0;
        if (swap32(&mp->mutex_lockword, 0) & WAITERMASK) {
                (void) ___lwp_mutex_wakeup(mp);
                if (self->ul_spin_lock_wakeup != UINT_MAX)
                        self->ul_spin_lock_wakeup++;
        }
        preempt(self);
}

Ah ha! So we did the swap, but we never returned anything to the caller. In spin_lock_clear, we were checking whatever happened to be in %rax to see if there were waiters. If %rax happened to be 0, the calling thread would think that there is no waiter to wake up, leaving the poor thread waiting for the mutex looping forever!

To fix the problem, I changed swap32 to the following:


        .inline swap32, 0
        movl    %esi, %eax
        xchgl   (%rdi), %eax
        .end

So the moral of the story is that, things are not always what they seem on the surface.


Technorati Tag:
Technorati Tag:
Comments:

Post a Comment:
Comments are closed for this entry.
About

sherrym

Search

Archives
« July 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
  
       
Today