Understanding Data Races 3: Several currently available tools (2/3)

Runtime Checking (simulation based) Tool

Tool 3: Helgrind from Valgrind

In this blog entry, I will describe my experiment of the test cases with Helgrind. Helgrind is a data race detection module of Valgrind, which is pretty successful framework and tool suite for debugging and profiling Linux programs.

Unlike other runtime checking tools I will describe later (e.g. Intel's Thread Checker and Sun's DRDT), Valgrind is simulation based. One advantage of simulation based approach is the two active entities - the target application and the detection module are in different processes. They have different address spaces and name spaces. Therefore this approach can avoid many conflicts between the two entities. For example, the detection module can call any library routines that it monitors without worrying about re-entry problems. [Update: Valgrind actually runs in the same namespace as the target application. And the target application and the detection module are part of the same process. The detection module (core and tools) are designed carefully to avoid dependence on glibc.so.] One challenge of simulation based approach is dealing with system calls. The simulation based approach simulates only the execution of the user process, and it is NOT simulating the OS. A even more bigger challenge is to deal with threading calls. Valgrind is not multi-threaded itself, and all threading executions are serialized. I have not got a chance to study how it works. It must be very interesting. Valgrind's manual claims it works with NPTL or LinuxThreads "well enough for significant threaded applications".

Helgrind is based on the famous Eraser method enhanced with detection of thread creation and thread join. The method is very similar to that used in Compaq/HP's Visual Threads (as described in Harrow's paper). Lockset based methods (such as Eraser) tend to have a lot of false positives.

Currently Valgrind is at release 3.2.0. But the latest version that Helgrind works is 2.2.0. When I ran Helgrind in 3.2.0, I got

Helgrind is currently not working, because:
 (a) it is not yet ready to handle the Vex IR and the use with 64-bit
     platforms introduced in Valgrind 3.0.0
 (b) we need to get thread operation tracking working again after
     the changes added in Valgrind 2.4.0
 If you want to use Helgrind, you'll have to use Valgrind 2.2.0, which is
 the most recent Valgrind release that contains a working Helgrind.

Sorry for the inconvenience.  Let us know if this is a problem for you.

Then I swithced to 2.2.0. First I tried with pthr_prime.c.

$ cc -g pthr_prime.c -lm -lpthread -o pthr_prime

$ valgrind --tool=helgrind ./pthr_prime

==32368== Helgrind, a data race detector for x86-linux.
==32368== Copyright (C) 2002-2004, and GNU GPL'd, by Nicholas Nethercote et al.
==32368== Using valgrind-2.2.0, a program supervision framework for x86-linux.
==32368== Copyright (C) 2000-2004, and GNU GPL'd, by Julian Seward et al.
==32368== For more details, rerun with: -v
==32368== 
==32368== Thread 2:
==32368== Possible data race writing variable at 0x80498B0 (total)
==32368==    at 0x80486BD: work (pthr_prime.c:51)
==32368==    by 0x1D4AFE79: thread_wrapper (vg_libpthread.c:867)
==32368==    by 0xB0010EF3: (within /home/yl140942/vg2/lib/valgrind/stage2)
==32368==  Address 0x80498B0 is in data section of /home/yl140942/tmp/vg/a.out
==32368==  Previous state: shared RO, no locks
==32368== 
==32368== Possible data race writing variable at 0x57EFE95C 
==32368==    at 0x804877D: main (pthr_prime.c:75)
==32368==  Address 0x57EFE95C == &(i) at pthr_prime.c:75
==32368==  Previous state: shared RO, no locks
==32368== 
==32368== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 2 from 2)
==32368== 4 possible data races found; 0 lock order problems
Helgrind finds the race access of total at line 51 and race access of i at line 75. Note that a data race is caused by a pair of accesses. Helgrind reports only one access of a pair. For the first one, the report is ok because line 51 reads and updates total, therefore it is fairly easy to guess what are the racing access pairs. For the second one (i at line 75), I would imagine it would take a fair large of amount of time for one to figure out the other race access of the pair is in line 46. Helgrind also misses several data races (e.g. write-write race at line 50, write-read race between line 50 and 76) due to the heuristic it uses.

Next, I tried with pthr_prime_fixed.c.

$ cc -g pthr_prime_fixed.c -lm -lpthread -o pthr_prime_fixed
$ valgrind --tool=helgrind ./pthr_prime_fixed

==21596== Helgrind, a data race detector for x86-linux.
==21596== Copyright (C) 2002-2004, and GNU GPL'd, by Nicholas Nethercote et al.
==21596== Using valgrind-2.2.0, a program supervision framework for x86-linux.
==21596== Copyright (C) 2000-2004, and GNU GPL'd, by Julian Seward et al.
==21596== For more details, rerun with: -v
==21596==
==21596== Thread 2:
==21596== Possible data race writing variable at 0x804CA10 (pflag+16)
==21596==    at 0x80486DC: is_prime (pthr_prime_fixed.c:34)
==21596==    by 0x8048756: work (pthr_prime_fixed.c:50)
==21596==    by 0x1D4EAE79: thread_wrapper (vg_libpthread.c:867)
==21596==    by 0xB0010EF3: (within /home/yl140942/vg2/lib/valgrind/stage2)
==21596==  Address 0x804CA10 is in BSS section of /home/yl140942/tmp/vg/pthr_prime_fixed
==21596==  Previous state: exclusively owned by thread 1
==21596==
==21596== Thread 2:
==21596== Possible data race writing variable at 0x804CA18 (pflag+24)
==21596==    at 0x80486DC: is_prime (pthr_prime_fixed.c:34)
==21596==    by 0x8048756: work (pthr_prime_fixed.c:50)
==21596==    by 0x1D4EAE79: thread_wrapper (vg_libpthread.c:867)
==21596==    by 0xB0010EF3: (within /home/yl140942/vg2/lib/valgrind/stage2)
==21596==  Address 0x804CA18 is in BSS section of /home/yl140942/tmp/vg/pthr_prime_fixed
==21596==  Previous state: exclusively owned by thread 1

<similar messages repeated for various pflag+offset>

==21596== Thread 2:
==21596== Possible data race writing variable at 0x804CAC0 (pflag+192)
==21596==    at 0x80486DC: is_prime (pthr_prime_fixed.c:34)
==21596==    by 0x8048756: work (pthr_prime_fixed.c:50)
==21596==    by 0x1D4EAE79: thread_wrapper (vg_libpthread.c:867)
==21596==    by 0xB0010EF3: (within /home/yl140942/vg2/lib/valgrind/stage2)
==21596==  Address 0x804CAC0 is in BSS section of /home/yl140942/tmp/vg/pthr_prime_fixed
==21596==  Previous state: exclusively owned by thread 1
==21596==
==21596==
==21596== Possible data race reading variable at 0x80499B0 (total)
==21596==    at 0x8048873: main (pthr_prime_fixed.c:80)
==21596==  Address 0x80499B0 is in data section of /home/yl140942/tmp/vg/pthr_prime_fixed
==21596==  Previous state: shared RW, locked by:0x80499B4(mutex)
==21596==
==21596== ERROR SUMMARY: 33 errors from 33 contexts (suppressed: 2 from 2)
==21596== 35 possible data races found; 0 lock order problems

This time Helgrind reports 32 races accesses of pflag[] at line 34. As explained in DRDT tutorial, these are benign data races. Helgrind also reports a false positive race that has an access of total at line 80.

Helgrind does a good job of reporting the name of the variable involved in the data races (e.g. total, pflag[] and i) and the lock variables (e.g. mutex). The Previous state gives a hint why Helgrind thinks an access might cause data race. For example, in the above experiment with pthr_prime_fixed.c, for the access of total at line 80, it says "Previous state: shared RW, locked by:0x80499B4(mutex)". The accesses of total at lines 52-53 are protected by mutex locks. When Helgrind finds the read access of total is not protected by the same lock (or any lock in this case), it reports a possbile data race. The detection of the thread_join sometime did not work to get rid of the false positive though.

Comments:

tytytyt

Posted by guest on September 23, 2009 at 08:05 PM PDT #

Post a Comment:
  • HTML Syntax: NOT allowed
About

yuanlin

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today