Friday Nov 21, 2008

Extending the OpenMP profiling API for OpenMP 3.0

Last Tuesday at the OpenMP BOF of SC08, Oleg Mazurov presented our work on extending the OpenMP profiling API for OpenMP 3.0 (pdf slides).

The current existing API was first published in 2006 and was last updated in 2007. Since then, two more developments now beg for another update - one is for supporting the new OpenMP tasking feature, and the other is for supporting vendor specific extensions.

The extension for tasking support is straight forward. A few events that corresponding to the creation, execution, and termination of tasks are added. Also added are a few requests to get the task ID and other properties.

Vendor specific extensions are implemented essentially by sending a establishing-extension request with a vendor unique ID from the collector tool to the OpenMP runtime library. The OpenMP runtime library accepts the request if it supports the vendor, otherwise rejects it. After a successful rendezvous, the request establishes a new name space for subsequent requests and events.

One pending issue is how to support multiple vendor agents in one session. Not that a solution cannot be engineered, we are waiting for a use case to emerge.

During the execution of an OpenMP program, any arbitrary program event can be associated with

  • an OpenMP state,
  • a user callstack,
  • a node in the thread tree with parallel region ID's and OpenMP thread ID's along the path, and
  • a node in the task tree with task ID's along the path.

Because the execution of an OpenMP task may be asynchronous, and the executing thread may be different from the encountering thread, getting the user callstack of an event happened within a task becomes tricky.

At our Sun booth in SC08, we demoed a prototype Performance Analyzer that can present user callstacks in a cool way when OpenMP tasks are involved.

Take a simple quick sort code for an example.

        void quick_sort ( int lt,  int rt,  float \*data )  { 
            int md = partition( lt,  rt,  data ); 
            #pragma omp task 
            quick_sort( lt,  md - 1,  data ); 
            #pragma omp task 
            quick_sort( md + 1,  rt,  data ); 
        } 

The following figure shows the time line display of one execution of the program. The same original data are sorted three times, once sequential, once using two threads, and once using four threads.

The spikes in callstacks in the sequential sort show the recursive nature of the quick sort. But when you look at the parallel sort, the callstacks are flat. That's because each call to quick_sort() is now a task, and the tasking execution essentially changes the recursive execution into a work-list execution. The low-level callstack in the above figure shows close-to what actually happens in one thread.

While these pieces of information are useful in showing the execution details, they do not help answering the question which tasks are actually being executing. Where was the current executing task created? In the end, the user needs to debug the performance problem in his/her code (not in the OpenMP runtime). Representing information close to the user program logic is crucial.

The following figure shows the time line and user callstacks in the user view constructed by our prototype tool. Notice the callstacks in the parallel run are almost the same as in the sequential run. In the time line, it is just like the work in the sequential run is being distributed among the threads in the parallel run. Isn't this what happens intuitively when you parallelize a code using OpenMP? :)

Saturday Nov 17, 2007

Non-concurrency Analysis Used in PTP

When visiting the IBM booth at SC07, I was a little surprised to learn that the non-concurrency analysis technology for OpenMP programs had also been adopted and implemented in the Parallel Tools Platform.

Beth Tibbitts from IBM has kindly sent me the reference details: STMCS'07 program, paper, and presentation.

The technology is used by Sun Studio Compilers to do static error check for OpenMP programs.

Friday Nov 09, 2007

Maximum Automation for Mundane Tasks

Adam Kolawa (Parasoft) said in his recent article on DDJ,

"Many people ... want tools to find these bugs automatically. After 20 years of examining how and why errors occur, I believe this is the wrong response. Only a small class of errors can be found automatically; most bugs are related to functionality and requirements, and cannot be identified with just the click of a button."

and

"Our current mission is to address this problem by inventing technologies and strategies to support the brain as it performs this evaluation. We are building automated infrastructures that provide maximum automation for mundane tasks (compiling code, building/running regression test suites, checking adherence to policies, supporting code reviews, and so on) in such a way that each day the brain is presented with the minimal information needed to determine if yesterday's code modifications negatively impacted the application."

There is probably no magic button one can push to turn a piece of legacy code that is not thread-safe into a thread-safe code. A tool should offload the mundane tasks from human brain which can be set free to finish the magic touch.

Friday Oct 13, 2006

A Presentation of DRDT at SVOSUG 9/28/2006

Here is a short presentation of DRDT at the Silicon Valley OpenSolaris Users Group (SVOSUG) meeting on Setp 28, 2006.

Friday Aug 04, 2006

Understanding Data Races 3: Several currently available tools (2/3)

Runtime Checking (simulation based) Tool

Tool 3: Helgrind from Valgrind

In this blog entry, I will describe my experiment of the test cases with Helgrind. Helgrind is a data race detection module of Valgrind, which is pretty successful framework and tool suite for debugging and profiling Linux programs.

Unlike other runtime checking tools I will describe later (e.g. Intel's Thread Checker and Sun's DRDT), Valgrind is simulation based. One advantage of simulation based approach is the two active entities - the target application and the detection module are in different processes. They have different address spaces and name spaces. Therefore this approach can avoid many conflicts between the two entities. For example, the detection module can call any library routines that it monitors without worrying about re-entry problems. [Update: Valgrind actually runs in the same namespace as the target application. And the target application and the detection module are part of the same process. The detection module (core and tools) are designed carefully to avoid dependence on glibc.so.] One challenge of simulation based approach is dealing with system calls. The simulation based approach simulates only the execution of the user process, and it is NOT simulating the OS. A even more bigger challenge is to deal with threading calls. Valgrind is not multi-threaded itself, and all threading executions are serialized. I have not got a chance to study how it works. It must be very interesting. Valgrind's manual claims it works with NPTL or LinuxThreads "well enough for significant threaded applications".

Helgrind is based on the famous Eraser method enhanced with detection of thread creation and thread join. The method is very similar to that used in Compaq/HP's Visual Threads (as described in Harrow's paper). Lockset based methods (such as Eraser) tend to have a lot of false positives.

Currently Valgrind is at release 3.2.0. But the latest version that Helgrind works is 2.2.0. When I ran Helgrind in 3.2.0, I got

Helgrind is currently not working, because:
 (a) it is not yet ready to handle the Vex IR and the use with 64-bit
     platforms introduced in Valgrind 3.0.0
 (b) we need to get thread operation tracking working again after
     the changes added in Valgrind 2.4.0
 If you want to use Helgrind, you'll have to use Valgrind 2.2.0, which is
 the most recent Valgrind release that contains a working Helgrind.

Sorry for the inconvenience.  Let us know if this is a problem for you.

Then I swithced to 2.2.0. First I tried with pthr_prime.c.

$ cc -g pthr_prime.c -lm -lpthread -o pthr_prime

$ valgrind --tool=helgrind ./pthr_prime

==32368== Helgrind, a data race detector for x86-linux.
==32368== Copyright (C) 2002-2004, and GNU GPL'd, by Nicholas Nethercote et al.
==32368== Using valgrind-2.2.0, a program supervision framework for x86-linux.
==32368== Copyright (C) 2000-2004, and GNU GPL'd, by Julian Seward et al.
==32368== For more details, rerun with: -v
==32368== 
==32368== Thread 2:
==32368== Possible data race writing variable at 0x80498B0 (total)
==32368==    at 0x80486BD: work (pthr_prime.c:51)
==32368==    by 0x1D4AFE79: thread_wrapper (vg_libpthread.c:867)
==32368==    by 0xB0010EF3: (within /home/yl140942/vg2/lib/valgrind/stage2)
==32368==  Address 0x80498B0 is in data section of /home/yl140942/tmp/vg/a.out
==32368==  Previous state: shared RO, no locks
==32368== 
==32368== Possible data race writing variable at 0x57EFE95C 
==32368==    at 0x804877D: main (pthr_prime.c:75)
==32368==  Address 0x57EFE95C == &(i) at pthr_prime.c:75
==32368==  Previous state: shared RO, no locks
==32368== 
==32368== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 2 from 2)
==32368== 4 possible data races found; 0 lock order problems
Helgrind finds the race access of total at line 51 and race access of i at line 75. Note that a data race is caused by a pair of accesses. Helgrind reports only one access of a pair. For the first one, the report is ok because line 51 reads and updates total, therefore it is fairly easy to guess what are the racing access pairs. For the second one (i at line 75), I would imagine it would take a fair large of amount of time for one to figure out the other race access of the pair is in line 46. Helgrind also misses several data races (e.g. write-write race at line 50, write-read race between line 50 and 76) due to the heuristic it uses.

Next, I tried with pthr_prime_fixed.c.

$ cc -g pthr_prime_fixed.c -lm -lpthread -o pthr_prime_fixed
$ valgrind --tool=helgrind ./pthr_prime_fixed

==21596== Helgrind, a data race detector for x86-linux.
==21596== Copyright (C) 2002-2004, and GNU GPL'd, by Nicholas Nethercote et al.
==21596== Using valgrind-2.2.0, a program supervision framework for x86-linux.
==21596== Copyright (C) 2000-2004, and GNU GPL'd, by Julian Seward et al.
==21596== For more details, rerun with: -v
==21596==
==21596== Thread 2:
==21596== Possible data race writing variable at 0x804CA10 (pflag+16)
==21596==    at 0x80486DC: is_prime (pthr_prime_fixed.c:34)
==21596==    by 0x8048756: work (pthr_prime_fixed.c:50)
==21596==    by 0x1D4EAE79: thread_wrapper (vg_libpthread.c:867)
==21596==    by 0xB0010EF3: (within /home/yl140942/vg2/lib/valgrind/stage2)
==21596==  Address 0x804CA10 is in BSS section of /home/yl140942/tmp/vg/pthr_prime_fixed
==21596==  Previous state: exclusively owned by thread 1
==21596==
==21596== Thread 2:
==21596== Possible data race writing variable at 0x804CA18 (pflag+24)
==21596==    at 0x80486DC: is_prime (pthr_prime_fixed.c:34)
==21596==    by 0x8048756: work (pthr_prime_fixed.c:50)
==21596==    by 0x1D4EAE79: thread_wrapper (vg_libpthread.c:867)
==21596==    by 0xB0010EF3: (within /home/yl140942/vg2/lib/valgrind/stage2)
==21596==  Address 0x804CA18 is in BSS section of /home/yl140942/tmp/vg/pthr_prime_fixed
==21596==  Previous state: exclusively owned by thread 1

<similar messages repeated for various pflag+offset>

==21596== Thread 2:
==21596== Possible data race writing variable at 0x804CAC0 (pflag+192)
==21596==    at 0x80486DC: is_prime (pthr_prime_fixed.c:34)
==21596==    by 0x8048756: work (pthr_prime_fixed.c:50)
==21596==    by 0x1D4EAE79: thread_wrapper (vg_libpthread.c:867)
==21596==    by 0xB0010EF3: (within /home/yl140942/vg2/lib/valgrind/stage2)
==21596==  Address 0x804CAC0 is in BSS section of /home/yl140942/tmp/vg/pthr_prime_fixed
==21596==  Previous state: exclusively owned by thread 1
==21596==
==21596==
==21596== Possible data race reading variable at 0x80499B0 (total)
==21596==    at 0x8048873: main (pthr_prime_fixed.c:80)
==21596==  Address 0x80499B0 is in data section of /home/yl140942/tmp/vg/pthr_prime_fixed
==21596==  Previous state: shared RW, locked by:0x80499B4(mutex)
==21596==
==21596== ERROR SUMMARY: 33 errors from 33 contexts (suppressed: 2 from 2)
==21596== 35 possible data races found; 0 lock order problems

This time Helgrind reports 32 races accesses of pflag[] at line 34. As explained in DRDT tutorial, these are benign data races. Helgrind also reports a false positive race that has an access of total at line 80.

Helgrind does a good job of reporting the name of the variable involved in the data races (e.g. total, pflag[] and i) and the lock variables (e.g. mutex). The Previous state gives a hint why Helgrind thinks an access might cause data race. For example, in the above experiment with pthr_prime_fixed.c, for the access of total at line 80, it says "Previous state: shared RW, locked by:0x80499B4(mutex)". The accesses of total at lines 52-53 are protected by mutex locks. When Helgrind finds the read access of total is not protected by the same lock (or any lock in this case), it reports a possbile data race. The detection of the thread_join sometime did not work to get rid of the false positive though.

Thursday Jul 06, 2006

Understanding Data Races 2: Several currently available tools (1/3)

Introduction

This blog entry begins to describe a couple of currently available tools that detect data races in multi-threaded C/C++/Fortran programs. These tools and the categories they can be roughly put into are

  1. Static Checking
    • LockLint from Sun
    • vpara compile time check for OpenMP programs from Sun
  2. Runtime Checking - simulation based
    • Helgrind from Valgrind
  3. Runtime Checking - execution based
    • Visual Threads from HP
    • Thread Checker from Intel
    • Data Race Detection Tool from Sun

What not covered here are the tools from some research work. Some of them use combined static and runtime methods, and some use post-mortem based approaches.

Code Examples

I will reuse the following four code examples from the Tutorial of Using Sun Data Race Detection Tool. If you have downloaded and installed the Sun Studio Express June 2006, you should be able to find the example codes under

{installed-directory}/opt/SUNWspro/examples/rdt/prime.

All four codes find the prime numbers between 2 and 3000 using 4 threads. An OpenMP version and a Pthread version are provided,

  1. omp_prime.c: OpenMP version, contains data races
  2. omp_prime_fixed.c: OpenMP version, bugs fixed
  3. pthr_prime.c: Pthread version, contains data races and bugs
  4. pthr_prime_fixed.c: Pthread version, bugs fixed

Read the Tutorial to find out what the data races are and how the bugs are fixed.

omp_prime.c
    ...
    12	#include <stdio.h>
    13	#include <math.h>
    14	#include <omp.h>
    15	
    16	#define THREADS 4
    17	#define N 3000
    18	
    19	int primes[N];
    20	int pflag[N];
    21	
    22	int is_prime(int v)
    23	{
    24	    int i;
    25	    int bound = floor(sqrt(v)) + 1;
    26	    
    27	    for (i = 2; i < bound; i++) {
    28	        /\* no need to check against known composites \*/ 
    29	        if (!pflag[i]) 
    30	            continue;
    31	        if (v % i == 0) { 
    32	            pflag[v] = 0;
    33	            return 0;
    34	        }
    35	    }
    36	    return (v > 1); 
    37	}
    38	
    39	int main(int argn, char \*\*argv)
    40	{
    41	    int i;
    42	    int total = 0;
    43	
    44	#ifdef _OPENMP
    45	    omp_set_num_threads(THREADS);
    46	    omp_set_dynamic(0);
    47	#endif
    48	
    49	    for (i = 0; i < N; i++) {
    50	        pflag[i] = 1; 
    51	    }
    52	    
    53	    #pragma omp parallel for
    54	    for (i = 2; i < N; i++) {
    55	        if ( is_prime(i) ) {
    56	            primes[total] = i;
    57	            total++;
    58	        }
    59	    }
    60	    printf("Number of prime numbers between 2 and %d: %d\\n",
    61	           N, total);
    62	    for (i = 0; i < total; i++) {
    63	        printf("%d\\n", primes[i]);
    64	    }
    65	}
pthr_prime.c
    ...
    12	#include <stdio.h>
    13	#include <math.h>
    14	#include <pthread.h>
    15	
    16	#define THREADS 4
    17	#define N 3000
    18	
    19	int primes[N];
    20	int pflag[N];
    21	int total = 0;
    22	
    23	int is_prime(int v)
    24	{
    25	    int i;
    26	    int bound = floor(sqrt(v)) + 1;
    27	
    28	    for (i = 2; i < bound; i++) {
    29	        /\* no need to check against known composites \*/ 
    30	        if (!pflag[i])
    31	            continue;
    32	        if (v % i == 0) {
    33	            pflag[v] = 0;
    34	            return 0;
    35	        }
    36	    }
    37	    return (v > 1); 
    38	}
    39	
    40	void \*work(void \*arg)
    41	{
    42	    int start;
    43	    int end;
    44	    int i;
    45	
    46	    start = (N/THREADS) \* (\*(int \*)arg) ;
    47	    end = start + N/THREADS;
    48	    for (i = start; i < end; i++) {
    49	        if ( is_prime(i) ) {
    50	            primes[total] = i;
    51	            total++;        
    52	        }
    53	    }
    54	    return NULL;
    55	}
    56	
    57	int main(int argn, char \*\*argv)
    58	{
    59	    int i;
    60	    pthread_t tids[THREADS-1];
    61	
    62	    for (i = 0; i < N; i++) {
    63	        pflag[i] = 1; 
    64	    }
    65	
    66	    for (i = 0; i < THREADS-1; i++) {
    67	        pthread_create(&tids[i], NULL, work, (void \*)&i);
    68	    }
    69	
    70	    i = THREADS-1;
    71	    work((void \*)&i);
    72	    
    73	    printf("Number of prime numbers between 2 and %d: %d\\n",
    74	           N, total);
    75	    for (i = 0; i < total; i++) {
    76	        printf("%d\\n", primes[i]);
    77	    }
    78	}
omp_prime_fixed.c
    ...
    12	#include <ststdio.h>
    13	#include <math.h>
    14	#include <pthread.h>
    15	
    16	#define THREADS 4
    17	#define N 3000
    18	
    19	int primes[N];
    20	int pflag[N];
    21	int total = 0;
    22	pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;
    23	
    24	int is_prime(int v)
    25	{
    26	    int i;
    27	    int bound = floor(sqrt(v)) + 1;
    28	
    29	    for (i = 2; i < bound; i++) {
    30	        /\* no need to check against known composites \*/ 
    31	        if (!pflag[i])
    32	            continue;
    33	        if (v % i == 0) {
    34	            pflag[v] = 0;
    35	            return 0;
    36	        }
    37	    }
    38	    return (v > 1); 
    39	}
    40	
    41	void \*work(void \*arg)
    42	{
    43	    int start;
    44	    int end;
    45	    int i;
    46	    
    47	    start = (N/THREADS) \* ((int)arg) ;
    48	    end = start + N/THREADS;
    49	    for (i = start; i < end; i++) {
    50	        if ( is_prime(i) ) {
    51	            pthread_mutex_lock(&mutex);
    52	            primes[total] = i;
    53	            total++;        
    54	            pthread_mutex_unlock(&mutex);
    55	        }
    56	    }
    57	    return NULL;
    58	}
    59	
    60	int main(int argn, char \*\*argv)
    61	{
    62	    int i;
    63	    pthread_t tids[THREADS-1];
    64	
    65	    for (i = 0; i < N; i++) {
    66	        pflag[i] = 1; 
    67	    }
    68	
    69	    for (i = 0; i < THREADS-1; i++) {
    70	        pthread_create(&tids[i], NULL, work, (void \*)i);
    71	    }
    72	
    73	    i = THREADS-1;
    74	    work((void \*)i);
    75	    
    76	    for (i = 0; i < THREADS-1; i++) {
    77	        pthread_join(tids[i], NULL);
    78	    }
    79	
    80	    printf("Number of prime numbers between 2 and %d: %d\\n",
    81	           N, total);
    82	    for (i = 0; i < total; i++) {
    83	        printf("%d\\n", primes[i]);
    84	    }
    85	}
pthr_prime_fixed.c
    ...
    12	#include <stdio.h>
    13	#include <math.h>
    14	#include <pthread.h>
    15	
    16	#define THREADS 4
    17	#define N 3000
    18	
    19	int primes[N];
    20	int pflag[N];
    21	int total = 0;
    22	pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;
    23	
    24	int is_prime(int v)
    25	{
    26	    int i;
    27	    int bound = floor(sqrt(v)) + 1;
    28	
    29	    for (i = 2; i < bound; i++) {
    30	        /\* no need to check against known composites \*/ 
    31	        if (!pflag[i])
    32	            continue;
    33	        if (v % i == 0) {
    34	            pflag[v] = 0;
    35	            return 0;
    36	        }
    37	    }
    38	    return (v > 1); 
    39	}
    40	
    41	void \*work(void \*arg)
    42	{
    43	    int start;
    44	    int end;
    45	    int i;
    46	    
    47	    start = (N/THREADS) \* ((int)arg) ;
    48	    end = start + N/THREADS;
    49	    for (i = start; i < end; i++) {
    50	        if ( is_prime(i) ) {
    51	            pthread_mutex_lock(&mutex);
    52	            primes[total] = i;
    53	            total++;        
    54	            pthread_mutex_unlock(&mutex);
    55	        }
    56	    }
    57	    return NULL;
    58	}
    59	
    60	int main(int argn, char \*\*argv)
    61	{
    62	    int i;
    63	    pthread_t tids[THREADS-1];
    64	
    65	    for (i = 0; i < N; i++) {
    66	        pflag[i] = 1; 
    67	    }
    68	
    69	    for (i = 0; i < THREADS-1; i++) {
    70	        pthread_create(&tids[i], NULL, work, (void \*)i);
    71	    }
    72	
    73	    i = THREADS-1;
    74	    work((void \*)i);
    75	    
    76	    for (i = 0; i < THREADS-1; i++) {
    77	        pthread_join(tids[i], NULL);
    78	    }
    79	
    80	    printf("Number of prime numbers between 2 and %d: %d\\n",
    81	           N, total);
    82	    for (i = 0; i < total; i++) {
    83	        printf("%d\\n", primes[i]);
    84	    }
    85	}

Static Checking Tools

Static checking tools find data races in a program without actually executing the program.

The static checking approach has three advantages, as compared with runtime based approachs.

  1. It can be very fast and consume little memory.
  2. The analysis does not affect the behavior of program because it is performed offline.
  3. It can detect potential data races that do not happen in a particular run with a particular input data set.

Because of the above advantages, static checking can be used in situations where it is very difficult or impossible to get a runtime experiment or where it is very difficult or impossible to get a precise runtime experiment without altering the runtime result, such as OS kernels and device drivers.

The biggest disadvantage of static checking is the large amount of false positives it may generate. Static checking is always puzzled by imprecise information due pointer aliasing and vague execution paths.

Tool 1: LockLint from Sun

Sun Studio provides a utility called LockLint, which analyzes the use of mutex and reader/writer locks, and reports data races and deadlocks due to inconsistent use of locking techniques.

LockLint reports a data race when accesses to a variable are not consistently protected by at least one lock, or accesses violate assertions about which locks protect the variable.

LockLint originates from WARLOCK, which was designed to detect data races and deadlocks in Solaris kernels and device drivers. Search for warlock in opensolaris.org, and you can still find the use of it there.

The following shows the result of using LockLint on pthr_prime.c. Notice the false positive at line 63, and false negative with respect to variable i.

$ cc -mt -Zll pthr_prime.c
$ lock_lint start
$ lock_lint load pthr_prime.ll
$ lock_lint analyze -v

\* Warning: A main function was loaded with no annotations to indicate the
      presence or absence of concurrency. Lock_lint will assume concurrency.
  Please annotate source with:
      NOTE(COMPETING_THREADS_NOW) or NOTE(NO_COMPETING_THREADS_NOW)

\* Writable variable read while no locks held!
  variable = :pflag
     where = :is_prime [pthr_prime.c,30]

\* Variable written while no locks held!
  variable = :pflag
     where = :is_prime [pthr_prime.c,33]

\* Variable written while no locks held!
  variable = :pflag
     where = :main [pthr_prime.c,63]

\* Writable variable read while no locks held!
  variable = :total
     where = :main [pthr_prime.c,74]

\* Writable variable read while no locks held!
  variable = :total
     where = :main [pthr_prime.c,75]

\* Writable variable read while no locks held!
  variable = :primes
     where = :main [pthr_prime.c,76]

\* Writable variable read while no locks held!
  variable = :total
     where = :main [pthr_prime.c,77]

\* Writable variable read while no locks held!
  variable = :total
     where = :work [pthr_prime.c,50]

\* Variable written while no locks held!
  variable = :primes
     where = :work [pthr_prime.c,50]

\* Variable written while no locks held!
  variable = :total
     where = :work [pthr_prime.c,51]

The following shows the result of using LockLint on pthr_prime_fixed.c. Notice that the data races in routine work() are now gone, but the false positives and the false negatives in the previous experiment with pthr_prime.c are still there.

$ cc -mt -Zll pthr_prime_fixed.c
$ lock_lint start
$ lock_lint load pthr_prime_fixed.ll
$ lock_lint analyze -v

\* Warning: A main function was loaded with no annotations to indicate the
      presence or absence of concurrency. Lock_lint will assume concurrency.
  Please annotate source with:
      NOTE(COMPETING_THREADS_NOW) or NOTE(NO_COMPETING_THREADS_NOW)

\* Writable variable read while no locks held!
  variable = :pflag
     where = :is_prime [pthr_prime_fixed.c,31]

\* Variable written while no locks held!
  variable = :pflag
     where = :is_prime [pthr_prime_fixed.c,34]

\* Variable written while no locks held!
  variable = :pflag
     where = :main [pthr_prime_fixed.c,66]

\* Writable variable read while no locks held!
  variable = :total
     where = :main [pthr_prime_fixed.c,81]

\* Writable variable read while no locks held!
  variable = :total
     where = :main [pthr_prime_fixed.c,82]

\* Writable variable read while no locks held!
  variable = :primes
     where = :main [pthr_prime_fixed.c,83]

\* Writable variable read while no locks held!
  variable = :total
     where = :main [pthr_prime_fixed.c,84]

LockLint provides a rich set of source code notations and interactive subcommands that can be used to provide more precise information to LockLint so to improve the analysis.

Tool 2: vpara option in Sun Studio Fortran/C compilers

Strickly, this is not a tool. It is a compile-time check option provided in Sun Studio Fortran and C compilers. The following is from the man page of the cc command.

     -xvpara
          Show parallelization warning messages

          Issues warnings about potential parallel programming
          related problems that may cause incorrect results when
          using OpenMP or Sun/Cray parallel directives and prag-
          mas.

          Use with -xopenmp and OpenMP API directives, or with
          -explictpar and MP parallelization directives.

          Warnings are issued when the compiler detects the fol-
          lowing situations:

          o Loops that are parallelized using MP directives when
          there are data dependencies between different loop
          iterations

          o Problematic use of OpenMP data sharing attributes
          clauses, such as declaring a variable "shared" whose
          accesses in an OpenMP parallel region may cause data
          race, or declaring a variable "private" whose value in
          a parallel region is used after the parallel region.

In short, when -xvpara is used as an option to compile an OpenMP program, the compiler is able to report problems in the source code caused by incorrect use of data sharing attribute clause. One typical problem is data race introduced by incorrectly declaring a variable "shared".

When using vpara checking on the omp_prime.c, the compiler finds the data race between the write accesses to variable total at line 57 by different threads, as illustrated below. The checking analyzes the code enclosed lexically inside an OpenMP parallel region only, therefore it does not find data races in routine is_prime(). The checking also misses the data race on array primes[] due to a technique to reduce false positives. Unfortunately, the technique introduces a false negative here.

$ cc -xopenmp -xO3 -xvpara omp_prime.c -lm
"omp_prime.c", line 53: Warning: inappropriate scoping
        variable 'total' may be scoped inappropriately as 'shared'
        . write at line 57 and write at line 57 may cause data race

$ cc -xopenmp -xO3 -xvpara omp_prime_fixed.c -lm
$

The vpara compile-time checking is based on the static non-concurrency analysis techniques for OpenMP programs, which is also used by the OpenMP autoscoping feature provided in Sun Studio compilers.

Friday Jun 30, 2006

Understanding Data Races 1: the Role of Data Race Detection Tools

This is the first of a series of blogs on understanding data races I am going to post.


With the release of Sun Studio Express (June 2006 Build), we are offering a run-time data race detection tool (DRDT) for developers on Sun's platforms for FREE. It compliments other data race detection tools Sun already offers now.

If you have been bugged by data race problems in the past, you should give it a try. Go here (scroll to 'How to get started') to download it. And here is the page dedicated to the DRDT project.


I would like to start the series with understanding the role data race detection tools first.

Many mt programs have race conditions, the existence of which makes debugging mt programs very hard. One class of race conditions is data race condition or data race. (The difference between general race condtion and data race condition will be explained in another blog.)

Data race is a condition that happens in a program. People often think a data race is always a bug. This is not true. A data race could be the root cause of a bug; it could be caused by a bug; or it could be there because the programmer wants it there.

If a data race is the root cause of a bug, we want to find it. If a data race is caused by a bug, showing where the data race is can help the programmer locate the real bug. If a data race is there by design, we want to make sure it is there and we also want to make sure there is no unexpected data race.

The role of a data race detection tool is to check whether a program contains data races and pin-point the locations of them if there is any.

There are many ways of using a data race detection tool. Some use it as debugging tool: run it when there is a bug in the program. Someone use it as a sanity checking tool: run it as part of regression tests. And some use it as a programming assistance tool in parallelizing sequential programs: find thread unsafe routines and global variables that should be private to threads.

Sunday May 28, 2006

Static Code Analysis Tools

New.com recently has an article on companies making comercial static code analysis tools for checking security flaws.

Companies and products to watch: 

Most of them use context sensitive, interprocedural, cross module, and mixed language analysis. A major difference between the analysis used in static error detection and the one used in compiler optimization is that the former can be incomplete and unsound.


Here is a link to a site that lists a collection of static analysis tools for C code.


About

yuanlin

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today