Thursday Sep 20, 2007

Common Mistakes in Using OpenMP 5: Assuming Non-existing Synchronization Before Entering Worksharing Construct

There is no synchronization between the threads in a team when they enter a worksharing construct. Many people assume there is a barrier before the threads enter a worksharing construct, especially when there is a FIRSTPRIVATE used in the worksharing construct. This is a common mistake.

For example, in the following code, assume two threads - thread 1 and thread 2 are in the team, and Read1 is executed by thread 1 and Read2 is executed by thread 2.

  #pragma omp parallel
     if (omp_get_thread_num()==0)
        z = 1;
        z = 2;
     #pragma omp sections firstprivate(z)
       #pragma omp section
          ... = z;      // Read1
       #pragma omp section
          ... = z;      // Read2

What are the values of z at Read1 and Read2? All the following three combinations are possible,

  1. Read1:1 Read2:1
  2. Read1:1 Read2:2
  3. Read1:2 Read2:2

If there were a synchronization before the worksharing construct, then the above (Read1:1, Read2:2) is not possible.

Now, look at the following example which has both FIRSTPRIVATE and LASTPRIVATE,

  #pragma omp parallel
     z = 1;
     #pragma omp for firstprivate(z) lastprivate(z) nowait
     for (i=0; i<n; i++) {
          ... = z;      // Read1
          z = 2;        // Write1

What could be the value of z at Read1? Would it be 2? OpenMP 3.0 Draft has clarified this situation. It says

If a list item appears in both firstprivate and lastprivate clauses, the update required for lastprivate occurs after all initializations for firstprivate.

So, the value of z at Read1 cannot be 2.

Thursday Jun 29, 2006

The idea behind environment variable "SUNW_MP_MAX_POOL_THREADS"

Sun's OpenMP implementation supports true nested parallel regions - when nested parallelism is enabled, the inner parallel region can be executed by multiple threads concurrently.

We provide an environment variable called SUNW_MP_MAX_POOL_THREADS for users to control the total number of OpenMP slave threads in a process.

For example, if you have want a maximum of 16 threads to be used for a nest of parallel regions in your program, you can set SUNW_MP_MAX_POOL_THREADS to 15. That's 15 slave threads (some of them may become masters in inner parallel regions) plus one user thread which is the master thread for the out-most parallel region.

Why did we design an environment variable like SUNW_MP_MAX_NUM_THREADS so that a user can set it to 16 in the above example? Intel's implementation has KMP_ALL_THREADS and KMP_MAX_THREADS which do that.

Well, we were trying to have a scheme that works on more general cases, not just pure OpenMP codes. In particular, we think our scheme works better than others for mixed pthread and OpenMP thread code. The pool defines a set of threads that can be used as OpenMP slave threads. If the program has two pthreads and both will create a team, then both will try to grab slave threads from the same pool. The env var SUNW_MP_MAX_POOL_THREADS was NOT designed for users to control the total number of threads in a process. We cannot control that because of the use of pthreads. The env var is designed for users to control the total number of OpenMP slave threads.

The env var SUNW_MP_MAX_NUM_THREADS is documented here. We also have a short article "How Many Threads Does It Take?" if you want to understand it better.

Sunday Jun 11, 2006

Common Mistakes in Using OpenMP 4: Orphaned Worksharing Constructs

More precisely, this mistake should be classified as a common mis-understanding of OpenMP.

When a worksharing construct, such omp for or omp sections, is encountered outside any explicit parallel region, the arising worksharing region is called orphaned worksharing region. A common mis-understanding is that in this case the worksharing construct is simply being ignored and the region is executed sequentially.

Orphaned worksharing constructs are not ignored. All the data sharing attribute clauses are honored. The worksharing regin is executed as if a team of only one thread is executing the region.

For example, in the following C++ code,

         class_type_1  a;
         #pragma omp for private(a) schedule(dynamic)
         for (i=1; i<100; i++) {
             printf("%d\\n", i);

the default constructor for class_type_1 will be called, and a comforming implementation is not forced to execute the loop in the order of 1, 2, 3, ..., 99.

Wednesday Jun 07, 2006

Common Mistakes in Using OpenMP 3: Fifteen Cases from a IWOMP 2006 paper by Michael Süß and Claudia Leopold

The coming International Workshop on OpenMP (IWOMP 2006) has a paper titled "Common Mistakes in OpenMP and How to Avoid Them" written by Michael Süß and Claudia Leopold (University of Kassel, Germany).

The result is based on a survey of two undergraduate courses. The authors of the paper kindly allow me to list the 15 common mistakes presented in their paper here,

  1. (Correctness) Access to shared variables not protected
  2. (Correctness) Use of locks without flush
  3. (Correctness) Read of shared variable without flush
  4. (Correctness) Forget to mark private variables as such
  5. (Correctness) Use of ordered clause without ordered construct
  6. (Correctness) Declare loop variable in #pragma omp parallel for as shared
  7. (Correctness) Forget to put down for in #pragma omp parallel for
  8. (Correctness) Try to change num. of thr. in parallel reg. after start of reg.
  9. (Correctness) omp_unset_lock() called from non-owner thread
  10. (Correctness) Attempt to change loop variable while in #pragma omp for
  11. (Performance) Use of critical when atomic would be sufficient
  12. (Performance) Put too much work inside critical region
  13. (Performance) Use of orphaned construct outside parallel region
  14. (Performance) Use of unnecessary flush
  15. (Performance) Use of unnecessary critical

For detail, please read the full paper.

Monday Feb 20, 2006

Common Mistakes in Using OpenMP 2: Atomic

The following code finds good members in array member[] and stores the indices of the good members in array good_members[].

#define N 1000

struct data member[N];

int good_members[N];

int pos = 0;

void find_good_members()
for (i=0; i < N; i++) {
if (is_good(member[i])) {
good_members[pos] = i;
pos ++;

The following is a navie way of parallelizing the above code,

#define N 1000

struct data member[N];

int good_members[N];

int pos = 0;

void find_good_members()
#pragma omp parallel for
for (i=0; i < N; i++) {
if (is_good(member[i])) {
good_members[pos] = i; // line a
#pragma omp atomic
pos ++; // line b

In order to avoid data races between different updates of global variable pos, the code puts the increment (at line b) in a atomic construct. However, the code does not work, because there is a data race between the read of pos at line a and write of pos at line b.

Changing the body of the if statement to the following gives the correct result.

      int mypos;
#pragma omp critical
mypos = pos;
pos ++;
good_members[mypos] = i;

In OpenMP 2.5 (the latest Specification), inside a parallel region, the only place where you can safely get the value of a variable that is updated in an atomic region is another atomic region.

Friday Dec 30, 2005

Common Mistakes in Using OpenMP 1: Incorrect Directive Format

In C/C++, OpenMP directives are specified by using the #pragma mechanism; and in Fortran, they are specified by using special comments that are identified by unique sentinels.

This design allows users to write OpenMP programs that can be compiled with compilers that do not support OpenMP or compiled with OpenMP compiles with OpenMP support disabled.

However, if you do not follow the directive format, you might get a program that compiles and runs but gives unexpected results, because the compiler does not recognize your OpenMP directives and thinks they are non-OpenMP related pragmas (C/C++) or regular comments (Fortran).


How many "me"s does the following code print? Assume a team of 4 threads are executing the parallel region.

    #pragma omp parallel
        #pragma single

Common Mistakes in Using OpenMP

I will post a list of common mistakes found in parallel programs written using OpenMP.

Although it is always true that users of a language need to spend effort to understand the language so to avoid mistakes, I wonder what it means to the language designers if many many users keep making the same set of mistakes again and again.




« March 2015