Thursday Sep 20, 2007

Common Mistakes in Using OpenMP 5: Assuming Non-existing Synchronization Before Entering Worksharing Construct

There is no synchronization between the threads in a team when they enter a worksharing construct. Many people assume there is a barrier before the threads enter a worksharing construct, especially when there is a FIRSTPRIVATE used in the worksharing construct. This is a common mistake.

For example, in the following code, assume two threads - thread 1 and thread 2 are in the team, and Read1 is executed by thread 1 and Read2 is executed by thread 2.

  #pragma omp parallel
     if (omp_get_thread_num()==0)
        z = 1;
        z = 2;
     #pragma omp sections firstprivate(z)
       #pragma omp section
          ... = z;      // Read1
       #pragma omp section
          ... = z;      // Read2

What are the values of z at Read1 and Read2? All the following three combinations are possible,

  1. Read1:1 Read2:1
  2. Read1:1 Read2:2
  3. Read1:2 Read2:2

If there were a synchronization before the worksharing construct, then the above (Read1:1, Read2:2) is not possible.

Now, look at the following example which has both FIRSTPRIVATE and LASTPRIVATE,

  #pragma omp parallel
     z = 1;
     #pragma omp for firstprivate(z) lastprivate(z) nowait
     for (i=0; i<n; i++) {
          ... = z;      // Read1
          z = 2;        // Write1

What could be the value of z at Read1? Would it be 2? OpenMP 3.0 Draft has clarified this situation. It says

If a list item appears in both firstprivate and lastprivate clauses, the update required for lastprivate occurs after all initializations for firstprivate.

So, the value of z at Read1 cannot be 2.

Sunday Jun 11, 2006

Common Mistakes in Using OpenMP 4: Orphaned Worksharing Constructs

More precisely, this mistake should be classified as a common mis-understanding of OpenMP.

When a worksharing construct, such omp for or omp sections, is encountered outside any explicit parallel region, the arising worksharing region is called orphaned worksharing region. A common mis-understanding is that in this case the worksharing construct is simply being ignored and the region is executed sequentially.

Orphaned worksharing constructs are not ignored. All the data sharing attribute clauses are honored. The worksharing regin is executed as if a team of only one thread is executing the region.

For example, in the following C++ code,

         class_type_1  a;
         #pragma omp for private(a) schedule(dynamic)
         for (i=1; i<100; i++) {
             printf("%d\\n", i);

the default constructor for class_type_1 will be called, and a comforming implementation is not forced to execute the loop in the order of 1, 2, 3, ..., 99.

Concurrency vs Parallelism, Concurrent Programming vs Parallel Programming


In the danger of hairsplitting, ...

Concurrency and parallelism are NOT the same thing. Two tasks T1 and T2 are concurrent if the order in which the two tasks are executed in time is not predetermined,

  • T1 may be executed and finished before T2,
  • T2 may be executed and finished before T1,
  • T1 and T2 may be executed simultaneously at the same instance of time (parallelism),
  • T1 and T2 may be executed alternatively,
  • ...

If two concurrent threads are scheduled by the OS to run on one single-core non-SMT non-CMP processor, you may get concurrency but not parallelism. Parallelism is possible on multi-core, multi-processor or distributed systems.

Concurrency is often referred to as a property of a program, and is a concept more general than parallelism.

Interestingly, we cannot say the same thing for concurrent programming and parallel programming. They are overlapped, but neither is the superset of the other. The difference comes from the sets of topics the two areas cover. For example, concurrent programming includes topic like signal handling, while parallel programming includes topic like memory consistency model. The difference reflects the different orignal hardware and software background of the two programming practices.

Update: More on Concurrency vs Parallelism THIS BLOG HAS BEEN MOVED TO

Wednesday Jun 07, 2006

Common Mistakes in Using OpenMP 3: Fifteen Cases from a IWOMP 2006 paper by Michael Süß and Claudia Leopold

The coming International Workshop on OpenMP (IWOMP 2006) has a paper titled "Common Mistakes in OpenMP and How to Avoid Them" written by Michael Süß and Claudia Leopold (University of Kassel, Germany).

The result is based on a survey of two undergraduate courses. The authors of the paper kindly allow me to list the 15 common mistakes presented in their paper here,

  1. (Correctness) Access to shared variables not protected
  2. (Correctness) Use of locks without flush
  3. (Correctness) Read of shared variable without flush
  4. (Correctness) Forget to mark private variables as such
  5. (Correctness) Use of ordered clause without ordered construct
  6. (Correctness) Declare loop variable in #pragma omp parallel for as shared
  7. (Correctness) Forget to put down for in #pragma omp parallel for
  8. (Correctness) Try to change num. of thr. in parallel reg. after start of reg.
  9. (Correctness) omp_unset_lock() called from non-owner thread
  10. (Correctness) Attempt to change loop variable while in #pragma omp for
  11. (Performance) Use of critical when atomic would be sufficient
  12. (Performance) Put too much work inside critical region
  13. (Performance) Use of orphaned construct outside parallel region
  14. (Performance) Use of unnecessary flush
  15. (Performance) Use of unnecessary critical

For detail, please read the full paper.

Monday Feb 20, 2006

Common Mistakes in Using OpenMP 2: Atomic

The following code finds good members in array member[] and stores the indices of the good members in array good_members[].

#define N 1000

struct data member[N];

int good_members[N];

int pos = 0;

void find_good_members()
for (i=0; i < N; i++) {
if (is_good(member[i])) {
good_members[pos] = i;
pos ++;

The following is a navie way of parallelizing the above code,

#define N 1000

struct data member[N];

int good_members[N];

int pos = 0;

void find_good_members()
#pragma omp parallel for
for (i=0; i < N; i++) {
if (is_good(member[i])) {
good_members[pos] = i; // line a
#pragma omp atomic
pos ++; // line b

In order to avoid data races between different updates of global variable pos, the code puts the increment (at line b) in a atomic construct. However, the code does not work, because there is a data race between the read of pos at line a and write of pos at line b.

Changing the body of the if statement to the following gives the correct result.

      int mypos;
#pragma omp critical
mypos = pos;
pos ++;
good_members[mypos] = i;

In OpenMP 2.5 (the latest Specification), inside a parallel region, the only place where you can safely get the value of a variable that is updated in an atomic region is another atomic region.

Friday Dec 30, 2005

Common Mistakes in Using OpenMP 1: Incorrect Directive Format

In C/C++, OpenMP directives are specified by using the #pragma mechanism; and in Fortran, they are specified by using special comments that are identified by unique sentinels.

This design allows users to write OpenMP programs that can be compiled with compilers that do not support OpenMP or compiled with OpenMP compiles with OpenMP support disabled.

However, if you do not follow the directive format, you might get a program that compiles and runs but gives unexpected results, because the compiler does not recognize your OpenMP directives and thinks they are non-OpenMP related pragmas (C/C++) or regular comments (Fortran).


How many "me"s does the following code print? Assume a team of 4 threads are executing the parallel region.

    #pragma omp parallel
        #pragma single

Common Mistakes in Using OpenMP

I will post a list of common mistakes found in parallel programs written using OpenMP.

Although it is always true that users of a language need to spend effort to understand the language so to avoid mistakes, I wonder what it means to the language designers if many many users keep making the same set of mistakes again and again.




« July 2016