Tuesday Jan 08, 2008

Gulf of Execution

Gulf of Execution is a term used to describe the the difference between the steps one actually needs to take to achieve a goal and the steps that one perceives.

After learning this term, the example that quickly jumps into my mind is setting up those wifi-enabled devices, like Wii, PSP, NDS, Wireless gateway, etc. In my experience, the one with the narrowest gap is iPhone. The worst one is, well, some operating system.

Michael G Schwern had a blog entry about this on Perl.

Who wide is the Gulf in your favorite parallel programming language/model/scheme/library?

Thursday Sep 20, 2007

Common Mistakes in Using OpenMP 5: Assuming Non-existing Synchronization Before Entering Worksharing Construct

There is no synchronization between the threads in a team when they enter a worksharing construct. Many people assume there is a barrier before the threads enter a worksharing construct, especially when there is a FIRSTPRIVATE used in the worksharing construct. This is a common mistake.

For example, in the following code, assume two threads - thread 1 and thread 2 are in the team, and Read1 is executed by thread 1 and Read2 is executed by thread 2.

  #pragma omp parallel
     if (omp_get_thread_num()==0)
        z = 1;
        z = 2;
     #pragma omp sections firstprivate(z)
       #pragma omp section
          ... = z;      // Read1
       #pragma omp section
          ... = z;      // Read2

What are the values of z at Read1 and Read2? All the following three combinations are possible,

  1. Read1:1 Read2:1
  2. Read1:1 Read2:2
  3. Read1:2 Read2:2

If there were a synchronization before the worksharing construct, then the above (Read1:1, Read2:2) is not possible.

Now, look at the following example which has both FIRSTPRIVATE and LASTPRIVATE,

  #pragma omp parallel
     z = 1;
     #pragma omp for firstprivate(z) lastprivate(z) nowait
     for (i=0; i<n; i++) {
          ... = z;      // Read1
          z = 2;        // Write1

What could be the value of z at Read1? Would it be 2? OpenMP 3.0 Draft has clarified this situation. It says

If a list item appears in both firstprivate and lastprivate clauses, the update required for lastprivate occurs after all initializations for firstprivate.

So, the value of z at Read1 cannot be 2.

Sunday Jun 11, 2006

Common Mistakes in Using OpenMP 4: Orphaned Worksharing Constructs

More precisely, this mistake should be classified as a common mis-understanding of OpenMP.

When a worksharing construct, such omp for or omp sections, is encountered outside any explicit parallel region, the arising worksharing region is called orphaned worksharing region. A common mis-understanding is that in this case the worksharing construct is simply being ignored and the region is executed sequentially.

Orphaned worksharing constructs are not ignored. All the data sharing attribute clauses are honored. The worksharing regin is executed as if a team of only one thread is executing the region.

For example, in the following C++ code,

         class_type_1  a;
         #pragma omp for private(a) schedule(dynamic)
         for (i=1; i<100; i++) {
             printf("%d\\n", i);

the default constructor for class_type_1 will be called, and a comforming implementation is not forced to execute the loop in the order of 1, 2, 3, ..., 99.

Wednesday Jun 07, 2006

Common Mistakes in Using OpenMP 3: Fifteen Cases from a IWOMP 2006 paper by Michael Süß and Claudia Leopold

The coming International Workshop on OpenMP (IWOMP 2006) has a paper titled "Common Mistakes in OpenMP and How to Avoid Them" written by Michael Süß and Claudia Leopold (University of Kassel, Germany).

The result is based on a survey of two undergraduate courses. The authors of the paper kindly allow me to list the 15 common mistakes presented in their paper here,

  1. (Correctness) Access to shared variables not protected
  2. (Correctness) Use of locks without flush
  3. (Correctness) Read of shared variable without flush
  4. (Correctness) Forget to mark private variables as such
  5. (Correctness) Use of ordered clause without ordered construct
  6. (Correctness) Declare loop variable in #pragma omp parallel for as shared
  7. (Correctness) Forget to put down for in #pragma omp parallel for
  8. (Correctness) Try to change num. of thr. in parallel reg. after start of reg.
  9. (Correctness) omp_unset_lock() called from non-owner thread
  10. (Correctness) Attempt to change loop variable while in #pragma omp for
  11. (Performance) Use of critical when atomic would be sufficient
  12. (Performance) Put too much work inside critical region
  13. (Performance) Use of orphaned construct outside parallel region
  14. (Performance) Use of unnecessary flush
  15. (Performance) Use of unnecessary critical

For detail, please read the full paper.

Monday Feb 20, 2006

Common Mistakes in Using OpenMP 2: Atomic

The following code finds good members in array member[] and stores the indices of the good members in array good_members[].

#define N 1000

struct data member[N];

int good_members[N];

int pos = 0;

void find_good_members()
for (i=0; i < N; i++) {
if (is_good(member[i])) {
good_members[pos] = i;
pos ++;

The following is a navie way of parallelizing the above code,

#define N 1000

struct data member[N];

int good_members[N];

int pos = 0;

void find_good_members()
#pragma omp parallel for
for (i=0; i < N; i++) {
if (is_good(member[i])) {
good_members[pos] = i; // line a
#pragma omp atomic
pos ++; // line b

In order to avoid data races between different updates of global variable pos, the code puts the increment (at line b) in a atomic construct. However, the code does not work, because there is a data race between the read of pos at line a and write of pos at line b.

Changing the body of the if statement to the following gives the correct result.

      int mypos;
#pragma omp critical
mypos = pos;
pos ++;
good_members[mypos] = i;

In OpenMP 2.5 (the latest Specification), inside a parallel region, the only place where you can safely get the value of a variable that is updated in an atomic region is another atomic region.

Friday Dec 30, 2005

Common Mistakes in Using OpenMP 1: Incorrect Directive Format

In C/C++, OpenMP directives are specified by using the #pragma mechanism; and in Fortran, they are specified by using special comments that are identified by unique sentinels.

This design allows users to write OpenMP programs that can be compiled with compilers that do not support OpenMP or compiled with OpenMP compiles with OpenMP support disabled.

However, if you do not follow the directive format, you might get a program that compiles and runs but gives unexpected results, because the compiler does not recognize your OpenMP directives and thinks they are non-OpenMP related pragmas (C/C++) or regular comments (Fortran).


How many "me"s does the following code print? Assume a team of 4 threads are executing the parallel region.

    #pragma omp parallel
        #pragma single

Common Mistakes in Using OpenMP

I will post a list of common mistakes found in parallel programs written using OpenMP.

Although it is always true that users of a language need to spend effort to understand the language so to avoid mistakes, I wonder what it means to the language designers if many many users keep making the same set of mistakes again and again.

Sunday Dec 25, 2005

Must Read: ACM Queue Microprocessors issue (9-2005)

The following articles from the ACM Queue Microprocessors issue (vol. 3, no. 7 - September 2005) are must reads.

Multicore CPUs for the Masses
Mache Creeger, Emergent Technology Associates

Software and the Concurrency Revolution
Herb Sutter and James Larus, Microsoft

The Price of Performance
Luiz André Barroso, Google

Extreme Software Scaling
Richard McDougall, Sun Microsystems

The Future of Microprocessors
Kunle Olukotun and Lance Hammond, Stanford University




« March 2015