OpenMP 3.0 tasking model: as smooth as Cilk?

There are 3 reasons behind this blog entry: first of all, the OpenMP 3.0 specification is now available for public comment. That in turn, made OpenMP ARB's CEO Larry Meadows proclaim The Birth of OpenMP 3.0 and introduce the most anticipated new feature: the tasking model. Finally, I've just got back from Super Computing '07 in Reno where OpenMP BOF took place and clearly showed that the OpenMP 3.0 tasking model is not well understood and, worse yet, it seems to be missing a number of important capabilities. Before I jump to my critique, let me just point out one thing. There is no doubt in my mind, that OpenMP 3.0 represents a step in the right direction, yet I can't help but notice its slow pace of development. If it took us 10 years to get to the point where explicit tasks are possible how long would it take us to make them actually useful? And in the meantime, wouldn't programmers be better off sticking to something like Cilk or Rapid Minds frameworks?

Here's how tasks got introduced by Larry:
A task is similar to a lambda function in C#. The task directive is used to describe an explicit task as follows (C/C++ is used for the examples for convenience, there are analogs for Fortran):
#pragma omp task [clauses]
 // code
When a thread encounters the task directive, the data environment is captured. That environment, together with the code represented by the structured block, constitutes the generated task. The task may be executed immediately or may be queued for execution.
Aside from the fact that referring to lambda functions as something from C# would be like introducing Dorian Gray as a fictional character from the major Hollywood movie the rest of the paragraph looks remarkably similar to the concepts from the Cilk language. And this is a "good thing" (tm). Cilk grew out of practical needs for coding a very diverse set of applications on shared memory systems. Pretty much every singly new keyword that Cilk introduced has a war story of how it came about. It is not perfect (no parallel framework is) but it can be considered an important yardstick for evaluating aspiring parallel frameworks for C-like languages. My very firm personal belief is that if Cilk semantics can NOT be easily expressed then a framework has serious issues.

With that in mind, lets see how OpenMP 3.0 tasking model stacks up. It looks like it is fair to say that #pragma omp task is like a spawn and inlet keywords rolled into one and #pragma omp taskwait is like sync, right? Not quite. The biggest difference is that in OpenMP 3.0 it is now possible for a procedure to spawn a bunch of tasks, exit but have the children tasks happily running (Cilk's implicit sync just before the return statement doesn't apply in OpenMP 3.0). This is bad news for runtime ("cactus stack" optimizations become very hard to do) and bad news for debugging and analysis (what should the stack of such an "orphan" task look like?). It also widens the gap between the syntax structure of the program and its semantics. Is it good for anybody? Is it good for you? I'm really curious to see your comments. For now, however, here's my first bullet on "what's wrong with OpenMP 3.0 tasking model" list:
  1. Orphan tasks
But that's semantics, on a syntax side the difference between "almost a closure" feeling I get from the #pragma omp task and a more orderly division of labor between inlet and spawn is definitely up to each individual to [dis]like. You have more freedom in expressing yourself in-line without factoring out the "reduction" code into inlet, but you pay the price of doing manual synchronization (as illustrated by the #pragma omp atomic and #pragma omp taskwait in the following comparison of parallel Fibonacci computation routines):
Cilk:OpenMP 3.0:
cilk int fib (int n)
    int x = 0;
    inlet void summer (int result)
        x += result;

    if (n<2) return n;
    else {
        summer(spawn fib (n-1));
        summer(spawn fib (n-2));
        return (x);
int fib(int n)
    int res = 0;
    if (n<2) return n;
#pragma omp task
             #pragma omp atomic
             res += fib(n-1);
#pragma omp task
             #pragma omp atomic
             res += fib(n-2);
#pragma omp taskwait
    return res;
It might surprise some of you, but at this point we are almost done with our comparison. Cilk is a very nice extension of C, precisely because it is a small one. Out of 6 keywords that Cilk adds to C (cilk, spawn, sync, inlet, abort, SYNCHED) we've already covered 4 and so far OpenMP fairs at least on par with Cilk. Unfortunately, the remaining two keywords spell a bit of disaster. Their semantics does NOT seem to be supported at all by OpenMP 3.0. Frankly such a big oversight on the OpenMP committee's part still baffles me: I don't think I know of any better way to manage speculative execution than calling abort from the inlet. What seems to be particularly ironic is that Cilk itself didn't get abort till its version 4. The evolution it went through is well document in the Introduction to the highly enjoyable paper called: Using Cilk to Write Multiprocessor Chess Programs. Can OpenMP learn from that experience? Will it have to go through its own painful and slow evolution just to arrive to the same conclusion? I don't know, but an issue like this surely belongs to my "what's wrong with OpenMP 3.0 tasking model" list:
  1. Lack of support for speculative tasks
Finally, the lack of support for SYNCHED semantics in OpenMP 3.0 hints at a whole area that is currently underserved by the draft specification: task queue management. However, before I add it to my list as the third and final item, let me quickly explain why is it important to explicitly manage the task queue. You see, before OpenMP 3.0 introduced tasks the runtime had complete knowledge of the size of any chunk of work that had to be done in parallel. We payed the price of following a very rigid set of rules for structuring our for loops, but on a flip side we didn't have to worry about issues like thread starvation or oversubscription. Everything was sort of static and easy to chunk up. It is not anymore. The tiniest example of a list traversal (the very same one that Larry uses to explain why tasks are needed) already has a scheduling problem:
#pragma omp parallel
  #pragma omp single private(p)
    p = listhead ;
    while (p) { 
       #pragma omp task
               process (p)
       p=next (p) ;
First of all, we no longer know the size of the entire job like we used to in case of the #pragma omp parallel for. Which mens that for a list of gazillion items we will either have to create gazillion tasks at once (not likely) or to suspend/resume tasks that are trying to create more tasks (how is runtime to know who is the real producer?). Also, we can no longer give scheduler any hints as to how many iterations of the loop body would constitute a reasonable chunk to be given to one thread (scheduling clauses are not applicable to tasks). The API now got much simpler: we can produce as many tasks as we see fit; and runtime is responsible for scheduling worker threads in order to complete all of them. In a sense, we find ourselves dealing with a classical Producer-consumer problem. With a crucial complication that we have no explicit control over the size of the queue, nor do we have any way for establishing explicit communication between the producer and the consumer. We are left to the mercy of a runtime (and hacks like untied tasks as explained in the BOF slides). We sure deserve better, and with that I give you my list of what's wrong with OpenMP 3.0 tasking model:
  1. Lack of support for task queue management
  2. Lack of support for speculative tasks
  3. Orphan tasks
Do you see anything missing from it? Anything you would like to clarify? All of it is just one "Post comment" button click away!

Post a Comment:
  • HTML Syntax: NOT allowed



Top Tags
« June 2016