Good DRMAA News

I have now tested and checked in an extension to the Grid Engine DRMAA implementations to allow a session to be restarted in 6.0u10. The extension is available in both the C and Java™ language bindings.

Starting with 6.0u10, you will be able to query the session for its session id, and then use that session id to reconnect to a previously closed session. There is a caveat, however. Because the qmaster (usually) disavows all knowledge of jobs that have already completed, jobs which ended between the call to exit() and the second call to init() will be lost. As long as the exit info is not important, you can simply persist a list of the running jobs before you exit, and check if those jobs are still running after reinitializing. Any that aren't, ended while the session was closed.

Here's an example in C:

#include "drmaa.h"

int main(int argc, char \*\*argv) {
   char contact[DRMAA_CONTACT_BUFFER + 1];
   char job_id[DRMAA_JOBNAME_BUFFER + 1];
   ...

   drmaa_init("");
   drmaa_get_contact(contact, ...);
   ...
   drmaa_run_job(job_id, ...);
   drmaa_exit();

   /\* The session is now closed. \*/

   drmaa_init(contact, ...);
   drmaa_wait(job_id, ...);
   drmaa_exit();
}

I left out some of the details for clarity's sake. You can see that I was able to open a session, submit a job, and close the session, and then re-open the session and wait for the job. Of course, if the job had ended before the second call to drmaa_init(), the call to drmaa_wait() would fail.

Let's try that with the Java language binding, too.

import org.ggf.drmaa.\*;

public class DrmaaTest {
   public static void main (String[] args) throws Exception {
      Session s = SessionFactory.getFactory ().getSession ();

      s.init ("");

      String contact = s.getContact ();

      ...

      String job = s.runJob (jt);

      s.exit ();

      s.init (contact);
      s.wait (job, s.TIMEOUT_WAIT_FOREVER);
      s.exit ();
   }
}

This was a much requested feature, and I hope it satisfies some developer needs. Unfortunately, we still don't have an answer for jobs which exit while the session is closed, but we've taken a good and hopefully useful first step. If you can't wait for u10 to be released (which should be somewhere near mid-March), you can just grab the current maintrunk source from CVS.

As if that weren't enough, the u10 release will also update the C language bindings from 0.95 to 1.0. There are two important differences between 0.95 and 1.0. The first is that the drmaa_get_next_\*() calls will return DRMAA_ERRNO_NO_MORE_ELEMENTS instead of DRMAA_ERRNO_INVALID_ARGUMENT when there are no more elements to return. The second is the addition of the drmaa_get_num_\*() calls. These functions allow you to get the count of elements in an opaque string vector so that you can correctly preallocate a buffer array before walking through the entire vector with drmaa_get_next_\*(). Using these new functions, I updated the JNI code for the Java language binding and was able to simplify the code greatly. I was even able to remove one of the error codes! That's what I call improvement.

Unfortunately, it's not all goodness. Because of Sun's product release rules, we can't make 1.0 the default in u10. Instead, we're offering both 0.95 and 1.0 libraries. 0.95 will be the default, but by changing a symbolic link, 1.0 can be used instead. This default applies to both the C and Java language bindings. The Java language binding API is itself unaffected; all that changes is how it uses the C binding. Hopefully with 6.1, the 1.0 library will become with the default, and by 7.0 we should be able to remove the 0.95 library completely.

The difficulty in having two different library versions hanging around is that the error codes changed form 0.95 to 1.0. That means that an application compiled against 0.95 will likely fail when linked against 1.0, and vice versa. On the bright side, though, you finally have a use for the drmaa_version() call!

Comments:

Post a Comment:
Comments are closed for this entry.
About

templedf

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today