Specifying a Username With DRMAA

DRMAA is a standard API for submitting, monitoring, and controlling jobs with a DRM (Distributed Resource Manager). Grid Engine includes DRMAA bindings for C and for the Java™ platform. One of the first ideas most people come up with when looking at DRMAA is to build a daemon that will extend the reach of DRMAA, such as a portal or a web services interface. Traditionally, creating such a daemon in DRMAA had two problems. The first was that jobs are bound to the DRMAA session during which they were submitted. If you lose the session, such as by crashing and/or restarting, you lose contact with your jobs. The second problem was that DRMAA submits jobs as the user running the application, which in the case of a daemon was usually root or some neutral 3rd party, like sgeadmin.

The first problem has been advanced Grid Engine admin class I teach, a solution to this second problem occurred to me, and it doesn't require modifications to Grid Engine. Here's how it works...

First, configure a queue where your daemon will submit all jobs. (It could be more than one queue, but I'm going to continue in the singular.) The important thing is that only jobs from the daemon are allowed to run in the queue. The easiest way to do that is to create a new forced boolean complex, and assign it to the queue. Then have your daemon add a request for that resource to all job submissions. (There are a variety of way to do this, including adding the resource request to the daemon user's sge_request file.)

The reason why you want to isolate the daemon's queue is that you're going to change its starter method. Create a script or program that reads a user name from an environment variable, such as SGE_DRMAA_USERNAME, changes users to that user, and then executes the job script (passed as the first argument to the starter script). An example script might look like:


if [ "$SGE_DRMAA_USERNAME" = "" ]; then
   exit 100

An example C program for Solaris might look like:

#include <stdlib.h>
#include <unistd.h>
#include <sys/types.h>

int main(int argc, char \*\*argv) {
   char \*username = getenv("SGE_DRMAA_USERNAME");

   if (username == NULL) {
      return 100;
   } else {
      struct passwd \*pw = getpwnam(username);

      if (pw == NULL) {
         return 100;
      else {

Next, configure the queue's starter method to be your script/program. See the queue_conf(5) man page for details about the queue configuration. Because Grid Engine will run the starter script as the user who submitted the job, if your daemon is running as root, the starter script will be run as root, giving it permission to change the user id. If running the daemon as root is a problem in your environment, you can get around it using Solaris role-based access control (RBAC) or similar mechanisms on other operating systems to assign setuid permission to the user as whom the daemon is running.

Now, whenever your daemon accepts a job submission, it should attach the environment variable, with the name of the user as the value, to the job's environment. See the job environment DRMAA attribute (drmaa_v_argv in the drmaa_attributes(3) man page for C and the jobEnvironment property of the JobTemplate class for the Java binding) for details.

This arrangement does represent a security hole. The starter script blindly believes whatever the environment variable says. A malicious user could set the environment variable to root for his job and then submit it to your queue. Bad news. To prevent this security hole, create a public/private key pair for your daemon. Instead of putting the cleartext username in the environment variable, encrypt it first with the private key. The starter script must then use the daemon's public key to decrypt the username.

But there's still a security hole. A malicious user could snoop the job submission, lift the encrypted username and reuse that encrypted username for his own jobs. Eliminating this security hole is a little trickier. One solution might be to also include an encrypted sequence number that gets incremented with every job, forcing the starter method to globally track which sequence numbers have already been used (because jobs may ultimately be scheduled in any order). To close the hole completely, you'd have to verify that the sequence number belongs to the job being run. With that in mind, the best approach might be to have the starter method contact the daemon and report the decrypted sequence number. The daemon would then respond with the associated job number. If the job number's don't match, the job is a fake. To be completely secure, that communication should happen over SSL.

I haven't actually tried this approach yet, but it's on my list of things to do. If anyone out there gives it a go, I'd be very interested to hear how it went.


Hi Dan,

Great work! One question that have is that doesn't this mean that the starter program, be it a bash script or a compiled C program, would need to have the setuid bit set? Otherwise, how would the user that the grid software runs as have permission to switch identities to other users? Another possibility is to use sudo or perhaps fakeroot...


Posted by Victor on September 01, 2007 at 10:29 AM PDT #

Victor, thanks for the input. I amended the text to answer your question. Basically, the user as whom the daemon runs must have permission to change UIDs. The simplest way is by running the daemon as root, but there are other options.

Posted by Daniel Templeton on September 04, 2007 at 08:42 AM PDT #

Interesting. We were thinking about writing a multithreaded daemon which would use DRMAA to take user's jobs from an external DB and submit them to SGE.

The only problem I see is that you lose the ability to schedule jobs by user or use any username based control as all jobs will have the same username. But if you are scheduling jobs based solely by project and job then I suppose you're okay.

Posted by Dee on January 17, 2008 at 05:28 AM PST #

Post a Comment:
  • HTML Syntax: NOT allowed



« July 2016