Getting Started With DRMAA

For all the howtos I've put online about how to use DRMAA, I've yet to provide any clear description of how to go from nothing to DRMAA-enabled app. This blog entry will try to correct that oversight. Instead of explaining how to plan a real, product grid, I am just going to walk you through installing a local test cluster.

Downloading & Installing

First of all, to use DRMAA, you have to have Grid Engine 6.0 (open source) or Sun N1 Grid Engine 6.0 (product) installed and functional. If you want to use the Java language binding, you'll need version 6.0u3 at least.

After you've obtained the binaries, unpack them somewhere not NFS mounted. Cd to that directory. You will find an install_qmaster script. Run it. Here's what my answers to the prompts look like:

    Hit <RETURN> to continue >>
    Hit <RETURN> if this is ok or stop the installation with CTRL-C >>
    If this directory is not correct (e.g. it may contain an automounter prefix) enter the correct path to this directory of hit <RETURN> to use default [/tmp/dant] >>
    Hit <RETURN> to continue >>
    Please add the service now or press <RETURN> to go to entering a port number >>
    Check again (enter [n] to enter a port number) (y/n) [y] >> n
    Please enter an unused port number >> 60220
    Hit <RETURN> to continue >>
    Please add the service now or press <RETURN> to go to entering a port number >>
    Check again (enter [n] to enter a port number) (y/n) [y] >> n
    Please enter an unused port number >> 60221
    Hit <RETURN> to continue >>
    Enter cell name [default] >>
    Hit <RETURN> to continue >>
    Do you want to select another qmaster spool directory (y/n) [n] >>
    Hit <RETURN> to continue >>
    Are all hosts of your cluster in a single DNS domain (y/n) [y] >>
    Hit <RETURN> to continue >>
    Hit <RETURN> to continue >>
    Please choose a spooling method (berkeleydb|classic) [berkeleydb] >>
    Do you want to use a Berkeley DB Spooling Server? (y/n) [n] >>
    Hit <RETURN> to continue >>
    Default: [/tmp/dant/default/spool/spooldb] >>
    Hit <RETURN> to continue >>
    Please enter a range >> 20000-20100
    Hit <RETURN> to continue >>
    Default: [/tmp/dant/default/spool] >>
    Default: [none] >>
    Do you want to change the configuration parameters (y/n) [n] >>
    Hit <RETURN> to continue >>
    Hit <RETURN> to continue >>
    Hit <RETURN> to continue >>
    Do you want to use a file which contains the list of hosts (y/n) [n] >>
    Host(s): mydesktop
    Finished Adding hosts. Hit <RETURN> to continue >>
    Do you want to add your shadow host(s) now? (y/n) [y] >> n
    Hit <RETURN> to continue >>
    Default configuration is [1] >>
    Do you agree? (y/n) [y] >>
    Hit <RETURN> to see where Grid Engine logs messages >>
    Do you want to see previous screen about using Grid Engine again (y/n) [n] >>
    Hit <RETURN> >>

In the above lines where I didn't enter anything, I just pressed enter to accept the defaults. For the port numbers, pick ones that aren't already in use. Same for the GID's. You can run the install as root, but on Solaris/SPARC and Linux/x86, it's not required, and I usually don't. On Solaris/x86 you have to install as root for the execution daemon to work correctly. When installing as root, it goes the same as above, except for three things.

  1. It will offer to let you install Grid Engine as the user who own the directory. That is a good idea.
  2. It will ask if you installed the binaries with pkgadd. Be honest. If you enter 'n', it will ask if it should update the permissions. It should.
  3. It will also give you the option of adding an RC script to start the cluster at boot time. I usually say no, but it's your choice.

For clarity's sake, let's look at what the above supplied answers actually mean. First of all, the first directory path is for the $SGE_ROOT. $SGE_ROOT is the root of the Grid Engine installation. I normally install in the same directory where I unpacked the binaries, but that's not a requirement.

Next the port numbers are used internally by Grid Engine for the various components to talk among themselves. The qmaster port is used by all the components to connect to the qmaster. These include all the client utilities, like qsub, qstat, etc. The execd port is used by the qmaster to find running execution daemons.

The cell name is the $SGE_CELL. Grid Engine allows you to have mutliple different configurations running simultaneously. In other words, using the same set of binaries, I could have different grids running on different port numbers. The way Grid Engine keeps the different instances separate is by dividing them into cells. A cell represents a complete configuration for a grid instance. All of the configuration information for a cell will be stored in the $SGE_ROOT/$SGE_CELL directory.

The next bit is about admin and submission hosts. An admin host is a host from which admin users are allowed to execute admin commands, like qconf or qmod. A submission host is a host from which users are allowed to submit jobs, via qsub, qlogin, qrsh, qsh, or qtcsh. By default, the host you're installing will be made an admin host. This part gives you the option to also make it (and other hosts) a submission host. If you skip this step, you can always add the host later with qconf -as <hostname>.

Next comes the shadow host. A shadow host is Grid Engine's version of high availability. A shadow host runs a shadow daemon, which watches the qmaster and takes over if the qmaster becomes unresponsive, i.e. goes down. I generally skip this part since it's not really useful in a quick test cluster.

The scheduler configuration tells the scheduler how to behave. For a test cluster, the default value is fine.


When the install script finishes, you need to set up your environment. If you're in csh/tcsh,

    source default/common/settings.csh

If you're in sh/ksh/bash,

    . default/common/settings.sh

Next, run the install_execd script. You can accept all the defaults. Really. I usually just hold down the return key until I get the shell prompt again.

When the install script is done, Grid Engine should be installed and running. Run

    qstat -f

and you should see an entry for all.q@hostname. If so, everything is set up.

Writing & Compiling With C

For programs written using the C binding, you have to include the header file "drmaa.h."

When compiling your program, you have to include the Grid Engine include directory (<install_dir>/include) and link to the DRMAA shared library (<install_dir>/lib/<arch>/libdrmaa.so). This is normally done with something like

    cc -o app -c app.c -I <install_dir>/include -ldrmaa

Writing & Compiling With the Java Language

For programs written using the Java language binding, you'll need to import the org.ggf.drmaa package.

When compiling, the DRMAA jar file (<install_dir>/lib/drmaa.jar) needs to be in the classpath (CLASSPATH).

Running Your App

Whether using the C or Java language binding, the DRMAA shared library needs to be included in the shared library path (LD_LIBRARY_PATH on Solaris and Linux). For a Java language binding app, the DRMAA jar file also needs to be included in the classpath. When running a Java language binding app on 64-bit Solaris SPARC, you will need to run the VM with the -d64 option. Otherwise, Solaris will complain about an ELF class mismatch.

For a cheap thrill, source the file <install_dir>/util/dl.csh (or dl.sh) and then run

    dl 2

before running your app. This turns on debugging, and lets you see everything that's happening under the covers. To turn it off again, run

    dl 0

That's really all there is to it. For specifics on writing DRMAA apps, see the above mentioned Howto section of the Grid Engine website. For more information about installing or managing Grid Engine, see docs.sun.com.

Comments:

Post a Comment:
  • HTML Syntax: NOT allowed
About

templedf

Search

Archives
« April 2015
SunMonTueWedThuFriSat
   
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
  
       
Today