Wednesday Jun 27, 2007

Announcing Grid Engine 6.0 Update 11

We've just announced the Grid Engine 6.0 update 11 release. This update includes multiple CLI fixes, a nasty scheduler memory leak fix, and a new option for qsub, -terse, that has qsub print out only the id of the submitted job. See the announcement for the complete list.

Monday Apr 16, 2007

Using the DRMAA Perl Binding

I'm flying out to a customer site this week to help with a trial integration between their application and Tim Harsch's DRMAA Perl binding for Grid Engine. This trip finally gave me an excuse to pull down Tim's binding and play with it a little. What follows is my helpful guide to the DRMAA Perl binding.

First thing you have to do is get a copy of Tim's module. You can find it at CPAN as Schedule-DRMAAc-0.81. In the upper right corner you'll find the download link. Once you've downloaded the link, you'll have to build the module. Tim was nice enough to include the pregenerated SWIG file, so you don't need to have SWIG, but you will need a perl interpreter.

In the module directory, there's a README that details the steps for building the module. Below is my annotated version of those steps:

  1. Source /default/common/settings.csh or /default/common/
  2. If you're using Solaris, determine whether your Perl binary is 32-bit or 64-bit. The easiest way is to run file `which perl`, assuming Perl is in your path.
    1. If it's 64-bit and you're running Grid Engine 6.0, you're fine
    2. If it's 64-bit and you're running Grid Engine 6.1, you need to set your library path to include $SGE_ROOT/lib/`$SGE_ROOT/util/arch`
    3. If it's 32-bit and you're on an AMD machine, you need to set your library path to start with $SGE_ROOT/lib/sol-x86. Note that you will probably have to download the x86 Grid Engine binaries and install them.
    4. If it's 32-bit and you're on a SPARC machine, you need to set your library path to start with $SGE_ROOT/lib/sol-sparc. Note that you will probably have to download the 32-bit SPARC Grid Engine binaries and install them.
  3. Make a local link to the DRMAA header file: ln -s $SGE_ROOT/include/drmaa.h
  4. Create the makefile: perl Makefile.PL
  5. Build the module: make
  6. Test the module: make test. Make sure that your Grid Engine grid is running before doing this step.
  7. Install the module as root: make install

Once you have the module installed, you're ready to give it a try. In the module directory you'll find an examples subdirectory. Try running one or two of these scripts. If you're successful, you're ready to start writing code! Documentation for the module gets installed in the perl man directory when you install the module. To view the module docs, run man -M /usr/perl5/man Schedule::DRMAAc. (On my machine it's actually /usr/perl5/5.8.4/man.) As the docs are a little hard to read, you might want to also check out the C binding howto. It's for C, not Perl, but the Perl binding is a wrapper around the C binding, so all the same rules still apply. In any case there should be enough examples in the examples directory and the C binding howto to get you started with whatever project you have planned.

Friday Apr 13, 2007

Enabling Debugging Output

Since it came up recently on the Grid Engine mailing list, let's talk about how to get debugging output from Grid Engine. It's often useful to know what's going on behind the scenes. For example, I occasionally run into a problem where the qmaster will crash immediately after starting. Turning on debugging output enables me to see exactly why it's crashing. (Usually it's something dumb that I did.)

The first step to turning on the debugging output is to source the $SGE_ROOT/util/ or $SGE_ROOT/util/dl.csh file. Once you've sourced one of these files, you can set the debug level using the dl command. dl takes one argument, the debug level. There are ten preconfigured debug levels. Each level represents a combination of layers and classes.

There are 8 debug layers in Grid Engine. They are:

  1. Top -- this is where general debugging information lives
  2. CULL -- debugging information specifically related to the Common User Linked List routines
  3. Basis -- I think this is intended for utility operation debugging information; currently only the JGDI makes much use of it
  4. GUI -- debugging information from qmon
  5. Unused -- not surprisingly, it's unused
  6. Commd -- essentially unused; this was for the commd from 5.3, but the commd replacement in 6.0, the comm lib, has it's own multi-threaded logging facility
  7. GDI -- debugging information specifically related to the Grid Database Interface, the protocol that the qmaster speaks
  8. Pack -- debugging information about the packing and unpacking of data for network communications

Grid Engine also has 8 debugging classes. They are:

  1. Trace -- shows information about entering and exiting functions
  2. Info -- general debugging information
  3. Job trace -- apparently unused
  4. Special -- apparently unused
  5. Timing -- used to report job start time in the execd
  6. Lock -- used by the locking library to output lock information
  7. Free Y -- apparently unused
  8. Free Z -- apparently unused

As you can see, many of the debugging classes are unused. The Grid Engine debugging mechanism is intended mostly as a tool for the Grid Engine developers. The unused classes provide places for developers to put temporary debugging output during product development.

Setting the debug level for Grid Engine means assigning a class to each enabled debugging layer. The 10 predefined debugging levels are:

  1. Top = Info
  2. Top = Trace + Info
  3. Top + CULL + GDI = Info
  4. Top + CULL + GDI = Trace + Info
  5. Top + GUI + GDI = Info
  6. Top + CULL + Basis + Commd + GDI = Lock
  7. Unused = Trace + Info
  8. Top + Commd = Info
  9. Top + Commd = Trace + Info
  10. Top + CULL + Basis + Pack = Trace + Info

For general configuration debugging, levels 1, 3, and 5 are the most useful. Very experienced admins might be able to make good use of 2 and 4. 6 is only useful for debugging deadlocks and bottlenecks caused by locking. 7 is only for developers and has no effect by default. Because the Commd layer isn't really used anymore, 8 and 9 are essentially the same as 1 and 2. In rare cases, 10 might be useful, but it's essentially only useful for developers.

The actual effect that the dl command has is to set the SGE_DEBUG_LEVEL environment variable. The value of the variable is a list of the sums of the class settings for each layer. For example, dl 1 will cause SGE_DEBUG_LEVEL to be set to "2 0 0 0 0 0 0 0". (A class value of 0 means that debugging for that layer is not enabled.) dl 4 results in SGE_DEBUG_LEVEL="3 0 0 3 0 0 3 0". Setting a debug level greater than 0 results in the SGE_ND environment variable being set to "true". When SGE_ND is true, the Grid Engine daemons (sge_qmaster, sge_schedd, sge_execd, and sge_shadowd) won't deamonize. That means they won't jump to background processes, enabling you to see the debugging output instead of sending it to /dev/null.

To turn off debugging, set the debug level to 0. Setting the debug level to 0 clears SGE_DEBUG_LEVEL and SGE_ND environment variables.

Thursday Apr 12, 2007

Behavior Change in 6.1

With the 6.1 version of Grid Engine, the default behavior of the qstat command will change. (The 6.1 beta version is available, but the official beta period is now over.) Prior to 6.1, when a user issues the qstat command without the -u option, information on all users' jobs is reported. This behavior is undesirable for two reasons. 1) It means that in a busy grid, the user may have to sort through hundreds or thousands of jobs that aren't his to try to find the ones that are. 2) It means that the qmaster has to do all of the work required to report information on all users' jobs. Using the -u option, a user can limit the job information reported to jobs belonging to a specified list of users. The most common case is for a user to request that only her jobs be reported.

We have come to the conclusion that it would be best for everyone involved if we made the common case the default case. It means less hassle for the users and less unnecessary load on the qmaster. So, starting with 6.1, if the qstat command is run without the -u option, it will behave as though there is an implicit -u $USER, meaning that only the user's jobs will be reported. Users who prefer the 6.0-style behavior can call qstat with the -u "\*" or put -u "\*" in their sge_qstat files. (The quotes around the \* are to keep the shell from interpreting it. -u \\\* would work just as well.) Administrators who prefer that the 6.0-style behavior remains the default behavior can add -u \* to the global sge_qstat file.

Wednesday Apr 11, 2007

Getting the Band Back Together

Believe it or not, we're finally going to hold another Grid Engine Workshop! The last one we had was in 2003, so as John points out, there's a lot to talk about. The workshop will be held September 10th-12th in Regensburg, Germany. There isn't yet an official information site or registration form, but you can expect those to show up soon. Until then, save the date, start buttering up your manager, and brush up on your Bavarian!

In case you're wondering, the workshop will fall 3 weeks after Gäubodenfest in Straubing, 3 weeks before Oktoberfest in Munich, and 1 day after the Regensburger Herbstdult. The timing is unfortunate, but since Regensburg was named a UNESCO World Heritage Site, it's been a little difficult to find a time when it's possible to secure enough hotel rooms to hold a conference!

Wednesday Apr 04, 2007

PE Tight Integration

While the topic of integration of parallel environments with Grid Engine is still fresh, there's one other topic I'd like to cover. What is a tight integration, and how is it different from a loose integration?

Let's start with how a parallel job is started.
Step 1, the scheduler sends the qmaster a set of orders, saying where to put the master task and where to put the slave tasks. The master task is the one that runs the job script. (I say script because in the vast majority of cases, a parallel job will be a script. It is, however, theoretically possible for it to be a binary.)
Step 2, the qmaster sends the master task to its destination execution daemon, just like with a non-parallel job, but it also reserves the jobs slots on the destination execution daemons for the slave tasks. Notice that I said "reserves slots," not "starts." The qmaster does not actually start any of the slave tasks. See step 3.3.
Step 3, the execution daemon starts the parallel job on the master node.
Step 3.1, the execution daemon on the master node runs the parallel environment startup script. This script prepares the parallel environment for running the master task. Among other things, this script creates a file that lists the job slots to be used for the slave tasks.
Step 3.2, the execution daemon runs the job script as the master task.
Step 3.3, the master task starts the parallel environment for the job. This step is different from step 3.1. Step 3.1 prepares the parallel environment, but it doesn't necessarily start any processes. Step 3.3 is where the parallel environment is actually run, such as running mpirun for an MPI integration.
Step 3.4, the parallel environment connects to the slave nodes and starts the slave tasks.
Step 4, after the job finishes, the execution daemon on the master node runs the parallel environment shutdown script.

The above process applies to both loosely and tightly integrated parallel environments. The difference between loose and tight integration is how the slave tasks gets started. In a loose integration, the parallel environment uses some out-of-band method to connect to the slave nodes and start the slave tasks. This method gives the parallel environment a great deal of freedom in how it starts the slave tasks, but it means that the slave tasks are running outside of the scope of Grid Engine. Because the slave tasks are run outside of Grid Engine, the qmaster has no way to track the resource usage of slave tasks in loosely integrated parallel environments. Only the resource usage of the master task can be tracked.

In a tightly integrated parallel environment, the slave tasks are started through qrsh -inherit. The -inherit switch is a special qrsh switch that is used only with slave tasks in tightly integrated parallel environments. A job submitted this way actually bypasses the scheduler completely and is sent directly to the target execution daemon. As a security precaution, execution daemons deny such job submissions by default. In step 2, when the qmaster reserves the slave nodes for a parallel job in a tightly integrated parallel environment, it tells the execution daemons to expect the qrsh -inherit jobs and not to deny them. Because the slave tasks are run through Grid Engine, the qmaster is able to track the tasks' resource usage, the same as with any other kind of job. A common trick to make the implementation of the integration easier is to provide an rsh wrapper that translates rsh calls into qrsh calls. That way, as long as the parallel environment naturally uses rsh to contact the slave nodes, the tight integration will work automatically.

Monday Apr 02, 2007

Comment Comments

We're trying hard to finally get the DRMAA 1.0 specification accepted as an Open Grid Forum recommendation. The acceptance process requires that we produce experience documents detailing implementations of the DRMAA 1.0 specification. We have written three such documents, and the steering group has finally posted them for public comment. If you're interested in DRMAA and/or the Grid Engine, Condor, and/or GridWay implementations, please read through the documents and provide comments. The public comment period will end on April 20th.

Tuesday Mar 27, 2007

Introduction to Grid Engine

I'm starting to experiment with video as a medium for delivering Grid Engine content to the masses. This video is my first. I would really appreciate it if you could leave a comment, either on this blog post or on the YouTube page for the video telling me what you think. Did you learn something? Were you bored? Do you want to see more? Thanks!




« July 2016