DRMAA and the shared.library.path

I noticed that multiple folks have found by blog in the last month because they were trying to figure out why the Grid Engine implementation of the DRMAA Java language binding is complaining about libdrmaa not being in the shared library path. I've probably answered this question indirectly in a previous post, but for the benefit of all those searchers, here it is in all its glory.

The Grid Engine DRMAA Java language binding implementation is written as a wrapper around the DRMAA C binding implementation. When the classloader loads the com.sun.grid.drmaa.SessionImpl class (which will happen when you call org.ggf.drmaa.SessionFactory.getSession() for the first time), The SessionImpl class will attempt to load the DRMAA C binding's shared library, otherwise known as libdrmaa. In order to find libdrmaa, the Java virtual machine must know that it should look in the $SGE_ROOT/lib/$ARC directory. ($SGE_ROOT is where Grid Engine is installed. $ARC is the name of your host's architecture, which can be determined by running $SGE_ROOT/util/arch.)

There are two ways for the Java virtual machine to know to look in the Grid Engine lib directory. The first is by the lib directory being included in the parent shell's shared library path environment variable. On Solaris, that's $LD_LIBRARY_PATH. (Or $LD_LIBRARY_PATH_64.) On some other platforms, it's $LIBPATH or $SHLIB_PATH. With Grid Engine 6.0, when you source the settings file ($SGE_ROOT/$SGE_CELL/common/settings.[c]sh), your shared library path is automatically modified to include the Grid Engine lib directory. With 6.1 on platforms other than Solaris and Linux, that's also true. With 6.1 on Solaris and Linux, the settings file no longer sets the shared library path. Instead, the Grid Engine binaries are compiled in such a manner that they can determine from their own paths what the path to the lib directory is. If you're using 6.1 on Solaris or Linux, in addition to sourcing the settings file, you will also have to set the shared library path to include the Grid Engine lib directory or use the second method I talk about below. Note that this 6.1 shared library path change also affects DRMAA applications written in C. Unless a DRMAA application written in C expects to be installed in the Grid Engine root directory (and hence was compiled to know how to find the lib directory), it will require that the user explicitly set the shared library path, just like with DRMAA applications written for the Java platform. (The same thing also applies to the Perl, Python, and Ruby bindings.)

The other way to tell the Java virtual machine how to find the Grid Engine lib directory is to pass in the information via the shared.library.path system property. To use this method, add the following to the options you pass to the Java virtual machine: -Dshared.library.path=$SGE_ROOT/lib/$ARC, where $SGE_ROOT and $ARC are as defined above. This method is probably the simpler and less invasive, but it must be applied every time the Java virtual machine is launched. If you use the shared library path method, you set it once, and it applies to all Java virtual machines launched from that shell. The downside, of course, is that setting the shared library path for DRMAA may adversely affect other applications with their own expectations for what should be in the shared library path.

While we're talking about issues caused by libdrmaa in DRMAA applications written for the Java platform, we should also talk about 32-bit versus 64-bit. The Java virtual machine has a restriction that it can only load libraries that are compiled for the same architecture as it was. If you're using a 32-bit Java virtual machine, it can only load 32-bit libraries. A 64-bit virtual machine can only load 64-bit libraries. The problem is that by default, the Grid Engine binaries that folks download for Solaris are 64-bit, while the Java virtual machine that runs by default is 32-bit. Again, there are two solutions to this problem. The better solution is to download and install the 32-bit Solaris binaries for Grid Engine. Again, this works with Grid Engine 6.0. It also works for 6.1 on AMD64; just download the x86 binaries. If you're using 6.1 on SPARC, though, you're saved from the trouble because the 32-bit libdrmaa is included with the 64-bit binaries.

The other option is to download and install the 64-bit Java virtual machine and run your app with the -d64 switch. That works in all cases, but it means that your application will be running in 64-bit mode, which is slower and has a bigger memory footprint than running in 32-bit mode.

These native problems are rather annoying. The better option would be to have a DRMAA Java language binding written in pure Java. We're talking about it, but don't expect one any time soon.


Post a Comment:
  • HTML Syntax: NOT allowed



« July 2016