By templedf on Jul 16, 2008
One of the questions that comes up often in Grid Engine land is, "Why should I upgrade?" Now that 6.2 is almost ready, I thought now would be a good time to provide a clear and concise answer to the question.
Why upgrade to Grid Engine 6.2?
The watchword for 6.2 is scalability. If you're running a large (multi-thousand host) cluster, you really want to be running 6.2. A lot has been done to address scalability in large clusters. Advance reservation is another headliner. 6.2 offers you the ability to reserve a set of resources at a specific time. The other big-ticket item for 6.2 is multi-clustering. Using a feature-limited release of Project Hedeby (AKA Haithabu, Service Domain Manager (SDM)), Grid Engine 6.2 offers you the ability to set up several independent Grid Engine 6.2 clusters that are also to share resources. As one cluster gets overloaded while other clusters are idle, resources will automatically be migrated from the underused clusters to the overloaded cluster.
Here's the complete feature list:
- Scalability to 63,000 cores
- Streamlined communications between qmaster and execution daemons
- The scheduler is no longer a separate process and is now a thread in the qmaster
- More efficient resource matching process in the scheduler
- Reduced qmaster startup time
- Reduced qmaster memory requirements for large clusters
- ARCo scalability improvements — faster DBWriter and faster queries
- Advance reservation — reserve resources for a given period of time. qsub now lets you submit jobs into a pre-existing reservation
- New interactive job support — with 6.2, you can now configure interactive jobs (and hence parallel slave tasks) to communicate with the client through the existing Grid Engine communications channels, instead of having to fork off an rsh/rshd (or ssh/sshd, telnet/telnetd, etc.) pair
- Administration improvements
- ARCo installation documentation is much better
- Support for Solaris SMF (in addition to traditional rc scripts)
- Support for Sun Service Tags on Solaris and Linux
- JMX interface for the qmaster — the qmaster now offers a JMX management interface that enables the complete set of Grid Engine management operations. The API is, however, unstable and will change, probably significantly
- Project Hedeby will enable the automatic migration of resources from underloaded clusters to overloaded clusters. Service Level Objects configured for each cluster determine the boundaries of overloaded and underloaded, and policies govern the relative importance of the clusters.
- ARCo now supports multiple clusters in the same database using the same web interface
What was introduced with Grid Engine 6.1?
The two big wins for 6.1 are resource quota sets and boolean expressions. Both go a long way towards simplifying the administrator's life and present a compelling reason to upgrade from earlier releases all by themselves. The rest of the lesser 6.1 features are also largely targeted at improving the administration experience.
Here's the complete feature list:
- Resource quota sets (RQS) — allows the administrator to define fine-grained limits over which users, projects, and/or groups can use what resources on what hosts, queues, and/or PEs. Much of what RQS provides you was previously only possible with large numbers of special-purpose queues
- Boolean expressions — prior to 6.1, a resource request could use logical OR, and multiple requests were treated as a logical AND. 6.1 understands full boolean expressions, including logical OR, AND, NOT, and grouping. For example, "-l arch=sol-\*&!(\*-sparc\*|\*64)" What's even better is that the boolean expressions are understood by any command that handles comples strings, such as qhost and qstat. "qstat -f -q '(prod-\*|test-\*)&!\*-ny'"
- Shared library path is "fixed" — with 6.1, the shared library path is no longer set by the settings file for Solaris and Linux hosts. Previously, sourcing the settings file would prepend the Grid Engine library directory to the shared library path, which could cause conflicts with applications that use local BDB or OpenSSL libraries. Unfortunately, that fix means that users of DRMAA applications must now explicitly add the Grid Engine library path to their shared library paths in order for DRMAA to work. (The Grid Engine binaries now use the compiled-in run path to find the Grid Engine libraries, so they don't need the shared library path. External DRMAA applications, on the other hand, are rarely able to use the same trick.)
- -wd for qsub, qrsh, qsh, qalter, and qmon — allows you to specify the working directory. -cwd is effectively aliased to "-wd `$CWD". (That means that if you include both in the same command, the later one overrides the former, as if they were both the same kind of switch.)
- -xml for qhost " prints output in XML instead of formatted text
- Source-level\* SSH tight integration
- MySQL support for ARCo
- OS Support
- Support for MacOS X on Intel, Linux on IA64, FreeBSD (source-level\* only), and native 64-bit HP-UX 11
- Solaris DTrace script — allows you to see potential bottlenecks in the master and scheduler using Solaris DTrace
- Online job usage information for MacOS X, AIX, and HP-UX
- Built-in resource data collection on AIX — previously required an extra load sensor script to be configured
- DRMAA 1.0 for C and Java languages
- JGDI early access — Java language API for Grid Engine management operations. Very unstable. This API becomes the JMX interface in 6.2
- ARCo correctly accounts daily usage of long-running jobs — before 6.1u3, a long running job did not update the accounting database until it was done, meaning that a job that takes 3 months to complete would have zero resource usage in the accounting database until it completed, which could cause accounting errors in daily, weekly, or even monthly reports. With 6.1u3, the accounting database will be updated with resource usage information for long-running jobs on a daily basis.
\*Source-level support — some features are included only if you build the binaries yourself. Those features are considered "source-level".
What changed between Grid Engine 5.x and Grid Engine 6.0?
Grid Engine 6.0 was a huge step forward technologically from 5.3. 6.0 introduced cluster queues, ARCo, the Windows port, the multi-threaded qmaster, BDB, XML output, DRMAA, and much more. The gap between 5.3 and 6.0 is so large, that there really isn't a question of whether to upgrade. There is almost no use case that wouldn't benefit significantly from upgrading from 5.x to 6.x.
Below is the feature list, but it may be incomplete. I'm reconstructing this one from memory. As I find errors and omissions, I will correct them. (Let me know if you find any!)
- Cluster queues — prior to 6.0, a queue could only be on a single host. 6.0 made it possible for a single queue to span multiple hosts, greatly reducing administrator burden
- Accounting and Reporting Console — web-based front-end for an accounting database derived from the Grid Engine accounting file (also new with 6.0). ARCo makes it possible for an administrator to create canned queries for generating usage reports. ARCo was originally only available the N1 Grid Engine product, but was released into open source with 6.0u8
- Windows port — a port of the execution daemon and shepherd to Microsoft SFU (now known as SUA). Originally released only in the N1 Grid Engine 6.0u4 product, the Windows port still hasn't made it into the open source, but it will soon
- Multi-threaded qmaster daemon — prior to 6.0 the qmaster was a single-threaded loop, meaning that a large influx of jobs could cause the qmaster to think its execution daemons had died. With 6.0, the qmaster is multi-threaded, freeing it from the constraints of a single giant control loop, and laying the foundation for significant scalability improvements
- -xml for qstat — qstat prints output in XML instead of formatted text. Introduced in 6.0u2
- DRMAA 0.97 C language binding — updated to 1.0 in 6.0u8
- DRMAA 0.5 Java language binding — introduced in 6.0u4. Updated to 1.0 in 6.0u8
- qsub -sync — qsub behaves synchronously for ease of scripting
- Berkeley Database — 6.0 added both local and remote Berkeley database servers as spooling options instead of just flat files
- New communications library — before 6.0, communications were handled by a separate single-threaded daemon called the commd. With 6.0, every daemon has it's own built-in multi-threaded communications channel. The commd is retired
- Automated installer — 6.0 adds a -auto switch to inst_sge that reads a config file and installs a cluster in a non-interactive mode. If remote access is properly configured, the auto installer can also install execution daemons on remote machines
- Backslash line continuation — with 6.0 configuration files can use a backslash to continue an entry on the next line. The SGE_SINGLE_LINE environment variable disable this behavior to ease scripting
- Resource reservation — 6.0u4 added resource reservation to prevent large jobs from being starved by smaller jobs. With resource reservation, a large job is able to collect resources until it has enough to run. While waiting for all needed resources to become available, idle resources may be backfilled with short jobs
- qping — on the surface, it's a utility to tell if your Grid Engine daemons are still alive, but if you dig a little deeper, you'll discover that it can also be used to profile threads in the qmaster and debug communications traffic
- qsub -shell — allows you to control whether Grid Engine will start a shell to start your job. The default is "yes" The alternative is to have Grid Engine execute your job directly, which has implications on environment variable interpretation and error conditions
- backup/restore — with 6.0, the inst_sge script can be used to backup your cluster's configuration and state data and restore it later
- target-specific qmake resource requests — with 6.0 it's possible to specific the resources to be requested by qmake jobs on a per-target basis