Wednesday Jun 24, 2009

Sun Grid Engine 6.2u3 Released

The Sun Grid Engine 6.2u3 is yet another product update. This means that besides the bug fixes it introduces several new features. But there is another very important difference to the previous releases. The license has changed!

Without a valid Sun Grid Engine license, evaluation use is only permitted for 90 days. If you want to use Grid Engine after that you need to replace the Sun binaries with unsupported courtesy binaries. However the courtesy binaries do not include the new Amazon EC2 adapter and the SGE Inspect modules.

New features in Sun Grid Engine 6.2u3:

Amazon EC2 Adapter

The Service Domain Manager (SDM) adds connectivity to Amazon Elastic Cloud EC2 and the ability to flexibly add execution hosts as needed on demand.

Initial Power Saving Support

A new power saving scheme in SDM enables the creation of a special resource spare pool in which systems can be powered on or off when added or removed from this spare pool.

Service Domain Manager (SDM) Simple Install

It is now possible to install and run an SDM system with only one JVM per (managed or master) host. Previously, the system was using up to three separate JVMs per host. This new feature simplifies installation, configuration and maintenance.

SGE Inspect

A new Java based Sun Grid Engine Inspect module allows to monitor SGE clusters and the Service Domain Manager (SDM). The similarity to VisualVM or Netbeans is not coincidental.

Exclusive Host Scheduling

Exclusive host scheduling allows users to request that jobs and parallel tasks run exclusively on a host if allowed by an administrator.

Microsoft Windows Vista Display Support

The display_win_gui feature is now fully supported. This feature allows a job to launch a GUI on the currently visible desktop on the Windows host that displays job information. This works only if the job is a native Windows application.

As always, you may test this release in parallel with your old cluster to try out the new features. Optionally, you may use the upgrade procedure (clone configuration method) to install SGE 6.2u3, while keeping all your other cluster configuration settings. See the upgrade video. While the video shows an upgrade from 6.1 to 6.2, it will work the same way for cloning any 6.0u2 or later to 6.2u3.

Additional links:

Download page
Wiki documentation
Release notes
Fixed bugs
Patch matrix

Monday Aug 04, 2008

Sun Grid Engine 6.2 Is Here

New version of  Sun Grid Engine 6.2 will be released tomorrow. Checkout DanT's blog entry describing the new features.

Here's a quick list of new features in SGE 6.2:

  • Advance Reservation
  • Multi-Clustering with Service Domain Manager (SDM)
  • Scalability Improvements (Scheduler as a Thread, New Interactive Job Support, etc.)
  • Array Task Dependencies
  • Accounting and Reporting Console (ARCo) Improvements (Multi-Cluster support, DBwriter is now up to 10x faster)
  • Solaris Enhancements (Service Tags and SMF support)
  • New Upgrade Procedure

Since Dan already discribed most of them. I'll just blog about the New Upgrade Procedure and SMF support today.

New Upgrade Procedure

The original upgrade procedure had many restrictions. The most troublesome in my opinion was that you couldn't simply leave the old cluster running and in parallel start a new SGE version with the same configuration. This and many other issues were solved by the new upgrade procedure.

You can now create a backup of the whole cluster configuration and later, at anytime, restore it while the qmaster is running! The old upgrade required to shutdown the qmaster before the configuration could have been loaded to the upgraded cluster.

The upgrade/update to a newer version should now be easy as never before (hello 6.0 users!). The complete description of the upgrade procedure can be found here.

SMF Support

Service Management Facility (SMF) has been introduced with Solaris 10 and provides an alternative model to the service management as opposed to Run Control (RC) scripts.

It solved many problems and I'll just list my 3 favourite:

  • service dependencies (services can depend on each other)
  • service fail-overs (services can be automatically restarted on failure)
  • single place for all log files

With SMF services on your system start up faster and are generally more reliable.

Regarding the Sun Grid Engine we've introduced following services:

  • qmaster service
  • shadowd service
  • execd service

If any of those get killed or fail (e.g.: dump core) the SMF will detect this and will automatically restart the failing services. It basically reduces your cluster downtime for free.

SMF is now installation default on all Solaris 10+ machines.

To get more information refer to:

Installing SMF Services

Managing SMF Services

Sunday Jun 29, 2008

Managing Grid Engine SMF services

Hi guys, I just added a section about managing Grid Engine SMF services to There's not much now, but it's a start. Let me know what would you like to add there.


Lubomir Petrik


« July 2016