By Lubomir Petrik on Aug 04, 2008
New version of Sun Grid Engine 6.2 will be released tomorrow. Checkout DanT's blog entry describing the new features.
Here's a quick list of new features in SGE 6.2:
- Advance Reservation
- Multi-Clustering with Service Domain Manager (SDM)
- Scalability Improvements (Scheduler as a Thread, New Interactive Job Support, etc.)
- Array Task Dependencies
- Accounting and Reporting Console (ARCo) Improvements (Multi-Cluster support, DBwriter is now up to 10x faster)
- Solaris Enhancements (Service Tags and SMF support)
- New Upgrade Procedure
Since Dan already discribed most of them. I'll just blog about the New Upgrade Procedure and SMF support today.
New Upgrade Procedure
The original upgrade procedure had many restrictions. The most troublesome in my opinion was that you couldn't simply leave the old cluster running and in parallel start a new SGE version with the same configuration. This and many other issues were solved by the new upgrade procedure.
You can now create a backup of the whole cluster configuration and later, at anytime, restore it while the qmaster is running! The old upgrade required to shutdown the qmaster before the configuration could have been loaded to the upgraded cluster.
The upgrade/update to a newer version should now be easy as never before (hello 6.0 users!). The complete description of the upgrade procedure can be found here.
Service Management Facility (SMF) has been introduced with Solaris 10 and provides an alternative model to the service management as opposed to Run Control (RC) scripts.
It solved many problems and I'll just list my 3 favourite:
- service dependencies (services can depend on each other)
- service fail-overs (services can be automatically restarted on failure)
- single place for all log files
With SMF services on your system start up faster and are generally more reliable.
Regarding the Sun Grid Engine we've introduced following services:
- qmaster service
- shadowd service
- execd service
If any of those get killed or fail (e.g.: dump core) the SMF will detect this and will automatically restart the failing services. It basically reduces your cluster downtime for free.
SMF is now installation default on all Solaris 10+ machines.