Thursday Jan 14, 2010

Sun Grid Engine: Still Firing on All Cylinders

The Sun Grid Engine team has just released the latest version of SGE, humbly called Sun Grid Engine 6.2 update 5. It's a yawner of a name for a release that actually contains some substantial new features and improvements to Sun's distributed resource management software, among them Hadoop integration, topology-aware scheduling at the node level (think NUMA), and improved cloud integration and power management capabilities.

You can get the bits directly here. Or you can visit Dan's blog for more details first. And then get the bits.

Thursday Dec 18, 2008

Beta Testers Wanted: Sun Grid Engine 6.2 Update 2

A busy day for fresh HPC bits, apparently...

The Sun Grid Engine team is looking for experienced SGE users interested in taking their latest Update release for a test drive. The Update includes bug fixes, but also some new features as well. Two features in particular caught my eye: a new GUI-based installer and optimizations to support very large Linux clusters (think TACC Ranger.)

Full details are below in the official call for beta testers. The beta program will run until February 2nd, 2009. Look no further for something to do during the upcoming holiday season. :-)

Sun Grid Engine 6.2 Update 2 Beta (SGE 6.2u2beta) Program

This README contains important information about the targeted audience of this beta release, new functionality, the duration of this SGE beta program and your possibilities to get support and provide feedback.

  1. Audience of this beta program
  2. Duration of the beta program and release date
  3. New functionality delivered with this release
  4. Installing SGE 6.2u2beta in parallel to a production cluster
  5. Beta program feedback and evaluation support
  1. Audience of this beta program

    This Beta is intended for users who already have experience with the Sun Grid Engine software or DRM (Distributed Resource Management) systems of other vendors. This beta adds new features to the SGE 6.2 software. Users new to DRM systems or users who are seeking a production ready release should use the Sun Grid Engine 6.2 Update 1 (SGE 6.2u1) release which is available from here.

    For the shipping SGE 6.2u1 release we are offering a free 30 day evaluation email support.

  2. Duration of the Beta program and release date

    This beta program lasts until Monday, February 2, 2009. The final release of Sun Grid Engine 6.2 Update 2 is planned for March 2009.

  3. New functionality delivered with this release

    Sun Grid Engine 6.2 Update 2 (SGE 6.2u2) is a feature update release for SGE 6.2 which adds the following new functionality to the product:

    • a GUI based installer helping new users to more easily install the software. It complements the existing CLI based installation routine.
    • new support for 32-bit and 64-bit editions of Microsoft Windows Vista (Enterprise and Ultimate Edition), Windows Server 2003R2 and Windows Server 2008.
    • a client and server side Job Submission Verifier (JSV) allows an administrator to control, enforce and adjust jobs requests, including job rejection. JSV scripts can be written in any scripting language, e.g. Unix shells, Perl or TCL.
    • consumable resource attributes can now be requested per job. This makes resource requests for parallel jobs much easier to define, especially when using slot ranges.
    • on Linux, the use of the 'jemalloc' malloc library improves performance and reduces memory requirements.
    • the use of the poll(2) system call instead of select(2) on Linux systems improves scalability of qmaster in extremely huge clusters.
  4. Installing SGE 6.2u2 in parallel to a production cluster

    Like with every SGE release it is safe to install multiple Grid Engine clusters running multiple versions in parallel if all of the following settings are different:

    • directory
    • ports (environment variables) for qmaster and execution daemons
    • unique "cluster name" - from SGE 6.2 the cluster name is appended to the name of the system wide startup scripts
    • group id range ("gid_range")

    Starting with SGE 6.2 the Accounting and Reporting Console (ARCo) accepts reporting data from multiple Sun Grid Engine clusters. Following the installation directions for ARCo and using a unique cluster name for this beta release there is no risk of losing or mixing reporting data from multiple SGE clusters.

  5. Beta Program Feedback and Evaluation Support

    We welcome your feedback and questions on this Beta. Weask you to restrict your questions to this Beta release only. If you need general evaluation support for the Sun Grid Engine software please subscribe to the free evaluation support by downloading and using the shipping version of SGE 6.2 Update 1.

    The following email aliases are available:

Wednesday May 14, 2008

Growing Flowers with Datacenter Heat

The Open Source Grid and Cluster Conference is being held this week in Oakland, California. I attended the first day of the conference before flying home to meet a personal commitment. My favorite talk of the day was Paul Brenner's presentation titled Grid Heating: Dynamic Thermal Allocation via Grid Engine Tools.

Brenner, who works as a scientist in the University of Notre Dame's Center for Research Computing, is exploring innovative ways to exploit the waste heat generated by HPC and other datacenters via partnerships with various municipal entities in the South Bend area. His first prototype, currently in progress, involves placing a rack of HPC compute nodes at a local municipal greenhouse, the South Bend Greenhouse and Botanical Garden.

The greenhouse had recently been forced to close portion of its facility due to high natural gas heating costs. Brenner wondered if he could help. Since current datacenters can be viewed as massive electricity-to-heat converters (with a computational byproduct), it seemed there might be an opportunity to exploit the waste heat in some useful way. But transferring heat, especially low-grade waste heat, over distances is very inefficient. Was there a way to overcome this barrier?

Enter grid computing with its ability to harness remotely located compute resources. If Brenner couldn't transport the heat to the greenhouse, why not place the datacenter at the greenhouse? The garden gets the heat and Notre Dame gets the compute resources via established grid computing capabilities like Sun's Grid Engine distributed resource manager, which is already in use at Notre Dame. Cool idea? Hot idea!

Based on early prototype work which involves placing single rack in the greenhouse, the idea looks like a promising way to reduce natural gas heating requirements for the facility. Brenner has shown he can use grid scheduling software to deliver a desired temperature (within a range, of course) by simply adding or throttling compute jobs on the greenhouse cluster, which communicates with Notre Dame via a wide-area wireless broadband connection.

He has looked at humidity issues and so far they don't seem to be a problem given the ranges supported by typical compute gear. And he points out that while the greenhouse environment does not offer the highly filtered environment of a controlled datacenter, the particulate tolerance for typical compute gear is far in excess of EPA guidelines for people.

Phase II will involve placing three full racks of gear at the greenhouse to significantly reduce heating costs. Notre Dame will pay the electrical costs and use the compute resources. The city saves money on heating.

While the greenhouse is an interesting experiment, it is not ideal since its heating requirements will fluctuate seasonally. There are, however, other installations that have constant heating requirements--for example, hospitals have a 24x7 need for hot water. Sites like this could be interesting for future deployments.

Brenner's full presentation is available [PDF].

Monday May 05, 2008

Now THIS is Peachy!

I'm a bit late posting this, but did want to mention that the Peach open movie project recently released Big Buck Bunny, a 3D animated movie that was rendered on the Sun Grid Compute Utility at Details on the operation of the Peach render farm are here. You can also click on the diagram below for a closer look at the overall IT setup for the project.


Josh Simons


« June 2016