Sunday Nov 16, 2008

What's Been Happening with Sun's Biggest Supercomputer?

Karl Schulz – Associate Director, HPC, Texas Advanced Computing Center gave an update on Ranger, including current usage statistics as well as some of the interesting technical issues they've confronted since bringing the system online last year.

Karl started with a system overview, which I will skip in favor of pointing to an earlier Ranger blog entry that describes the configuration in detail. Note, however, that Ranger is now running with 2.3 GHz Barcelona processors.

As of November 2008, Ranger has more than 1500 allocated users who represent more than 400 individual research projects. Over 300K jobs have been run so far on the system, consuming a total of 220 million CPU hours.

When TACC brought their 900 TeraByte Lustre filesystem online, they wondered how long it would take to fill it. It took six months. Just six months to generate 900 TeraBytes of data. Not surprising, I guess, when you hear that users generate between 5 and 20 TeraBytes of data per day on Ranger. Now that they've turned on their file purging policy files currently currently reside on the filesystem for about 30 days before they are purged, which is quite good as supercomputing centers go.

Here are some of the problems Karl described.

OS jitter. For those not familiar, this phrase refers to a sometimes-significant performance degradation seen by very large MPI jobs that is caused by a lack of natural synchronization between participating nodes due to unrelated performance perturbations on individual nodes. Essentially some nodes fall slightly behind, which slows down MPI synchronization operations, which can in turn have a large effect on overall application performance. The worse the loss of synchronization, the longer certain MPI operations take to complete, and the larger the overall application performance impact.

A user reported bad performance problems with a somewhat unusual application that performed about 100K MPI_AllReduce operations with a small amount of intervening computation between each AllReduce. When running on 8K cores, a very large performance difference was seen when running 15 processes per node versus 16 processes per node. The 16-process-per-node runs showed drastically lower performance.

As it turned out, the MPI implementation was not at fault. Instead, the issue was traced primarily to two causes. First, an IPMI daemon that was running on each node. And, second, another daemon that was being used to gather fine-grained health monitoring information to be fed into Sun Grid Engine. Once the IPMI daemon was disabled and some performance optimization work was done on the health daemon, the 15- and 16-process runs showed almost identical run times.

Karl also showed an example of how NUMA effects at scale can cause significant performance issues. In particular, it isn't sufficient to deal with processor affinity without also paying attention to memory affinity. Off-socket memory access can kill application performance in some cases, as in the CFD case shown during the talk.

Saturday Nov 15, 2008

Spur: Terascale Visualization at TACC

Kelly Gaither, Associate Director at TACC – University Texas Austin, talked today about Spur, a new scalable visualization system that is directly connected to Ranger, the 500 TFLOP Sun Constellation System at the Texas Advanced Computing Center (TACC) in Austin.

Spur, which was a joint collaboration between TACC and Sun, bolts the visualization system directly into the InfiniBand fat tree used to create the 65K-core Ranger cluster. This allows both compute and visualization to be used effectively together, essential for truly interactive data explorations.

Spur is actually a cluster of eight Sun compute nodes, each with four nVidia QuadroPlex GPUs installed. The overall system includes 32 GPUs, 128 cores, close to 1 TB RAM and can support up to 128 simultaneous Shared Visualization clients. Sun Grid Engine is used to schedule jobs onto Spur nodes.

Spur went into production in October and within a week was being used about 120 hours per week for interactive HPC visualization. Users who access Spur via the TeraGrid find the resource invaluable because the alternative--computing on Ranger and then transferring results back to their home site to visualize it just isn't feasible due to the sheer volume of data involved. One user estimated it would take him a week to transfer the data to his local facility whereas with Spur he is able to compute at TACC, render the visualization at TACC using Spur, and then use Sun Shared Visualization software to display the visualization at his local site.

Thursday Jun 12, 2008

TACC Ranger Upgrade: Another 80 TeraFLOPs, Please

As was announced yesterday, the 508 TFLOP Ranger supercomputer at TACC will be upgraded this weekend. All 15,744 2.0 GHz quad-core AMD processors will be replaced with 2.3 GHz processors, effectively adding another 80 TFLOPs to the system's peak performance rating. Ranger is currently the largest Sun Constellation System in the world and the largest open, scientific computing system in the world.


Josh Simons


« June 2016