Ranger Update: TACC's Path to Petascale
By Josh Simons on Nov 11, 2007
Ranger is the first in a series of annual Track 2 NSF procurements that have been motivated by the findings of the NSF Cyberinfrastructure Strategic Plan, which is available in PDF format here.
There are several institutions involved in this procurement. TACC / UT Austin provides project leadership, hosts and runs Ranger, provides user support, etc. ICES / UT Austin provides algorithmic expertise and applications collaborations. The Cornell Center for Advanced Computing (formerly the Cornell Theory Center) provides large-scale data management and analysis and training. Arizona State HPCI contributes user support and technology evaluation and insertion.
So, just how big is this Big Iron? Just over one-half PetaFLOPs (504 TFLOPs), built with 3936 Sun four-socket blades, each socket populated by a four-core 2.0 GHz AMD Barcelona processor for a total of almost 63,000 cores. Memory is big as well, with 2 GB per core (32 GB/node) for a total of 125 Terabytes in the Ranger system.
This being Texas, the disk subsystem does not disappoint with 1.7 Petabytes of storage built from 72 Sun X4500 (Thumper) I/O servers, each with 24 Terabytes delivering a total aggregate bandwidth of 72 Gbytes/sec. The largest filesystem built on this storage offers one Petabyte of storage.
The system interconnect is InfiniBand using Mellanox's latest ConnectX Infiniband cards and two of Sun's 3456-port Magnum switches. Interconnect link bandwidth is approximately 10 Gb/sec and latency is approximately 2.3us.
Physically, the system fits in 96 racks (82 compute, 12 support, 2 switches) that sit in about 4500 square feet along with 116 APC InRow cooling units. Due to the density of the Sun solution, floor space has not really been an issue. Power requirements on the other hand, are quite daunting for a system of this size. 1 MW of the 3.4 MW required to run Ranger are needed for cooling.
I was impressed to hear that Jay expects a significant number of applications to sustain 50-100 TFLOPs on Ranger--that is some serious application scaling! He predicted there will be a double-digit number of codes using over 10,000 cores by the end of 2008 and expects a few of these to run later this year.
In terms of software environment, Ranger is a Linux cluster that uses the ROCKS provisioning software to handle OS and application deployments, Lustre as its scalable parallel filesystem, and the OpenFabrics stack to control the InfiniBand interconnect. In addition, at least two MPI implementations will be used on Ranger -- MVAPICH and Open MPI. There will be several compiler suites available, including Sun Studio, Pathscale, and the Portland Group compilers. Sun Grid Engine will be used for job scheduling.
The impact Ranger will have on the capabilities of the TeraGrid is considerable as it will make more CPU hours available to TeraGrid users than all other current TeraGrid systems combined. At 504 TFLOPs, Ranger is 5X larger than the current top TeraGrid system.
Jay ended with a brief summary of the status of the Ranger installation process, which is ongoing. He characterized most things as good: TACC is happy with Barcelona performance, with the Sun Constellation blades, the performance of the InfiniBand fabric (Sun switch and Mellanox card), Thumper performance, the Sun racks, the APC cooling solution, and Sun Grid Engine.
There have been some BIOS issues that Sun and AMD have been working through and there have been some expected component failures due to the very large number of components involved in this system.
The most vexing problem, which we hope has now been solved, involved manufacturing issues related to the special InfiniBand cables used in the Constellation system. Apparently, some step in the manufacturing process introduced a crimp which caused connectivity problems. Correctly manufactured cables are now being put in place.
As Jay said, it has been through the extremely hard work of Sun and TACC personnel that the delays introduced by these problems have been largely overcome. I know the Sun folks I've talked with are working incredibly hard to make TACC successful.
The system is expected to be online in early December.