Sun HPC Consortium: Day I Customer Talks

The Sun HPC Consortium meeting in Tampa started at 8am Saturday morning with registration and breakfast. We had a full day of talks with an excellent selection of speakers, including three customers who took us from NASCAR in South Carolina, to the pleasures and pains of building a huge new datacenter at USC, to the world of Canadian secure grid portals.

Clemson University's Computational Center for Mobility Systems

Dr. James Leylek, of the Clemson University International Center for Automotive Research, gave a talk titled, Clemson University's Computational Center for Mobility Systems, which highlighted Clemson's ongoing and future involvement in the Southeast's regional automotive ecosystem--the largest in the U.S.

The CU-CCMS is intended to be a technology anchor for the Clemson University International Center for Automotive Research campus in Greenville, South Carolina. Dr. Leylek presented an overview of the approximately $16M worth of computational infrastructure, which includes a 200-node Infiniband cluster using Sun v40z, dual-core Opterons (4 processor, 32GB), and a 72-processor Sun Fire 25K with 680GB of memory.

The second half of the talk focused on advances in CFD at Clemson, with a focus on aerodynamics. His motivating example involved analysis to optimize a NASCAR body shape for both short-track races that require lots of aerodynamic down-force, and long track races that require top high-end speeds. Boundary layer control and the ability to predict laminar to turbulent transitions is a key requirement and a difficult problem. Dr. Leylek presented some selected result illustrating significant advances in simulation fidelity for this difficult problem, including the simulation of multiple effects simultaneously, something not doable with current commercial packages.

Power and Cooling at USC

James Pepin from USC was our second customer presenter. He gave a fairly frightening talk about the construction of USC's new datacenter, which required a $30M investment and included the installation of some truly impressive pieces of gear.

This new facility is designed to support variety of computational requirements including HPC (physics, chemistry, natural language processing, etc.), library technology, and some non-HPC, but critical IT infrastructure. Their current HPC computing capability includes a 5384 processor cluster with a peak performance of about 13.8 TFLOPs of peak performance. I was interested to hear that they routinely run 512 processor jobs at their site.

USC has been wrestling with significant physical infrastructure problems in their current datacenter: power, air conditioning, airflow hot spots, how to handle a/c failures, wiring density, power cabling, and blocked cooling due to massive cabling infrastructure.

They've configured their new space with both HPC (high-density) and non-HPC areas. The datacenter space is about 8000 square feet and the HPC space is about 5000 square feet. The HPC space is configured to handle 13-15 kilowatts per rack, while the "normal" or non-HPC space is configured for 3-4 kilowatts per rack. There is, however, some built-in expansion capability in the design as well in recognition of increasing power and cooling requirements over time.

The building is designed to handle eight megawatts. Currently installed equipment can cool 2.5 megawatts. They have four 350-ton chillers. Three two megawatt generators, and 5.5 megawatts of UPS. This is some very heavy duty infrastructure!

Secure Grid Computing

Ken Edgecombe, Executive Director of the High Performance Computing Virtual Laboratory (HPCVL), delivered our last customer talk of the day. Before discussing their secure grid portal, Dr. Edgecombe gave a quick sketch of the center's computing resources. These include three Sun Fire 15K servers each with 72 CPUs and 288GB of memory, and seven Sun Fire 25Ks each with 72 dual-core UltraSPARC IV+ processors and 576GB of memory. HPCVL in addition has about 160TB of disk storage capacity and 480 TB of tape storage. Some serious hardware.

The Secure Grid Portal is currently hosted on two Sun Fire T2000 systems with UltraSPARC T1 processors--a nice example of the fact that large HPC centers have significant non-floating point workloads like any other large IT customer.

HPCVL is a Certificate Authority and uses this capability to issue digital certificates to manage identities for the HPCVL Secure Grid Portal. The portal itself was designed with several requirements in mind:

  • Transportability -- the ability to access resources from anywhere
  • Access -- to available user support, to applications (including graphics), and access to data
  • Ease of use

The Portal is used by researchers across Canada and by some from outside as well. It is based on Sun technology, including the new Tarantella-based technology recently made available by Sun. It's a web-based portal with real-time access to computational capabilities back at HPCVL. It includes a capability that allows users to allow expert support personnel to share a user's session and interactively work with them to debug user problems. Dr. Edgecombe finished his presentation with a live demo of the Portal. He showed several simulations running in real-time in Canada, being displayed locally with reasonably smooth graphics in our conference room in Tampa. He also edited an OpenOffice document in a Portal window. It was pretty slick.

We had several interesting talks by Sun personnel as well. I may post some brief highlights of those talks at some point.

Summaries of Day II talks are here.

[Thanks to Tony Warner for supplying several photos for this entry.]


Post a Comment:
  • HTML Syntax: NOT allowed

Josh Simons


« July 2016