HPC Consortium: Sun HPC at Clemson University (Why Big SMPs Matter)
By Josh Simons on Jul 02, 2007
James Leylek, Director of the Computational Center for Mobility Systems at Clemson University (CU-CCMS), spoke at Sun's HPC Consortium meeting in Dresden this past week. He presented a brief overview of the Center and its mission and gave a status update on the Center's computational infrastructure, including an explanation of why CU-CCMS believes strongly in both large SMPs and small-node clusters for HPC.
Since his last update in November, the Center's computational infrastructure has now been put in place. It includes a Sun Fire E25K with 72 UltraSPARC IV+ processors and 680 Gbytes of memory; two Sun Fire E6900 systems, each with 24 UltraSPARC IV+ processors and 384 Gbytes of memory; 1600 cores worth of Sun Fire V20Z systems connected with Voltaire Infiniband; and a variety of workstations. All of the Big Iron is running Solaris 10, while the V20z cluster runs SUSE Linux. The infrastructure has a peak performance rating of about 11 TFLOPs
As Dr. Leylek explained, the mission of the Center is to provide a balanced computational approach to satisfy a diverse set of requirements from the ten major technical groups (e.g., fluid dynamics, acoustics, mechanics, vehicle design, human modelling, etc.) served by the Center. Also, because CU-CCMS is not a research organization and must deliver results on time and within budget, they have a focus on supplying stable, reliable infrastructure for their customers.
The wide range of systems at CU-CCMS reflects an understanding that one size does not fit all for HPC applications: not everything parallelizes onto clusters. As an example, adaptive multi-grid computations are considered to be memory monsters that benefit from the immense capabilities found within the single Solaris image of an E25K or E6900. At the Center, they view billion-element finite element simulations as a starting point for full vehicle simulations. They are dealing with big problems.
As the Center prepared to bring its computational capabilities on line, what scared the CU-CCMS staff the most was the actual act of setting up and deploying this infrastructure. With Clemson, Sun, CISCO, and Voltaire all responsible for key aspects of the infrastructure, they were worried that coordinating all of these efforts successfully was going to be an absolute nightmare.
In response to this, Sun assigned a program manager to run the entire integration process. As Dr. Leylek said, the Sun program manager put together the most detailed integration plan he had ever seen in his life. In addition, as work progressed, all status was reported on a site accessible to all participating parties, which aided in maintaining coordination and promoting problem solving throughout the process.
In the end, Dr. Leylek said that what they had feared would be a nightmare turned out to be as seamless and painlessly smooth as it could have been.