Daimler gets scalable STAR-CD I/O with Lustre
By Frederic Pariente-Oracle on Feb 24, 2009
Daimler AG, one of the world’s largest auto manufacturers and the maker of the luxury Mercedes cars, had been running CD-adapco's STAR-CD application for Computational Fluid Dynamics (CFD) to simulate large models of multi-million cells on a cluster of Sun Fire X4100 servers, for some time. Interestingly, Sun was chosen for its ability to innovate in the commodity computing (Lintel) space; while all vendors now feature the same chips from AMD and/or Intel, Sun differentiates itself through a well-thought and high-quality packaging that delivers higher RAS and lower power consumption --down to 56% reduction compared to competitive servers at the time of launch.
"The server products were particularly convincing, especially because of their well-designed heat management. Ultimately, our demands in this regard are very high because of the density of the computers in the server rooms. Favorable heat management provides a long hardware service life."
Dr. Volker Schwarz, Team Leader, Aerodynamics and Aeroacoustics Department, Daimler AG
The computation part of STAR-CD scaled very well across the 100's of processors in the Daimler cluster. Parallel computing makes it possible to routinely work on a 100M-cell mesh, thus allowing engineers to simulate entire structures. As a result, the start-up and closing phases of a simulation run, where models are decomposed into input files for each cluster node to read, could eventually dominate the simulation time, and Daimler experienced a bottleneck beyond 100 processors with NFS, which is not exactly suited for highly parallel I/O.
For a technical eye on this issue, read this blog.
Enters Lustre. The Lustre file system employs object-based storage to scale to 10000's of nodes & petabytes of data, and delivers high throughput and I/O performance by separating metadata operations from data manipulation. That is, a file descriptor does not point to a physical storage location but contains objects indicating the storage server & device. Lustre thus allows unlimited scalability by adding more storage servers with attached storage. Today, there is an industry consensus that object-oriented storage is the way to do scalable parallel I/O and Lustre has become the de-facto standard for it. E.g. Lustre is deployed at 7 of the Top 10 supercomputers and 25+ of the Top 50 supercomputers, from the Top 500 Supercomputer list dated Nov 18th 2008.
We deployed Lustre at Daimler's Aerodynamics and Aeroacoustics Department benchmark on 48TB Sun Fire X4500 storage servers, delivering 1.5GB/s over InfiniBand and removing I/O bottlenecks at the start & final phases. The metadata servers run on Sun Fire X4100 servers in failover mode for high availability. Per design, Lustre is transparent to the STAR-CD application, to any application that can run off NFS in fact. If you want to see how Lustre can help you build a scalable data grid for your High-Performance Computing (HPC) application, start here.