Managing Petabytes with SAM-QFS
By Josh Simons on Nov 11, 2007
Bryan Banister, Manager, Storage Systems and Production Servers, at the San Diego Supercomputing Center (SDSC) spoke at the HPC Consortium meeting in Reno about their SAM-QFS deployment, which forms an important part of their HPC infrastructure.
As background, Bryan briefly described each of the large clusters installed at SDSC: a 15.6 TFLOPs IBM cluster, a 3.1 TFLOPs IBM Itanium cluster, and a 17.1 TFLOPs IBM BlueGene/L system. Lots of IBM gear on the compute side. Sun's presence here is much more visible on the storage side of the SDSC HPC facility.
As a TeraGrid site, SDSC is heavily involved in serving large amounts of data for their user base. Grid computing requires migration of input data to remote sites for processing and then either migrating the results back to the home system or on to another site for additional processing.
As an example, Bryan described a recent computation done by the Southern California Earthquake Center (SCEC) which generated 47 TBytes of output data and took five days to run on 240 processors. In the near future, the center would like to do a 1 PByte run that will require transferring that data and then processing for 20 days on 1000 processors. To do this, they will require 10 GByte/sec parallel file system transfer rates, with higher rates needed in the near future.
SDSC currently has about 2 PBytes of online (disk) storage and an additional 5 PBytes of near-line storage in HPSS and Sun's SAM-QFS. Of that 5 PBytes, 1.2 PBytes are on SAM-QFS, though it is interesting that 85M of the approximately 130M files are stored on SAM-QFS because SAM-QFS handles small files better than HPSS. Bryan showed several architectural diagrams that I cannot reproduce here that detail how SAM-QFS fits into the overall IT infrastructure at SDSC and how it interacts with the TeraGrid infrastructure.
The talk closed with a brief description of what SDSC sees as emergent storage technologies that they consider important. These included solid state disk (ssd/flash) for high performance, 8 Gb fiberchannel technology, SATA drives, RAID6, expansion/adoption of 10 GbE, MAID for transient data, and as QDR (quad data rate) becomes available, InfiniBand as a combined storage and cluster interconnect technology.