Lustre Parallel File Sytem for CFD analysis
By Gunter Roeth on Feb 24, 2009
Whether you are looking at crash simulations, implicit-explicit computations, or CFD analysis, all computing numerical solutions for very different physical models, they have in common, that the size of the data sets becomes bigger and bigger. This is true for the input data, the temporarily computed scratch files, and the final output data. Generally, I/O times have been considered small compared to the runs times of the solver. This may not be longer true today. Not all ISV codes propose a parallel I/O option, and if, it is not always easy to use.
Look for example at the StarCD input files for StarCD V3, V4 and finally StarCCM+ : The same geometry, a 34M cell case, uses 3.5GB for StarCD V3 .geom file and climbs up to 5.1GB when converted into the V4 .ccmg file (or the CCM+ .sim file).
If the cluster is used to run the solver part only, where the solver will run hundreds of iterations to reach convergence, then, I/O times are small compared to the total run time. In this case a ordinary NFS files system may be sufficient. If you wish to tune your NFS, please look at this outstanding cookbook from SUN's Roch Bourbonnais.
However, if you use your cluster, as development platform which means that the models do not converge yet, and you will constantly read or write them by modifying model parameters, boundaries, then I/O part will become very visible and dominant. Reading several GBs of data may easily take several (tenth) minutes on an ordinary NFS. This is even more true, if StarCCM+ is used. From the graphical interface, users can modify all parameters, import CAD models, recompute geometries and boundaries, and continuing the solver which is running as a slave on the server.
This was the situation at Daimler AG using StarCD. They have solved this bottleneck by using the Lustre parallel file system, please read more in this blog here.
SUN's Lustre parallel filesystem is an ideal software to overcome these problems. Lustre is a free software and can be download here. It can be installed on most common Linux system, and guarantees to give you the necessary parallel bandwidth (typically users wish to get around 1.5GB/sec) per compute core.
All storage (called Object Storage Targets, OST) attached to the storage servers (called Object Storage Servers, OSS) will be made visible and accessible to your compute cluster.The size of the Lustre filesystem is simply the sum of all OSTs. The aggregate bandwidth available in the filesystem equals the aggregate bandwidth by the OSS servers to the targets. Both, capacity and bandwidth scale with the number of OSS servers. This means that you can add more OSTs to increase the bandwidth, without interrupting Lustre.
Please check this whitepaper to get a basic description of Lustre. Another 'hands-on' white-paper, explaining every step from download to the final mkfs is in preparation, and should be released within the next week. Please check this blog again.