Memorial Day in Germany

Memorial Day started about 9 hours too early for me, as the first rays of sunlight broke through the bottom of the window shade in United Airlines 747 as we descended towards Frankfurt airport. I'm visiting Germany this week for the grand opening of the new Jülich Supercomputer Center, and its 2000 node Sun Constellation System. The Jülich system is one of the first large QDR-based InfiniBand supercomputers, but we expect that 40 Gb/sec QDR technology will rapidly replace the previous generation 20 Gb/sec DDR technology in large clusters, not only because of its higher bandwidth but also because of the improved latency of QDR.

The Jülich system also features a Sun Lustre Storage System directly connected to its InfiniBand network, using multiple Lustre Object Storage Servers (OSS) to provide high speed & parallel access to large single namespace filesystem easily expandable to PetaBytes of storage and 10's or even 100's of GB/sec of storage bandwidth (Oak Ridge National Labs has achieved over 200 GB/sec on their Sun Lustre system).

One unique feature the Jülich system is its InfiniBand fabric using Sun and Mellanox QDR switches. Besides the 2000 node Sun Constellation System using Sun Magnum QDR switches, the Jülich QDR fabric also supports a 1000 node Bull cluster using Mellanox QDR switches. While both the Sun and Bull supercomputers are built out of 2-socket Intel Nehalem compute nodes, the physical size and complexity of the systems stands in stark contrast. Using regular 4x IB cables to connect to the Mellanox switches, the Bull cluster, while only half the number of compute nodes, requires more cables than the Sun Constellation System with its 3-in-1 12x cables. In addition, the Sun Constellation System racks require no internal cables to connect the compute nodes to its built-in "QNEM", the world's first in-chassis QDR leaf switch. While most Sun Constellation Systems use the QNEM to build a fully connected "fat tree" IB fabric, the QNEM also supports mesh and 3D Torus IB fabrics, the latter being used at a Sun Constellation System being deployed at Sandia National Labs in the US.

Bull does a good job of packing 72 of their Nehalem compute nodes into a single rack, but counting their IB racks still requires almost 2x the floorspace of the Sun Constellation System sporting 96 compute nodes in each rack.

Jülich choose Sun's new water-cooled rear door option for the Sun Constellation System, greatly simplifying the cooling design of their data center. Depending on exact CPU and memory configuration, Sun Constellation System racks can require 30-40 KW of cooling per rack which requires some sort of supplemental cooling. Sun provides both water-cooled and refrigerant gas cooled rear door options for Sun Constellation System racks. This approach has advantages over in-row or top-of-rack based supplemental cooling systems in that no supplemental fans are required, air is moved through the cooling doors using only the blade chassis's build-in fans.The supplemental fans in in-row and top-of-rack systems are often left out of customer's power-usage calculations. Sun's Data Center Efficiency practice can help customers design more efficient data centers, be it an entire new from the ground up data center or retrofitting an existing data center.

Well, it is time to head off to the grand opening ceremonies, I'll be back afterwards with more of the story.

Comments:

Post a Comment:
Comments are closed for this entry.
About

marchamilton

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today