Thursday Apr 10, 2008

Maramba (T5240) Unplugged: A Look Under the Hood

Aside from being interesting machines for HPC, our latest systems are beautiful to behold. At least to an engineer, I suppose. We had a fun internal launch event (free ice cream!) on Sun's Burlington campus yesterday where we had a chance to look inside the boxes and ask questions of the hardware guys who designed these systems. Here is my annotated guide to the layout of a T5240 system.

[annotated T5240 system]

A. Software engineers examining the Sun SPARC Enterprise T5420. Note the uniforms. Sometimes I think the hardware guys are better dressers than the software folks.

B. Front section of the machine. Disks, DVD, etc, go here.

C. Ten hot-swappable fans with room for two more. The system will compensate for a bad fan and continue to run. The system also automatically adjusts the fan speeds to control the amount of cooling needed, depending on demand. For example, when the mezzanine (K) is in place, more cooling is needed. If it isn't, why spend the energy spinning the fans faster than you need?

D. This is a plexiglass air plenum which is hard to see in the photo because it is clear though the top part near the fans does have some white-background diagrams on it. The plenum sits at an angle near the top of the photo and then runs horizontally over the memory DIMMs (F). It captures the air coming out of the fans and forces it past the DIMMS for efficient cooling. The plenum is removed when installing the mezzanine.

E. These are the two heat sinks covering the two UltraSPARC T2plus processors. The two procs are linked with four point-to-point coherency links which are also buried under that area of the board.

F. The memory DIMMs. On the motherboard you can see there are eight DIMM slots per socket for a maximum of 64 GB of memory with current FBDIMMs. But see (K) if 64 GB isn't enough memory for you.

G. There are four connectors shown here with the rightmost connector filled with an air controller as the others would be in an operating system. The mezzanine assembly (K) mates to the motherboard here.

H. PCIe bridge chips are here in this area, supporting external IO connectivity.

I. This is the heat sink for Neptune, Sun's multithreaded 10 GbE technology. It supports two 10 GbE links or some number of slower links and has a large number of DMA engines to offer a good impedance match with our multithreaded processors. On the last generation T2 processors, the Neptune technology was integrated onto the T2 silicon. In T2plus, we moved the 10 GbE off-chip to make room for the coherency logic that allows us to create a coherent 128-thread system with two sockets. As a systems company, we can make tradeoffs like this at will.

J. This is where the service processor for the system is located, included associated SDRAM memory. The software guys got all tingly when they saw where all their ILOM code actually runs. ILOM is Sun's unified service processor that provides lights out management (LOM.)

K. And, finally, the mezzanine assembly. This plugs into the four sockets described above (G) and allows the T5240 to be expanded to 128 GB...all in a two rack-unit enclosure. The density is incredible.

Kicking Butt with OpenMP: The Power of CMT

[t5240 internal shot]

Yesterday we launched our 3rd generation of multicore / multithreaded SPARC machines and again the systems should turn some heads in the HPC world. Last October, we introduced our first single-socket UltraSPARC T2 based systems with 64 hardware threads and eight floating point units. I blogged about the HPC aspects here. These systems showed great promise for HPC as illustrated in this analysis done by RWTH Aachen.

We have now followed that with two new systems that offer up to 128 hardware threads and 16 floating point units per node. In one or two rack units. With up to 128 GB of memory in the 2U system. These systems, the Sun SPARC Enterprise T5140 and T5240 Servers, both contain two UltraSPARC T2plus processors, which maintain coherency across four point-to-point links with a combined bandwidth of 50 GB per second. Further details are available in the Server Architecture whitepaper.

An HPC person might wonder how a system with a measly clock rate of 1.4 GHz and 128 hardware threads spread across two sockets might perform against a more conventional two-socket system running at over 4 GHz. Well, wonder no more. The OpenMP threaded parallelization model is a great way to parallelize codes on shared memory machine and good illustration of the value proposition of our throughput computing approach. In short, we kick butt with these new systems on SPEComp and have established a new world record for two-socket systems. Details here.

Oh, and we also just released world record two-socket numbers for both SPECint_rate2006 and SPECfp_rate2006 using these new boxes. Check out the details here.


Josh Simons


« April 2014