Tuesday Dec 06, 2005

Niagara IO - Architecture & Performance

Today Sun is launching a revolutionary new set of server products. The Sun FireTM CoolThreads servers, internally named Ontario and Erie, are both based on the Niagara multicore SPARC processor. The Niagara, or UltraSPARCTM T1 processor, represents a quantum leap in implementing multiple execution pipelines (cores) on a single chip, with support for multiple hardware threads per pipeline. We refer to this throughput-oriented design as Chip Multithreading (CMT) technology. The UltraSPARC T1 processor incorporate eight execution cores, with four hardware threads per core, providing the capability of what previously required 32 processors (where each processor was a traditional design with a single instruction pipeline) on a single chip. The Sun FireTM T2000 (Ontario) and the Sun FireTMT1000 (Erie) represent ground breaking technology. First and foremost, the amount of processing power (CPU, memory, I/O) available in a relatively small system. Both the Sun FireTMT2000 and T1000 are rack mount chassis systems; the T2000 is two RU (rack unit) high, and the T1000 is one rack unit in height. Within a relatively small package, we find an amazing amount of computing power - not only in terms of parallel processor-oriented tasks, but also in memory and I/O bandwidth capabilities. The icing on the cake is the low power design of the systems. The UltraSPARC T1 processor generates a remarkably low amount of heat, and the system as a whole has an amazing performance/power metric.

But my blog here today is not about the power and heat metrics of the T2000 and T1000. I'm sure that the launch blog-burst will include specific data on that particular feature. Nor will I be detailing the UltraSPARC T1 microprocessor architecture - the beauty of 8 execution cores, 4 hardware threads per core, (32 threads total), and the whiz-bang performance and throughput these systems deliver with parallel workloads. My fellow bloggers will expound on these virtues, as well as other features. This discussion is intended to provide an overview of the I/O architecture of these systems, and a small sample of some performance numbers we have measured in our benchmarking work. Not industry standard benchmark results - they can be found on the product pages.

The I/O architecture of the T2000 includes five PCI slots; three PCI-E and two PCI-X, as well as four on-board gigabit ethernet ports. PCI-X is a 64-bit wide, 133Mhz bus, capable of 1.06GB/sec bandwidth. PCI-E (PCI-Express) is a point-to-point bus that provides a non-shared link to a PCI-E device. A link can be implemented with one or more lanes to carry data, where each lane carries a full-duplex serial data bit stream and a rate of 2.5Gbits/second. PCI-E implementaions can scale-up bandwidth based on the number of lanes implemented in the link, referred to as X1, X2, X4, X8, X12, X16 and X32, where the value after the X corresponds to the number of data lanes. PCI-E on the T2000 and T1000 is X8, supporting devices with up to 8 lanes of data bandwidth capability. The transport bus between the Fire I/O bridge chip and the UltraSPARC T1 processor is the Jbus, which has a theoretical maximum bandwidth of 2.5GB/sec. Please note that this is not memory bandwidth - processor to memory data transfers take place on a different physical bus in the system (and of course through a cache memory hierarchy). The Jbus is dedicated to I/O, providing true high-end I/O bandwidth capability.

The T1000 uses the same Fire I/O bridge chip and Jbus to interface the I/O subsystem to the UltraSPARC T1 processor. The T1000, at one RU in size, has fewer I/O slots, with one PCI-E slot.

Some quick tests on an Sun Fire T2000 system with well over 200 connected disks (multiple Sun 3510 storage arrays connected via multiple PCI-X and PCI-E dual port Gbit Fiber Channel adapters) indicate these systems are extremely I/O capable. The T2000 is able to sustain 1.6GB/sec of sequential disk read bandwidth doing sequential reads from raw disk devices. Running a database transactional workload, which has a random I/O profile (and small 4k I/O's), the T2000 sustains 58000 IOPS (I/O operations per second). Using a smaller I/O size for the sequential tests (8k instead of 1MB), we can sustain 120,000 IOPS on reads (just under 1GB/sec bandwidth with 8k IOs). On a combined read/write test, 60,000 read/sec and 60,000 writes/sec are sustained.

The numbers quoted above provide a solid indication that the Sun FireTMT2000 system is not just a new system with another pretty processor (the UltraSPARC T1). These systems are designed handle workloads that generate high rates of sustained I/O, making the T2000 system suitable for a broad range of applications and workloads.

[ T: http://technorati.com/tag/NiagaraCMT ]




« December 2005 »