Monday Oct 13, 2008

T5440 PCI-E I/O Performance

T5440 PCI-Express Performance
Sun's latest CMT-based server is the four-way Sun SPARC Enterprise T5440. As with the previous two-way Sun SPARC Enterprise T5140 and T5240 servers, the T5440 is built around the UltraSPARC T2 Plus processor (an SMP version of the UltraSPARC T2). Whereas the T5140 and T5240 used glueless coherency links to connect two T2 Plus processors in 1U and 2U form factors, the T5440 uses 4 coherency hubs to connect up to four processors in a 4U form factor. (The coherency hub was presented at a 2008 IEEE Symposium on High-Performance Interconnects. (You can find more detail here: slides, paper.) With four T2 Plus processors, the T5440 provides 256 hardware threads.  As with previous CMT servers the T5440 utilizes PCI-Express (PCI-E) for Input/Output (IO). With the UltraSPARC  T2 and T2 Plus processors, a PCI-E root complex is brought directly on-chip, reducing latency between IO devices and memory.

T5440 PCI-Express Topology
The T5440 uses four PCI-E switches to connect to onboard devices and eight PCI-E slots for external device connections. All the slots are x8 PCI-E electrically, though two are physically x16. Another two of the eight slots are unavailable if the co-located XAUI is used. 

The T5440 can be configured with fewer than 4 CPU modules. In that case, x8 PCI-E crosslinks between the PCI-E switches are enabled providing full access to all IO components. (Note: crosslinks are not shown in the diagram above.) This is discussed further here.

Bandwidth

Let's take a look at the DMA bandwidth performance for the T5440. The T5440 has four root complexes (one on-chip per T2 Plus processor). These measurements were made using multiple internally developed PCI-E exerciser cards. (How we made these measurements.) We used multiple load generating modules and two Sun External IO Expansion Units. As shown in the section on latency below, any expansion unit will add latency for IO, however for IO devices that support sufficient outstanding IO requests, full PCI-E bandwidth is achievable. (For a logical view of the IO Expansion Unit PCI-E configuration, look here.)


1 PCI-E

2 PCI-E

3 PCI-E

4 PCI-E

100% DMA Read

1520 MB/s

3050 MB/s

4580 MB/s

6100 MB/s

100% DMA Write

1720 MB/s

3440 MB/s

5170 MB/s

6890 MB/s

Bi-Directional (DMA read + DMA write)

2940 MB/s

5900 MB/s

8850MB/s

11800MB/s

Bandwidth for the T5440 is 97% for uni-directional and 93% for bi-directional of the theoretical maximum for the given PCI-E configuration.  Peak DMA Bandwidth on T5440 scales very well.


Latency
The table below shows latency for DMA Read operations. (The time is from the upstream request by the device to first data bytes of the downstream 64B completion.)

T5140

T5240

T5440

T5440

T5440 from

IO Expander

CPU

T2 Plus @ 1.2GHz

T2 Plus @ 1.4GHz

T2 Plus @ 1.2GHz

T2 Plus @ 1.4GHz

T2 Plus @ 1.4GHz

One DW (4 Bytes) Satisfied from L2 Cache

653 ns

641 ns (est.)

698 ns

662 ns

1900 ns

First DW of 64 Byte MemRd Satisfied from Local Memory

820 ns

 808 ns (est.)

1047 ns

954 ns

2200 ns

First DW of 64 Byte MemRd Satisfied from Remote Memory

916 ns

 904 ns (est.)

1242 ns

1143 ns

2400 ns

For a given IO slot, memory is either local or remote. (Local versus Remote Latency.) Both the T5240 and the T5440 are SMP architectures and have additional delays due to coherency protocol overhead. The added latency due to coherence on the 5240 is lower than on the T5440 since the T5240 is glueless while the T5440 uses a coherency hub. The last column shows latency for a device in the IO Expansion Unit. The extra levels of PCI-E switch add approximately 1.25 us of latency to the IO path, but with sufficient outstanding IO requests to memory, devices can still achieve full PCI-E x8 bandwidth.

T5440 PCI-E Reconfiguration

Sun's latest CMT-based server is the four-way Sun SPARC Enterprise T5440. As with the previous two-way Sun SPARC Enterprise T5140 and T5240 servers, the T5440 is built around the UltraSPARC T2 Plus processor (an SMP version of the UltraSPARC T2). The T5440 supports one, two, or four processors in a 4U form factor.  With four T2 Plus processors, the T5440 provides 256 hardware threads.  As with previous CMT servers the T5440 utilizes PCI-Express (PCI-E) for Input/Output (IO). With the UltraSPARC  T2 and T2 Plus processors, a PCI-E root complex is brought directly on-chip, reducing latency between IO devices and memory.
Each on-chip PCI-E x8 root complex connects directly to a PLX PEX8548 PCI-E switch. In turn, each switch connects to on-board IO as well as two PCI-E slots. With four switches, there are a total of eight PCI-E slots. The following diagram shows the the PCI-E topology of the fully configured four-way T5440.

Note that two slots (1 and 6) are physically x16, but electrically x8. Another two slots (4 and 5) are unavailable if the co-located XAUI slots are used.

The dashed gray line in the figure above shows a x8 PCI-E link that connects between PCI-E switches, but is not necessarily enabled. When all four processors are configured, then PCI-E IO device access to memory or CPU access to a device is through the device's local CPU and the coherence interconnect. (See this diagram.) The x8 crosslinks are not used. However, if the T5440 is configured as a two-way (perhaps to allow for future upgrade to 4-way) then two of the switches will not have a local root complex. In that case, two of the crosslinks will be enabled to preserve full connectivity to IO. The following diagram shows the supported two-way T5440 configuration.

With the supported two-way configuration, the on-board LSI1068E SAS controller for the internal disks, as well as the Neptune ASIC used for network connectivity retain direct connections to local root complexes (CPU0 and CPU1 respectively). However, devices connected through slots 2,3,6,7 obtain connectivity to a root complex through an extra PCI-E switch hop. This will add latency to the I/O path for those devices. (Full path of remote memory access.)

Note that changes in the PCI-E configuration may change the Solaris path names for devices. For example, if a customer starts with the two-way configuration shown above and later upgrades to a four-way configuration, devices connected using slots 2, 3, 6, and 7, will have different Solaris path names. The Sun SPARC Enterprise T5440 Server Product Notes provide more detail concerning PCI-E reconfiguration including supported configurations, reconfiguration procedures, and use of an available reconfiguration script.

Reconfiguration can be useful in the event of a processor failure, as well. If a processor failed, then access to that processor's PCI-E IO devices would be unavailable. Probably the best solution would be to replace the CPU, but if one is not immediately available, then by using reconfiguration, access to those I/O devices can be restored to the system. Reconfiguration also allows a customer to start with a single CPU and memory, and later add processors, going to two-way or four-way as needed. T5440  PCI-E reconfiguration provides flexibility and full access to I/O.


About

pyakutis

Search

Archives
« October 2008
SunMonTueWedThuFriSat
   
1
2
3
4
5
6
7
8
9
10
11
12
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
 
       
Today