10 Gigabit Ethernet on UltraSPARC T2

Tuning on-chip 10GbE (NIU) on T5120/T5220

The UltraSPARC T2 has integrated dual 10GbE Network Interface Unit (NIU) on-chip. It requires minimal tuning for throughput. In /etc/system
set ip:ip_soft_rings_cnt=16
uses 16 soft rings per 10GbE port. If you don't want to reboot the system, you can use ndd to change the number of soft rings
ndd -set /dev/ip ip_soft_rings_cnt 16
and plumb nxge for the tuning to take effect. Solaris 10 Update 8 or later uses 16 soft rings for 10GbE by default, so the tuning is no longer needed.

Soft rings are kernel threads that offload processing of received packets from interrupt CPU, thus preventing interrupt CPU from becoming the bottleneck. The trade-off is increased latency from switching from interrupt thread to soft ring thread. If your workload is latency sensitive, you may want to see if turning off soft rings help meet your latency needs while also delivering required packet rate or throughput.

Soft rings can be used with any GLDv3 network drivers, such as nxge, e1000g or bge. A little known trick about soft rings is that it can be configured on a per port basis, so you can, for example, configure NIU with 16 soft rings and on-board 1GbE with 2 soft rings. To continue the example:

ndd -set /dev/ip ip_soft_rings_cnt 16
ifconfig nxge0 plumb
ndd -set /dev/ip ip_soft_rings_cnt 2
ifconfig e1000g1 plumb

Tuning Sun Multi-threaded 10GbE PCI-E NIC on T5120/T5220

If your T5120/T5220 has plugged in Sun Multi-threaded 10GbE PCI Express NIC, then two tunables are recommended for throughput:
set ip:ip_soft_rings_cnt=16
set ddi_msix_alloc_limit=8
ddi_msix_alloc_limit is a system-wide limit of how many MSI(Message Signaled Interrupt) or MSI-X a PCI device can allocate. The default is to allow maximum 2 MSI per device. Since each 10GbE port of Sun Multi-threaded 10GbE PCI-E has 8 receive DMA channels and each channel can generate one interrupt, it can generate up to 8 interrupts. To avoid interrupt CPU becoming the performance bottleneck, it is recommended to set ddi_msix_alloc_limit to 8 so that network receive interrupts can target 8 different CPU. This tunable will be unnecessary with a patch to Solaris 10 update 4.

Which CPU are taking interrupts?

If your application threads are pinned too often by interrupts and it becomes a problem, you can create processor set to dedicate these CPU for interrupt processing. To find out which CPU are taking interrupts, you can use intrstat. In the intrstat output below, you can see CPU 27-34 is interrupted by NIU (niumx is the name of nexus driver for NIU):
...
      device |     cpu24 %tim     cpu25 %tim     cpu26 %tim     cpu27 %tim
-------------+------------------------------------------------------------
     niumx#0 |         0  0.0         0  0.0         0  0.0      2086 80.9

      device |     cpu28 %tim     cpu29 %tim     cpu30 %tim     cpu31 %tim
-------------+------------------------------------------------------------
     niumx#0 |      1885 82.4      2057 80.1      2189 77.2      2019 79.6

      device |     cpu32 %tim     cpu33 %tim     cpu34 %tim     cpu35 %tim
-------------+------------------------------------------------------------
     niumx#0 |      1993 81.8      2073 79.7      1948 81.7         0  0.0
...
You can see interrupts from Sun Multi-threaded 10GbE PCI-E using mdb command as well:
echo ::interrupts | mdb -k
but interrupts from NIU are not included at this time.

NIU or PCI-E 10GbE NIC?

UltraSPARC T2 comes with on-chip dual 10GbE Network Interface Unit(NIU), but you can also plug in 10GbE NIC on PCI-E slots. Here is a summary of performance features:
Features
NIU
PCI-E NIC
# 10GbE ports
2
2
# transmit DMA channels/port
8
12
# receive DMA channels/port
8
8
integrated on-chip?
US-T2
no
bus interface?
no
8 lane PCI Express
bus bandwidth limit?
no
16 Gbits/s each direction\*
transmit packet classification
software
software
receive packet classification
hardware
hardware
\* Note: PCI Express Specification 1.1 specifies 2 Gbit/s payload per lane full-duplex, so 8 lanes can reach 8 x 2 = 16 Gbit/s. Actual bus bandwidth on T5120/S10U4 is measured at 12 Gbits/s from host to device.

The same driver, nxge, is used for both NIU and Sun Multi-threaded 10GbE PCI-E NIC. If you have both installed, you can tell which instance is NIU by looking into /etc/path_to_inst: the instance with /niu prefix is NIU.

$grep nxge /etc/path_to_inst
"/pci@0/pci@0/pci@8/pci@0/pci@8/network@0" 2 "nxge"
"/pci@0/pci@0/pci@8/pci@0/pci@8/network@0,1" 3 "nxge"
"/niu@80/network@0" 0 "nxge"
"/niu@80/network@1" 1 "nxge"
In the example above, nxge0 and nxge1 are NIU, and nxge2 and nxge3 are PCI-E NIC.

You may wonder: is there performance difference between NIU and PCI-E NIC? The short answer is: NIU wins in all micro-benchmarks except one - single 10GbE port transmitting UDP small packets.

Sun Multi-threaded 10GbE PCI-E NIC can transmit an impressive 2.1 million 64 byte UDP packets per second, 50% more than NIU(1.4 million pps) out of one port, due to the fact that it has 50% more transmit DMA channels than NIU (12 vs. 8 per port). So if your workload is mostly sending small UDP packets, Sun Multi-threaded 10GbE PCI-E NIC may deliver higher packet rate than NIU on UltraSPARC T2 systems.

In all other scenarios, NIU gives higher throughput or lower CPU utilization at similar throughput. On 2 10GbE port throughput test, 2 NIU can achieve an impressive 14.6 Gbits/s on TCP transmit, or 18.2 Gbits/s on TCP receive using 8Kbytes messages and 145 connections on 1.4GHz T5120. For TCP transmit, the CPU efficiency (measured by Gbps/GHz) of NIU is 23% higher than PCI-E NIC at maximum throughput for 2 ports, and 46% higher at maximum throughput for 1 port. These clearly demonstrates the performance advantage of integrating 10GbE on-chip.

Links

T5120, T5220, T6320 System and blades Launch blogs

Sun Multi-threaded 10GbE Tuning Guide

Neptune and NIU Hardware Specification

Comments:

I don't know if I not commenting at a totally wrong spot but we were using a T1000 for a customers SIP proxy server. With only a few thousand concurrent users (SIP devices like phones etc. registered) we had a 50% loss of UDP packages. The SIP server is running in a Solaris zone. I love Solaris zones. Unfortunately the packet loss was not acceptable. We tried to tune parameters (thats why I found your blog entry - tuning network) and nothing helped. We increased the UDP buffers etc. Today we moved the customer from Solaris and T1000 to Dell and Suse and the packet loss is consistent 0%. Any idea what went wrong? Something undocumented or so? We would love to move the service back on Solaris.

Posted by Nikolai Manek on November 21, 2007 at 03:17 PM PST #

So what hardware do I actually need to get in order to use the NIU? This isn't clear to me at all.

Posted by Ceri Davies on April 08, 2008 at 07:47 PM PDT #

We are only able to get 750Mbit from PCI-E NIC on our Sun 5200 10GbE PCI-E NIC single thrread which was slower than the onboard 1Gbit NIC... Very disapointing!

Looks like the 10Gbit NIU will give us 1.4Gbit/sec single thread... Better, but SUN need to be pushing 7-9Gbit single thread, or rename the product to 1.4Gbit NIC....

Posted by dan on April 16, 2008 at 01:07 PM PDT #

Hi, Do you know about CPU consumption in M5000 ?

Posted by Italo on July 22, 2008 at 09:44 PM PDT #

Post a Comment:
  • HTML Syntax: NOT allowed
About

user12608924

Search

Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today