Infiniband Performance Limits: Streaming Disk Read and new Summary
By cindi on Oct 10, 2009
Updated Performance Limit Summary
I was able to squeak out a few more bytes/second in the streaming DRAM test for IPoIB and have achieved a respectable upper bound for RDMA streaming disk reads for this Sun Storage 7410 configuration. The updated summary is below with links to the relevant Analytics screenshots. I'll update this summary as I gather more data.
| NFSv3 Streaming DRAM Read
|| 2.93 GBytes/second \*\*
|| ~ 2.40 GBytes/second\*
| NFSv3 Streaming Disk Read
|| 2.11 GBytes/second \*\*
|| 1.47 GBytes/second \*
| NFSv3 Streaming Write
|| 984 MBytes/Second \*\*
|| 752 MBytes/second \*
| NFSv3 Max IOPS - 1 byte reads
| NFSv3 Max IOPS - 4k reads
| NFSv3 Max IOPS - 8K reads
The IPoIB numbers do not represent the maximum limits I expect to ultimately achieve. On the 7410, we are well under resource utilization for CPU and disk. In the I/O path, we are no where close to saturating the IB transport and the hypertransport and PCIe root complexes have plenty of head room. The problem is the number of clients. As I develop a better client fabric, expect these values to change.
With NFSv3/RDMA, I am able to hit maximum limits with the current client configuration (10 clients). Except, that is, max IOPS. In the streaming read from DRAM test , I was able to hit the limit imposed by the PCIe generation 1 root complexes and downstream bus. For the streaming read/write from/to disk, I am able to reach the maximum we can expect from this storage configuration. The throughput numbers are given in GBytes/second for the transport. While throughput numbers observed on the subnet manager were higher, I took a conservative approach to reporting streaming write and DRAM read limits. For this test, I used the IOPS and multiplied by the data transfer size (128K). For example, we see 24041 (iops) x 128K (read size) = 3.00 GBytes/second for the streaming read frm DRAM test. Once we have 64-bit port performance counters, I can be more confident in the throughput I observed through them. For streaming read from disk, I used the reported disk throughput.
Filer: Sun Storage 7410, with the following config:
- 256 Gbytes DRAM
- 8 JBODs, each with 24 x 1 Tbyte disks, configured with mirroring
- 4 sockets of six-core AMD Opteron 2600 MHz CPUs (Istanbul)
- 2 Sun DDR Dual Port Infiniband HCA
- 3 HBA cards
- noatime on shares, and database size left at 128 Kbytes
- 2 sockets of Intel Xeon quad-core 1600 MHz CPUs
- 3 Gbytes of DRAM
- 1 Sun DDR Dual Port Infiniband HCA Express Module
- mount options:
- read tests: mounted forcedirectio (to skip client caching), and rsize to match the workload
- write tests: default mount options
Switches: 2 internal Sun DataCenter 3x24 Infiniband switches (A and C)
- Centos 5.2
- Sun HPC Software, Linux Edition
- 2 Sun DDR Dual Port Infiniband HCA
NFSv3 Streaming Disk Reads
I was able to achieve a maximum read limit for NFSv3 streaming read from disk for RDMA. As with my previous tests, I have a 10 client fabric connected to the same Sun Storage 7410. The clients are split equally between two subnets and connected to two separate HCA ports on the 7410. Each client has a separate share mounted. For the read from disk tests, I'm using all 10 clients each running 10 threads to read 1 MB of data (see Brendan's seqread.pl script) from its own 2GB file. The shares are mounted with rsize=128K.
Update on Maximum IOPS
I'm still waiting to run this set of tests with a larger number of clients. But in the interim, I wanted to make sure that adding those clients would indeed push me to the limits of the 7410. To validate my thinking, I ran a step test for the 4k maximum IOPS test. Here, we can see the stepwise function of adding two clients at a time plus one at the end for a maximum of 9 clients.
We're scaling nicely: every two clients adds roughly 42000 IOPS per step and the last client adds another 20000. We're starting to reach a CPU limit but if I add just 5 more clients, I can match Brendan's IOP max of 400K. I think I can do it! Stay tuned...