Maximizing NFS client performance on 10Gb Ethernet
By user12610824 on Nov 30, 2009
The default values for the tunables outlined below are all either being reviewed, or have already changed since the release of S10u7. Some of these tunings are unnecessary if you are running S10u8, and they should all be unnecessary in the future. Consider these settings a workaround to achieve maximum performance, and plan to revisit them in the future. A good place to monitor for future developments is the Networks page on the Solaris Internals site. You can also review the NFS section of the Solaris Tunable Parameters Reference Manual.
If you want to fine tune these settings beyond what is outlined here, a reasonable technique is to start from your current default settings and double the value until no observable improvement is seen.
For the time being, consider the following settings if you plan to run NFS between a single client and a single server over 10GbE:
Step 1 - TCP window sizesThe TCP window size defines how much data a host is willing to send/receive without an acknowledgment from its communication partner. Window size is a central component of the TCP throughput formula, which can be simplified to the following if we assume no packet loss:
- max throughput (per second) = window size / round trip time (in seconds)
For example, with 1ms RTT and the current default window size of 48k, we have:
- 49152 / 0.001 = ~50 MB/sec per communication partner
This is obviously too low for NFS over 10GbE, so the send and receive window sizes should be increased. A setting of 1MB provides a max bandwidth of ~1 GB/sec with a RTT of 1ms.
Solaris 10 Update 8 and earlier
ndd -set /dev/tcp tcp_xmit_hiwat 1048576 ndd -set /dev/tcp tcp_recv_hiwat 1048576TCP window size has been the subject of a number of CRs, has changed several times over the years, and the default is likely to change again in the near future. Use a command like
ndd -get /dev/tcp tcp_xmit_hiwaton your system to check the current default value before tuning, to make sure that you do not inadvertently lower the values.
Note: if you want to increase TCP window sizes beyond 1MB, you should also increase tcp_max_buf and tcp_cwnd_max, which currently default to 1MB.
Step 2 - IP software ringsA general heuristic for network bandwidth is that we need approximately 1GHz of CPU bandwidth to handle 1Gb (gigabit) per second of network bandwidth. That means that we need to use multiple CPUs to match the bandwidth of a 10GbE interface. Software Rings are used in Solaris as a mechanism to spread the incoming load from a network interface across multiple CPU strands, so that we have enough aggregate CPU bandwidth to match the network interface bandwidth. The default value for the number of soft rings in Solaris 10 Update 7 and earlier is too low for 10GbE, and must be increased:
Solaris 10 Update 7 and earlier on Sun4vIn /etc/system
Solaris 10 Update 7 and earlier on Sun4u, x86-64, etcIn /etc/system
Solaris 10 Update 8 and laterThanks to the implementation of CR 6621217 in S10 u8, the default value for the number of soft rings should be fine for network interface speeds up to and including 10GbE, so no tuning should be necessary.
The changes introduced by CR 6621217 highlight why tuning is often evil. It was found that it is difficult to find an optimal, system wide setting for the number of soft rings if the system contains multiple network interfaces of different types. This resulted in the addition of a new tunable, ip_soft_rings_10gig_cnt, which applies to 10GbE interfaces. The old tunable, ip_soft_rings_cnt, applies to 1GbE interfaces. Both tunables have good defaults at this point, so it is best not to tune either on S10u8 and later.
Step 3 - RPC client connectionsNow that we have enough IP software rings to handle the network interface bandwidth, we need to have enough IP consumer threads to handle the IP bandwidth. In our case the IP consumer is NFS, and at the time of this writing, its default behavior is to open a single network connection from an NFS client to a given NFS server. This results in a single thread on the client that handles all of the data coming from that server. To maximize throughput between a single NFS client and server over 10GbE, we need to increase the number of network connections on the client:
Solaris 10 Update 8 and earlierIn /etc/system
set rpcmod:clnt_max_conns = 8Note: for this to be effective, you must have the fix for CR 2179399, which is available in snv_117, s10u8, or s10 patch 141914-02
A new default value for rpcmod:clnt_max_conns is being investigated as part of CR 6887770, so it should be unnecessary to tune this value in the future.
Step 4 - Allow for multiple pending I/O requestsThe IOPS rate of a single thread issuing synchronous reads or writes over NFS will be bound by the round trip network latency between the client and server. To get the most out of the available bandwidth you should have a workload that generates multiple pending I/O requests. This can be from multiple processes each generating an individual I/O stream, a multi-threaded process generating multiple I/O streams, or a single or multi-threaded process using asynchronous I/O calls.
ConclusionOnce you have verified/tuned TCP window sizes, IP soft rings, and rpc client connections, and you have a workload that can capitalize on the available bandwidth, you should see excellent NFS throughput on your 10GbE network interface. There are a few more tunings that might add a few percentages of performance, but the tunings shown above should suffice for the majority of systems.
As I mentioned at the start, these tunables are all either under investigation or already adjusted in Solaris 10 Update 8. Our goal is always to provide excellent performance out of the box, and these tunings should be unnecessary in the near future.