CMT, NFS and 10 Gbe

Now that we have Gigabytes/sec class of Network Attached OpenStorage and highly threaded CMT servers to attach from you figure just connecting the two would be enough to open the pipes for immediate performance. Well ... almost.

Our openstorage system can deliver great performance but we often find limitation on the client side. Now that NAS servers can deliver so much power, their NAS client can themselve be powerful servers trying to deliver GB/sec class services to the internet.

CMT servers are great throughput engines for that, however they deliver the goods when the whole stack is threaded. So in a recent engagement, my collegue David Lutz found that we needed one tuning at each of 4 levels in Solaris : IP, TCP, RPC and NFS.

ServiceTunable
IPip_soft_ring_cnt
TCPtcp_recv_hiwat
RPCclnt_max_conns
NFSnfs3_max_threads
NFSnfs4_max_threads


ip_soft_rings_cnt requires tuning up to Solaris 10 update 7. The default value of 2 is not enough to sustain the high throughput in a CMT environment. A value of 16 proved beneficial.

In /etc/system :
   \* To drive 10Gbe in CMT in Solaris 10 update 7 : see blogs.sun.com/roch
   set ip_soft_rings_cnt=16


The receive socket buffer size is critical to the TCP connection performance. The buffer is not preallocated and memory is only used if and when the application is not reading the data it has requested. The default at 48K is from the age of 10MB/s Network cards and 1GB/sec systems. Having a larger value allows the peer to not throttle it's flow pending the returning TCP ACK. This is specially critical in high latency environment, urban area networks or other large fat network but it's also critical in the datacenter to reach a reasonable portion of the 10Gbe available in today's NIC. It turns out that NFS connection inherit the TCP default for the system and so it's interesting to run with a value between 400K and 1MB :

	ndd -set /dev/tcp_recv_hiwat 400000


But even with this, a single TCP connection is not enough to extract the most out of 10Gbe on CMT. And the solaris rpc client will establish a single connection to any of the server it connects to. The code underneath is highly threaded but did suffer from a few bugs when trying to tune that number of connections notably 6696163, 6817942 both of which are fixed in S10 update 8.

With that release, it becomes interesting to tune the number of RPC connections for instance to 8.

In /etc/system :
   \* To drive 10Gbe in CMT in Solaris 10 update 8 : see blogs.sun.com/roch
   set clnt_max_conns=8


And finally, above the RPC layer, NFS does implement a pool of threads per mount point to service asynchronous requests. These will be mostly used in streaming workloads (readahead and writebehind) while other synchronous requests will be issued within the context of the application thread. The default number of asynchronous requests is likely to limit performance in some streaming scenario. So I would experiment with

In /etc/system :
   \* To drive 10Gbe in CMT in Solaris 10 update 7 : see blogs.sun.com/roch
   set nfs3_max_threads=32
   set nfs4_max_threads=32


As usual YMMV and use them with the usual circumspection, remember that tuning is evil but it's better to know about these factors than being in the dark and stuck with lower than expected performance.

Comments:

Post a Comment:
Comments are closed for this entry.
About

user13278091

Search

Categories
Archives
« avril 2014
lun.mar.mer.jeu.ven.sam.dim.
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
    
       
Today
News
Blogroll

No bookmarks in folder