Soft Rings (pre-Crossbow or Crossbow Phase0)
By rajgopi on May 05, 2006
Soft rings is a feature that I worked on recently and putback the changes into S10 update 2. This feature improves incoming network traffic performance. This is the worker thread model of processing packets. The incoming traffic is made to land on a soft ring and a worker thread will pick up the packet and deliver it to IP.
Let's for a minute go back and see what problem we are trying to solve:
The FireEngine architecture introduced a per-CPU synchronization mechanism called vertical perimeter inside TCP/IP module. These vertical perimeters are implemented using a serialization queue abstraction called squeue. A connection is bound to an instance of squeue when the connection is initialized. Afterwards all packets for the connection are always processed on the same squeue. In the case of new incoming connections, they get bound to the squeue of the CPU that took the interrupt. This helps achieve better cache locality and increased network performance.
Now on systems consisting of slow cpus (CPU speed less than 1 Ghz), a single CPU will not be able to handle incoming load of 1 Gbps. On the other hand, even faster CPUs will not be able to handle loads generated by 10 Gbps NICs. The solution would be to fanout the load to be handled by multiple CPUs.
The current solution of enabling this by setting ip_squeue_fanout to 1 is suboptimal (or rather one can say it is broken). With ip_squeue_fanout set to 1, for new incoming TCP connections a random squeue that could belong to any one of the CPUs in the system gets selected and then the packet could get processed in the same context. This is bad because what you want here is to have the other CPU to do the processing of the packets belonging to its squeue.
The problem is addresses by soft rings. Soft rings is an abstraction that simulates hardware Rx ring functionality in software. Multiple soft rings can be configured on a system (tunable: ip_soft_rings_cnt). By default 2 soft rings are configured. Incoming traffic is made to land on one of the soft rings. The soft ring will have pointer to the right squeue to which the packet has to be delivered. A worker thread will be created for each soft ring and this worker thread will pick up the packet from the soft ring and deliver it to IP. The worker thread will have affinity to the CPU to which the squeue belong. All this helps in efficient processing of the packets.
Fanout based on the hardware/platform:
Consider Niagara processors. Niagara processor contain multiple cores in a single chip. Each core in turn can process 4 threads. When handling software fanout, due consideration is given to tie in the incoming data to be handled by threads (these thread are counted as CPUs) in the same core that took the interrupt. This would help preserve interrupt to cpu/core affinity.
Same is the case with AMD dual core processors. It would be optimal if the load can be fanned out to CPUs on the same core to capitalize on the shared L2 cache.
How to enable soft rings ?
You need to have Solaris 10 update 2.
On Niagara platforms (T1000 and T2000s), it is enabled by default.
On other platforms, it can be enabled by setting ip_squeue_fanout to 1.
ip_soft_rings_cnt has a default value of 2. A value of 2 or 3 has been found to be optimal for getting good performance on 1Gbps NICs on the Niagara platforms.