flow control with multiple Tx rings

Phase 1 of Project Crossbow has been putback. As a core member of the project team plus the project gate keeper, I was involved in practically every aspect of the project.

One of the areas that I worked on was designing and implementing the Tx data path (at layer 2). With 10Gbps NICs (and some 1Gbps NICs too) featuring multiple Tx rings, the Tx side needed to take advantage of this by being able to fanout outgoing traffic among the Tx rings and at the same time being able to handle flow control condition arising on a specific Tx ring without affecting other Tx rings. To accomplish this, the MAC client API on the send path takes a fanout hint and returns a cookie that identifies a blocked Tx ring under flow control condition. Details of the MAC client API can be found in the excellent Crossbow Network Virtualization document

mac_tx() is the the API that MAC clients need to use to send packets out on the wire.

mac_tx_cookie_t mac_tx(mac_client_handle_t mch, mblk_t \*mp_chain, uintptr_t hint, uint16_t flag, mblk_t \*\*ret_mp);

mac_tx() takes a hint which is used to fanout the outgoing traffic among Tx rings. On successful transmission, NULL is returned to the caller. If unsuccessful, a cookie (mac_tx_cookie_t) is returned. This uniquely identifies the blocked Tx ring.

Clients like TCP/UDP/IP using the MAC API can take advantage of this to do efficient flow control.

UDP flow control

UDP does a mix of STREAMS and direct function calls to send traffic out of the box. The STREAMS canput(9F) function is called before calling the direct function in DLD to send the packet down. Note that the direct function pointers between IP and DLD are exchanged as part of the IP plumbing operations during ifconfig command. The direct function pointer on the Tx side takes the fanout hint. UDP passes the address of the conn_t structure as fanout hint for UDP connections originating from the system. This fanout hint is used to fanout to a Tx ring. If a Tx ring gets flow blocked, a cookie is returned back to the caller to indicate the flow blocked condition. The cookie identifies a specific Tx ring that is blocked (This cookie is not used in the phase1 putback to do flow control). At this time, DLD sets QFULL condition on the write queue. This causes all future canput(9F) on DLD to fail including for those connections that were sending packets out of Tx rings that are not blocked. This leads to bad Tx performance for UDP under stress. This is more severe for small packets as small packets cause a Tx ring to run out of descriptors very quickly.

UDP flow control with Crossbow

The cookie that gets returned to IP is used to do flow control based on individual Tx ring. If a UDP conn gets blocked, then the cookie that is returned is used to hash into a blocked conn list onto which the blocked conn is placed. All other UDP conns that are sending on the same Tx ring and which get blocked will be placed on the same blocked list. Later when flow control is relieved, GLDv3 (MAC) layer makes a direct call into IP passing the cookie. The cookie is used to access the blocked conn list and re-start the blocked UDP conns. UDP conns that were sending on different Tx rings that were not blocked will not be affected and they continue sending packets.

Performance improvement

With the new per Tx ring UDP flow control in place, all NICs that have multiple Tx rings benefits. On Oplin card, we saw an improvement in the range of 20-25% for packets of sizes 16, 64, 128, 256 and 512.

how do you determine which Nics have multiple TX rings and if enabling them will help without intensive research and benchmarking?

Posted by James Dickens on February 11, 2009 at 11:41 AM PST #

It is part of the NICs specification. Most of the 10 gig NICs (like ixgbe, nxge) come with multiple Tx and Rx rings. Before we putback Crossbow, NICs like nxge which had multiple Tx rings were using them for fanning out traffic from inside the NIC driver. Crossbow made these rings visible to the MAC layer which helps us to have better control over each of the Tx ring and thus do better flow control.

In fact exposing rings to MAC layer also helps in assigning these resources to virtual NICs (VNICs).

Posted by Rajagopal Kunhappan on February 13, 2009 at 08:25 AM PST #

Post a Comment:
  • HTML Syntax: NOT allowed



« April 2014