Converged fabric does not mean TCP/IP everywhere
By andy.grover on Mar 24, 2009
While I was not able to attend the OFA workshop in person this year, I have been making use of the provided teleservices to follow it.
The big issue this year is whether it's preferable to have the RDMA protocol used in a datacenter over Ethernet be encapsulated in TCP/IP, or just on Ethernet itself. While the trend over the past 20 years has been to use TCP/IP even for LAN traffic, the reasons for this may not apply to the two possible examples of going the other way, namely Fibre Channel over Ethernet (FCoE) and what was called RDMA over Converged Enhanced Ethernet (RoCEE or RoE)
Both FCoE and RoE have TCP/IP-based work-alikes -- iSCSI and iWARP respectively. The latter technologies face the problem that processing received packets cannot be made to be efficient! People have tried. A lot. Netchannels anyone? TCP's functionality (reordering, demultiplexing, etc) means that each packet is going to get copied in its entirety at least once, in addition to the NIC's copy. You can try to coalesce this by moving the TCP stack into the NIC with a TOE, but that sucks for about 14 reasons. You can try to hide it with an asynchronously-operating DMA engine to do the copy for you, but that has issues of its own. Finally, you can add a custom TCP stack in the kernel for your special protocol. This is a flagrant layering violation and the Linux netdev crew will also put a hit out on you.
Encapsulating in only Ethernet makes things a lot more palatable, with the two caveats that you have to do without the niceties that TCP gives you (congestion control) as well as IP (routing). In a datacenter you don't need routing, and if CEE does congestion control then you're covered. It is now possible to sanely build an adapter that can do 0-copy FCoE and RoE -- just bolt your IB^H^HRDMA macrocell and your FC macrocell next to the regular stuff on your Ethernet silicon and divert to them based on ethertype. send/recv rings all look kinda the same, don't they? Your card now shows up as three separate devices, and all three (fc, rdma, net) perform at their max efficiency.
Internet protocols have pushed far into the LAN/datacenter environment, displacing almost all equivalent LAN protocols. TCP/IP's ubiquity is a virtue above almost all others, and that same virtue is also behind the drive towards Ethernet as a common fabric. But performance and clean implementation outweigh ubiquity. Convergence on TCP/IP in the datacenter is not the big win, it's a nice-to-have. iWARP had potential downsides that people thought could be worked around...but the workarounds needed workarounds and now costs totally outweigh its benefits. The big win is converged Ethernet, not converged TCP, and I think that's what the debate should not lose sight of.