Saturday Mar 29, 2008

Link Aggravation

Have been trying to do something useful with Link Aggregation on a T5120 connected to a Linksys SRW2048 switch. The whole rig includes a couple of those cute little 1u x4200 servers we sell, the T5120 (64 harware thread CMT system), the Faban Test harness, a benchmark designed to test a whole load of Web 2.0 stuff and a MySQL instance.

I couldn't for the life of me make Link aggregation work on incoming packets. it was obvious from various forum threads that the Linksys load balanced packets based on the MAC address of the system sending the traffic. But the implications of what that meant took a long time to sink in. I'd configure the aggregation on the System Under Test (hereby referred to as the SUT) using dladm:

dladm create-aggr -d e1000g1 -d e1000g2 1

Which creates an aggregation called aggr1 from the interfaces e1000g1 and e1000g2. Then I configured the switch, which involved using Windows and Internet Explorer and a web based interface that logged me out after 5 minutes of inactivity. It's fairly straightforward to configure a Link Aggregation Group (LAG) on the Linksys providing you do everything in exactly the right order (and provided you don't get distracted for 5 minutes). A LAG is the switch side aggregation of 2 or more ports, hopefully in my case the ones that are connected to the interfaces on the SUT that are part of the aggregation.

My test harness has two agents running on separate x4200s and generating load to the SUT. The SUT has two interfaces aggregated (or teamed) which results in a virtual network interface called aggr1 . You can use dladm to look at the traffic on the interfaces that make up the integration:

dladm show-aggr -s -i 5 1

The people who wrote dladm didn't get the formatting of the output right and you basically have to memorize the column positions of the output data. You end up using the %ipkts and %opkts metric for each interface as that never goes above 100 and so it's position doesn't change. The output looks like this:

key: 1  ipackets  rbytes      opackets   obytes          %ipkts %opkts
           Total        193732    167576255   398676    515340307  
           e1000g1      194852    168869030   214144    274881494       100.6  
           e1000g2      0         0           185943    242021790       0.0    

Which shows all of the incoming traffic (%ipkts) being sent to e1000g1. This traffic is coming from 2 systems, each with a different MAC address (duh) so why no load balancing?

Turns out that the load balancing on the switch is more routing than load balancing. For a 2 port LAG (as in this case) Packets from src MAC address 1 are sent to the first port of the LAG, packets from src MAC address 2 are sent to the second port of the LAG, packets from src MAC address 3 are sent to the first port of the LAG and so on. Packets from the same src MAC address are always sent to the same port so as to avoid any re-ordering issues. So depending on the traffic on my network, it's really a matter of luck as to whether the traffic from the two agents goes to the same port on the LAG or to different ports on the LAG. This wouldn't be a problem if I had 2000 clients sending traffic from 2000 different systems, but there's only two (which might well be the case with a server system fronted by a reverse proxy).

There is a workaround that we tried in our test environment, we changed the MAC address of one of the client systems using ifconfig (not something I'd generally recommend), until the incoming traffic (according to dladm)  was balanced across the two interfaces. This seemed to work every time and had me chanting "Wax on , Wax off" as I toggled the MAC address of one of the loaders and watched the traffic move to a different interface on the SUT. Here's the final result as given by dladm:

 key: 1  ipackets  rbytes      opackets   obytes          %ipkts %opkts
           Total        176194    156273288   342248    441581280  
           e1000g1      74939     67362534    206674    269054155       42.5   
           e1000g2      102336    90181188    136617    173791907       58.1   

Apparently it would be better if we were able to load balance on the switch at Layer 4 (the transport layer) which would allow load balancing of traffic based on network endpoints (i.e. IP address and port number) but it seems likely that our Linksys SRW2048 switch doesn't support this.

There are useful entries on Nicolas Droux' blog showing the architecture of the Link Aggregation subsystem in Solaris 10 and on setting up a Link Aggregation.


Bloggity, blog


« February 2017