Bonding Parameters Based on Network Layout

Quick Introduction to Linux Bonding

As the name suggests, bonding driver creates a logical network interface by using multiple physical network interfaces underneath. There are various reasons to do so, including link aggregation for higher bandwidth, redundancy, high availability etc. Upper layers communicate through the logical bond interface which has an IP address but eventually the active physical interface(s) communicate to lower layer 2. It also provides transparency to upper layers by hiding the actual interface.

Bonding Parameters

Other than specifying the physical interfaces part of the logical bond interface, we also specify how to we want this whole thing to work. There are several possible configurations but I am only going to focus on one mode which is called "Active Backup" and has a numerical identifier as 1.You can actually list all parameters of the kernel bonding driver installed in your system. Look for the lines beginning with 'parm' below. You can work with these options in /etc/modprobe.conf or some of them can also be worked with directly under each bonding interface via /etc/sysconfig/network-scripts/ifcfg-bond1

[root@hostA ~]# modinfo /lib/modules/2.6.32-100.23.80.el5/kernel/drivers/net/bonding/bonding.ko
filename:       /lib/modules/2.6.32-100.23.80.el5/kernel/drivers/net/bonding/bonding.ko
author:         Thomas Davis, tadavis@lbl.gov and many others
description:    Ethernet Channel Bonding Driver, v3.5.0
version:        3.5.0
license:        GPL
srcversion:     4D5495287BB364C8C5A5ABE
depends:        ipv6
vermagic:       2.6.32-100.23.80.el5 SMP mod_unload
parm:           max_bonds:Max number of bonded devices (int)
parm:           num_grat_arp:Number of gratuitous ARP packets to send on failover event (int)
parm:           num_unsol_na:Number of unsolicited IPv6 Neighbor Advertisements packets to send on failover event (int)
parm:           miimon:Link check interval in milliseconds (int)
parm:           updelay:Delay before considering link up, in milliseconds (int)
parm:           downdelay:Delay before considering link down, in milliseconds (int)
parm:           use_carrier:Use netif_carrier_ok (vs MII ioctls) in miimon; 0 for off, 1 for on (default) (int)
parm:           mode:Mode of operation : 0 for balance-rr, 1 for active-backup, 2 for balance-xor, 3 for broadcast, 4 for 802.3ad, 5 for balance-tlb, 6 for balance-alb (charp)
parm:           primary:Primary network device to use (charp)
parm:           lacp_rate:LACPDU tx rate to request from 802.3ad partner (slow/fast) (charp)
parm:           ad_select:803.ad aggregation selection logic: stable (0, default), bandwidth (1), count (2) (charp)
parm:           xmit_hash_policy:XOR hashing method: 0 for layer 2 (default), 1 for layer 3+4 (charp)
parm:           arp_interval:arp interval in milliseconds (int)
parm:           arp_ip_target:arp targets in n.n.n.n form (array of charp)
parm:           arp_validate:validate src/dst of ARP probes: none (default), active, backup or all (charp)
parm:           fail_over_mac:For active-backup, do not set all slaves to the same MAC.  none (default), active or follow (charp)

How to verify bonding status and configuration

Under active-backup mode, most common configuration that folks use is the link based failure detection via a set of parameters - miimon and use_carrier. Here is how it looks like in a running system.

[root@hostA ~]# cat /proc/net/bonding/bond1
Ethernet Channel Bonding Driver: v3.5.0 (November 4, 2008)

Bonding Mode: fault-tolerance (active-backup) (fail_over_mac active)
Primary Slave: None
Currently Active Slave: eth0
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 5000
Down Delay (ms): 5000

Slave Interface: eth0
MII Status: up
Link Failure Count: 0
Permanent HW addr: 00:21:28:4a:cd:80

Slave Interface: eth1
MII Status: up
Link Failure Count: 0
Permanent HW addr: 00:21:28:4a:cd:81
[root@hostA ~]#

What we see here is that bond1 is set in active-backup mode with two physical interfaces or slaves - eth0 and eth1. Their link status is monitored every 100ms. If a link goes down, the bonding driver will wait 5000ms before actually declaring it as DOWN. When the lost link recovers, the driver will again wait for 5000ms before declaring it as UP.

The option 'primary' is set to none. This means that bonding driver does not have any preference of eth0 vs eth1 if both are UP at same time. Link failure counter keeps track of how many times the link has failed since host has been running.


Limitations of MII monitoring based bonding

Now lets review the following topology diagram. Here a host 'A' has two physical interfaces and their links are marked as 1 and 2 respectively. They are connected to independent Ethernet switches for redundancy and high availability purposes. These switches further connect into a bigger network which is external to us. It may be a corporate network or even Internet. The links are labeled as 3 and 4 respectively from our local Ethernet switches as well.



Host  A with bond1 interface has eth0 as currently active interface. It is expected to communicate to an external network as shown.

Scenario 1: When link number 1 is out of service, then bonding driver will detect it within specific period of time and activate the next backup interface eth1. Service will be restored at this point. :)

Scenario 2: When link number 3 is out of service, then the bonding driver will be completely unaware of this scenario because both of its local physical interfaces are completely in service. However, the host A will be unable to reach out to the external world due to link number 3 being out of service. :(


Alternate bonding configuration to detect failures at OSI layer 3

Bonding driver offers an alternate set of parameters to solve the problem illustrated above. Instead of miimon, we will use arp_ip_target and arp_interval.

The modified configuration will look like this.

[root@hostA ~]# cat /proc/net/bonding/bond1
Ethernet Channel Bonding Driver: v3.5.0 (November 4, 2008)

Bonding Mode: fault-tolerance (active-backup)
Primary Slave: None
Currently Active Slave: eth0
MII Status: up
MII Polling Interval (ms): 0
Up Delay (ms): 0
Down Delay (ms): 0
ARP Polling Interval (ms): 60
ARP IP target/s (n.n.n.n form): 192.168.70.1

Slave Interface: eth0
MII Status: up
Link Failure Count: 0
Permanent HW addr: 00:21:28:4a:cd:80

Slave Interface: eth1
MII Status: up
Link Failure Count: 0
Permanent HW addr: 00:21:28:4a:cd:81
[root@hostA ~]#

As you can see that the bonding driver is now monitoring accessibility to 192.168.70.1 every 60 seconds. If this is not successful then it will attempt to use eth1 irrespective of local link status.

Conclusion

MII monitoring based bonding is ideal when you are communicating within a LAN and do not go across a router. IPoIB is a good example in this case because currently InfiniBand networks are limited to same broadcast subnets or in other words, do not use a layer 3 router.

ARP IP target based monitoring should be preferred if your setup is similar to what we just discussed above. If the bonded interface is expected to communicate to an outside world across a router, then its better to monitor the accessibility to a set of external IP addresses instead of just local link status. Client access networks created with EoIB is a good example here.


Comments:

Hi Neeraj

What do you thing about the link state vnic configuration:

eport_state_enforce=1

in the mlx4_vnic kernel module?

Best regards

Posted by Alejandro Vidal Quiroga on August 01, 2012 at 01:42 PM PDT #

Post a Comment:
  • HTML Syntax: NOT allowed
About

You have connected here over internet and already using the technologies under the hood for Networking and may be wondering how things work ?

This blog space will present you with various topics related to Oracle's Products and their close association with Networking. My goal is not to overwhelm you and I will try my best to present information in simple way.
Stay tuned !


About Author: Hi, I am Neeraj Gupta at Oracle. I worked at Sun Microsystems for 11 years specializing in InfiniBand, Ethernet, Security, HA and Telecom Computing Platforms. Prior to joining Sun, I spent 5 years in Telecom industry focusing on Internet Services and GSM Cellular Networks.
At present, I am part of Oracle's Engineered Systems team focused on Networking and Maximum Availability Architectures.

Search

Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today