IPMP Development Update #2

IPMP Development Follow-up

Several folks have again (understandably) asked for updates on the Next-Generation IPMP work. Significant progress has been made since my last update. Notably:

  • Probe-based failure detection is operational (in addition to the earlier support for link-based failure detection).
  • DR support of interfaces using IPMP through RCM works. Thanks to the new architecture, the code is almost 1000 lines more compact than Solaris's current implementation -- and more robust.
  • Boot support is now complete. That is any number (including all) interfaces can be missing at boot and then transparently repaired during operation.
  • At long last, ipmpstat. As discussed in the high-level design document, this is a new utility that allows the IPMP subsystem to be compactly examined.

Since ipmpstat allows other aspects of the architecture to be succinctly examined, let's take a quick look at a simple two-interface group on my test system:

  # ipmpstat -g
  GROUP       GROUPNAME   STATE     FDT       INTERFACES
  net57       a           ok        10000ms   ce1 ce0

As we can see, the "-g" (group) output mode tells us all the basics about the group: the group interface name and group name (these will usually be the same, but differ above for illustrative purposes), its current state ("ok", indicating that all of the interfaces are operational), the maximum time needed to detect a failure (10 seconds), and the interfaces that comprise the group.

We can get a more detailed look at the IPMP health and configuration of the interfaces under IPMP using the "-i" (interface) output mode:

  # ipmpstat -i
  INTERFACE   ACTIVE  GROUP       FLAGS   LINK      PROBE     STATE
  ce1         yes     net57       ------  up        ok        ok
  ce0         yes     net57       ------  up        disabled  ok

Here, we can see that ce0 has probe-based failure detection disabled. We can also see issues that prevent an interface from being used (aka being "active") -- e.g., if suppose we enable standby on ce0:

  # ifconfig ce0 standby

  # ipmpstat -i
  INTERFACE   ACTIVE  GROUP       FLAGS   LINK      PROBE     STATE
  ce1         yes     net57       ------  up        ok        ok
  ce0         no      net57       si----  up        disabled  ok

We can see that ce0 is now no longer active, because it's an inactive standby (indicated by the "i" and "s" flags). This means that all of the addresses in the group must be restricted to ce1 (unless ce1 becomes unusable), which we can see via the "-a" (address) output mode ("-n" turns off address-to-hostname resolution):

  # ipmpstat -an
  ADDRESS             GROUP       STATE   INBOUND     OUTBOUND
  10.8.57.210         net57       up      ce1         ce1
  10.8.57.34          net57       up      ce1         ce1

For fun, we can offline ce1 and observe the failover to ce0:

  # if_mpadm -d ce1

  # ipmpstat -i
  INTERFACE   ACTIVE  GROUP       FLAGS   LINK      PROBE     STATE
  ce1         no      net57       ----d-  disabled  disabled  offline
  ce0         yes     net57       s-----  up        disabled  ok
[ In addition to the "offline" state, the "d" flag also indicates that all of the addresses on ce0 are down, preventing it from receiving any traffic. ]
  # ipmpstat -an
  ADDRESS             GROUP       STATE   INBOUND     OUTBOUND
  10.8.57.210         net57       up      ce0         ce0
  10.8.57.34          net57       up      ce0         ce0
We can also convert ce0 back to a "normal" interface, online ce1 and observe the load spreading configurations:
  # ifconfig ce0 -standby
  # if_mpadm -r ce1

  # ipmpstat -i
  INTERFACE   ACTIVE  GROUP       FLAGS   LINK      PROBE     STATE
  ce1         yes     net57       ------  up        ok        ok
  ce0         yes     net57       ------  up        disabled  ok

  # ipmpstat -an
  ADDRESS             GROUP       STATE   INBOUND     OUTBOUND
  10.8.57.210         net57       up      ce0         ce1 ce0
  10.8.57.34          net57       up      ce1         ce1 ce0
In particular, this indicates that incoming traffic to 10.8.57.210 will go to ce0 and inbound traffic to 10.8.57.34 will go to ce1 (as per the ARP mappings). However, outbound traffic will potentially flow over either interface (though to sidestep packet ordering issues, a given connection will remain latched unless the interface becomes unusable).

This also highlights another aspect of the new IPMP design: the kernel is responsible for spreading the IP addresses across the interfaces (rather than the administrator). The current algorithm simply attempts to keep the number of IP addresses "evenly" distributed over the set of interfaces, but more sophisticated policies (e.g., based on load measurements) could be added in the future.

To round out the ipmpstat feature set, one can also monitor the targets and probes used during probe-based failure detection:

  # ipmpstat -tn
  INTERFACE   MODE      TESTADDR            TARGETS
  ce1         mcast     10.8.57.12          10.8.57.237 10.8.57.235 10.8.57.254 10.8.57.253 10.8.57.207
  ce0         disabled  --                  --
Above, we can see that ce1 is using "mcast" (multicast) mode to discover its probe targets, and we can see the targets it has decided to probe, in firing order. We can also look at the probes themselves, in real-time:
  # ipmpstat -pn
  TIME      INTERFACE   PROBE     TARGET              RTT       RTTAVG    RTTDEV
  1.15s     ce1         112       10.8.57.237         1.09ms    1.14ms    0.11ms
  2.33s     ce1         113       10.8.57.235         1.11ms    1.18ms    0.13ms
  3.94s     ce1         114       10.8.57.254         1.07ms    2.10ms    2.00ms
  5.38s     ce1         115       10.8.57.253         1.08ms    1.14ms    0.10ms
  6.19s     ce1         116       10.8.57.207         1.43ms    1.20ms    0.19ms
  7.73s     ce1         117       10.8.57.237         1.04ms    1.13ms    0.11ms
  9.47s     ce1         118       10.8.57.235         1.04ms    1.16ms    0.13ms
  10.67s    ce1         119       10.8.57.254         1.06ms    1.97ms    1.76ms
  \^C
Above, the inflated RTT average and standard deviation for 10.8.57.254 indicate that something went wrong with 10.8.57.254 in the not-too-distant past. (As an aside: "-p" also revealed a subtle longstanding bug in in.mpathd that was causing inflated jitter times for probe targets; see 6549950.)

Anyway, hopefully all this gives you not only a feel for ipmpstat, but a feel for how development is progressing. It should be noted that several key features are still missing, such as:

  • Broadcast and multicast support on IPMP interfaces.
  • IPv6 traffic on IPMP interfaces.
  • IP Filter support on IPMP interfaces.
  • MIB and kstat support on IPMP interfaces.
  • DHCP on IPMP interfaces.
  • Sun Cluster support.
All of these are currently being worked on. In the meantime, we will be making early-access BFU archives based on what we have so far to those who are interested in kicking the tires. (And a big thanks to those customers who have already volunteered!)

Technorati Tag:
Technorati Tag:
Technorati Tag:

Comments:

As a long time IPMP user, it interests me to know, how is all this IPMP work going to play with VNICs and CrossBow?

Posted by UX-admin on April 26, 2007 at 04:35 AM EDT #

Nifty! Are you planning to add an option to ipmpstat to display traffic by NIC and IPMP group, or will that be covered under one of the other networking projects? - Ryan

Posted by Matty on April 26, 2007 at 03:00 PM EDT #

Ux-admin: a very interesting question, and one we're still working out. You can check out this discussion for some current thoughts. There's more work that needs to be done to ensure that Crossbow and IPMP will interact smoothly. (Note that said work is mostly independent of IPMP NG.)

Matty: since IPMP NG models each IPMP group as an IP interface -- and is slated to export basic kstats from the IP layer in much the same manner as lo0 -- existing tools (or new tools like nicstat) are expected to provide that functionality.

Posted by meem on April 28, 2007 at 07:09 AM EDT #

Post a Comment:
Comments are closed for this entry.
About

meem

Search

Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today
News

No bookmarks in folder

Blogroll

No bookmarks in folder