Sunday May 31, 2009

Clearview IPMP in Production

When I was first getting obsessed with programming in my early teens, I recall waking up on many a Saturday to the gleeful realization I had the whole day to improve some crazy piece of home-grown software. Back then, the excitement was simply in the journey itself -- I was completely content in being the entire userbase.

Of course I still love writing software (though the idea of being able to devote a whole day to it seems quaint) -- but it pales in comparison to the thrill of knowing that real people are using that software to solve their real problems. Unfortunately, with enterprise-class products such as Solaris, release schedules have historically meant that a completed project may have to wait years until it gets to solve its first real-world problem. By then, several other projects may have run their course and I'm invariably under another one's spell and not in the right frame of mind to even reminisce, let alone rejoice.

Thankfully, times have changed. First, courtesy of OpenSolaris's ipkg /dev repository, only a few weeks after January's integration, Clearview IPMP was available for bleeding-edge customers to experiment with (and based on feedback I've received, quite a few have successfully done so). Second, for the vast majority who need a supported release, Clearview IPMP can now be found in the brand-new OpenSolaris 2009.06 release. Third, thanks to the clustering team, Clearview IPMP also works with the current version of OpenSolaris Open HA Cluster.

Further, there is one little-known but immensely important release vehicle for Clearview IPMP: the Sun Storage 7000 Q2 release. Indeed, in the months since the integration of Clearview IPMP, I've partnered with the Fishworks team on getting all of the latest and greatest networking technologies from OpenSolaris into the Sun Storage 7000 appliances. As such, the Q2 release contains all of the Solaris networking projects delivered up to OpenSolaris build 106 (most notably Volo and Crossbow), plus Clearview IPMP from build 107. Of course, these projects also open up a range of new opportunities for the appliance -- especially around Networking QoS and simplified HA configuration -- which will find their way into subsequent quarterly releases.

Needless to say, all of this is immensely satisfying for me personally -- especially the idea that some of our most demanding enterprise customers are relying on Clearview IPMP to ensure their mission-critical storage remains available when networking hardware or upstream switches fail. As per my blog entry announcing Clearview IPMP in OpenSolaris, it's clear I'm a proud parent, but given the thrashing we've given it internally and its track-record thus far with customers, I'm confident it's ready for prime time.

For those exploring IPMP for the first time, Xiang Zhou (the co-author of its extensive test suite) has put together a great blog entry, including step-by-step instructions. Additionally, Raoul Carag and I extensively revised the IPMP administrative overview and IPMP tasks guide.

Those familiar with Solaris 10 IPMP may wish to check out a short slide deck that highlights the core differences and new utilities (if nothing else, I'd recommend scanning slides 12-21).

Have fun -- and of course, I (and the rest of the Clearview team) am eager to hear how it stacks up against your real-world networking high-availability problems!

Wednesday Apr 25, 2007

IPMP Development Update #2

IPMP Development Follow-up

Several folks have again (understandably) asked for updates on the Next-Generation IPMP work. Significant progress has been made since my last update. Notably:

  • Probe-based failure detection is operational (in addition to the earlier support for link-based failure detection).
  • DR support of interfaces using IPMP through RCM works. Thanks to the new architecture, the code is almost 1000 lines more compact than Solaris's current implementation -- and more robust.
  • Boot support is now complete. That is any number (including all) interfaces can be missing at boot and then transparently repaired during operation.
  • At long last, ipmpstat. As discussed in the high-level design document, this is a new utility that allows the IPMP subsystem to be compactly examined.

Since ipmpstat allows other aspects of the architecture to be succinctly examined, let's take a quick look at a simple two-interface group on my test system:

  # ipmpstat -g
  GROUP       GROUPNAME   STATE     FDT       INTERFACES
  net57       a           ok        10000ms   ce1 ce0

As we can see, the "-g" (group) output mode tells us all the basics about the group: the group interface name and group name (these will usually be the same, but differ above for illustrative purposes), its current state ("ok", indicating that all of the interfaces are operational), the maximum time needed to detect a failure (10 seconds), and the interfaces that comprise the group.

We can get a more detailed look at the IPMP health and configuration of the interfaces under IPMP using the "-i" (interface) output mode:

  # ipmpstat -i
  INTERFACE   ACTIVE  GROUP       FLAGS   LINK      PROBE     STATE
  ce1         yes     net57       ------  up        ok        ok
  ce0         yes     net57       ------  up        disabled  ok

Here, we can see that ce0 has probe-based failure detection disabled. We can also see issues that prevent an interface from being used (aka being "active") -- e.g., if suppose we enable standby on ce0:

  # ifconfig ce0 standby

  # ipmpstat -i
  INTERFACE   ACTIVE  GROUP       FLAGS   LINK      PROBE     STATE
  ce1         yes     net57       ------  up        ok        ok
  ce0         no      net57       si----  up        disabled  ok

We can see that ce0 is now no longer active, because it's an inactive standby (indicated by the "i" and "s" flags). This means that all of the addresses in the group must be restricted to ce1 (unless ce1 becomes unusable), which we can see via the "-a" (address) output mode ("-n" turns off address-to-hostname resolution):

  # ipmpstat -an
  ADDRESS             GROUP       STATE   INBOUND     OUTBOUND
  10.8.57.210         net57       up      ce1         ce1
  10.8.57.34          net57       up      ce1         ce1

For fun, we can offline ce1 and observe the failover to ce0:

  # if_mpadm -d ce1

  # ipmpstat -i
  INTERFACE   ACTIVE  GROUP       FLAGS   LINK      PROBE     STATE
  ce1         no      net57       ----d-  disabled  disabled  offline
  ce0         yes     net57       s-----  up        disabled  ok
[ In addition to the "offline" state, the "d" flag also indicates that all of the addresses on ce0 are down, preventing it from receiving any traffic. ]
  # ipmpstat -an
  ADDRESS             GROUP       STATE   INBOUND     OUTBOUND
  10.8.57.210         net57       up      ce0         ce0
  10.8.57.34          net57       up      ce0         ce0
We can also convert ce0 back to a "normal" interface, online ce1 and observe the load spreading configurations:
  # ifconfig ce0 -standby
  # if_mpadm -r ce1

  # ipmpstat -i
  INTERFACE   ACTIVE  GROUP       FLAGS   LINK      PROBE     STATE
  ce1         yes     net57       ------  up        ok        ok
  ce0         yes     net57       ------  up        disabled  ok

  # ipmpstat -an
  ADDRESS             GROUP       STATE   INBOUND     OUTBOUND
  10.8.57.210         net57       up      ce0         ce1 ce0
  10.8.57.34          net57       up      ce1         ce1 ce0
In particular, this indicates that incoming traffic to 10.8.57.210 will go to ce0 and inbound traffic to 10.8.57.34 will go to ce1 (as per the ARP mappings). However, outbound traffic will potentially flow over either interface (though to sidestep packet ordering issues, a given connection will remain latched unless the interface becomes unusable).

This also highlights another aspect of the new IPMP design: the kernel is responsible for spreading the IP addresses across the interfaces (rather than the administrator). The current algorithm simply attempts to keep the number of IP addresses "evenly" distributed over the set of interfaces, but more sophisticated policies (e.g., based on load measurements) could be added in the future.

To round out the ipmpstat feature set, one can also monitor the targets and probes used during probe-based failure detection:

  # ipmpstat -tn
  INTERFACE   MODE      TESTADDR            TARGETS
  ce1         mcast     10.8.57.12          10.8.57.237 10.8.57.235 10.8.57.254 10.8.57.253 10.8.57.207
  ce0         disabled  --                  --
Above, we can see that ce1 is using "mcast" (multicast) mode to discover its probe targets, and we can see the targets it has decided to probe, in firing order. We can also look at the probes themselves, in real-time:
  # ipmpstat -pn
  TIME      INTERFACE   PROBE     TARGET              RTT       RTTAVG    RTTDEV
  1.15s     ce1         112       10.8.57.237         1.09ms    1.14ms    0.11ms
  2.33s     ce1         113       10.8.57.235         1.11ms    1.18ms    0.13ms
  3.94s     ce1         114       10.8.57.254         1.07ms    2.10ms    2.00ms
  5.38s     ce1         115       10.8.57.253         1.08ms    1.14ms    0.10ms
  6.19s     ce1         116       10.8.57.207         1.43ms    1.20ms    0.19ms
  7.73s     ce1         117       10.8.57.237         1.04ms    1.13ms    0.11ms
  9.47s     ce1         118       10.8.57.235         1.04ms    1.16ms    0.13ms
  10.67s    ce1         119       10.8.57.254         1.06ms    1.97ms    1.76ms
  \^C
Above, the inflated RTT average and standard deviation for 10.8.57.254 indicate that something went wrong with 10.8.57.254 in the not-too-distant past. (As an aside: "-p" also revealed a subtle longstanding bug in in.mpathd that was causing inflated jitter times for probe targets; see 6549950.)

Anyway, hopefully all this gives you not only a feel for ipmpstat, but a feel for how development is progressing. It should be noted that several key features are still missing, such as:

  • Broadcast and multicast support on IPMP interfaces.
  • IPv6 traffic on IPMP interfaces.
  • IP Filter support on IPMP interfaces.
  • MIB and kstat support on IPMP interfaces.
  • DHCP on IPMP interfaces.
  • Sun Cluster support.
All of these are currently being worked on. In the meantime, we will be making early-access BFU archives based on what we have so far to those who are interested in kicking the tires. (And a big thanks to those customers who have already volunteered!)

Technorati Tag:
Technorati Tag:
Technorati Tag:

About

meem

Search

Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today
News

No bookmarks in folder

Blogroll

No bookmarks in folder