Clearview IPMP in OpenSolaris
By meem on Jan 21, 2009
Here, I'd like to get a bit more personal as the designer and developer of Clearview IPMP. The project has been a real labor of love, borne both from the challenges many of Sun's top enterprise customers have faced trying to deploy IPMP, and from the formidable internal effort needed to keep the pre-Clearview IPMP implementation chugging along for the past decade. That is, it became clear that IPMP was both simultaneously a critical high-availability technology for our top customers and also an increasing cost on both our engineering and support organizations -- we either needed to kill it or fix it. Ever the optimist and buoyed by a growing customer interest in IPMP, I convinced management that I could tackle this work as part of the broader Clearview initiative that Seb and I were in the process of scoping (and moreover, either killing or fixing IPMP was required to meet Clearview's Umbrella Objectives).
From an engineering standpoint, IPMP is a case study in how much it matters to have the right abstractions. Specifically, the old (pre-Clearview) model was a struggle in large part because it introduced a new "group" abstraction to represent the IPMP group as a whole, rather than modeling an IPMP group as an IP interface (more on core network interface abstractions). This meant that every technology that interacted directly with IP interfaces (e.g., routing, filtering, QoS, monitoring, ...), required heaps of special-case code to deal with IPMP, which introduced significant complexity and a neverending stream of corner cases, some of which were unresolvable. It also made certain technologies (e.g., DHCP) downright impossible to implement, because their design was based on assumptions that held in \*all\* cases other than IPMP (e.g, that a given IP address would not move between IP interfaces). More broadly, with each new networking technology, significant effort was needed to consider how it could be made to work with IPMP, which simply does not scale.
The real tragedy of the old implementation is that the actual semantics -- while often misunderstood by customers and Sun engineers alike -- actually acted as if each IPMP group had an IP interface. For instance, if one placed two IP interfaces into an IPMP group, then added a route over one of those IP interfaces, it was as if a route had been added over the IPMP group. I say "tragedy" because this was wholly unobvious, and thus understandably led to numerous support calls. Similar surprises came from the fact that a packet with a source IP address from one IP interface could be sent out through another IP interface. In short, the implementation had cobbled together various other abstractions to build something that acted mostly like an IPMP group IP interface, but wasn't actually one.
From this one central mistake came a raft of related problems that impacted both the programmatic and administrative models. For instance, in addition to having to teach technologies about IPMP groups, consider what happens when an IP interface fails. In concept, this should be a simple operation: the IP addresses that were mapped to the failed interface's hardware address need to be remapped to the hardware address of a functioning interface in the group. This remapping can occur entirely within IP itself -- applications using those IP addresses should not need to know or care. However, in the old IPMP implementation, this was actually a very disruptive operation: the IP addresses had to be visibly moved from the failed IP interface to a functioning IP interface, confusing applications that either interacted with the IP interface namespace or listened to routing sockets. Moreover, the application had to be specially coded to know that while the IP interface had failed, it should not react to the failure because another IP interface had taken over responsibility. Similar problems abounded in areas both far and near; an interesting recent example is the issue Steffen found with the new defrouter feature and Solaris 10 IPMP. That problem doesn't exist with Clearview IPMP not because we overpowered it with reams of code but simply because the Clearview IPMP design precludes it.
Speaking of "reams of code", one of the aspects I'm most proud of with Clearview IPMP is the size of the codebase. In terms of raw numbers, the kernel implementation has shrunk by more than 35%, from roughly 8500 lines of code to 5500 lines (roughly 1000 lines of that are comments), and the lion's share of that code is isolated behind a simple kernel API of a few dozen functions (in contrast, the old IPMP codebase was sprawling and often written in-line). More importantly, the work needed to integrate the Clearview IPMP code with related technology was minimal: packet monitoring across the group required 15 lines of code; IP filter support required 5 lines of code; dynamic routing required no additional code. The new model also opened up unexpected opportunities, such as allowing the IPSQ framework (the core synchronization framework inside IP) to be massively simplified. Further, as a side effect of the new model, Clearview IPMP was able to fix many longstanding bugs -- some as old as IPMP itself -- such as 5015757, 6184000, 6359536, 6516992, 6591186, 6698480, 6752560, and 6787091 (among others).
Anyway, it's obvious that I'm a proud and biased parent. Whether my pride is justified will only become clear once Clearview IPMP has ten years of production use under its belt and an objective comparison is possible. However, I encourage you all to take it for a spin now and make your own assessment -- and of course feedback is welcome, either to me in private or on clearview-discuss-AT-opensolaris.org.