Sunday Dec 27, 2009

Five cscope Tips

As software becomes increasingly complex and codebases continue to sprawl, source code cross-reference tools have become a critical component of a software engineer's toolbox. Indeed, since most of us are tasked with enhancing an existing codebase (rather than writing from scratch), proficiency in use of a cross-reference tool can mean the difference between understanding the subtleties of a subsystem in an afternoon and spending weeks battling "unforeseen" complications.

At Sun, we primarily use a tweaked version of the venerable cscope utility which has origins going back to AT&T in the 1980s (now freely available from cscope.sourceforge.net). As with many UNIX utilities, despite its age it has remained popular because of its efficiency and flexibility, which are especially important when understanding (and optionally modifying) source trees with several million lines of code.

Despite cscope's importance and popularity, I've been surprised to discover that few are familiar with anything beyond the basics. As such, in the interest of increasing cscope proficiency, here's my list of five features every cscope user should know:

  1. Display more than 9 search results per page with -r.

    Back in the 1980s the default behavior may have made sense, but with modern xterms often configured to have 50-70 rows the default is simply inefficient and tedious. By passing the -r option to cscope at startup (or including -r in the CSCOPEOPTIONS environment variable), cscope will display as many search results as will fit. The only caveat is that selecting an entry from the results must include explicitly pressing return (e.g., "3 [return]" instead of "3") so that entries greater than 9 can be selected. I find this tradeoff more than acceptable. (Apparently, the current open-source version of cscope uses letters to represent search results beyond 9 and thus does not require -r.)

  2. Display more pathname components in search results with -pN.

    By default, cscope only displays the basename of a given matching file. In large codebases, files in different parts of the source tree can often have the same name (consider main.c), which makes for confusing search results. By passing the -pN option to cscope at startup (or including -pN in the CSCOPEOPTIONS environment variable) -- where N is the number of pathname components to display -- this confusion can be eliminated. I've generally found -p4 to be a good middle-ground. Note that -p0 will cause pathnames to be omitted entirely from search results, which can also be useful for certain specialized queries.
  3. Use regular expressions when searching.

    While it is clear that one can enter a regexp when using "Find this egrep pattern", it's less apparent that almost all search fields will accept regexps. For instance, to find all definitions starting with ipmp_ and ending with ill, just specify ipmp_.\*ill to "Find this definition". In addition to allowing groups of related functions to be quickly found, I find this feature is quite useful when I cannot remember the exact name of a given symbol but can recall specific parts of its name. Note that this feature is not limited to symbols -- e.g., passing .\*ipmp.\* to "Find files #including this file" returns all files in the cscope database that #include a file with ipmp somewhere in its name.
  4. Use filtering to refine previous searches.

    cscope provides several mechanisms for refining searches. The most powerful is the ability to filter previous searches through an arbitrary shell command via \^. For instance, suppose you want to find all calls to GLDv3 functions (which all start with mac_) from the nge driver (which has a set of source files starting with nge). You might first specify a search pattern of mac_.\* to "Find functions calling this function". With ON's cscope database, this returns a daunting 2400 matches; filtering with "\^grep common/io/nge", quickly pares the results down to the 12 calls that exist within the nge driver. Note that this can be repeated any number of times -- e.g., "\^sort -k2" alphabetizes the remaining search results by calling function.
  5. Use the built-in history mechanisms.

    You can quickly restore previous search queries by using \^b (control-b); \^f will move forward through the history. This feature is especially useful when performing depth-first exploration of a given function hierarchy. You can also use \^a to replay the most recent search pattern (e.g., in a different search field), and the > and < commands to save and restore the results of a given search. Thus, you could save search results prior to refining it using \^ (as per the previous tip) and restore them later, or restore results from a past cscope session.

Of course, this is just my top-five list -- there are many other powerful features, such as the ability to make changes en masse, build custom cscope databases using the xref utility, embed command-line mode in scripts (mentioned in a previous blog entry), and employ numerous extensions that provide seamless interaction with popular editors such as XEmacs and vim. Along these lines, I'm eager to hear from others who have found ways to improve their productivity with this exceptional utility.

Sunday May 31, 2009

Clearview IPMP in Production

When I was first getting obsessed with programming in my early teens, I recall waking up on many a Saturday to the gleeful realization I had the whole day to improve some crazy piece of home-grown software. Back then, the excitement was simply in the journey itself -- I was completely content in being the entire userbase.

Of course I still love writing software (though the idea of being able to devote a whole day to it seems quaint) -- but it pales in comparison to the thrill of knowing that real people are using that software to solve their real problems. Unfortunately, with enterprise-class products such as Solaris, release schedules have historically meant that a completed project may have to wait years until it gets to solve its first real-world problem. By then, several other projects may have run their course and I'm invariably under another one's spell and not in the right frame of mind to even reminisce, let alone rejoice.

Thankfully, times have changed. First, courtesy of OpenSolaris's ipkg /dev repository, only a few weeks after January's integration, Clearview IPMP was available for bleeding-edge customers to experiment with (and based on feedback I've received, quite a few have successfully done so). Second, for the vast majority who need a supported release, Clearview IPMP can now be found in the brand-new OpenSolaris 2009.06 release. Third, thanks to the clustering team, Clearview IPMP also works with the current version of OpenSolaris Open HA Cluster.

Further, there is one little-known but immensely important release vehicle for Clearview IPMP: the Sun Storage 7000 Q2 release. Indeed, in the months since the integration of Clearview IPMP, I've partnered with the Fishworks team on getting all of the latest and greatest networking technologies from OpenSolaris into the Sun Storage 7000 appliances. As such, the Q2 release contains all of the Solaris networking projects delivered up to OpenSolaris build 106 (most notably Volo and Crossbow), plus Clearview IPMP from build 107. Of course, these projects also open up a range of new opportunities for the appliance -- especially around Networking QoS and simplified HA configuration -- which will find their way into subsequent quarterly releases.

Needless to say, all of this is immensely satisfying for me personally -- especially the idea that some of our most demanding enterprise customers are relying on Clearview IPMP to ensure their mission-critical storage remains available when networking hardware or upstream switches fail. As per my blog entry announcing Clearview IPMP in OpenSolaris, it's clear I'm a proud parent, but given the thrashing we've given it internally and its track-record thus far with customers, I'm confident it's ready for prime time.

For those exploring IPMP for the first time, Xiang Zhou (the co-author of its extensive test suite) has put together a great blog entry, including step-by-step instructions. Additionally, Raoul Carag and I extensively revised the IPMP administrative overview and IPMP tasks guide.

Those familiar with Solaris 10 IPMP may wish to check out a short slide deck that highlights the core differences and new utilities (if nothing else, I'd recommend scanning slides 12-21).

Have fun -- and of course, I (and the rest of the Clearview team) am eager to hear how it stacks up against your real-world networking high-availability problems!

Tuesday May 12, 2009

Hunting Cruft

It's no secret that I am borderline-O.C.D. in many aspects of my life -- and especially so when it comes to developing software. However, large-scale software development is inherently a messy process, and even with the most disciplined engineering practices, remnants from aborted or bygone designs often remain, lying in wait to confuse future developers.

Thankfully, many of the more obvious remnants can be identified with automated programs. For instance, the venerable lint utility can identify unused functions within an application. Many moons ago, I applied a similar concept to the OS/Net nightly build process with a utility called findunref that allows us to automatically identify files in the source tree that are not used during a build. (Frighteningly, it also identified 1100 unreferenced files in the sourcebase. That is, roughly 4% of the files we were dutifully maintaining had no bearing whatsoever on our final product. Of course, some of these should have been used, such as disconnected localization files and packaging scripts.)

Cruft-wise, Clearview IPMP posed a particular challenge: the old IPMP implementation was peanut-buttered through 135,000 lines of code in the TCP/IP stack, and I was determined to leave no trace of it behind. As such, over time I amassed collection of programs which were run as cron jobs that mined the sourcebase for possible vestiges (note that this was an ongoing task because the sourcebase Clearview IPMP replaced was still undergoing change to address critical customer needs). Some of these programs were simple (e.g., text-based searches for old IPMP-related abbreviations such as "ill group" and "ifgrp"), but others were a bit more evolved.

For instance, one key problem is the identification of unused functions. As I mentioned earlier, lint can identify unused functions in a program, but for a kernel module like ip things are more complex because other kernel modules may be the lone consumers of symbols provided by it. While it is possible to identify all the dependent modules, build lint libraries for each of them and perform a lint crosscheck across them (and in fact, we do these during the nightly build, though not for unused functions), it is also quite time-consuming and as such a bit heavyweight for my needs.

Thinking about the problem further, another solution emerged: during development, it is customary to maintain a source code cross-reference database, typically built with the classic cscope utility. A little-known aspect of cscope is that it can be scripted. For instance, to find the definition for symbol foo, one can do cscope -dq -L1 foo. Indeed, a common way to check that a symbol is unused is to (interactively) look for all uses of the symbol in cscope. Thus, for a given kernel module, it is straightforward to write a script to find unused functions: use nm(1) to obtain the module's symbol table and then check whether each of those symbols is used via cscope's scripting interface. In fact, that is exactly what my tiny dead-funcs utility does. Clearly, this requires the kernel module to be build from the same source base as the cscope database, and identifies already-extant cruft (in addition to interfaces that may have consumers outside of the OS/Net source tree), but it nonetheless proved quite useful during development (and has been valuable to others as well).

A similar approach can be followed to ensnare dead declarations, though some creativity may be needed to build the list of function/variable names to feed to cscope, as the compiler will have already pruned them out prior to constructing the kernel module and header files require care to properly parse. I resorted to convincing lint to build a lint library out of the header file in question (via PROTOLIB1), then using lintdump (another utility I contributed to the OS/NET tool chain) to dump out the symbol list -- admittedly a clunky approach, but effective nonetheless.

Unfortunately, scripts such as dead-funcs are too restrictive to become general-purpose tools in our chain, though perhaps you will find them (or their approaches) useful for your own O.C.D. development.

Wednesday Jan 21, 2009

Clearview IPMP in OpenSolaris

Clearview IPMP in OpenSolaris At long last, and somewhat belatedly, I'm thrilled to announce that the Clearview IPMP Rearchitecture has integrated into Solaris Nevada (build 107)! Build 107 has just closed internally, so internal WOS images will be available in the next few days (unfortunately, it will likely be a few weeks before the bits are available via OpenSolaris packages). For more on the new administrative experience, please check out the revised IPMP documentation, or Steffen Weiberle's in-depth blog entry. For more on the internal design, there's an extensive high-level design document, internals slides and numerous low-level design discussions in the code itself.

Here, I'd like to get a bit more personal as the designer and developer of Clearview IPMP. The project has been a real labor of love, borne both from the challenges many of Sun's top enterprise customers have faced trying to deploy IPMP, and from the formidable internal effort needed to keep the pre-Clearview IPMP implementation chugging along for the past decade. That is, it became clear that IPMP was both simultaneously a critical high-availability technology for our top customers and also an increasing cost on both our engineering and support organizations -- we either needed to kill it or fix it. Ever the optimist and buoyed by a growing customer interest in IPMP, I convinced management that I could tackle this work as part of the broader Clearview initiative that Seb and I were in the process of scoping (and moreover, either killing or fixing IPMP was required to meet Clearview's Umbrella Objectives).

From an engineering standpoint, IPMP is a case study in how much it matters to have the right abstractions. Specifically, the old (pre-Clearview) model was a struggle in large part because it introduced a new "group" abstraction to represent the IPMP group as a whole, rather than modeling an IPMP group as an IP interface (more on core network interface abstractions). This meant that every technology that interacted directly with IP interfaces (e.g., routing, filtering, QoS, monitoring, ...), required heaps of special-case code to deal with IPMP, which introduced significant complexity and a neverending stream of corner cases, some of which were unresolvable. It also made certain technologies (e.g., DHCP) downright impossible to implement, because their design was based on assumptions that held in \*all\* cases other than IPMP (e.g, that a given IP address would not move between IP interfaces). More broadly, with each new networking technology, significant effort was needed to consider how it could be made to work with IPMP, which simply does not scale.

The real tragedy of the old implementation is that the actual semantics -- while often misunderstood by customers and Sun engineers alike -- actually acted as if each IPMP group had an IP interface. For instance, if one placed two IP interfaces into an IPMP group, then added a route over one of those IP interfaces, it was as if a route had been added over the IPMP group. I say "tragedy" because this was wholly unobvious, and thus understandably led to numerous support calls. Similar surprises came from the fact that a packet with a source IP address from one IP interface could be sent out through another IP interface. In short, the implementation had cobbled together various other abstractions to build something that acted mostly like an IPMP group IP interface, but wasn't actually one.

From this one central mistake came a raft of related problems that impacted both the programmatic and administrative models. For instance, in addition to having to teach technologies about IPMP groups, consider what happens when an IP interface fails. In concept, this should be a simple operation: the IP addresses that were mapped to the failed interface's hardware address need to be remapped to the hardware address of a functioning interface in the group. This remapping can occur entirely within IP itself -- applications using those IP addresses should not need to know or care. However, in the old IPMP implementation, this was actually a very disruptive operation: the IP addresses had to be visibly moved from the failed IP interface to a functioning IP interface, confusing applications that either interacted with the IP interface namespace or listened to routing sockets. Moreover, the application had to be specially coded to know that while the IP interface had failed, it should not react to the failure because another IP interface had taken over responsibility. Similar problems abounded in areas both far and near; an interesting recent example is the issue Steffen found with the new defrouter feature and Solaris 10 IPMP. That problem doesn't exist with Clearview IPMP not because we overpowered it with reams of code but simply because the Clearview IPMP design precludes it.

Speaking of "reams of code", one of the aspects I'm most proud of with Clearview IPMP is the size of the codebase. In terms of raw numbers, the kernel implementation has shrunk by more than 35%, from roughly 8500 lines of code to 5500 lines (roughly 1000 lines of that are comments), and the lion's share of that code is isolated behind a simple kernel API of a few dozen functions (in contrast, the old IPMP codebase was sprawling and often written in-line). More importantly, the work needed to integrate the Clearview IPMP code with related technology was minimal: packet monitoring across the group required 15 lines of code; IP filter support required 5 lines of code; dynamic routing required no additional code. The new model also opened up unexpected opportunities, such as allowing the IPSQ framework (the core synchronization framework inside IP) to be massively simplified. Further, as a side effect of the new model, Clearview IPMP was able to fix many longstanding bugs -- some as old as IPMP itself -- such as 5015757, 6184000, 6359536, 6516992, 6591186, 6698480, 6752560, and 6787091 (among others).

Anyway, it's obvious that I'm a proud and biased parent. Whether my pride is justified will only become clear once Clearview IPMP has ten years of production use under its belt and an objective comparison is possible. However, I encourage you all to take it for a spin now and make your own assessment -- and of course feedback is welcome, either to me in private or on clearview-discuss-AT-opensolaris.org.

Technorati Tag:
Technorati Tag:
Technorati Tag:

Tuesday Sep 02, 2008

Creating Shell-Friendly Parsable Output

Creating Shell-Friendly Parsable Output

Being able to easily write scripts from the command-line has long been regarded as one of UNIX's core strengths. However, over the years, surprisingly little attention has been paid to writing CLIs whose output lend themselves to scripting. Indeed, even modern CLIs often fail to consider parsable output as a distinct concept from human output, leading to overwrought and fragile scripts which inevitably break as the CLI is enhanced over time. Some recent CLIs have "solved" the parsable format problem by using popular formats such as XML and JSON. These are fine formats for sophisticated scripting languages, but a poor match for traditional UNIX line-oriented tools (e.g. grep, cut, head) that form the foundation of shell-scripting.

Even those CLIs that consider the shell when designing a parsable output format often fall short of the mark. For dladm(1M), it took us (Sun) three tries to create a format that can be easily parsed from the shell. So, while the final format we settled on may seem simple and obvious, as is often the case, making things simple can prove to be surprisingly hard. Further, there are a number of alternative output formats that seem compelling at first blush but ultimately prove to be unworkable.

So that others working on similar problems may benefit, below I've summarized our set of guidelines -- some obvious, some not -- that we arrived at while working on dladm. As each CLI has its own constraints, not all of them may prove applicable, but I'd urge anyone designing a CLI with parsable output to consider each one carefully.

To provide some specifics to hang our guidelines on, first, here's an example of the dladm human output format:

  # dladm show-link -o link,class,over
  LINK        CLASS    OVER
  eth0        phys     --
  eth1        phys     --
  eth2        phys     --
  default0    aggr     eth0 eth1 eth2
  cyan0       vlan     default0
... and here's the equivalent parsable output format:
  # dladm show-link -p -o link,class,over
  eth0:phys:
  eth1:phys:
  eth2:phys:
  default0:aggr:eth0 eth1 eth2
  cyan0:vlan:default0
Now, the guidelines:
  1. Design CLIs that output in a regular format -- even in human output mode.

    Once your human output mode ceases to be regular (ifconfig(1M) output is a prime example of an irregular format), later adding a parsable output mode becomes difficult if not impossible. (As an aside, I've often found that irregular output suggests deeper design flaws, either in the CLI itself or the objects it operates on.)

  2. Prefer tabular formats in parsable output mode.

    Because traditional shell scripting works best with lines of information, tabular formats where each line both identifies and describes a unique object are ideal. For example, above, the link field uniquely identifies the object, and the class and over fields describe that object. In some cases, multiple fields may be required to uniquely identify the object (e.g., with dladm show-linkprop, both the link and the property field are needed). As an aside: in the multiple-field case, the human output mode may choose to use visual nesting (e.g., by grouping all of a link's properties together on successive lines and omitting the link value entirely), but it's important this not be done in parsable output mode so that the shell script can remain simple.

  3. Require output fields to be specified.

    Unlike humans, scripts always invoke a CLI with a specific purpose in mind. Also unlike humans, scripts are not facile at adapting to change (e.g., the addition of new fields). Thus, it's imperative that scripts be forced to explicitly specify the fields they need (with dladm, attempting to use -p without -o yields an error message). With this approach, new fields can be added to a CLI without any risk of breaking existing consumers. Further, if a field used by a script is removed, the failure mode becomes hard (the CLI will produce an error), rather than soft (the consumer misparses the CLI's output and does something unpredictable). Note that for similar reasons, if your CLI provides a way to print field subsets that may change over time (e.g., -o all), those must also fail in parsable output mode.

  4. Leverage field specifiers to infer field names.

    Because field names must be specified in an order, it's natural to use that same order as the output order, and thus avoid having to explicitly identify the field names in the parsable output format. That is, as shown above, dladm can omit indicating which field name corresponds with which value because the order is inferred from the invocation of -olink,class,over. This may seem a minor point, but in practice it saves a lot of grotty work in the shell to otherwise correlate each field name with its value.

  5. Omit headers.

    Similarly, because the field order is known (and no human will be staring at the output) there is no utility in providing a header in parsable output mode, and indeed its presence would only complicate parsing. As shown above, dladm omits the header in parsable output mode.

  6. Do not adorn your field values.

    In human output mode, it can be useful to give visual indications for specific field values. For instance, as shown above, dladm shows empty values as "--" in human output mode so that the table does not look malformed. In parsable output mode, such embellishments only complicate and confuse consumers of the data (and may in fact make it ambiguous), and thus should be avoided. As above, in parsable output format, empty values are shown as actually being empty.

  7. Do not use whitespace as a field separator.

    Whitespace may seem like a natural field separator, but in practice it's problematic. Specifically, many shells treat whitespace separators specially by merging consecutive instances into a single instance. For example, consider representing three consecutive empty values. With a non-whitespace field separator such as ":", this would be output as "::" (empty value 1, : separator, empty value 2, :, empty value 3). With the shell's IFS variable set to ":", the shell will parse this as three separate empty values, as intended. With space as the field separator, this would be output as "   ", and with IFS set to " " the shell would misparse this as a single empty value.

  8. Do not restrict your allowed field values.

    While some fields may be controlled directly by the CLI (e.g., the class field above), others are either outside of your direct control (e.g., the link field above), or outside of even your system's control (e.g., the essid field output by dladm show-wifi). As such, aside from ensuring the field value is printable ASCII (where newline is considered as unprintable), no values should be filtered out or forbidden[1].

    Thus, any values that have special meaning should generally be escaped. For instance, with ":" as a field delimiter, IPv6 address "fe80::1" would become "fe80\\:\\:1" when displayed in parsable output mode. Thankfully, escaping does not complicate shell parsing because all popular scripting shells have read builtins that will automatically strip escapes. Thus, the common idiom of piping the output to a read/while loop works as expected without any special-purpose logic. For instance, even though the BSSID field will contain embedded colons, this will loop through each BSSID on each link, trying to connect to one until it succeeds:

          dladm scan-wifi -p -o link,bssid |
          while IFS=: read link bssid; do
                  dladm connect-wifi -i $bssid $link && break
          done
        
    That said, if only a single field has been requested, the field separator is not needed. Since no ambiguity exists in that case, there's no need to escape it -- and not doing so can make things more convenient for other shell idioms -- e.g., to collect all in-range SSIDs:
          ssids=`dladm scan-wifi -p -o bssid`
        
I'd welcome hearing back from others who have tackled this problem.

[1] If unprintable ASCII values can legitimately occur in a given field's output, you need to use another encoding format.

Wednesday Aug 20, 2008

GNOME Home

GNOME Home

Yes, it's been a whole year since I last posted a blog entry. Between moving from Boston to San Francisco (metro, anyway), countless urgent matters (both professional and personal), and wrapping up Clearview IPMP development (more on that real soon), blogging hasn't exactly been top priority. That said, I have amassed a really nice list of topics for future blog entries over the coming weeks (OK, maybe months ;-).

Before we get to all that though, I have an urgent tip for those who are using GNOME's Nautilus on OpenSolaris build 94 or later. It seems that the GNOME development team (not inside Sun) decided to change the Open Terminal menu item (available by right clicking on the desktop) to Open in Terminal, and correspondingly changed things so that the GNOME terminal will open in your ~/Desktop directory, rather than ~. The unmitigated idiocy and arrogance of this change is beyond comprehension, and the pain associated with it only intensifies with each opened terminal. Nonetheless, thankfully, there is a simple way to restore the previous (correct) behavior:

  gconftool-2 -s /apps/nautilus-open-terminal/desktop_opens_home_dir
              -t bool true 
        
Hope this saves some other poor soul from spending half a day digging through the GNOME sources for a solution.

Technorati Tag:
Technorati Tag:
Technorati Tag:

Thursday Aug 16, 2007

Solaris Networking Abstractions

Linked

Solaris draws clear boundaries between IP interfaces, data-links, devices, and physical hardware. However, these boundaries are a frequent source of confusion, especially for migrants from other operating systems that do not have such clear delineations. Further, with data-link abstractions becoming ever-richer (via link aggregations, VLANs, IP tunnels -- and soon VNICs, vswitches, and vbridges), people have become increasingly confused about how the abstractions within and across each layer relate. As such, the Clearview team has been working closely with Sun's documentation writers to provide a background chapter (including illustrations) that illuminate the core abstractions.

Needless to say, I was thrilled to see my original skrawls turned into wonderful images like this one:

Above, one can see the flexible and powerful networking topologies that can be created simply from two common Sun networking cards (in this case, ce and qfe). Above the hardware layer, we see five devices -- one for the ce card, and four for the qfe card (the "q" stands for "quad"; qfe has four network ports on one card, which appear to the operating system as four independent devices).

Above the device layer, we see four physical links (shown in blue) that have been instantiated using those devices (the qfe1 device is unused). These links (as with all links) have been named by the administrator using Clearview's upcoming vanity naming feature. As illustrated, VLANs can be created over the links -- as can aggregations. Further, any of the links can also be instantiated at the IP layer (with their link name) using the ifconfig plumb subcommand. We also see that some links can exist independently of any specific underlying hardware -- such as vpn1, which uses the IP routing table to determine the actual link to direct a given packet to.

Finally, at the IP layer, we see that while most IP interfaces have a one-to-one relationship with an underlying datalink, some (such as lo0) have no underlying datalink, and others (such as eml3) group IP interfaces on the same IP broadcast domain together using IPMP (at least, they will once Clearview IPMP is complete).

Technorati Tag:
Technorati Tag:
Technorati Tag:

Wednesday May 16, 2007

Disruptor

Disruptor

If I may indulge my personal side for a second, congratulations to my father for being named one of Fortune Magazine's top 24 "disruptive innovators"! Since childhood, I've looked up to the passion, precision, integrity and imagination he brought to solving problems of all sizes -- and I continue to be amazed at both the magnitude and relevance of the problems he chooses to tackle today (at age 63, no less!) Needless to say, his approach to engineering (among many other things) has influenced me enormously, and I'm both proud and inspired.

(I could go on at length, but I'll save him the embarrassment ;-)

Wednesday Apr 25, 2007

IPMP Development Update #2

IPMP Development Follow-up

Several folks have again (understandably) asked for updates on the Next-Generation IPMP work. Significant progress has been made since my last update. Notably:

  • Probe-based failure detection is operational (in addition to the earlier support for link-based failure detection).
  • DR support of interfaces using IPMP through RCM works. Thanks to the new architecture, the code is almost 1000 lines more compact than Solaris's current implementation -- and more robust.
  • Boot support is now complete. That is any number (including all) interfaces can be missing at boot and then transparently repaired during operation.
  • At long last, ipmpstat. As discussed in the high-level design document, this is a new utility that allows the IPMP subsystem to be compactly examined.

Since ipmpstat allows other aspects of the architecture to be succinctly examined, let's take a quick look at a simple two-interface group on my test system:

  # ipmpstat -g
  GROUP       GROUPNAME   STATE     FDT       INTERFACES
  net57       a           ok        10000ms   ce1 ce0

As we can see, the "-g" (group) output mode tells us all the basics about the group: the group interface name and group name (these will usually be the same, but differ above for illustrative purposes), its current state ("ok", indicating that all of the interfaces are operational), the maximum time needed to detect a failure (10 seconds), and the interfaces that comprise the group.

We can get a more detailed look at the IPMP health and configuration of the interfaces under IPMP using the "-i" (interface) output mode:

  # ipmpstat -i
  INTERFACE   ACTIVE  GROUP       FLAGS   LINK      PROBE     STATE
  ce1         yes     net57       ------  up        ok        ok
  ce0         yes     net57       ------  up        disabled  ok

Here, we can see that ce0 has probe-based failure detection disabled. We can also see issues that prevent an interface from being used (aka being "active") -- e.g., if suppose we enable standby on ce0:

  # ifconfig ce0 standby

  # ipmpstat -i
  INTERFACE   ACTIVE  GROUP       FLAGS   LINK      PROBE     STATE
  ce1         yes     net57       ------  up        ok        ok
  ce0         no      net57       si----  up        disabled  ok

We can see that ce0 is now no longer active, because it's an inactive standby (indicated by the "i" and "s" flags). This means that all of the addresses in the group must be restricted to ce1 (unless ce1 becomes unusable), which we can see via the "-a" (address) output mode ("-n" turns off address-to-hostname resolution):

  # ipmpstat -an
  ADDRESS             GROUP       STATE   INBOUND     OUTBOUND
  10.8.57.210         net57       up      ce1         ce1
  10.8.57.34          net57       up      ce1         ce1

For fun, we can offline ce1 and observe the failover to ce0:

  # if_mpadm -d ce1

  # ipmpstat -i
  INTERFACE   ACTIVE  GROUP       FLAGS   LINK      PROBE     STATE
  ce1         no      net57       ----d-  disabled  disabled  offline
  ce0         yes     net57       s-----  up        disabled  ok
[ In addition to the "offline" state, the "d" flag also indicates that all of the addresses on ce0 are down, preventing it from receiving any traffic. ]
  # ipmpstat -an
  ADDRESS             GROUP       STATE   INBOUND     OUTBOUND
  10.8.57.210         net57       up      ce0         ce0
  10.8.57.34          net57       up      ce0         ce0
We can also convert ce0 back to a "normal" interface, online ce1 and observe the load spreading configurations:
  # ifconfig ce0 -standby
  # if_mpadm -r ce1

  # ipmpstat -i
  INTERFACE   ACTIVE  GROUP       FLAGS   LINK      PROBE     STATE
  ce1         yes     net57       ------  up        ok        ok
  ce0         yes     net57       ------  up        disabled  ok

  # ipmpstat -an
  ADDRESS             GROUP       STATE   INBOUND     OUTBOUND
  10.8.57.210         net57       up      ce0         ce1 ce0
  10.8.57.34          net57       up      ce1         ce1 ce0
In particular, this indicates that incoming traffic to 10.8.57.210 will go to ce0 and inbound traffic to 10.8.57.34 will go to ce1 (as per the ARP mappings). However, outbound traffic will potentially flow over either interface (though to sidestep packet ordering issues, a given connection will remain latched unless the interface becomes unusable).

This also highlights another aspect of the new IPMP design: the kernel is responsible for spreading the IP addresses across the interfaces (rather than the administrator). The current algorithm simply attempts to keep the number of IP addresses "evenly" distributed over the set of interfaces, but more sophisticated policies (e.g., based on load measurements) could be added in the future.

To round out the ipmpstat feature set, one can also monitor the targets and probes used during probe-based failure detection:

  # ipmpstat -tn
  INTERFACE   MODE      TESTADDR            TARGETS
  ce1         mcast     10.8.57.12          10.8.57.237 10.8.57.235 10.8.57.254 10.8.57.253 10.8.57.207
  ce0         disabled  --                  --
Above, we can see that ce1 is using "mcast" (multicast) mode to discover its probe targets, and we can see the targets it has decided to probe, in firing order. We can also look at the probes themselves, in real-time:
  # ipmpstat -pn
  TIME      INTERFACE   PROBE     TARGET              RTT       RTTAVG    RTTDEV
  1.15s     ce1         112       10.8.57.237         1.09ms    1.14ms    0.11ms
  2.33s     ce1         113       10.8.57.235         1.11ms    1.18ms    0.13ms
  3.94s     ce1         114       10.8.57.254         1.07ms    2.10ms    2.00ms
  5.38s     ce1         115       10.8.57.253         1.08ms    1.14ms    0.10ms
  6.19s     ce1         116       10.8.57.207         1.43ms    1.20ms    0.19ms
  7.73s     ce1         117       10.8.57.237         1.04ms    1.13ms    0.11ms
  9.47s     ce1         118       10.8.57.235         1.04ms    1.16ms    0.13ms
  10.67s    ce1         119       10.8.57.254         1.06ms    1.97ms    1.76ms
  \^C
Above, the inflated RTT average and standard deviation for 10.8.57.254 indicate that something went wrong with 10.8.57.254 in the not-too-distant past. (As an aside: "-p" also revealed a subtle longstanding bug in in.mpathd that was causing inflated jitter times for probe targets; see 6549950.)

Anyway, hopefully all this gives you not only a feel for ipmpstat, but a feel for how development is progressing. It should be noted that several key features are still missing, such as:

  • Broadcast and multicast support on IPMP interfaces.
  • IPv6 traffic on IPMP interfaces.
  • IP Filter support on IPMP interfaces.
  • MIB and kstat support on IPMP interfaces.
  • DHCP on IPMP interfaces.
  • Sun Cluster support.
All of these are currently being worked on. In the meantime, we will be making early-access BFU archives based on what we have so far to those who are interested in kicking the tires. (And a big thanks to those customers who have already volunteered!)

Technorati Tag:
Technorati Tag:
Technorati Tag:

Tuesday Feb 06, 2007

IPMP Development Update

IPMP Development Update

A number of people have sent me emails asking for updates on the Next-Generation IPMP work. In short, there's a lot to do, but development is progressing smoothly and early-access bits are on the horizon[1]. At this point, one can:

  • Create, destroy, and reconfigure IPMP groups with arbitrary numbers of interfaces and IP addresses, using either the legacy or new administrative model.
  • Load-spread inbound and outbound traffic across the interfaces and addresses. As per the new model, all IP addresses are hosted on "IPMP" interfaces and the kernel handles the binding of IP addresses to interfaces in the group internally. There is no longer a visible concept of failover or failback.
  • Use in.mpathd to track the failure and repair of interfaces. It notifies the kernel of these changes so that the kernel can update its interface-to-address bindings.
  • Use if_mpadm to offline and undo-offline interfaces. Again, this causes the kernel to update its interface-to-address bindings.

To illustrate where I'm at, let me use last night's build to show the lay of the land. (What's been implemented is almost identical to what was proposed in the high-level design document -- so please consult that document for additional background.) For starters, one can use the old IPMP administrative commands as before -- e.g., to create a two-interface group with two IP data addresses:

# ifconfig ce0 plumb group ipmp0 10.8.57.34/24 up
# ifconfig ce1 plumb group ipmp0 10.8.57.210/24 up
But what you end up with looks a bit different:
# ifconfig -a
lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 index 1
        inet 127.0.0.1 netmask ff000000
ce0: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 2
        inet 0.0.0.0 netmask ff000000
        groupname ipmp0
        ether 0:3:ba:94:3b:74
ce1: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 4
        inet 0.0.0.0 netmask ff000000
        groupname ipmp0
        ether 0:3:ba:94:3b:75
ipmp0: flags=8001000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4,IPMP> mtu 1500 index 3
        inet 10.8.57.34 netmask ffffff00 broadcast 10.8.57.255
        groupname ipmp0
ipmp0:1: flags=8001000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4,IPMP> mtu 1500 index 3
        inet 10.8.57.210 netmask ffffff00 broadcast 10.8.57.255
Above, we can see that ifconfig has created an ipmp0 interface for IPMP group a and placed the two data addresses that we configured onto it. The ce0 and ce1 interfaces have no actual addresses configured on them (though they would if we'd configured test addresses), but are marked UP so that they can be used to send and receive traffic. Note that ipmp0 is marked with a special IPMP flag to indicate that it is an IP interface that represents an IPMP group.

Though the legacy configuration works, we will recommend configuring IPMP through the new model, since it better expresses the intent. The same configuration as above would be achieved instead by doing:

# ifconfig ipmp0 ipmp 10.8.57.34/24 up addif 10.8.57.210/24 up
# ifconfig ce0 plumb group ipmp0 up
# ifconfig ce1 plumb group ipmp0 up
Note the presence of the ipmp keyword, which tells ifconfig that the interface represents an IPMP group. Because of this keyword, an IPMP interface can actually be given any valid unused IP interface name -- e.g., ifconfig xyzzy0 ipmp will create an IPMP interface named xyzzy0. This follows the Project Clearview tenet that IP interface names must not be tied to the interface type -- which in turn allows one to roll out new networking technologies without disturbing the system's higher-level network configuration.

In general, an IPMP interface can be used like any other IP interface -- e.g., to create a default route through ipmp0, we can do:

# route add default 10.8.57.248 -ifp ipmp0 
We can also examine the ARP table to see the current distribution of ipmp0's IP addresses to IP interfaces in the group (once development is complete, this will be able to be done more easily with ipmpstat):
# arp -an | grep ipmp0
ipmp0  10.8.57.34           255.255.255.255 SPLA     00:03:ba:94:3b:74
ipmp0  10.8.57.210          255.255.255.255 SPLA     00:03:ba:94:3b:75
Here, we see that 10.8.57.34 is using ce0's hardware address, and 10.8.57.210 is using ce1's hardware address. If we offline ce0, we can see the kernel will change the binding:
# if_mpadm -d ce0
# arp -an | grep ipmp0
ipmp0  10.8.57.34           255.255.255.255 SPLA     00:03:ba:94:3b:75
ipmp0  10.8.57.210          255.255.255.255 SPLA     00:03:ba:94:3b:75
One interesting consequence of the new design is that it's possible to remove all of the interfaces in a group and still preserve the IPMP group configuration. For instance:
# ifconfig ce0 unplumb
# ifconfig ce1 unplumb
# ifconfig -a
lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 index 1
        inet 127.0.0.1 netmask ff000000
ipmp0: flags=8001000803<UP,BROADCAST,MULTICAST,IPv4,IPMP> mtu 1500 index 3
        inet 10.8.57.34 netmask ffffff00 broadcast 10.8.57.255
        groupname ipmp0
ipmp0:1: flags=8001000803<UP,BROADCAST,MULTICAST,IPv4,IPMP> mtu 1500 index 3
        inet 10.8.57.210 netmask ffffff00 broadcast 10.8.57.255
Since all of the network configuration (e.g., the routing table) is tied to ipmp0 rather than to the underlying interfaces, it's unaffected. However, of course, no network traffic can flow through ipmp0 until another interface is placed back into the group -- as evidenced by the fact that the RUNNING flag has been cleared on ipmp0.

Those familiar with the existing IPMP implementation may be asking yourself what's left to do. The answer is "quite a bit". Notable current omissions include:

  • Broadcast and multicast support on IPMP interfaces.
  • IPv6 traffic on IPMP interfaces.
  • Probe-based failure detection.
  • DR support of interfaces using IPMP through RCM.
  • MIB and kstat support on IPMP interfaces.
  • DHCP over IPMP interfaces.
  • ipmpstat

Of the above, the first four are supported by the existing IPMP implementation and are (with minor exceptions) requirements for any early-access candidate. That said, as I mentioned earlier, development is proceeding at a good clip -- especially now that the hairy IP configuration multithreading model as been tamed, and several lethal bugs in IP have been nailed[2]. So stay tuned.

Footnotes

[1] If you're a Sun customer interested in kicking the tires in a pre-production environment, please send an email to meem AT eng DOT sun DOT com.
[2] For instance, see http://mail.opensolaris.org/pipermail/clearview-discuss/2007-February/000685.html or http://mail.opensolaris.org/pipermail/clearview-discuss/2007-February/000661.html.

Technorati Tag:
Technorati Tag:

Monday Dec 04, 2006

WiFi/GLDv3

WiFi/GLDv3

Now that the new Solaris WiFi architecture has integrated into Build 54 of Nevada, this seemed an appropriate time to interrupt the static with a fresh blog entry. In short, WiFi's integration represents both a new beginning for WiFi on Solaris and a milestone for the new architecture of Solaris networking. I realize such hyperbole may set off more than a few BS-meters, so let me back these statements up with some specifics.

First, with regard to WiFi:

  1. The kernel now has first-class support for the WiFi link-layer protocols. Previously, WiFi drivers in Solaris masqueraded as Ethernet drivers, and internally performed header translations to send and receive actual WiFi link-layer frames.

    Having first-class support for WiFi simplifies our codebase, reduces per-packet overhead, introduces WiFi-specific kstats, and opens the door for network sniffers like snoop and Ethereal to directly interpret WiFi frames on Solaris via the new DLIOCNATIVE ioctl.

  2. Similarly, the kernel's GLDv3 networking driver framework now natively supports WiFi, allowing WiFi to be seamlessly handled by the protocol stack. Accordingly, the bundled ath driver has been ported to GLDv3, and all unbundled drivers have either been ported to GLDv3 or in the process of being ported. Previously, WiFi drivers were relegated to the GLDv2 framework, which is no longer under active development.

  3. The kernel now has a dedicated net80211 kernel module which facilitates code sharing across WiFi drivers. This kernel module is based closely on the mature and robust FreeBSD 7 WLAN module, allowing us to easily incorporate enhancements as they become available. Previously, different versions of the WLAN framework had been directly linked into each driver, which was a significant maintenance and support hazard.

  4. As a result of (2), WiFi drivers can now be managed using our GLDv3 administration command, dladm. For instance, running dladm show-link on a laptop now shows any available WiFi links alongside the Ethernet links. In addition, new dladm subcommands have been added to allow WiFi administration -- e.g., to connect to the most optimal unencrypted WiFi link in-range, just run dladm connect-wifi. Or you can create keys with dladm create-secobj and then pass those keys to connect-wifi. Check out the EXAMPLES section of the latest dladm manpage for specifics.

Collectively -- and in combination with other smaller improvements -- I hope you agree that this adds up to a solid foundation for WiFi on Solaris. Moreover, this foundation greatly benefits the development of our two follow-on WiFi projects -- specifically, WPA/WPA2 support, and bundled support for many more WiFi chipsets.

Now, with regard to Solaris Networking, WiFi both builds on recent work by other projects and paves the way for ongoing and future projects:

  1. The new WiFi support in GLDv3 makes use of the MAC Type plugin architecture integrated by Project Clearview into Build 44 of Nevada. In fact, two plugin operations -- mtops_header_cook() and mtops_header_uncook() -- were designed around WiFi's requirements. The entire mac_wifi plugin source is just 415 lines of straightforward code, including comments. Without the MAC Type plugin architecture, native WiFi support would have been significantly more complex and less elegant.

  2. The userland WiFi architecture was designed to be easily used by future administrative tools -- especially Network Auto-Magic (NWAM). Specifically, the heavy lifting for the new WiFi dladm subcommands is actually done by the new libwladm library, rendering dladm mostly a trivial wrapper around the libwladm routines.

    This separation of WiFi mechanism from UI policy allows other tools such as NWAM to make use of the WiFi framework without having to resort to ugly calls to dladm or (gag) code duplication. Note that libwladm is currently Consolidation Private, and thus is only safe to call from code in the ON consolidation.

  3. Enhancing dladm to support WiFi required new facilities for administering WEP keys and WiFi properties (e.g., radio and powermode settings). However, to keep the dladm administrative model simple and extensible, we introduced two new generic dladm facilities: link properties and secure objects.

    While these facilities are only used by WiFi at present, future projects will extend them in new directions. For instance, both Project Clearview and IP Instances already have new link properties planned (autopush and zone respectively), and an upcoming project to take administration out of the ndd(1M) stone-age will likely build extensively on link properties. Similarly, future secure objects will exist for WPA/WPA2 keys and perhaps other secure data such as certificates.

Looking ahead to 2007, Project Clearview, NWAM, Crossbow, and IP Instances will all make use of features introduced by WiFi - and with features introduced by one another -- to collectively realize a "new world order" for Solaris networking. Stay tuned, it's going to rock.


Peter Memishian
Last modified: Mon Dec 4 16:21:48 EST 2006 Technorati Tag:
Technorati Tag:

Sunday Jul 30, 2006

On Locking

On Locking

Recently, I've been hip-deep in the 105,000 lines of heavily-multithreaded code that comprise Solaris's IPv4/IPv6 implementation, finishing the bring-up of our new IP Network Multipathing (IPMP) implementation for Clearview. Along the way, I've been reminded of some collected wisdom regarding locking that could use wider dissemination:

  1. An object cannot synchronize its own visibility.

    Most long-lived objects are put into a structure such as a hash table that allows them to be looked up at some point in the future. In order to track the number of threads that currently have an object "checked out", objects are usually reference counted, with the reference count itself being manipulated under the object's lock.

    Things get tricky when the object must go away. In particular, many make the mistake of trying to synchronize the object's removal from visibility under the object's lock when the reference count reaches one. This is not possible: any thread looking up the object -- by definition -- does not yet have the object and thus cannot hold the object's lock during the lookup operation (unless the object lock is not tied to the object itself -- see (2)). Thus, another thread can race and acquire another reference at the same time the object is being destroyed, leading to incorrect behavior. Thus, whatever higher-level synchronization is used to coordinate the threads looking up the object must also be used as part of removing the object from visibility.

    Once the object has been removed from visibility, an object can indeed synchronize its own destruction [1]. The simplest approach is to have the object's destruction done by whatever thread causes the object's reference count to reach zero [2] -- that is, "if you're the last one out, turn off the lights". Note that this logic can be part of the standard code that decrements the object's reference count, but will be guaranteed to be unreachable until the object is removed from visibility. This is because the object's reference count will be incremented when it is made visible to another structure (e.g., the hash table providing its visibility), and that reference will remain until the object is removed from visibility.

  2. An object should not synchronize itself without due cause.

    This is a generalization of (1). When building objects that are intended to be used in a multithreaded environment, it is tempting to build the locks into the objects themselves. For instance, a stack object might contain an internal lock to ensure that multiple threads issuing a push or pop operation simultaneously will operate without corrupting the underlying stack object or returning erroneous results. Languages like Java have promoted this sort of locking to a first-class concept through the "synchronized" keyword.

    While this "works" in the small, it misses the big picture. Specifically, each of those aforementioned threads working on the stack object was performing its pushes and pops as part of accomplishing some larger task. Those larger tasks are indeed what need to be synchronized with one another, so that they appear atomic to each other as a whole. However, only the callers (the threads using the stack objects) have insight into the granularity and semantics of those tasks and the objects that comprise them -- so only they can implement that locking. However, once that locking has been implemented, any internal object locks performing similar functions become superfluous, and only end up complicating the object's implementation (e.g., to avoid recursive mutex locking).

    Thus, many objects are better off leaving their synchronization to their callers, since those users will have to synchronize between each other anyway. Of course, objects will in turn use other objects -- so it's still quite likely that an object will have embedded locks. However, those locks will be used to synchronize access to the objects it's using, rather than attempting to synchronize its own use across multiple threads. In short, locks are best kept external to the data structures they manage, unless the structure itself must support high-performance concurrent access.

  3. A condition can be signaled without holding the condition's lock.

    There is a longstanding and deeply embedded superstition that signaling a condition without holding that condition's lock will compromise correctness. In fact, Solaris's own cond_signal(3C) manpage contains:

          Both functions should be called under the protection of the same
          mutex that is used with the condition variable being signaled.
          Otherwise, the condition variable may be signaled between the test
          of the associated condition and blocking in cond_wait().  This can
          cause an infinite wait.
    
    This is completely false: the thread heading into cond_wait() must have already tested the condition under the lock and concluded it was false in order to decide to cond_wait(). Since any thread changing state that would affect the condition must also be holding the lock, there is no way for the state (and thus the outcome of the test) to change beween the test and the cond_wait(), and thus any cond_signal() sent during that window would end up being spurious anyway. Accordingly, 6437070 has been filed.

    All that said, it may be true that signaling the condition while holding the lock may improve overall determinism since it eliminates a possible avenue for priority inversion. It also may waste cycles since the thread being signalled may not be able to grab the lock (if the signaling thread has not yet dropped it). This recent thread on mdb-discuss contains more on this issue.

Footnotes

[1] As David Powell mentioned to me while discussing this blog entry, "This problem is frequently misrepresented as an inability to synchronize against one's own destruction To be precise, the problem is that an object can't synchronize its own visibility."
[2] The other common option is to have the thread removing the object to wait for the reference counts to drop, but that forces the thread to block for a potentially unbounded period of time, and should only be used if required for correctness.

Technorati Tag:
Technorati Tag:

Friday May 12, 2006

Clearview Updates

Clearview Updates

Clearview development has been proceeding at a rapid pace -- here's a quick update on the milestones reached over the past month:

  • Thanks especially to Sebastien's hard work, the Nemo Binary Compatibility and Nemo Generalization components have been reviewed by the OpenSolaris community, approved by our internal architectural review board, and are nearing integration into OpenSolaris. With these, non-Ethernet Nemo drivers (such as the Clearview IP Tunneling driver) will be able to be written -- not to mention Nemo WiFi drivers (which are almost ready for intergation into OpenSolaris as well). This work also brings us a step closer to making the Nemo interfaces available for third-party use, and paves the way for TCP LSO support.
  • Cathy and Dan Groves have published a proposal for improving the observability of VLAN's, which will be submitted to the our architecture review board shortly. These changes are necessary for Clearview's Nemo Unification component, but are also quite useful in their own regard since they make it significantly easier to track down networking problems that occur on VLANs. Code that implements the proposal is already running internally, and is also destined for a nearby build of OpenSolaris.
  • Phil Kirk has published our proposed architecture for IP Observability. With this work comes the vital ability to debug intra-zone and inter-zone networking problems using traditional utilities such as ethereal and snoop -- along with opening the door for interesting possibilities such as inter-zone IDS's. Again, the code that implements this proposal is already running internally.
  • Sagun Shakya has published our proposal for a public library that can be used to communicate with link-layer devices via DLPI. This work is necessary for [Vanity Naming], but also allows us to centralize all application-level DLPI handling -- and to torch thousands of lines of tedious and obscure code. Expect a revised proposal -- based on our experiences of porting the Solaris DHCP client to use it -- to be posted to OpenSolaris shortly.
I'm also happy to report that we will shortly be making builds of the Clearview gate available to the OpenSolaris community. These early-access bits will include everything mentioned above.

And finally, as promised, here are my photos from Sichuan. Thanks again to Cathy Zhou and her family for taking me on this amazing trip.

Technorati Tag:
Technorati Tag:

Friday Mar 24, 2006

An Update (From China)

An Update (From China)

Greetings from Beijing! Actually, I feel silly saying that since I've been here for the past 2 months on rotation, and I am only a week away from departure. This is my second trip to Beijing, and I've enjoyed it even more than the first -- though the most exciting part of my trip is yet to come: a week-long trip to Sichuan province -- a place legendary for its beautiful land, beautiful ladies, and divinely spicy cuisine. I plan on assembling a photo album of my trip, which I will post here shortly.

As I must pack for the trip, I have limited time to write, but I would like to make folks aware of a few new items we've been busy on with regard to Solaris networking:

  • Clearview is now an "official" OpenSolaris Project. For those trying to wrap their heads around it, we also now have a set of slides which provide a high-level overview. I'm also happy to report that development is proceeding at a rapid pace, and that some components of Clearview will be available for experimentation shortly (via the OpenSolaris Project page).
  • In WiFi news, our long-term proposal for Solaris WiFi administration is under active discussion on the OpenSolaris Networking Forum. Now's the time to speak up if you have strong opinions on such matters.

Technorati Tag:
Technorati Tag:

Tuesday Jan 03, 2006

Private vs. Secret

Private vs. Secret

Some personal responses I got to why lsof does not build in OpenSolaris build 27 made it clear that the distinction we make between private and secret has not been well-communicated.

Specifically, several were confused by my comment that <inet/udp_impl.h> "is not shipped". By this, I meant that it is not part of any OpenSolaris package, and thus that it will not be installed as /usr/include/inet/udp_impl.h on a machine running OpenSolaris. Thus, <inet/udp_impl.h> is what we consider to be a private header file: its contents represent private interfaces that we do not want external software to develop dependencies on[1]. This is the essence of good software engineering: making sure that each software layer depends only on well-defined interfaces from other layers. Moreover, well-defined interfaces allow stability levels to be specified (see attributes(5)) which allow the volatility of each interface to be known by its consumers.

Unfortunately, for a variety of (mostly historical) reasons, many header files containing only private interfaces have been shipped over the years. Moreover, there is not widespread consensus that shipping private header files is a bad idea -- many feel that there is a de facto understanding that undocumented header files are private, and thus that all headers, private or not, should be shipped with the system[2].

Regardless, while <inet/udp_impl.h> is private, it is not secret. That is, although <inet/udp_impl.h> is not shipped with OpenSolaris, it is part of the OpenSolaris source distribution, and available for anyone to examine. In contrast, a secret header file is one that we are legally prohibited from making available as part of OpenSolaris. For instance, the LLC2 driver was developed by a third party, and the terms of the agreement grant only Sun employees (and those under NDA) access to its source. Thus, <sys/llc2.h> is a secret header file, as we are legally prohibited from including <sys/llc2.h> with OpenSolaris. By contrast with open, we term all of our secret source files closed. These files reside in a parallel directory hierarchy of the Solaris ON gate rooted at usr/closed. The OpenSolaris source tree contains everything in the ON gate except for those files under usr/closed.

So, to summarize: private header files are not installed under /usr/include to keep the product more robust and flexible. However, they are part of the OpenSolaris source tree. In contrast, secret (or closed) header files are not installed under /usr/include because we are legally prohibited from doing so. Further, they are not part of the OpenSolaris source tree because we are legally prohibited from doing so.

Footnotes

[1] These dependencies usually happen accidentally, but sometimes they are on purpose. For instance, the aforementioned lsof utility uses many data structures that are contained in private header files, including <inet/ipclassifier.h>. The right long-term answer is to provide public, well-defined interfaces that these utilities can depend on.
[2] This is something I vehemently disagree with. As the lsof example makes clear, once an interface is shipped, external software is tempted to make use of it. These dependencies often remain unknown until an innocent developer needs to change the private interfaces and breaks a popular software package, affecting myriad end-users. Moreover, the "relief" provided by these private interfaces lowers the internal priority of developing proper well-defined interfaces.

Technorati Tag:
Technorati Tag:

About

meem

Search

Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today
News

No bookmarks in folder

Blogroll

No bookmarks in folder