Monday Aug 07, 2006

DHCPv6 Client project on OpenSolaris

Just thought I'd note that Jim has opened the DHCPv6 client project on OpenSolaris. From my perspective, this should mark the last significant bit of work needed to make IPv6 deployable for real customers, a process we started in Solaris nearly 10 years ago with the first IPv6 projects.  And this project is being done now because, in fact, there are real live customers who want to deploy IPv6, and soon.

One question that's come up more than once as we've been preparing this project over the past few months is, "Isn't there an open-source client out there we can just grab and use?"  Note that the appearance of this project on OpenSolaris in no way says that we're only interested in writing a client absolutely from scratch.  In fact, I think we'd be quite open to using an existing implementation that met the requirements.  The problem, though, is that we want an implementation that integrates tightly into the Solaris network configuration architecture, which we are working to actively evolve for the long-term with the Network Auto-Magic project, among others.  Our experience in this respect is that the protocol is actually the simpler part of the code; the integration with the rest of the system is the harder part, and each OS has a different architecture that the client needs to fit into.  Usually, that doesn't lend itself well to a cross-platform client.

History is instructive here, as we've in fact been down this road once before, with DHCPv4.  The DHCPv4 client that was introduced in Solaris 2.6 and continued in Solaris 7 was a third-party implementation designed for portability, and it worked fine insofar as it went.  However, as we planned to expand the usage of DHCP in the system, to support network installation, diskless booting, and all the other things that go into the network being the computer, we realized that the client we had just wasn't designed to do that, and making it do so was going to be basically a rewrite.  As a result, we (or, more accurately, meem) wrote a new client for Solaris 8.  It's proven quite stable, though will likely undergo some minor revisions to make Network Auto-Magic work the way users expect.

I see Jim's already started some early design discussion.  Hope to see you over there!

Thursday Apr 20, 2006

DHCP option lengths

I answered a question today about DHCP option lengths on our internal DHCP support list, a question which has come up before and no doubt will again. I might as well post it here for posterity.

> > Working with a customer on their custom jumpstart environment and they 
> > ran into the 255 byte limit.  I did find the notes in the documentation 
> > confirming the length limit in our DHCP server.  The customer came back 
> > and said that they don't seem to be having the problem with their Linux 
> > (RedHat/SE Linux) DHCP servers when using longer macro parameters for 
> > their network installations.
> > 

First, it's important to get the terminology straight, because it's key 
to understand what is happening.

The DHCP specifications talk about "options".  The format of each option 
in a DHCP packet is specified in such a way that the individual option 
has a one-byte unsigned length, followed by the data.  That means that 
an individual option passed in a DHCP packet can have a length no 
greater than 255 bytes.

Solaris DHCP provides a concept of "macros", which are merely a 
configuration convenience which allows options to be grouped together. 
Macros don't have a length limit, so you can specify any number of 
options in a macro.

Generally, the problem customers encounter is that the vendor options we 
use in Solaris for automating Jumpstart installs all end up collapsed 
into one DHCP option, because that's the way the standard works, and we 
run into the 255 byte limit there.  It's highly unlikely that you'll run 
into it in any other case.  The usual workaround is to use shorter paths 
to the installation image, as those are the options which eat up most of 
the space in the packet; symlinks work well for this purpose.

> > Is there a DRAFT change to DHCP to allow longer macros?

There is an RFC, 3396, which specifies a way to pass more than 255 bytes 
for an option.  It works by passing multiple instances of the option in 
the packet, requiring the client to concatenate them together, which 
means the server has to carefully order them in order for the client to 
reconstruct them correctly.  We have not implemented this option in 
either the Solaris client or server, but CR 4867934 records the need to 
do so; adding another call record there wouldn't hurt - it's already one 
of the highest-priority RFE's we have against DHCP, but there are no 
resources currently assigned to work in that area.

> > Is the Linux-based DHCP server ignoring the standards?
> > 

No, I'm sure they're using the RFC 3396 conventions.  The maintainers of 
the most popular DHCP server on Linux are quite cognizant of remaining 
compliant with the protocol standards.

Friday Sep 09, 2005

Futures for IPMP

Following on from my last entry about IPMP, we've now published for review a design document for new IPMP architecture to the OpenSolaris Networking community. If you're interested in reading about or, more importantly, influencing how IPMP will be evolving, here's your chance.

Technorati Tag:
Technorati Tag:

Wednesday Sep 07, 2005

IPMP on the cheap with Solaris 10

While discussing some examples of unapproachable features in Solaris today, one of our luminaries (who shall remain nameless so as not to dim his luminosity ;-) was noting how customers wouldn't use IP multipathing (IPMP) because of its requirement for additional, dedicated IP addresses to probe the network and detect failures. I pointed out that this is actually no longer true, as we implemented RFE 4840370 in Solaris 10. He hadn't heard about that, and I suspect most everyone else hasn't, either, since it didn't make it into the Solaris 10 What's New book. Here's what we should have said in that book:

"The use of a dedicated test address for IP multipathing groups is no longer required as of Solaris 10. When test addresses are not configured on the interfaces assigned to a group, IP multipathing will detect interface failures solely by monitoring the IFF_RUNNING flag for each network interface. This requires that the network interface driver support link status notification, which is true of most recent network drivers in Solaris. For further information, see in.mpathd(1M) and the System Administration Guide: IP Services."

The end result is that IPMP can be done "on the cheap" if you only care to passively monitor the link status from the driver; if you want to do active monitoring using ICMP probes, it'll still cost you a test address dedicated to each network interface. Note, though, that in.mpathd can use IPv6 link-local addresses for this purpose, one small reason to think about getting to know IPv6.

My buddy meem the code minimalist was the implementor of this feature. Perhaps he'll see fit to talk about the details at some point.

Monday Jun 13, 2005

DHCP server tour

An overview of the Solaris DHCP server

With the launch of OpenSolaris, one thing that comes along for the ride is our DHCP server.  This is somewhat significant, in that the only other widely-used implementation of a DHCP server that's open-source is the one provided by ISC.  We'll leave comparisons between the two for another day, as the point today is to take a look at how the Solaris server is implemented.

First, some history, because some people probably wonder why we don't just ship ISC's server like all the Linux distributions.  Development of the Solaris DHCP server started out about 12 years ago, as part of a PC-NFS spinoff product which was ultimately marketed as SolarNet PC-Admin.  The idea there was to leverage PC-NFS to build a simple, TCP/IP-based workgroup computing solution which used Solaris servers as the back-end for Windows PC desktops (remember, those were the days before Windows 95 where, if you wanted to access the Internet using Windows, you had to buy a third-party product such as PC-NFS).  One of the key features of the product was a way to centralize administrative control for those desktops; DHCP was a new protocol at the time, and it had the basic elements we needed to construct a centralized control solution, so we built a DOS DHCP client, a Solaris DHCP server, and some Solaris commands and a Windows GUI to administer it.  The product ultimately wasn't especially successful, but DHCP had started to catch on in IP networks enough that we needed to bundle a server as part of Solaris.  So in 1996 the team moved over to the Solaris organization and proceeded to integrate, with only some minor changes, the DHCP server and Solaris commands into Solaris 2.6.  In Solaris 8, I added the DHCP Manager GUI.

One thing we'd realized by that time, though, was that our original server was good in the environment we'd written it for, workgroup/department-level networks, but didn't scale to the loads of large-scale enterprises or ISP's.  That led to a very extensive rewrite of the server for Solaris 9 (also shipped in later Solaris 8 updates) to provide high levels of scalability, on the order of several hundred client operations per second on mid-range SPARC hardware of the day; to this day the current implementation is commonly referred to internally as the "Enterprise DHCP" server, because that was the particularly uncreative project name we used.  Right about that same time, as a separate project, we also added RFC 2136 DNS dynamic update support.

The obvious place to start in scaling the server was to make it fully multi-threaded; the original server had been single-threaded, but some limited multithreading to handle ICMP (used for duplicate detection prior to offering an address to a client) had been added as part of the Solaris 2.6 work.  To fully multi-thread the server, though, we needed to make the access to its lease storage thread-safe, as the original code I'd written for that in '94 (which actually was a minor evolution of code from the original Solaris 2 admintool) hadn't had that as a requirement (nowadays you wouldn't get away with writing a library that's not MT-safe).  We also felt, though, that we needed to provide more flexibility in storing the DHCP data.  We'd provided two options for storing the data from the beginning: traditional Unix ASCII files, and NIS+ tables.  Neither was actually sufficient for the transaction rates we wanted to support.  Ultimately, we settled on providing a plug-in architecture for data storage, with a private API used by the server and the administrative tools, which mediated access to a documented public layer, so that we (or anyone else) could add data stores to the server.  The existing data stores were re-implemented in terms of this API, and we added a new, binary file format which would support the transaction rate we were after.  The Netra HA folks subsequently wrote a high-availability data store solely from the documentation, so we know it can be done.

Well, that's the history behind what you see in the source today.  The main DHCP server source is found in usr/src/cmd/cmd-inet/usr.lib/in.dhcpd (one helpful thing to know is that most IP networking utilities are in the usr/src/cmd/cmd-inet portion of the tree).  A couple of interesting reads there are the comments in main.c describing the thread model, and the README on the server's in-memory caching, another part of the performance story.  The data storage API is over under usr/src/lib in libdhcpsvc; the private subdirectory contains the API used by the server and tools, while the modules subdirectory contains the various data storage modules that put the bits on the disk.  Other interesting libraries are libdhcpdu, which implements the DNS dynamic update feature, and libdhcputil, which contains some shared code related to DHCP option definitions that is used by the server, the admin tools, and the DHCP client.  The server administration tools, including the CLI's, can all be found under usr/src/cmd/cmd-inet/usr.sadm/dhcpmgr, one of the early, and few, outposts of Java in the OS/Net source.

If this has whetted your appetite, you can get really deep into one of the more unusual aspects of our server implementation over in my buddy meem's blog.

Technorati Tag:
Technorati Tag:


I'm the architect for Solaris deployment and system management, with a lot of background in networking on the side. I spend a lot of my time currently operating Solaris Engineering's OpenStack cloud. I am co-author of the OpenSolaris Bible (Wiley, 2009). I also play a lot of golf.


« July 2016

No bookmarks in folder


No bookmarks in folder