Friday Apr 15, 2005

DTrace is part of this complete operating system

Earlier this week, Mr. Vaughan-Nichols at eWeek wrote a largely inaccurate and needlessly hostile article about the CDDL, and our own Andy Tucker called him on a few points. Without bothering to correct that article or respond, he went back at it again on Wednesday, this time giving air time to SCO and their blessing of the OpenSolaris program. Why Mr. McBride of SCO felt the need to give this "blessing" is unclear; Sun obviously believes it has the rights needed to make the sources to nearly all of Solaris available under whatever license(s) we choose. Without those rights, no blessing would be sufficient; with them, none is necessary. I'll chalk this up to SCO taking whatever opportunity it can to appear relevant, especially as they continue to struggle in both the marketplace and the courtroom.

Enough of that. Instead, I'd like to focus on the most obvious and significant error in this article: the assertion that

"To date, though, the only released components of OpenSolaris are programs, such as DTrace, which aren't parts of the operating system."

We don't need to be too picky about what constitutes an operating system; even the most pedantic would surely agree that a component which spans the system from user applications to the heart of the kernel is part of the operating system. Under even an extremely narrow definition, DTrace is very much a part of the Solaris operating system - and therefore also of OpenSolaris technology. Our release of DTrace includes the sources for not just the standalone program dtrace(1M), but also all of the following:

  • The userland library libdtrace(3LIB) which provides most of dtrace(1M)'s functionality
  • Three other userland programs: lockstat(1M), plockstat(1M), and intrstat(1M), which are implemented using DTrace
  • Several kernel modules: dtrace(7D), fasttrap(7D), fbt(7D), lockstat(7D), profile(7D), sdt(7D), and systrace(7D); these implement the kernel portions of DTrace
  • Code added to the kernel itself to support dtrace, such as usr/src/uts/common/os/dtrace_subr.c
  • Two additional private user libraries which provide access to Compact C Type Format (CTF) data and the proc(4) filesystem
  • Small programs demonstrating the D language and DTrace functionality
  • A variety of headers and glue

It should be apparent that this is far more complex a subsystem than just one standalone user program. In fact, the source to dtrace(1M) is a single file out of 345 we released, and constitutes only 1431 of 102,163 lines of code (about 1.4%) in this initial release. It dtrace(1M) were simply an ordinary user program, it would not require over 100,000 lines of additional code - including over 32,000 in the kernel - to make it work.

As a final example, observe this comment block from usr/src/uts/os/common/dtrace_subr.c:

 \* Making available adjustable high-resolution time in DTrace is regrettably
 \* more complicated than one might think it should be.  The problem is that
 \* the variables related to adjusted high-resolution time (hrestime,
 \* hrestime_adj and friends) are adjusted under hres_lock -- and this lock may
 \* be held when we enter probe context.  One might think that we could address
 \* this by having a single snapshot copy that is stored under a different lock
 \* from hres_tick(), using the snapshot iff hres_lock is locked in probe
 \* context.  Unfortunately, this too won't work:  because hres_lock is grabbed
 \* in more than just hres_tick() context, we could enter probe context
 \* concurrently on two different CPUs with both locks (hres_lock and the
 \* snapshot lock) held.  As this implies, the fundamental problem is that we
 \* need to have access to a snapshot of these variables that we _know_ will
 \* not be locked in probe context.  To effect this, we have two snapshots
 \* protected by two different locks, and we mandate that these snapshots are
 \* recorded in succession by a single thread calling dtrace_hres_tick().  (We
 \* assure this by calling it out of the same CY_HIGH_LEVEL cyclic that calls
 \* hres_tick().)  A single thread can't be in two places at once:  one of the
 \* snapshot locks is guaranteed to be unheld at all times.  The
 \* dtrace_gethrestime() algorithm is thus to check first one snapshot and then
 \* the other to find the unlocked snapshot.

This comment, while arcane, is clear by itself, so I will not attempt to add to it. I will only point out that if DTrace were not a part of the operating system, it would not need to concern itself with the locking rules for updates to the high-resolution system timers. Further examples of DTrace's intimate association with core features of the Solaris kernel and userland libraries can easily be found by examining the sources.

Sun's DTrace experts have written extensively about their creation [more here and here to note just two] and provided a highly detailed reference manual. While much of this material may not be in a format which is accessible to the layman, even a cursory overview of the source we are offering and the breadth and depth of publications on the topic should be sufficient to satisfy one that DTrace is very much a part of the operating system. Perhaps Mr. Vaughan-Nichols was simply unfamiliar with the offering; in that case I would invite him to download the sources and inspect them himself, and to seek the opinions of expert engineers before making further claims of this sort. DTrace is very much a part of Solaris, and while we have much more to do, releasing it as open source was no trivial step.

Thursday Dec 23, 2004

Linus on Solaris

Most people have probably read the recent Linus interview, in which he has a number of things to say about Linux, Solaris, and software development. Like any interview, it contains some interesting assertions, some obvious filler, and some real head-scratchers. Many in the Solaris community have expressed dismay or anger over some of his remarks, but rather than add to that, I'd like to examine some internal contradictions in Linus's statements and try better to understand why he's made them. As we ready OpenSolaris for public consumption and contribution, it's important to observe how similar development systems work and take steps to avoid difficulties encountered by other projects. Linus's comments indicate that, indeed, the structures and processes in place to serve Linux development are imperfect. We will be well-served to learn from this.

One of the head-scratchers is his assertion that he's not interested in Solaris because he feels it offers nothing of value that isn't already in Linux. This conclusion might be less baffling, though no less disappointing, if he'd actually examined the code, the feature set, and then made up his mind. But he admitted openly that he probably won't even look at the code, and instead will rely on others to tell him if it contains ideas worth considering. I really have to wonder about this approach, especially given his later comments concerning the reason for adding a feature to a system. We certainly agree with him that system design is about solving problems, not just doing something new and different for its own sake. Features don't get added to Solaris if they don't serve some useful purpose, fill some hole for developers, users, or both. It's difficult to believe that Solaris developers and users have problems to solve that differ greatly from those of Linux developers and users. In fact, as a long-time Linux developer myself, I can say with some confidence that the challenges are the same. So why does Solaris offer tools like kmdb, dtrace, and crash dumps, while Linus either refuses to integrate similar functionality or claims he hasn't heard of the problems these tools help to solve?

One possible reason is that distributions sometimes provide parts of these feature sets, so that users never even realize their absence in Linux proper. Linus talked about the distributors serving a valuable function, buffering developers from customers. But perhaps in that process, valuable information is not making its way back to Linus. The Linux development community would be well-served by talking to ordinary systems administrators now and then. Another possibility is that users and administrators can't, won't, or don't effectively communicate the problems they are trying to solve. But why don't Solaris users seem to have this problem? Do Linux distributors simply not listen? Or perhaps these decisions are really based on ideology, as so many Linux detractors claim. Regardless, a sober assessment of users' real-world needs might well reveal that Linus and others still have much work to do (as do Solaris developers), and that some of the changes they ought well to consider have already been made in other systems. The solutions Linus might choose may well be quite different from those chosen by Sun, but disregarding or remaining ignorant of the challenges is an opportunity lost to innovate and improve. What kind of engineer willingly passes up that opportunity?

If NIH is in fact "a disease" - a point which ought to solicit universal agreement, I'm left to wonder why Linus would pass up an opportunity to examine the works of other engineers. If he does in fact rely on others to tell him about valuable features in similar systems, something in that process is broken. If he wants to make sure Linux can solve all the problems Solaris can, I'd suggest he look closely at what's been done here. The code isn't even needed for this - a quick glance at public white papers would be sufficient to understand many of the problems Solaris engineers have been working to solve. If he doesn't believe these problems exist, a reality check is in order.

There are lessons here, of course. One of them is that systems developers must not lose touch with the problems they're supposed to solve. It pays to listen. Another lesson is that a process which prevents useful features from being implemented is broken, and someone has to be willing to recognize and correct such a process. If distributions take on the work of making a usable system and interacting with customers, engineers risk losing sight of appropriate goals. This is avoidable, but that it appears to be occurring implies that the relationship among Linux (the codebase), its distributors, and its developers (many of whom work for distributors) is defective in some way.

I'm cheered by the prospects for OpenSolaris to avoid these pitfalls, especially if we recognize them and take proper action. I hope we as a community will remain cognizant that they have hindered other large projects before ours, even those with leaders of Linus's stature.

Friday Aug 13, 2004

A Sense of Entitlement

I've finally decided to write a bit about a topic that has bothered me for many years as a participant in the Free Software community (it applies equally well to Open Source if you prefer): User Entitlement.

Some of you out there know what I mean. You maintain an application in your spare time as a volunteer. You field trouble reports and RFEs and do your best to implement, at minimum, the suggestions that matter to you, all while holding down a job and meeting your personal and family obligations. But for a minority of users, that's not enough; they expect you to implement features that don't interest you and fix bugs you can't reproduce. In short, they expect you to provide support. While one tries never to be rude, at some point the urge to point out the obvious becomes overwhelming: you have the source, you obviously care a lot about this, and nobody else has the time or inclination to do anything about it! Instead of repeatedly asking when I'm going to implement your change, why not implement it yourself and send me a patch?

Of course, the inevitable response to this suggestion is that the user in question is not a programmer. This is a subtle but important fact that has changed the way the community functions over the years; in the beginning, we were all programmers. Now programmers are a minority of Free Software users, just as we are a minority of software users in general. The commons model breaks down under these conditions; many users have little to offer the community as a whole. Bug reports and testing are valuable services, true, but some users are just that - users. Not testers. Not contributors. Not developers. Just users; they use the software, expect (rightly) that it will work as advertised, and become unhappy and demanding if it does not. This looks a lot more like a customer than the fellow co-op shareholder the model would suggest.

I don't mean to suggest that this behaviour is representative, but it certainly has increased as the pool of users has expanded. How will Free Software projects in the future deal with the influx of Users? Much work has been done, mostly in economics, on the subject of managing cooperatives and commons; I believe this work is directly relevant to the Free Software community. I'll get more into some of that work in my next post.

Tuesday Aug 03, 2004

Solaris at OSCON

Last week a contingent from Sun showed up at OSCON to talk about Solaris 10, meet with some community leaders to discuss building a community around open source Solaris, and of course learn from the other conference attendees.

Simon Phipps gave a talk on open source development and the meaning of freedom which was quite interesting. One of his points was that freedom for deployers is as important as freedom for developers, and that while licenses can help to build a community around a piece of software by giving the developers freedom, this is not sufficient as an end in itself. This is one of the things we're looking at as we get ready to release Solaris under an OSI-approved license of some as yet undetermined kind. Governance and community structure are at least as important as the license we eventually choose, and we have much more direct and immediate participation in these aspects. One point I've tried to emphasize is that successful communities form spontaneously and organically; they cannot be constructed from scratch, purchased, or willed into existence. Our challenge is to get people excited about Solaris and interested in being a part of that community, then to provide them with infrastructure and a reasonable way to make their voices heard as we proceed. Many people at the conference had some helpful suggestions for doing this.

Perhaps the most important thing to remember is that developers are fundamentally attracted by two things: exciting technology and a meaningful opportunity to work on it. The success of the BOF that Adam, Andy, Bart, and Eric put on showed many people just how exciting some of these technologies are. The feedback we received was overwhelmingly positive. Rarely does such a demo-friendly piece of technology as DTrace come along, and many of the attendees were clearly impressed. Still, it's only one of many major enhancements in S10; it's fairly obvious that there will be plenty of developers attracted to our technology provided we can generate enough awareness.

But as we've seen, compelling technology and an open license are not enough to make for a successful project. GCC and XFree86 had both, but neither project was successful in building and sustaining a community (GCC of course has been reborn since the quagmire of the 2.8 era). Many things make up a project's public perception as one which is easy and fun, or frustrating and counterproductive, in which to participate. Infrastructure, governance, and developer resources each play a role. My focus at the moment is on our efforts to help developers get started; this means providing good documentation and keeping the barriers to entry as low as possible. As one of the people who will be creating the developer documentation, tutorials, and examples, I'd be very interested in your feedback. When you're attracted to a project, what type of resource increases your desire to participate? What dissipates your interest and turns it into frustration?


Everyone is doing it and you don't want to be left out, do you? Besides, the first one's free! I'll mostly be writing about my experiences becoming a part of the Solaris kernel group at Sun. One area I'll be working in is opening Solaris to the world, which is one of the most exciting things to happen in a long time. It's especially interesting because I joined Sun only a few weeks ago after being a sysadmin and maintaining SPARC Linux in my little remaining spare time. Thanks to Sun I'm now in recovery and hopefully on my way to making some kind of contribution in the real world.



« June 2016