Solaris Paleontology

In the footnote a few days ago, I commented on the fact that the history of Solaris debugging could rougly be divded into three 'eras'. As someone interested in UNIX history, I decided to dig through the Solaris archives and put together a chronology of Solaris debuggability and observability tools. For fun, I divided it into eras to parallel Earth's history. And I swear I'm not out to make anyone feel like a dinosaur (or a prokaryote, for that matter).

I've only been around for one of these "dawn of a new era" arrivals, DTrace1. When one of these revolutionary tools arrive, it's amazing to see how quickly engineers avoid their own past. Try asking Bryan to debug a performance problem on Solaris 9, and you'll probably get some choice phrases politely explaining that while he appreciates the importance of your problem, he would rather throw himself down a slide of broken glass and into a vat of rubbing alcohol. Being the neophyte that I am, I've only ventured into the 'Paleozoic era' on one occasion. After an MDB session on a Solaris 8 crashdump (paraphrased slightly):

        $ mdb 0
        > ::print
        mdb: invalid command '::print': unknown dcmd name
        > ::help print
        mdb: unknown command: print
        > ::please print
        mdb: invalid command'::please': unknown dcmd name
        > ::ihateyou
        $

I quickly ran away screaming, never to return. I think I ended up hiding in a corner of my office for two hours, cradling my DTrace answerbook and whispering "there's no place like home" over and over. I'm still a spoiled brat, but at least I have respect and admiration for those Solaris veterans who crawled through debugging hell so that I could live a comfortable life2. It's also made me feel sorry for the Linux (and Windows) developers out there. Not in the Nelson Muntz "Ha ha! You don't have DTrace!" sense. More like "Poor little guy. It's not his fault his species never evolved opposable thumbs." There are a lot of brilliant Linux developers out there, stuck in a movement that doesn't embrace debugging or observability as fundamental goals. But this post is supposed to be about history, not Linux. So without further ado, my brief history of Solaris (soon to be available in refrigerator magnet form):


<1989HADEAN SunOS 4.X
adb, ptrace, crash
1990ARCHAEAN SVr4 merge
/proc
truss(1)
1991 vtrace
vmstat(1M)
iostat(1M)
1992 SOLARIS 2.0
1993 mpstat
SOLARIS 2.2
1994 Kernel slab allocator
TNF
basic ptools
SOLARIS 2.4
1995
1996 SOLARIS 2.5.1
PROTEROZOIC Next generation /proc
Userland watchpoints
1997 lockstat(1M)
pkill and pgrep
libproc
1998 savecore on by default
SOLARIS 7
1999 libproc for corefiles
coreadm(1M)
prstat(1M)
lockstat kernel profiling
PALEOZOIC MDB(1)
::findleaks
2000 SOLARIS 8
EOL of crash(1M)
2001 live process control for MDB
EOL of adb(1)
pargs and preap
MESOZOIC kernel CTF data
trapstat(1M)
2002 SOLARIS 9
libumem(3LIB) and umem_debug(3MALLOC)
::typegraph for mdb(1)
2003 Userland CTF
coreadm(1M) content control
CENOZOIC DTrace(1M)
intrstat(1M)
2004 DTrace pid provider for x86
pfiles with pathnames
DTrace sched, proc providers
CTF for core libraries
DTrace I/O provider
KMDB(1)
DTrace MIB, fpuinfo providers
Per-thread ptools

These are my choices based on SCCS histories and putback logs. Obviously, I've failed to include some things. Leave a comment or email if you think something's not getting the recognition it deserves (keeping in mind this is a blog post, not a book).


1 I actually started exactly one day before DTrace integrated. But I had some experience (albeit limited) as an intern the previous year.

2 In all seriousness, it's not that I don't have to ever debug anything, or that the problems we have today are somehow orders of magnitude simpler than those in the past. What these tools provide is a dramatic reduction in time to root-cause. You still need the same inquisitive and logical mind to debug hard problems, its just that good tools let you form questions and get answers faster than you could before. Really good tools (like DTrace) let you ask the previously unanswerable questions. You may have been able to debug the problem before, but you would have ended up running around in circles trying to get data that's now immediately available thanks to DTrace.

Comments:

The lack of those commands in earlier Solarises (Solarii?) is one of the reasons that we find Solaris CAT so useful in addition to the other tools available. I too love the debugging features available in Solaris 10, however I don't have the luxury of not looking at problems in earlier systems (given that it's what they pay me for). Solaris CAT probably belongs in that timeline somewhere. I know that we were using it internally long before 4.0 was released.

That being said, it's a rare day when you don't hear me say something like, "if we could dtrace this, we'd have the answer by now".

Posted by Alan Hargreaves on August 14, 2004 at 08:48 AM PDT #

I'm just relieved to see that crash(1M) (deceased) didn't make the list. Or perhaps Mike ripping out crash(1M) in June 2000 should be marked as progress? Regarding CAT: it's fair to keep it (and its ilk) off the list; Eric's history is only of tools in the OS, not of tools layered on top of it.

One milestone that probably should be added, however, is Bonwick's kmem debugging facilities circa 1994. (And likewise, umem_debug(3MALLOC) in 2002.) And were I not so modest and self-effacing, I would probably also argue for ::findleaks (1998, shipped 2000) and ::typegraph (2003). ;)

Posted by Bryan Cantrill on August 14, 2004 at 12:19 PM PDT #

The EOL of crash(1M) was in my notes but never made it to the final list; I've added it along with the EOL of adb for good measure. I totally forgot about libumem (and the original kmem allocator). I also threw in ::typegraph since it's saved me a few times.

As Bryan points out, there are many useful tools like CAT, dbx, and lsof which won't make this list simply because they're not part of the OS.

Posted by Eric Schrock on August 14, 2004 at 01:25 PM PDT #

Good point Bryan. Having started crashdumps with adb and kadb, I can certainly appreciate mdb/kmdb and especially not forgetting dtrace.

Posted by Alan Hargreaves on August 14, 2004 at 02:54 PM PDT #

I'll lose the modesty: you've got to add ::findleaks. ::findleaks has found more bugs in the system than any single one of those technologies -- over 250 bugs to date! On the one hand, it's not nearly as radical as something like DTrace -- but it has made a substantial, quantifiable difference in the quality of software that we deliver. And thanks to libumem (and to the work of people like Alan to document its capabilities), ::findleaks is now finding lots of bugs outside Solaris as well. It may be hard for us to remember, but before ::findleaks, memory leaks were actually difficult to track down.

Okay, now back to my regular modest, self-effacing self... ;)

Posted by Bryan Cantrill on August 16, 2004 at 05:54 AM PDT #

Post a Comment:
Comments are closed for this entry.
About

Musings about Fishworks, Operating Systems, and the software that runs on them.

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today