Monday Jul 12, 2004

The Solaris 10 top 11-20

Solaris 10 has way more features than any release of Solaris that I can remember, and Sun's been marketing the hell out of them. Here's my top 10 list roughly in order of how cool I think each is:

  1. DTrace - of course...
  2. ZFS - the amazing new file system
  3. AMD64 Support - Opteron is so obviously great
  4. Zones - N1 Grid Containers for those of you keeping score at home
  5. Predictive Self Healing - never worry about flaky hardware again
  6. Performance - Solaris 10 is just faster, faster networking, faster everything
  7. Linux Compatibility - run linux binaries unmodified, 'nuf said
  8. Service Management Facility - managing a box just got much easier
  9. Process Rights Management - super-user is no longer a binary proposition
  10. NFSv4 - nfs++ (++++++)

Blah blah blah. That's for sure amazing stuff, but there are dozens of places where you can read about it (I was going to include some links to news stories, but I'm sure google can find you the same stuff it found for me).

But is that it for Solaris 10? Not by a long shot. There are literally dozens of features and improvements which would have cracked the top 10 for the last few Solaris releases. Without further ado, I present my Solaris 10 top 11-20 list:

  1. libumem - the tool for debugging dynamic allocation problems; oh, and it scales as well or better than any other memory allocator
  2. pfiles(1) with file names - you can get at the file name info through /proc too; very cool
  3. Improved coreadm(1M) - core files are now actually useful on other machines, administrators and users can specify the content of core files
  4. System V IPC - no more clumsy system tunables and reboots, it's all dynamic, and -- guess what? -- faster too
  5. kmdb - if you don't care, ok, but if you do care, you really really care: mdb(1)'s cousin replaces kadb(1M)
  6. Watchpoints - now they work and they scale
  7. pstack(1) for java - see java stack frames in a JVM or core file and through DTrace
  8. pmap(1) features - see thread stacks, and core file content
  9. per-thread p-tools - apply pstack(1) and truss(1) to just the threads you care about
  10. Event Ports - a generic API for dealing with heterogeneous event sources

There were some other features in the running (Serial ATA support, vacation(1) filtering, other p-tools improvements, etc.), but I had to draw the line somewhere. Remember this is my list; Solaris 10 has fancy new java and gnome stuff, but, while it's cool I guess, it doesn't do it for me in the way these things do. I'd be doing these features an injustice if I tried to summarize them all in one weblog entry, so I'll bite off one at a time and explain them in detail over the next few days; stay tuned.

Friday Jul 09, 2004

Inside nohup -p

I always thought it was cool, but I was surprised by the amount of interest expressed for my recent post on nohup -p. There was even a comment asking how nohup manages the trick of redirecting the output of a running process. I'll describe in some detail now nohup -p works.

First, a little background material: Eric Schrock recently had a nice post about the history of the /proc file system; nohup makes use of Solaris's /proc and the agent LWP in particular which Eric also described in detail. All of the /proc and agent LWP tricks I describe are documented in the proc(4) man page.

Historically, nohup invoked a process with SIGHUP and SIGQUIT masked and the output directed to a file called nohup.out. When you run a command inside a terminal there can be two problems: all the output is just recorded to that terminal, and if the terminal goes away the command will receive a SIGHUP, killing it by default. You use nohup to both capture the output in a file and protect the process against the terminal being killed (e.g. if your telnet connection drops).

To "nohup" a running process we both need to mask SIGHUP and SIGQUIT and redirect the output to the file nohup.out. The agent LWP makes this possible. First we create the agent LWP and have it execute the sigaction(2) system call to mask of SIGHUP and SIGQUIT. Next we need to redirect any output intended for the controling terminal to the file nohup.out. This is easy in principle: we find all file descriptors open to the controlling terminal, have the agent LWP close them, and then reopen them to the file nohup.out. The problem is that other LWPs (threads) in the process might be using (e.g. with the read(2) or write(2) system calls) those file descriptors and the close(2) will actually block until those operations have completed. When the agent LWP is present in a process, none of the other LWPs can run so none of the outstanding operations on those file descriptors can complete so the process would deadlock. Note that we can work ourselves out of the deadlock by removing the agent LWP, but we still have a problem.

The solution is this: with all LWPs in the process stopped, we identify all the file descriptors that we'll need to close and reopen, and then abort (using the PRSABORT flag listed in the proc(4) man page) those sytem calls. Once all outstanding operations have been aborted (or successfully completed) we know that there won't be any possibility of deadlocking the process. The agent LWP executes the open(2) system call to open the nohup.out file and then has the victim process dup2(3C) that file descriptor over the ones open to the process's controlling terminal (implicitly closing them). Actually, dup2(3C) is a library call so we have the agent LWP execute a fcntl(2) system call with the F_DUP2FD command.

Whew. Complicated to be sure, but at the end of it all, our precious process is protected against SIGHUP and SIGQUIT and through our arduous labors, output once intended for the terminal is now safely kept in a file. If this made sense or was even useful, I'd love to hear it...

Wednesday Jul 07, 2004

Linker alien spotting

After much cajoling, self-described linker alien Rod Evans has joined the fray with his own weblog. As Bryan Cantrill and I will assert again and again, the linker is one of the finest pieces of engineering in all of Solaris, and the Linkers and Libraries Guide is one of the finest pieces of documentation -- and it's written by Rod and his fellow linker aliens. The linker has many many subtle and useful features. See what Rod has to say about it, and go ask him questions -- I know I'm excited to see what Rod up his sleeve.

Thursday Jul 01, 2004

old school Solaris and Solaris quality of life

I don't like to dwell on past Solaris releases, but in Solaris 9 I wrote a cool update for nohup(1). The nohup(1) utility takes a command and its arguments and makes sure that it keeps runnning even if your shell dies or your telnet session drops. Usually the way people use nohup(1) is they login from home, start up a long running process, forget that they should have been running it under nohup(1) and either take their chances with their ISDN line or kill the process and restart it: nohup long-running-command ....

In Solaris 9, I implemented a -p flag that lets users apply nohup(1) to a live running process. And if you've never run nohup(1) you might not care, but if you have, you know how useful this is. Solaris is full of these kind of quality of life tools. Check out all of the p-tools which Eric Schrock has been writing about.

The other night, I was talking to some serious BSD-heads who pointed out that dmsg(1) isn't so useful on Solaris. And that's a bug! We in Solaris take very seriously these sorts of simple quality of life issues, and welcome suggestions. If there's something in Solaris that pisses you off or is better elsewhere, let us know.


Adam Leventhal, Fishworks engineer


« April 2014