Under the bootchart hood

Last week I announced our bootchart results. Dan continued with a sample of zone boot, as well as some interesting bugs that have been found already thanks to this tool. While we're working on getting the software released, I thought I'd go into some of the DTrace implementation.

To begin with, we were faced with the annoying task of creating a parsable log file. After looking at the existing implementation (which parses top output) and a bunch of groaning, Dan suggested that we should output XML data and leverage the existing Java APIs to make our life easier. Faced with the marriage between something as "lowlevel" as DTrace and something as "abstract" as XML, my first reaction was one of revulsion and guilt. But quickly we realized this was by far the best solution. Our resulting parser was 230 lines of code, compared with 670 for the set of parsers that make up the open source version.

Once we settled on an output format, we had to determine exactly what we would be tracing, and exactly how to do it. First off, we had to trace process lifetime events (fork, exec, exit, etc). With the top implementation, we cannot catch exact event times, nor can we catch short-lived processes which begin and end within a sample period. So we have the following D probes:

  • proc:::create - Fires when a new process is created. We log the parent PID, as well as the new child PID.
  • proc:::exec-success - Fires when a process calls exec(2) successfully. We log the new process name, so that we can convert between PIDs and process names at any future point.
  • proc:::exit - Logs an event when a process exits. We log the current PID.
  • exec_init:entry - This one is a little subtle. Due to the way in which init(1M) is started, we don't get a traditional proc:::create probe. So we have to use FBT and catch calls to exec_init(), which is responsible for spawning init.

This was the easy part. The harder part was to gather usage statistics on a regular basis. The approach we used leveraged the following probes:

  • sched:::on-cpu, sched:::off-cpu - Fires when a thread goes on or off CPU. We keep track of the time spent on CPU, and increment an aggregation using the sum() aggregation.
  • profile:::tick-200ms - Fires on a single CPU every 200 milliseconds. We use printa() to dump the contents of the CPU aggregation on every interval

There were several wrinkles in this plan. First of all, printa() is processed entirely in userland. Given the following script:

        #!/usr/sbin/dtrace -s

        profile:::tick-10ms
        {
                @count["count"] = count();
        }

        profile:::tick-200ms
        {
                printa(@count);
                clear(@count);
        }

One would expect that you would see 5 consecutive outputs of "20". Instead, you see one output of "100", and four more of "0". Because the default switchrate for DTrace is one second, and aggregations are processed by the dtrace(1M) process, we only see the aggregations once a second. This can be fixed by decreasing the switchrate tunable. This also means we can't make use of printa() during anonymous tracing, so we had to have two separate scripts (one for early boot, one for later).

The results are reasonable, but Ziga (the original author of bootchart) suggested a much more clever way of keeping track of samples. Instead of relying on printa(), we key the aggregation based on "sample number" (time divided by a large constant), and then dump the entire aggregation at the end of boot. Provided the amount of data isn't too large, the entire thing can be run anonymously, and we don't have the overhead of DTrace waking up every 10 milliseconds (in the realtime class, no less) to spit out data. We'll likely try this approach in the future.

There's more to be said, but I'll leave this post to be continued later by myself or Dan. In the meantime, you can check out a sample logfile produced by the D script.

Comments:

I am from TAIWAN I'm a collge student. My majority is computer science. I am a freshman. I want to learn unix kernel. I find a book"Lion's Commentary on UNIX 6th Edition"in our college's library. please tell me how to learn UNIX kernel. Do you have any friend who is kernel developer in SUN ? could they give me a direction? THANKS you for tolerating my poor english

Posted by tsj on January 23, 2005 at 01:45 AM PST #

Hey Eric, What ever happened to opensourcing the code for boot chart? Did you ever get to it? thanks Sean

Posted by Sean on December 08, 2006 at 04:13 AM PST #

Thank you very much for sharing a such information with us. I'm working on solaris 10 performance and i have to measure the boot time of solaris, the desktop environnement and the window manager. I'am working on a SPARC machine. Could you show me how to proceed.
Thank you in advance

Posted by rachid on November 22, 2007 at 01:37 AM PST #

Post a Comment:
Comments are closed for this entry.
About

Musings about Fishworks, Operating Systems, and the software that runs on them.

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today