Monday Jan 10, 2011

Chime 1.5

I recently announced version 1.5 of the Chime Visualization Tool for DTrace. One of my favorite changes in this update is not listed in that announcement: Chime now loads the last selected trace category at startup.

Last used Chime category selected at startup

What this means is that Chime feels more like your own toolbox, since it loads the traces you were last using. You can customize Chime by making your own category and populating it with the traces you use most. Simply create a directory anywhere under /opt/OSOL0chime/displays/ and copy into it just the traces you want. For example:

% mkdir /opt/OSOL0chime/displays/favorites
% cp /opt/OSOL0chime/displays/syscall.xml /opt/OSOL0chime/displays/favorites
% cp /opt/OSOL0chime/displays/system_probes.xml /opt/OSOL0chime/displays/favorites/
% cp /opt/OSOL0chime/displays/toolkit/vopstat.\* /opt/OSOL0chime/displays/favorites/
% cp /opt/OSOL0chime/displays/toolkit/procsystime.\* /opt/OSOL0chime/displays/favorites/

Then copy /opt/OSOL0chime/displays/toolkit/description.xml into that same directory and edit the "name" and "description" properties to match what you'd like to appear in the Category list and the Description text pane, respectively (see picture above). For example:

% cp /opt/OSOL0chime/displays/toolkit/description.xml /opt/OSOL0chime/displays/favorites/
% vi /opt/OSOL0chime/displays/favorites/description.xml

Then you might change
  <void property="name">
  <void property="description">
   <string>Traces adapted from scripts&lt;br&gt;in the DTraceToolkit.


  <void property="name">
  <void property="description">
   <string>Frequently used traces.</string>

When you next run Chime, you would find Favorites in the Category list. After selecting it, assuming you never changed it, Chime would always start up like this:

Custom Chime category loaded at startup

You can also specify a directory with 'chime -d', in case you want commands for several different customizations. It's easy to create your own traces to populate your custom categories. Give it a try!

Tuesday Jun 30, 2009

Chime Automatic Drilldown

I think you'll like this latest update to Chime. Drilldown has been a feature of Chime since its initial release. It lets you see more detail about selected items of interest, for example a breakdown by function of a counter for a particular executable. Drilldown supports some of the dynamic capability of dtrace(1m) by launching a new display to answer a question prompted by the current display. However, even though drilldown is easy to use once enabled (simply right-click on an item of interest), it has been difficult to enable in new displays and downright tedious if you wanted to specify multiple levels of drilldown. The feature has always required the addition of placeholders in your DTrace program and an explicit description of every drilldown option you wanted to support.

This latest update removes some of the barriers to enabling drilldown in new displays, and even supports drilldown automatically for displays generated on the command line using the -n and -s options. For example:

    % chime -ktn 'syscall:::entry { @[execname] = count(); }' -T "System Calls"

To try this (Chime is available for download here), you'll first want to add /opt/OSOL0chime/bin to your PATH, or else add an alias to your shell startup script: 'alias chime=/opt/OSOL0chime/bin/chime' so you can use the command 'chime' as conveniently as you use the command 'dtrace'. The -n option specifies the DTrace program, just as it does with the dtrace command. The -k option says you want the display to include a sparkline of values over time. The -t option says you want display totals, and the -T option sets the title bar to something more informative than the default title "Display". There's nothing new here, except that now you get drilldown automatically. Run the above command to see how this works:

System Calls drilldown

The command produces a simple count of system calls per second for each executable. Imagine a mouse pointer at the top left corner of the popup menu over the highlighted "gnome-panel" item. Right-clicking brings up a long list of drilldown options labeled "gnome-panel by <something>". Selecting "... by CPU" modifies the DTrace program as follows:

program before drilldown by cpu program after drilldown by cpu

The aggregation key has been moved to the predicate and filters out all but the selected value. The new aggregation key, cpu, replaces execname in the square brackets. You can drilldown still further:

System Calls by CPU

A few things to notice here: First, the by CPU option no longer appears, since we already aggregated by cpu in the previous drilldown, and doing so again would not further expose any useful detail. Also, you can break down the sum of multiple items, rather than just a single item. In the example above, both CPU 1 and CPU 2 are selected, so breakdown by function looks like this:

program after drilldown by function System Calls by Function

Next, we could select the "read" function and request a breakdown by User Stack:

System Calls by User Stack

The Caller field shows the top of the stack, and the User Stack field shows a tag to identify the full stack trace. To see those stack traces, right-click one or more selected items and choose View Selected Stack Traces from the popup menu. Changing the row selection automatically updates the stack trace display. Multiple stack traces are arranged in a table three to a row so you can easily compare stack traces side-by-side. (Use the Ctrl key to select multiple rows, or the Shift key to select a range of rows.)

Stack Trace display

Drilldown by stack trace is the end of the line and does not permit further drilldown. (The DTrace stack() and ustack() actions are not allowed in a predicate.) Our DTrace program, after three levels of drilldown, looks like this:

program after drilldown by user stack

It would be a pain to type all of that, so you can see that drilldown adds a lot convenience! At the end we got per-second counts of gnome-panel reads on CPU 1 and CPU 2 broken down by user stack, having specified very little of that on the command line. We could have requested running totals instead using the -a option. (Run chime -h to see the complete set of available options.)

Adding the -w option saves the generated display so you can further improve it in the New Display Wizard, then add it to a Chime category so it can be run by selection from a list. (By default -w adds it to the New Displays category.) Currently you add a new category by manually creating a subdirectory of /opt/OSOL0chime/displays with a description.xml file, but eventually I hope to make that a task performed by the GUI. Creating categories with descriptive text for each display is the best way to organize useful traces and remember what they do, and make running them in the future a simple matter of point and click. (Run chime without options to access the list of categories.)

Automatic drilldown relieves you from having to specify many common drilldown options. Installed Chime displays that already supported drilldown continue to do so normally, but add all the options shown above (plus the by Executable option not shown in the example above because the original DTrace program was already aggregated by execname). You can still specify a drilldown option explicitly, and it will appear at the top of the list ahead of any automatic options (overriding an automatic option whose name or aggregation key matches that of the explicit option.)

Drilldown still requires placeholders in your DTrace program. Automatic drilldown generates those placeholders for you in programs specified on the command line (unless they're already present), but it does not do so for displays listed in the GUI. If you add a display to the GUI and want it to support drilldown, you need to ensure that your program has the expected placeholders, either by inserting them manually (the wizard has right click options to help you do that) or by saving a display generated on the command line with -w. To further remove the barrier of understanding Chime's placeholder syntax, another method, the -U option, was added to insert placeholders for you. For example:

    % chime -Un 'syscall:::entry { @[execname] = count(); }'
    { @[??T??] = count(); }

    ??T0 execname

Redirecting stdout to a file creates a file that is valid to specify with the -s option on the command line or to select in the wizard (it will even compile successfully in the wizard), or to specify with the programFileName property in the display description (selectable Chime displays are described in XML). Once you have a file with the required placeholders, turning on automatic drilldown in the wizard is as simple as clicking a check box in the Add Drilldown Support screen:

Wizard auto drilldown checkbox

Drilldown menu items can still be specified explicitly in this screen, and where they do not override an automatic option, they will appear in addition to the automatic menu items.

The Callouts screen in the Initial Displays category now supports drilldown (along with many other displays in that category that already supported it).

Chime main window

Here's one that's fun to demo: Select rwtop ... from the DTraceToolkit category (another display that supports drilldown). Notice that the drilldown options change depending on whether you start the display with the by User or by Zone button selected. Although by Project is not included in the popup menu, you can specify it manually with the Specify Drilldown ... option by entering curpsinfo->pr_projid in the input dialog. The ability to specify the drilldown variable on the fly should occasionally come in handy.

Friday Dec 19, 2008

Chime Python Category

I just added a Python category to Chime, in addition to the Ruby category added earlier.

list of Python displays in main Chime window

A PHP category is coming soon. These are all scripts from the DTraceToolkit. Chime has gotten pretty flexible in terms of supporting script options, so I was able to merge the py_cpudist.d and py_cputime.d scripts into py_calldist.d and py_calltime.d, which differ only in measuring on-CPU time instead of elapsed time.

dialog prompting for elapsed time or on-CPU time

The prompt defaults to elapsed time. Here's a trace of a Mercurial command:

py_calltime display

It shows time in two ways side-by-side: the entire time from entry to return including time spent in other functions, and time in the current stack frame excluding time spent calling other functions. Here are some other displays showing total bytes allocated alongside a malloc() size distribution, by function and by stack trace.

py_malloc display py_mallocstk display

The tagged stack traces are viewable by accessing a right-click popup menu on one or more selected rows.

selected stack traces side by side in the stack viewer

So have fun tracing your Python apps! Chime is downloadable here.

Tuesday Dec 16, 2008

Chime Data Scrolling

As I was scrolling through output of the dtrace command recently, it struck me that the command line allowed me to do something I could not do with Chime: look back at recent data. It bothered me to see Chime at a disadvantage in terms of visibility, where it actually tries to excel over the command line.

To remedy this, I added controls to provide the equivalent of scrolling in a command terminal. Since Chime already has a scrollbar to view whatever part of the current aggregate snapshot doesn't fit in the window, it didn't make sense for a scrollbar to access previous snapshots as well. Instead I added forward and back buttons next to the pause button in the lower right corner.

System Calls with back and pause buttons enabled and forward button disabled

The forward button is disabled because we're normally looking at the most recent aggregate snapshot. I replaced the text of the pause button with an icon so that the button no longer changes width when switching the text from "Pause" to "Resume" and back. Changing the button width shifts the location of the forward and back buttons by the difference in width, whereas switching same-size icons leaves the buttons in place and also looks cleaner.

Clicking the back button automatically pauses the display and steps it to the previous aggregate snapshot.

System Calls with mouse pointer over back button and back, forward, and pause buttons enabled

Clicking the back button repeatedly moves the spark backward on the sparkline until reaching the oldest snapshot in the recent data cache.

System Calls with mouse pointer over disabled back button; forward and pause buttons enabled

I chose ten as the default limit of the recent data cache, since I didn't want to consume a lot of memory, and the most recent snapshots are usually all you want to see. If you want to be able to scroll back farther, you can set the number of steps in the recent data cache using the Options menu.

Main Chime Window with Set Recent Data Count ... selected in Options menu Set Recent Data Count dialog launched by menu item

While the display is paused, you can step forward and back through recent aggregate snapshots in a temporary copy of the cache. The actual cache continues to add new snapshots at the front and roll old snapshots off the back as data comes in. To resume the display of current data, click the pause/resume button (displaying the resume icon while paused).

Another convenience of the command line is that in the case of data drops, you can easily change option values and re-run the command. That convenience is next on my list of things to add to Chime.

Get the latest version of Chime (1.4.39) to try out the forward and back buttons (link here). The latest version also includes a large number of scripts from the DTraceToolkit, including a new Ruby category.

Friday Aug 08, 2008

Netbeans DTrace GUI plugin

A new version (0.4) of the Netbeans DTrace GUI plugin is ready for download from the Netbeans Update Center. This version includes Chime alongside the DTraceToolkit.

Here we see the vmbypid.d script on the right, and output at the bottom, obtained by clicking the green "Run" arrow, waiting a few seconds, then clicking the red stop button (now grayed out) in the bottom left margin. This is exactly the kind of script that's perfect for displaying in Chime. Now, switching to Chime is just a click away:

I blogged a while ago about integrating scripts from the toolkit with Chime (here and here), so I won't go into detail about that now. It's pretty simple to copy the script into the Chime wizard and try it out:

I added a new vmbypid display and put the helpful comments from the Toolkit script on display in the Description pane (above).

This is a little nicer to read than the Toolkit output, has totals, and refreshes every second automatically, rather than requiring you to decide blindly when to stop.

It might be nice if the graph could appear in the Netbeans "Output" window at the bottom (for consistency with the Toolkit output). Possibly that's one way the Chime integration with the Netbeans DTrace GUI could be taken to the next level, but it's nice to be able to see mutiple displays side-by-side the way things are now. Double-clicking vmbypid in the list of Chime "Traces" opens a popup displaying the DTrace program rather than launching the display, as the standalone Chime does, in order for Chime to be more consistent with its Toolkit neighbor. A more thorough integration will display the D program on the right side of the Netbeans window, as happens when double-clicking a script in the Toolkit. A green arrow for Chime's "Run" button (like that of the Toolkit) would also be nice. There are probably other ways the integration could be improved, like merging Chime's menu bar into the Netbeans menu bar, but this is an excellent beginning, and it's nice to see Chime and the DTraceToolkit side by side.

Nasser Nouri wrote up some nice instructions on how to get the plugin and run it. It was very easy, and I was impressed by how little time it took. I did run into a little gotcha: In the Toolkit, my view of the vmbypid.d script did not agree with the output (the column headers in the dtrace:::END printf() differed). Nasser explained to me that the script was left over from a previous installation, and that I needed to remove the old version of DTraceScripts in my home directory and reinstall the plugin. That fixed the problem for me.

The Netbeans DTrace GUI works with Sun Studio IDE Express releases and Netbeans IDE 6.0, 6.1, and 6.5. A DTrace Editor is also planned that will enable code completion, syntax checking, and code folding.

Tuesday Oct 16, 2007

Chime In Your App

Several people have asked me if there is a way to drop a Chime display into their own application. For example, a Chime plugin for Netbeans might be nice (code, anyone?). It's not hard to drop a Chime display into any Java app. For those who want to try, here's an example: [ view]

To compile and run the example (on Solaris Nevada build 35 or later), first install Chime, then enter the following commands (syscall.xml comes installed with Chime, although you can specify any valid display description):

; javac -cp /opt/OSOL0chime/lib/java/chime.jar:/usr/share/lib/java/dtrace.jar  \\
; java -cp .:/opt/OSOL0chime/lib/java/chime.jar:/usr/share/lib/java/dtrace.jar \\
TestChime /opt/OSOL0chime/displays/syscall.xml

TestChime app

Voila! This test app creates its own frame and drops a Chime display into it. Here is the significant code that generates the display and adds it to panel p:

    DisplayDescription d = AggregationDisplay.getDescriptionFromFile(file);
    try {
    } catch (Exception e) {

    Command c = AggregationDisplay.createCommand(d);
    final AggregationDisplay display = new AggregationDisplay(c, d);
    p.add(display.getDisplay(), BorderLayout.CENTER);

As in the example, you can provide your own controls that are more suitable for your app than the "Pause" button and "Interval in seconds" spinner that appear at the bottom of a Chime display. Here are the AggregationDisplay instance methods to do that:

    setPaused(boolean paused)
    boolean isPaused()
    setIntervalMillis(long millis)
    long getIntervalMillis()
    java.util.List <JMenuItem> menuItems()

The last method supplies the items for Chime's right-click popup menu. Chime interprets a null list element as a menu item separator, so your code needs to handle null list elements. From each JMenuItem, you can call getAction() to supply the action for the control you want to provide and getName() to label that control. Anything beyond that you can figure out from the Chime source code, starting with (which implements all of Chime's command-line options).

Many of Chime's menu items invoke actions that run in a new window. For example, drilldown displays, plots over time, and Chime's DTrace program viewer all run in a new window. Chime also prompts for DTrace macro arguments in a dialog box:

Chime macro arguments prompt

Currently (as of Chime version 1.4.29), if you need these features to run within your application frame, there is no easy way to do it: you will need to work with the Chime source code. If you do, please share your work to help the Chime project!

Tuesday Oct 09, 2007

Chime Sparklines

Continuing in the same vein as my previous blog entry, Chime Non-Repeating Keys, I've added another option for generating Chime displays on the command line.

    [-k add a sparkline to the first plottable data column]

Chime has had the ability to display sparklines since its initial OpenSolaris release, as demonstrated by the System Calls display that comes installed with Chime:

System Calls sparklines

The History column provides a view of the System Calls Per Second data over time. It scales from zero to max for each executable. Because the series uses a different vertical scale for each plot it cannot be compared visually as one might expect. If the same vertical scale was used for all executables, the plots would flatten out such that individual variation would no longer be visible. Instead, relative vertical scale is indicated by the cyan line, which is positioned farther to the right the higher the maximum value of the aggregation key. If you click on the History column header, it will sort the column in descending order by maximum value (cyan indicators farthest to the right will move to the top of the table).

The tooltip displays
  1. how much time has elapsed since the first non-zero data was plotted
  2. the maximum value defining the vertical scale of the plot
The tooltip 21 seconds: min 0 .. max 11,814 appeared when resting the mouse pointer (not shown) over the Xorg sparkline; a similar tooltip will appear as the mouse pointer moves across other sparklines.

To generate such a display directly from the command line:

% /opt/OSOL0chime/bin/chime -kTn 'syscall:::entry { @[execname] = count(); }'

Chime sparklines generated with -k option

The -k option adds the History column displaying the values from the @ aggregation over time. The -T option adds a total row. To change the labels in the title bar and column headers to match the System Calls display, enter the following command instead:

% /opt/OSOL0chime/bin/chime -kTn 'syscall:::entry \\
{ @[execname] = count(); }' -t "System Calls" \\
-h "Executable, History, System Calls Per Second"

To add thousands separators and change total units from "keys" to "executables", you will need to set properties in the wizard. Add -w to the previous command to save the display, then reopen it in the wizard with -W:

% /opt/OSOL0chime/bin/chime -kTn 'syscall:::entry \\
{ @[execname] = count(); }' -t "System Calls" \\
-h "Executable, History, System Calls Per Second" -w
Wrote /opt/OSOL0chime/displays/new/system_calls.xml
To run, enter "/opt/OSOL0chime/bin/chime -C /opt/OSOL0chime/displays/new/system_calls.xml".
% /opt/OSOL0chime/bin/chime -C \\ 
/opt/OSOL0chime/displays/new/system_calls.xml -W

Chime wizard

Click Next until you reach step 4. Specify Columns.

Chime wizard, step 4. Specify Columns

Select the "Executable" column and click the Specify >> button to replace "keys" with "executables". Next, select the "System Calls Per Second" column and click the Specify >> button. Check the Value Per Second check box and select #,##0 from the Format list to request thousands separators. Click Next until you reach step 5. Test Run the Display, then click the Run Display button:

New System Calls Test Run

Close the display and click the Finish button. The next time you run Chime without any options, the finished "System Calls" display will be selectable from the "New Displays" Trace Group in the main window.

Sparklines in Chime rely on a custom renderer that you can get from the Chime source code (see src/org/opensolaris/chime/swingx/ Chime uses JFreeChart to provide larger plots of selected values. For example, in the display just created, select Xorg, Java, and mixer_applet2 by holding down the Ctrl key while left-clicking the desired values in the table.

Multiple row selection

If values are moving targets because the current sort keeps changing the row under the mouse pointer, click the Pause button first (the button label will change to Resume; click the button again when you're ready). Right click anywhere on the selection to bring up the popup menu and select Plot Over Time:

Multiple row selection

Values already accumulated in the selected sparklines are displayed in the line graph that appears. The plot accumulates new data (scrolling to the left) after the initial capture as long as the original table remains open.

Friday Sep 28, 2007

Chime Non-Repeating Keys

Someone recently asked me if there was a way to make DTrace display non-repeating aggregation keys. He wanted something like this:

% dtrace -qn 'syscall:::entry { @[execname, probefunc] = count(); }' -n \\
'tick-1sec { printf("\\n%25s %25s %12s\\n", "Executable", "Function", \\
"Count"); printf("%25s %25s %12s\\n", "----------", "--------", "-----"); \\
printa("%25s %25s %@12d\\n", @); }' -xaggsortkey

               Executable                  Function        Count
               ----------                  --------        -----
                     Xorg             clock_gettime           18
                                        lwp_sigmask            4
                                            pollsys            9
                                               read            8
                                         setcontext            4
                                          setitimer            8
                                             writev            1
                   dtrace                       brk            6
                                              ioctl           77
                                           lwp_park            1
                                               mmap            1
                                           p_online           64
                                           schedctl            1
                                          sigaction            4
                                          sysconfig            3
              firefox-bin                  lwp_park            8
         gnome-netstatus-                     ioctl           11
                                            pollsys            5
                                               read            2
                                              write            2
           gnome-terminal                     ioctl            4
                                            pollsys            4
                                              write            2
         gnome-vfs-daemon                   pollsys            1
                                             stat64            2
                    inetd                  lwp_park            1
                     java                     ioctl           22
                                      lwp_cond_wait            2
                                            pollsys           45
                                              yield            9
            mixer_applet2                     ioctl           27
                                            pollsys           17

I told him I didn't think so. (It might make a nice option for dtrace(1M).) I also told him that Chime can display non-repeating keys. I wanted to demonstrate using his DTrace script, but I realized he would have to sit patiently while I opened the wizard and fiddled with display properties, so I didn't bother. Later it occurred to me that a command line option for non-repeating keys would have made the demonstration simple. Hence -g (for group):

% /opt/OSOL0chime/bin/chime -n 'syscall:::entry { \\
@[execname, probefunc] = count(); }' -g

Chime non-repeating keys

I was pretty happy about that. For one thing, the wizard makes non-repeating keys a little tricky to specify, since you have to list two columns, one for the data and another for a non-repeating view of that data. The -g option does this for you, giving you one less thing to worry about in the wizard later on if you do decide to make further improvements.

In case you want lines to separate your non-repeating groups, you can use -G instead of -g:

% /opt/OSOL0chime/bin/chime -n 'syscall:::entry { \\
@[execname, probefunc] = count(); }' -G

Chime key separator lines

Anything you specify on the command line can be saved using -w. So the more you can specify directly on the command line (thanks to new options like -g and -G), the more -w can do for you. Other things you can specify on the command line include the title and column headers (with -t and -h respectively); see Chime's Command Line page for details. For this example just save the display with non-repeating keys:

% /opt/OSOL0chime/bin/chime -n 'syscall:::entry { \\
@[execname, probefunc] = count(); }' -gw
Wrote /opt/OSOL0chime/displays/new/display.xml
To run, enter "/opt/OSOL0chime/bin/chime -C

Then you can tweak things in the wizard:

/opt/OSOL0chime/bin/chime -C /opt/OSOL0chime/displays/new/display.xml -W

The -C option (to specify a saved Chime display) works in combination with the -W option (for Wizard) on or after version 1.4.28 (29 Sep 2007). This lets you cut and paste from the "To run..." message in the command output above and simply tack on a -W. Before 1.4.29 the wizard came up in its initial state, requiring you to find the display with the Browse ... button.

Chime Wizard

Pretty convenient, so I thought why stop there? It might be nice if a total row could be specified just as easily. (All these things you can do in the wizard, but you can't beat the command line for instant gratification.) The result is the -T option, which generates a total for every column in your display:

% /opt/OSOL0chime/bin/chime -n 'syscall:::entry { \\
@[execname, probefunc] = count(); }' -gT

Chime Totals

If you want nicer labels like "23 executables" rather than the generic "23 keys", you can specify that in the wizard, but for a one-off, it's nice just to have the numbers without any fuss.

Again, you can rerun the command with -w to save the display, then tweak it in the wizard with -W. If you click the Next button in the wizard, you'll find your DTrace program already specified in step 2. If you continue to step 4, you can see the hidden column and the non-repeating view of the execname:

Chime Wizard Columns

The green check marks indicate that all the required column properties are already specified (although you might wish to specify others, such as alignment, thousands separator, etc). For more information about what you can do in the wizard, see Chime's New Display Wizard page.

Thursday May 31, 2007

Chime at JavaOne 2007

Steven Reynolds demonstrated Chime at JavaOne this month, and he posted his presentation slides and Chime displays here. His presentation, Performance Statistics of Java(TM) Applications Using DTrace and Chime, focuses on the built-in hotspot provider in Java 6 and gives a good dissection of a D program clause using the hotspot$target:::thread-start probe for people new to DTrace. The presentation also includes a helpful "Tips" section for both DTrace and Chime, where he points out some of the bumps in the road that he encountered, such as the printa action's interference with Chime, the need for the zdefs option (-Z), and the fact that Chime is installed in a directory named /opt/OSOL0chime with two Oh's and one zero (by the way, that convention was not my idea :-)

The displays Steven created also include helpful documentation that appears in Chime's Description pane, such as notes on the use of the java -XX:+ExtendedDTraceProbes option and jinfo. I posted screenshots of his displays here. These new displays by Steven highlight Chime's ability to display aggregation data from arbitrary DTrace programs. I encourage you to look at his presentation then try creating some displays of your own (which you are welcome to share).

Thanks Steven! I hope to see you at JavaOne again next year!

On the subject of tracing Java apps, Adam Leventhal and Jarod Jenson also presented Cranking Up Java Application Performance with DTrace at this month's JavaOne. They also demonstrated the hotspot provider, and they gave a sneak preview of the forthcoming Java Statically-Defined Tracing (JSDT) (see Adam's blog). Also on that subject, check out Jarod Jensen's article, DTrace and Java: Exposing Performance Problems That Once Were Hidden.

Friday Feb 23, 2007

Chime Webcast

Today, a webcast of Chime made the "This Week..." section at the top of the Solaris Developer Center page. It also appeared at the top of the Solaris Developer Web/Video casts, and you can view the webcast here (22 min 28 sec).

The demo covers many features of Chime, a visualization tool for DTrace, including a recent feature that lets you generate displays dynamically from the command line using -n, -s, -x, and -Z options of the dtrace(1M) command and save them with the -w option. Saved displays can be further improved using a wizard.

Some comments on the demo:
  • Screen space was pretty constrained because of what fits in the viewer. I did the best I could.
  • The total row seen in the "Interrupt Statistics" display is even better than what I showed because it never scrolls out of view. I forgot to demonstrate that by changing the height of the window.
  • Alert viewers may notice a few minor bugs that I fixed in version 1.4.22 (08 Feb 2007) shortly after the demo. Also, the tooltips that occasionally appear over list items have been improved so that they appear only when needed, when the item does not fit completely in the view pane. (Tooltips over table data in Chime have always had this optimized behavior.)
If you're patient and watch to the end, you'll see the only example currently in existence of how to enable drilldown in displays you create. Drilldown depends on program placeholders, which you'll see in the "Set DTrace Program" step of the wizard. Regarding the syntax of these placeholders (more completely described here), I decided to err on the side of excessively wacky in order to avoid any conflict with D syntax. Chime expands these placeholders before the program is compiled by DTrace. As demonstrated, there are right-click menu items to simplify the task of inserting these placeholders in your program. The decision tree in the "Add Drilldown Support" screen can be as complex as you want to better answer questions prompted by the display.

Well, enough fussing from me. I hope you enjoy the webcast! :-)

Wednesday Jan 24, 2007

Chime and the DTraceToolkit, Part 2

In my previous entry, Chime and the DTraceToolkit, I demonstrated how to adapt a script from the DTraceToolkit for display in Chime (including screenshots). I chose a simple script, bitesize.d, and deferred the investigation of how easily Chime could run more complicated scripts from the toolkit.

Before I pick up that investigation again (this time using procsystime), I want to highlight a point from the previous entry that may have gone unnoticed in the larger discussion. The point is that Chime can display DTrace programs directly from the command line, provided that they satisfy a few simple requirements. You can try this with many of the OneLiners from Brendan Gregg's DTrace Tools page, for example:

# Read bytes by process,
dtrace -n 'sysinfo:::readch { @bytes[execname] = sum(arg0); }'

To run with Chime, simply change dtrace to chime:

/opt/OSOL0chime/bin/chime -n 'sysinfo:::readch { @bytes[execname] = sum(arg0); }'

You can use -n to specify a program string and -s to specify a program file, just as you would with dtrace(1M). Similarly, you can specify DTrace options using -xoption=value and -Z. The column headers are taken from identifiers in your D program unless you specify headers with the -h option. You can specify a title for the title bar other than "Display" using the -t option. The previous entry has more detail on how to improve the display (and how to save it with the -w option).

Adapting a Shell Script

Most of the DTrace programs used in the toolkit are wrapped in shell scripts that offer extensive options. The iosnoop script, for example, has the following options:

# USAGE: 	iosnoop [-a|-A|-DeghiNostv] [-d device] [-f filename] 
#			[-m mount_point] [-n name] [-p PID]

Since the DTrace API cannot compile shell scripts, Chime cannot run them as-is. The DTrace program first needs to be removed from its wrapper. What I found is that the various "snoop" scripts (iosnoop, execsnoop, opensnoop, etc.) are not good candidates for adaptation to Chime because they do not use aggregations and instead rely on printf(). In a snoop, every probe firing is appended to the output. In Chime, that would translate into an event-driven display rather than a regular sampling of the aggregate. The Java DTrace API provides a ConsumerListener interface to listen for ProbeData in a DataEvent; the formatted elements of a DTrace printf() statement can thus be obtained from the getRecords() method of a PrintfRecord included in the probe data. The point is that it's certainly possible to add an event-driven display module to Chime, but currently Chime does not support printf(). Often it makes sense to change a script that relies on printf() so that it aggregates its data instead, but I don't think that's the case with any of the snoop scripts.

If the shell script wraps a DTrace program that does use aggregations, you still have to look at each shell option one by one and decide how it will be supported in Chime. As an example, I've done this with procsystime, a script that supports the following options:

# USAGE:	procsystime [-acehoT] [ -p PID | -n name | command ]
#		-p PID          # examine this PID
#		-n name         # examine this process name
#		-a              # print all details
#		-c              # print syscall counts
#		-e              # print elapsed times
#		-o              # print CPU times
#		-T              # print totals

I decided that the -a, -c, -e, and -o options could all be dropped, since there is no reason for Chime not to display all the fields at once. When running procsystime -a on the command line, syscall counts, elapsed times, and CPU times are displayed in separate output blocks stacked vertically, so that only syscall counts are visible without scrolling up to see the aggregated times. Chime will improve on this by placing the three value types side-by-side in a row for each process, with column headers that never scroll out of view.

Similarly, I decided to drop the -T option, since I saw no reason not to display totals every time.

That left the three mutually exclusive options: [ -p PID | -n name | command ]. Without a shell wrapper, the only way to support these in Chime is with macro arguments for the -p and -n options, and target process for the command option. The basic idea is to replace shell script option assignments with macro argument assignments. So, from procsystime:

  \* Command line arguments
 inline int OPT_elapsed  = '$opt_elapsed';
 inline int OPT_cpu      = '$opt_cpu';
 inline int OPT_counts   = '$opt_counts';
 inline int OPT_filter   = '$opt_filter';
 inline int OPT_pid      = '$opt_pid';
 inline int OPT_name     = '$opt_name';
 inline int OPT_totals   = '$opt_totals';
 inline int OPT_command  = '$opt_command';
 inline int PID          = '$pid';
 inline string NAME      = "'$pname'";
 inline string COMMAND   = "'$command'";

we change the assignments as follows in procsystime.d as it will be used by Chime:

inline string NAME = $$1;
inline int PID = $2;
inline int OPT_pid = (PID > 0);
inline int OPT_name = (NAME != "");
inline int OPT_command = ($target > 0);
inline int OPT_filter = (OPT_pid || OPT_name || OPT_command);

The -a, -c, -e, -o, and -T (totals) options are dropped. I chose not to tab-align the assignment operators, since the alignment does not display correctly in Chime's program viewer (in spite of monospace font). I think that inlining the cacheable expressions should be more efficient than assigning them to non-cacheable global variables in the BEGIN clause, but I may be missing a more efficient way to do this.

Differences such as the removal of the END clause along with its printa() statements were already explained in the previous entry; I won't repeat them here. The unwrapped program is now ready for display in Chime:

/opt/OSOL0chime/bin/chime -c 'date' -s procsystime.d -xdefaultargs \\
-h "System Call, Count, Elapsed Time, CPU Time" -t "Process System Call Time"

The above command generates a draft display (I could have omitted the -h and -t options for this first cut). The most important part of the command to note here is the -xdefaultargs option. This sets the "defaultargs" DTrace compile-time option, which tells DTrace to use zero (0) or empty string ("") as the value for unspecified macro args. Since PID, NAME, and COMMAND are mutually exclusive, we need to leave all but one unspecified. Without -xdefaultargs, that would fail with the following error:

failed to compile script /opt/OSOL0chime/displays/new/procsystime.d: line 44: macro argument $$1 is not defined

Specifying Macro Arguments

Chime reads the macro variables in your program and generates a dialog to prompt for the argument values:

You can specify labels for the macro variables using the -M option, potentially sparing people the effort of inspecting your program to decide what kind of values are expected:

/opt/OSOL0chime/bin/chime -c 'date' -s procsystime.d -xdefaultargs \\
-h "System Call, Count, Elapsed Time, CPU Time" -t "Process System Call Time" \\
-M "Process Name, Process ID"

To enforce the mutual exclusivity of these options, surround the macro argument labels after -M with square brackets:

/opt/OSOL0chime/bin/chime -c 'date' -s procsystime.d -xdefaultargs \\
-h "System Call, Count, Elapsed Time, CPU Time" -t "Process System Call Time" \\
-M "[Process Name, Process ID, \\$target]"

In order to group all three options, it is necessary to include the special string "$target" (escaping the dollar-sign), which indicates the order of the Target Process field relative to the named macro arguments. If the $target variable appears in your DTrace program, and you omit "$target" after -M, the Target Process appears first by default. The above command instead places it last (just for demonstration purposes):

Now it is only possible to specify one of the three options, as intended in the original procsystime script. The first field (in the order specified by -M) is selected initially. The remaining unselected fields are disabled.

It is possible to bracket multiple groups of argument labels after -M, resulting in multiple radio button groups. The first label in each group is selected by default. Without the square brackets, Chime will not generate radio buttons, allowing any or all of the arguments to be specified. (Macro argument grouping can also be done in Chime's display creation wizard.)

Without the -xdefaultargs option, Chime will require all options to be specified before enabling the OK button.

If the macro argument labels are not enough to make clear what is expected, a person can still inspect the DTrace program by clicking the View D Program button:

The procsystime Display

After selecting Target Process in the argument prompt and clicking OK (accepting the "date" value already specified with -c on the command line), the following display appears:

This is a rough first cut, with many opportunities for improvement:
  • Add a totals row (to satisfy the original -T option)
  • Link the ranges of the Elapsed Time and CPU Time columns so they are visually comparable
  • Add unit labels to the nanosecond time values
  • Use a different color for the Count column, to differentiate it from the time columns
  • Set the initial sort to Elapsed Time descending
  • Make the window a little wider
As with bitesize.d, we'll also want to accumulate values over time, rather than clearing them after each time interval. These changes are not yet supported from the command line, so we need to save what we've done so far by repeating the command with the -w option:

; /opt/OSOL0chime/bin/chime -c 'date' -s procsystime.d -xdefaultargs \\
-h "System Call, Count, Elapsed Time, CPU Time" -t "Process System Call Time" \\
-M "[Process Name, Process ID, \\$target]" -w
Wrote /opt/OSOL0chime/displays/new/process_system_call_time.xml
To run, enter "/opt/OSOL0chime/bin/chime -C /opt/OSOL0chime/displays/new/process_system_call_time.xml".

This writes the .xml display description that we can edit using Chime's display wizard. Both the previous entry and the New Display Wizard page give examples of how to make these kinds of changes, so I won't go through the steps here. Instead I'll skip to the final display after all the changes are finished:

I also took some text from the comments in the procsystime script to put in the Description pane when the new display is selected:

The description helps people understand what they're looking at and remains visible while the display is running as long as the new display is selected.

To try the procsystime display for yourself, add the following files to /opt/OSOL0chime/displays/new: Then run /opt/OSOL0chime/bin/chime without any options and select New Displays from the Trace Group pulldown. Finally, double-click Process System Call Time ... from the Traces list. Macro argument grouping (for mutually exclusive options) is a new feature in Chime version 1.4.20 (16 Jan 2007), so if you don't already have that version, you can download it here.

Thursday Jan 04, 2007

Chime and the DTraceToolkit

A while back someone suggested in the "Chime Comments/Questions" thread on the DTrace discussion list

"Maybe we can all chip in money and beer (or pop if he doesn't drink alcohol) to get Brendan to integrate his toolkit with chime. :)"

I've been meaning to look into this myself. Chime is a graphical tool for displaying DTrace aggregations. Its ability to sort by multiple columns, display column totals, and plot aggregation values over time might provide useful advantages over command-line output for many of the scripts in the toolkit. I wanted to see if the idea of integrating the toolkit is feasible.

Generating a Display from the Toolkit

First, I downloaded and installed the DTraceToolkit. Next I scanned Brendan's DTrace Tools page for a simple script and found bitesize.d:

"a simple program to examine the way in which processes use the disks ..."

This sounded like a good one to start with, so I located bitesize.d in the toolkit:

; find . -name bitesize.d

I diff'd the files and found that they are identical, so I ran the one in Bin to get an idea of what Chime ought to display:

; ./Bin/bitesize.d
Tracing... Hit Ctrl-C to end.

     PID  CMD


Nothing. Of course, I needed to generate some I/O in order to get any data. So I ran it again, and this time I wrote some bytes to a file in another window:

; echo "cat dog mouse" > tmp.txt

then I pressed Ctrl-C:

; ./Bin/bitesize.d
Tracing... Hit Ctrl-C to end.

     PID  CMD
       3  fsflush\\0

           value  ------------- Distribution ------------- count
             256 |                                         0
             512 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1
            1024 |                                         0

       0  sched\\0

           value  ------------- Distribution ------------- count
            2048 |                                         0
            4096 |@@@@@@@@@@@@@@@@@@@@                     1
            8192 |@@@@@@@@@@@@@@@@@@@@                     1
           16384 |                                         0


Without even looking at the script, I could see that it's a distribution of I/O size in bytes aggregated by process ID and command. Chime has -n and -s options like those in dtrace(1M) for running arbitrary DTrace programs, so I used the -s option to specify the program file:

; /opt/OSOL0chime/bin/chime -s ./Bin/bitesize.d
WARNING: printa on line 79 makes aggregation @Size unavailable to Chime.

An empty Chime window accompanied the console warning. No problem. This issue with printa() is covered on the Chime project page under Adding New Chime Displays (a link in the left margin) in the section Adapting an Existing DTrace Program, item 2:

If you use the printa() action on an aggregation, that aggregation will thereafter be unavailable to Chime. Typically, printa() is found in a profile probe such as tick-1sec. You should delete these tick clauses from your program. Instead of printing your aggregations, you are now relying on Chime to display them.

So I made a copy of bitesize.d with the following changes:

; diff ./Bin/bitesize.d ~/bitesize.d
>  \* 02-Jan-2007        Tom Erickson    Adapted for use in Chime.
< #pragma D option quiet
<  \* Print header
<  \*/
< dtrace:::BEGIN
< {
<       printf("Tracing... Hit Ctrl-C to end.\\n");
< }
< /\*
< }
< /\*
<  \* Print final report
<  \*/
< dtrace:::END
< {
<       printf("\\n%8s  %s\\n", "PID", "CMD");
<       printa("%8d  %S\\n%@d\\n", @Size);

There is no need to print anything in the BEGIN and END clauses, so I deleted them. Chime will display the aggregated distribution on a regular interval. That interval will default to 1 second, but Chime lets you change the interval while the display is running. (It's also possible to set the initial interval by specifying the aggrate option in the display's XML description.)

It would be nice if Chime could run bitesize.d without modifications. If there was a DTrace option to ignore specified actions, Chime could tell DTrace to ignore both printa() and clear(), but for now you need to delete these actions from existing DTrace programs.

So I tried again with the modified bitesize.d:

; /opt/OSOL0chime/bin/chime -s ~/bitesize.d

empty Chime display of bitesize.d

Although the console warning about printa() was gone, I still got the same empty Chime window. Of course the reason is that I need to generate some I/O in order to get any data (just as I do when running the script on the command line). Yes, Chime should print a message to that effect on the console. I'll add that soon. (When a colleague of mine is asked for work status he responds, "It's perfect and getting better every day."). Fortunately Chime has the option to wait a given number of seconds before generating the display, just for cases like this, giving time to generate I/O in another window:

; /opt/OSOL0chime/bin/chime -s ~/bitesize.d -S 10

By default, Chime waits one second before generating the display. Waiting ten seconds lets us run the following in another window, as before:

; echo "cat dog mouse" > tmp.txt

As I write this, I realize that a better solution might be to drop the -S option and instead make Chime wait as long as it needs for non-empty aggregations before generating the display (although it would need some way to know which aggregations to wait for); but for now, a ten second countdown on the console gets the job done. This time a working display appeared:

Generated display of bitesize.d

The generated column names are taken from the DTrace program (except for "bucket" over the I/O size buckets, which lack a name within the program). Also the default title "Display" in the title bar isn't very helpful. Still this is a good start, and we can easily improve on it:

; /opt/OSOL0chime/bin/chime -s ~/bitesize.d -S 10 -h "PID,Command,Bytes,Count" \\
-t "I/O Size Distribution"

Generated display of bitesize.d

This looked good, so I ran it once more with the -w option to write the .xml display description:

; /opt/OSOL0chime/bin/chime -s ~/bitesize.d -S 10 -h "PID,Command,Bytes,Count" \\
-t "I/O Size Distribution" -w
Wrote /opt/OSOL0chime/displays/new/bitesize.d
Wrote /opt/OSOL0chime/displays/new/i_o_size_distribution.xml
To run, enter "/opt/OSOL0chime/bin/chime -C /opt/OSOL0chime/displays/new/i_o_size_distribution.xml".

Chime prints a message telling you how to run the generated display. The -S option to delay a given number of seconds will no longer be necessary, since the display is already generated (it no longer needs data to generate columns and column headers) and will, from now on, run successfully even before any data appears.


Chime also borrows -Z and -x from dtrace(1M) to specify DTrace options for the generated display. For example, if the display was failing due to drops, you could add -xaggsize=4m to increase the size of the aggregation buffer (from Chime's default 256 KB). If you wanted to change the initial interval from one second to five seconds, you would specify -xaggrate=5s.

To make further changes to the display not supported on the command line, you can edit the XML description in Chime's New Display Wizard. For example, bitesize.d does not use a tick-1sec clause to print data on a regular interval, but instead prints the accumulated data once at the very end. This probably indicates that a display of running totals would more closely match the intention of the script than a display of values per time interval. To access the wizard, run Chime without any options and click the toolbar icon to invoke the New Display Wizard icon in the toolbar:

New Display Wizard

Click the "Browse ..." button and select i_o_size_distribution.xml to load the generated display description. Then click "Next" at the bottom of the wizard twice to reach the "Set Cleared Aggregations" step:

New Display Wizard

Click "Clear only selected aggregations" and leave the "@Size" aggregation unchecked. This will result in running totals, since the aggregation values will never get reset to zero. Then click "Next" at the bottom of the wizard three more times to reach the "Provide a Description" step:

New Display Wizard

The above description is taken (mostly unchanged) from the comments in the original bitesize.d file. Enter the description and click "Finish".

To run the display, first select "New Displays" from the "Trace Group" pulldown:

New Display Wizard

Select "I/O Size Distribution" (the display title specified with -t on the command line) to view its description:

New Display Wizard

Double-click "I/O Size Distribution" to run the display:

New Display Wizard

Now the counts grow over time, just like they do when running the script on the command line, except that we can see them grow instead of blindly picking the instant for the final viewable result (the Chime display can also be paused at any time).

One script from the DTraceToolkit is now integrated with Chime. Granted, bitesize.d is probably the simplest script in the toolkit (other than the one-liners, which you can try as-is with Chime's -n option, except for a few that don't use aggregations; for example /opt/OSOL0chime/bin/chime -n 'sysinfo:::readch { @bytes[execname] = sum(arg0); }'). The other scripts I looked at will not integrate so easily, since they are shell scripts with extensive options and many do not use aggregations. I'm thinking that some will need custom visualization tools to do them justice. I still think that many of them can be adapted in a way that preserves the original intention and increases usefulness, but the process will be more involved (for example, deciding if changing a script to use aggregations defeats the original performance considerations, or deciding which options are supportable by equivalent features in Chime, or which options justify multiple displays, etc.). Still I hope that this bitesize.d example gives a good idea of how to go about integrating an existing script with Chime.

After integrating a few more scripts from the toolkit, the next thing to do will be to create a separate "DTraceToolkit" subdirectory for them under /opt/OSOL0chime/displays, move the appropriate .xml and .d files from /opt/OSOL0chime/displays/new to the new subdirectory, and add a description.xml file to summarize the directory contents. Thereafter the group of displays can be selected from the "Trace Group" pulldown and a description of the group will appear in the "Description" pane. If many scripts from the toolkit are integrated, they can be organized in a subtree under the "DTraceToolkit" directory just as they are in the toolkit itself. Chime automatically generates submenus under the "Load" menu (under "File" in the menu bar) that mirror the directory subtree under /opt/OSOL0chime/displays. The "Trace Group" pulldown provides a flat list of the trace groups most recently loaded (remembered across multiple Chime sessions).

It would be nice if Chime could manage display directories for you, but for now you still need to create the directories manually. That's on my to-do list, along with a text pane to write the directory description (you'll need to copy and modify an existing description.xml file into each new directory for now).

Let me know if there's a particular script that you'd like to see integrated. If you're interested in integrating a script on your own, excellent! You are welcome to post it on the DTrace mailing list (that is the official discussion list for the Chime project).

Saturday Apr 22, 2006

Java DTrace API

Java DTrace API

Java DTrace API

The Java DTrace API is now available on Solaris Nevada (build 35 or later). This is the first public programming interface for DTrace. (The native DTrace library implemented in C is currently private.) Bindings for other programming/scripting languages should eventually arrive on the scene as well.

The Chime project page on OpenSolaris has more information about the API, including source and examples. One example to highlight here: I thought that a Java implementation of dtrace(1M) would be a great way to demonstrate how to use the new API to access functionality associated with familiar dtrace command options. Get jdtrace and give it a try. The majority of options are supported and in most cases the output of jdtrace and dtrace(1M) should be identical. I also emulated the behavior of Ctrl-C (printing unprinted aggregations before exiting) using Runtime.getRuntime().addShutdownHook() (this Java Developer Forums thread has other suggestions for handling Ctrl-C in Java).

Primarily I use jdtrace to leverage the existing DTrace test suite (and its thousand-or-so D programs) for testing of the Java DTrace API. I hope it also serves as a source of programming examples.

I should also mention the Chime Visualization Tool for DTrace, a new project on OpenSolaris that uses the Java DTrace API to display multiple running aggregations simultaneously. Please visit the project page and share your feedback on the dtrace-discuss list.

Tuesday Jun 14, 2005

Tom Erickson's Weblog

Using libdtrace

Using libdtrace

On this occasion of the full release of OpenSolaris I thought I would share some of my experiences using libdtracelibdtrace is the library to programmatically control DTrace, and I have gained some familiarity with it in the course of developing a Java DTrace API (available soon!).  While libdtrace is still private (and therefore still subject to change), I thought it would be worthwhile to at least describe what's there and some of the issues I encountered.

I first approached libdtrace with the obvious questions: How do I use it?  What functionality is available?  How do I get data out of it?  I started with existing programs that use libdtrace.  From the source for the dtrace command (dtrace.c) I learned the natural life cycle of a DTrace consumer:
open() returns a handle needed to access libdtrace functionality  (a dtrace_hdl_t pointer, the first argument to all consumer functions).  Two compile() functions both take a D program, one as a string and the other as a file.  A single consumer may compile multiple programs, as happens when you specify multiple -n or -s options to the dtrace command.  I was at first confused by exec() and go(), which sound like the same thing, until I learned that exec() is what enables probes, instrumenting code at matching probe points.  It is called once for each compiled D program and as an extra favor gives back information about the resulting program enablings such as the number of matching probes.  go() is a non-blocking call issued once to begin tracing data from all enabled probes, writing the output of probe actions into per-consumer DTrace buffers whenever a probe fires.

stop() on second thought isn't exactly a natural stage in the life of a DTrace consumer.  A consumer dies of natural causes when it encounters the exit() action in a D program, and there is no need to call stop()stop() is a programmatic way of interrupting a consumer outside of any D program action.  Whether or not the consumer dies of natural causes, it leaves any unconsumed data available to functions like dtrace_aggregate_snap() and dtrace_aggregate_walk().  Finally, close() releases all system resources held by the consumer.

After dtrace.c, I looked at the source for the intrstat command (intrstat.c), another example of a DTrace consumer. The biggest difference between it and the dtrace command is the absence of calls to dtrace_sleep() and dtrace_work().  Modeling the Java DTrace API after intrstat was a bit of a false start, based partly on the assumption that API users were in most cases interested only in  aggregations.  It took a while before I understood the difference between these two approaches to consuming DTrace data.  In particular, I remember going back to these examples one day and wondering why dtrace.c did not call dtrace_aggregate_print() until after exiting its consumer loop:

do {
        if (!g_intr && !done)

        if (g_newline) {
                 \* Output a newline just to make the output look
                 \* slightly cleaner.  Note that we do this even in
                 \* "quiet" mode...
                g_newline = 0;

        if (done || g_intr || (g_psc != 0 && g_pslive == 0)) {
                done = 1;
                if (dtrace_stop(g_dtp) == -1)
                        dfatal("couldn't stop tracing");

        switch (dtrace_work(g_dtp, g_ofp, chew, chewrec, NULL)) {
                done = 1;
                if (!g_impatient && dtrace_errno(g_dtp) != EINTR)
                        dfatal("processing aborted");

        if (fflush(g_ofp) == EOF)
} while (!done);

if (!g_impatient) {
        if (dtrace_aggregate_print(g_dtp, g_ofp, NULL) == -1 &&
            dtrace_errno(g_dtp) != EINTR)
                dfatal("failed to print aggregations");


whereas intrstat.c was calling dtrace_aggregate_snap(), dtrace_aggregate_walk(),and dtrace_aggregate_clear() repeatedly inside of its consumer loop:

while (indefinite || iter) {

        (void) sigsuspend(&set);

        (void) dtrace_status(g_dtp);

        if (g_intr == 0)


        now = gethrtime();
        g_interval = now - last;
        last = now;

        if (dtrace_aggregate_snap(g_dtp) != 0)
                fatal("failed to add to aggregate");

        g_start = g_end = 0;

        do {
                g_header = 1;

                if (dtrace_aggregate_walk_keyvarsorted(g_dtp,
                    walk, NULL) != 0)
                        fatal("failed to sort aggregate");

                if (g_start == g_end)
        } while ((g_start = g_end) < g_max_cpus);


The answer is that the various aggregate functions act on all aggregations at once - all aggregations from all enabled D programs - regardless of the actions in the probe clauses of those D programs (a programmatic alternative to relying on D actions).  That approach is not appropriate for the dtrace command, which generates output only from D program actions until interrupted with Ctrl-C, when it prints all unconsumed aggregation data before exiting.  Consider the following D program:

        @calls[execname] = count();
        @fcalls[probefunc] = count();


Without the tick probe, this program produces no output when run with dtrace -s (except the number of matching probes) until Ctrl-C, when dtrace_aggregate_print() prints both the @calls and the @fcalls aggregations together.  With the tick probe, however, dtrace prints the @calls aggregation once every second, but waits to print the @fcalls aggregation until Ctrl-C.  In fact, it skips all but unprinted aggregations (i.e. aggregations that have not already been printed at least once as a result of the printa() action), in this case printing only the @fcalls aggregation after Ctrl-C.

The D program in intrstat.c, on the other hand, has no need of a tick probe, because intrstat relies completely on aggregate functions to process all aggregations at once:

static const char \*g_prog =
"       self->ts = vtimestamp;"
"       this->devi = (struct dev_info \*)arg0;"
"       @counts[stringof(`devnamesp[this->devi->devi_major].dn_name),"
"            this->devi->devi_instance] = count();"
"       @times[stringof(`devnamesp[this->devi->devi_major].dn_name),"
"            this->devi->devi_instance] = sum(vtimestamp - self->ts);"
"       self->ts = 0;"

This is convenient for a command that collects aggregated statistics on its own programmatic interval.  It works because intrstat does not need to support DTrace options like aggrate and switchrate (so no need for dtrace_sleep() ), and because it handles all output itself and does not need any output-producing actions in its D program.

The Java DTrace API, however, needs to support all kinds of D programs, those that use output-producing actions and those that do not.  Like dtrace (and unlike intrstat) it needs to call dtrace_work(), because that is the only way to consume the output of D actions like trace(), printf(), and printa().  It also needs to support options like aggrate and switchrate, so it uses dtrace_sleep() to regulate its consumer loop, which wakes up as often as needed to honor all D options that control consumption rates.   However, this introduces another problem, because  dtrace_work()  sends its output to a file (stdout by default), whereas the Java DTrace API must not print to a file but rather generate data instances for the user of the API and leave that user in control of how (or whether) to output that data.  For example, an application using the API might not print any text at all, but instead display the information in a graph.  Like intrstat, the Java API needs to take control of what to do with the data it consumes.

libdtrace provides for this need with dtrace_handle_buffered(), which lets a user register a function to be called whenever there is buffered output that would otherwise be written to a file.  This is similar to the callback mechanism libdtrace employs to handle data drops, runtime errors, and target process completion (of interest when running dtrace -c).  For example, in dtrace.c:

if (g_mode != DMODE_LIST) {
        if (dtrace_handle_err(g_dtp, &errhandler, NULL) == -1)
                dfatal("failed to establish error handler");

        if (dtrace_handle_drop(g_dtp, &drophandler, NULL) == -1)
                dfatal("failed to establish drop handler");

        if (dtrace_handle_proc(g_dtp, &prochandler, NULL) == -1)
                dfatal("failed to establish proc handler");

These calls register three user-defined functions (implemented in dtrace.c) to call under the appropriate circumstances: errhandler(), drophandler(), and prochandler().  When the Java DTrace API registers a similar callback for buffered output with dtrace_handle_buffered(), it has the added effect of turning off output to the usual output file.  In all of these callback registration examples, the resulting calls to the registered handler function occur in the same thread that calls dtrace_work().  Here I found something that I consider especially considerate on the part of the libdtrace developers.  The NULL argument to each of the three handler registrations (above) is passed to a parameter (unused in this case) that could instead take a pointer to a user argument that is forwarded to every handler invocation.  Anything special needed by your handler function can be provided via the user arg set when the handler is registered.  In the case of the Java API, this gives the handler access to the JNI environment, including cached classes and method IDs, needed to pass data back to interested Java listeners.

Once the Java API registers a buffered output handler, it can use dtrace_work() to get probe data printed to an in-memory buffer instead of to a file.  Now that the Java API can support output-producing D actions like trace(), printf(), and printa(), we still want it to support the convenience enjoyed by intrstat, that is, the ability to request all aggregations at once on a programmatic interval outside of any D actions.  Supporting both approaches simultaneously requires a per-consumer lock to prevent the needed dtrace_aggregate_print() from interfering with dtrace_work(). (These two approaches to data consumption are inherently asynchronous.)

While on the subject of threads, it is also possible to run multiple consumers simultaneously, as long as you protect calls to libdtrace with a single global lock.  Fortunately exempt from this requirement are the most time-consuming calls, dtrace_work() and its competing dtrace_aggregate_print(), which only need to be kept from interfering with each other.  Note that the API uses dtrace_aggregate_print() instead of dtrace_aggregate_walk() in order to leverage the buffered output handler (which otherwise does not get called during aggregate traversal).

Using the libdtrace Buffered Output Handler

The purpose of the buffered output handler (bufhandler) is to make the output of the libdtrace print routines available without writing it to a file (such as stdout).  This is needed for the stack(), ustack(), and jstack() actions to get human-readable stack values, since there is no public function in libdtrace to convert the internal representation of stack values.  It is also used to get the formatted string resulting from the D printf() action.  The mechanism anticipates other similar formatting requirements for which a public libdtrace conversion function is not suitable.

dtrace_handle_buffered() takes a handler function of type dtrace_handle_buffered_f and a user argument forwarded to each callback.  In addition to the user argument, the bufhandler takes a pointer to a dtrace_bufdata_t structure:

typedef struct dtrace_bufdata {
        dtrace_hdl_t \*dtbda_handle;             /\* handle to DTrace library \*/
        const char \*dtbda_buffered;             /\* buffered output \*/
        dtrace_probedata_t \*dtbda_probe;        /\* probe data \*/
        dtrace_recdesc_t \*dtbda_recdesc;        /\* record description \*/
        dtrace_aggdata_t \*dtbda_aggdata;        /\* aggregation data, if agg. \*/
} dtrace_bufdata_t;

In addition to the formatted output (dtbda_buffered) of the libdtrace print routines normally sent to a file, the bufdata structure contains any probedata (including the current probedata record) and any aggregation data processed while generating that output.

The bufhandler is called once for each output-producing, non-aggregating D action, such as trace() or printf(), and once for each libdtrace aggregation record (whether in response to the D printa() action or an asynchronous aggregate request).   There is one libdtrace aggregation record per tuple element plus one for the corresponding value.  The bufhandler is called with a null record at the end of each complete tuple/value pair.

Using the buffered output handler leads to a few interesting observations:
  • The buffered output handler (bufhandler) is never called with action type DTRACEACT_PRINTA.
  • The bufhandler never sees the D exit() action
  • The base probedata pointer (bufdata->dtbda_probe->dtpda_data) is overwritten during probedata consumption and needs to be cached ahead of the bufhandler callback (in the probe-handling function, see below)
The points listed above highlight the need to coordinate use of the bufhandler with two other callback functions passed to dtrace_work():
Both of these functions also take a pointer to a user arg optionally passed to dtrace_work() and forwarded to each callback.

The probe-consuming function, dtrace_consume_probe_f, is called once each time a probe fires and ahead of any calls to the record consuming function or the bufhandler (for that probe's data).  It takes a probedata argument that includes an array of records corresponding to actions and statements in that probe.  In the case of the printf() action it includes a record for each unformatted printf element.  A statement that is not a D action, such as assignment to a variable, can be distinguished from an action by the fact that it has no size (rec->dtrd_size == 0).

The record-consuming function, dtrace_consume_rec_f, is called once for each unskipped probedata record and ahead of any call to the bufhandler (resulting from output associated with that record).  It differs from the bufhandler as follows:
  1. It does not have access to libdtrace-formatted output.
  2. It is called for each D program statement, including non-actions (size in probedata buffer is zero); the bufhandler is not bothered with non-actions, since they produce no output.
  3. It is called for the D exit() action; the bufhandler is not.
  4. In response to the printa() action, it is called with a record having an action of type DTRACEACT_PRINTA. The bufhandler never sees that action type, only the output-producing aggregation records (with action types such as DTRACEACT_STACK and DTRACEAGG_COUNT).
  5. It is called with a NULL record at the end of each probedata.
I said the record handler is called for each "unskipped" record because it is called only once for the printf() action, even though there may be multiple probedata records corresponding to the unformatted elements of that printf().   The knowledge of how many probedata records to consume to process a single printf() is private to libdtrace, so we cannot know ahead of time how many probedata records belong to the current printf().  Instead, we need to look back at how many records since the previous call were skipped to reach the current probedata buffer offset, for example:

for (r = prev_probedata_index;
        ((r < edesc->dtepd_nrecs) &&
        (edesc->dtepd_rec[r].dtrd_offset < rec->dtrd_offset));

Here edesc refers to the enabled probe description (metadata that includes an array of all the probedata records).  Records from prev_probedata_index up to and not including the current record index, r, are associated with the previous printf() action.  For proof of these observations, the following dtrace command gives a glimpse into the internal workings just described:

$ dtrace -n 'BEGIN { n = 1; printf("%s %d\\n", "cat", 9); trace(1/--n); exit(0); }'
dtrace: description 'BEGIN ' matched 1 probe
dtrace: error on enabled probe ID 1 (ID 1: dtrace:::BEGIN): divide-by-zero in action #4 at DIF offset 20

The divide-by-zero error message says it failed in action #4, even though there are only two actions in the program specified with the -n option: printf() and trace().  Internally, however, the assignment n = 1 is the first probedata record with action type DTRACEACT_DIFEXPR and size zero.  The printf() accounts for the next two probedata records, one for each unformatted element: "cat" and 9.  The fourth record (with action type DTRACEACT_DIFEXPR) is the trace() action with the divide-by-zero error (action type DTRACEACT_DIFEXPR indicates the trace() action unless size is zero).  From an implementation point of view, trace() is in this case "action #4".

By the time the action type DTRACEACT_PRINTF reaches the bufhandler, the probe- and record-consuming functions have already had their chance to see all information about that printf() except for the final formatted string.  That is now available as bufdata->dtbda_buffered.

The bufhandler should normally return DTRACE_HANDLE_OK, but in the case of an unexpected program error, returning DTRACE_HANDLE_ABORT stops the consumer prematurely.  Similarly, the record- and probe-consuming functions normally return DTRACE_CONSUME_THIS, but may return DTRACE_CONSUME_ABORT to stop the consumer.  If the current record is the exit() action  or  is  NULL (to delimit probedata), return DTRACE_CONSUME_NEXT.

Unlike the probe- and record-consuming functions, the bufhandler is called not only from dtrace_work() but also from dtrace_aggregate_print() (in an intrstat-style aggregate request).  This means that the user arg passed to dtrace_handle_buffered() is forwarded to the bufhandler in both contexts, even though it may contain state that is invalid outside of the thread that called dtrace_handle_buffered() (the same thread that calls dtrace_work() ).  To avoid the resulting (probably surprising) errors in callbacks from dtrace_aggregate_print(), you can use pthread_key_create() to create a user arg capable of obtaining user data specific to each context.  You just need to associate user data with the created key in each thread using pthread_setspecific(), then in the bufhandler use pthread_getspecific() to obtain the user data specific to the current thread (dtrace_work() or dtrace_aggregate_print() ).  In general, take care that state recorded in the bufhandler during dtrace_work() does not interfere with state recorded in the bufhandler during dtrace_aggregate_print() and vice-versa.

Technorati Tag:
Technorati Tag:
Technorati Tag:




« April 2014

No bookmarks in folder