Tuesday Jul 29, 2014

DTrace improvements in Oracle Solaris 11.2

There have been a few improvements to DTrace in Solaris 11.2.

llquantize()

DTrace has quantize() and lquantize() aggregating actions to give you, respectively, a power-of-two distribution and a linear distribution of data points that you're interested in.  While these are both useful, there may be instances in which you want to examine events whose latencies span multiple orders of magnitude, but for which you want relatively fine-grained information about the data within each order of magnitude.  With quantize(), there's likely to be insufficient detail. You could use lquantize(), but you'd need multiple aggregations to cover the multiple orders of magnitude.

In 11.2, we have added a log-linear quantize aggregation, llquantize().  This aggregating action allows you to specify a base and a range of exponents for the data, but it also allows you to specify a number of steps (or buckets, if you will) per order of magnitude.  For example, the following line will create an aggregation covering the values from 103 through 106 - 1 with 10 steps per order of magnitude.  (The buckets for 105 will include the values from 105 through 106 - 1.):

@ = llquantize(foo, 10, 3, 5, 20);

We can use this in a script to examine system call latencies:

        syscall:::entry
        {
                self->ts = timestamp;
        }

        syscall:::return
        / self->ts /
        {
                @ = llquantize(timestamp - self->ts, 10, 3, 5, 10);
                self->ts = 0;
        }

Because the timestamp is measured in nanoseconds, this script reports system call latencies in the microseconds range.  Here's sample output from this script:

           value  ------------- Distribution ------------- count
          < 1000 |@@@@@@                                   12899
            1000 |@@@@@@@@@@@@                             26357
            2000 |@                                        3202
            3000 |@                                        1869
            4000 |@                                        2110
            5000 |@@                                       4716
            6000 |@@                                       3998
            7000 |@                                        1617
            8000 |@@                                       4924
            9000 |@                                        2515
           10000 |@@@@@@@                                  15307
           20000 |@                                        2240
           30000 |@                                        1327
           40000 |@                                        1369
           50000 |                                         990
           60000 |                                         1057
           70000 |                                         631
           80000 |                                         453
           90000 |                                         434
          100000 |@                                        1570
          200000 |                                         228
          300000 |                                         45
          400000 |                                         59
          500000 |                                         60
          600000 |                                         52
          700000 |                                         30
          800000 |                                         22
          900000 |                                         17
      >= 1000000 |                                         513



Scalability

When DTrace was first conceived, 100 CPUs was a large machine.  Now, the largest machines contain over 1,000 CPUs, and the original DTrace architecture is starting to show its age.  dtrace(1M) (or specifically, libdtrace(3LIB)) was originally written to process data from the CPUs on a server in a single thread.  Unfortunately, dtrace(1M) was unable to to keep up on newer, larger servers with just a single thread.

In Solaris 11.2, we've modified libdtrace to perform this task with multiple threads.  On x86 servers, dtrace(1M) will use one thread per 8 CPUs.  On SPARC servers, it will use one thread per 16 CPUs.  We've also included an option, nworkers, to allow you to request a specific number of threads.

What does this mean for you?  The main benefit of doing this is that dtrace(1M) will actually be able to keep up on larger servers.  During testing on a 256-CPU system, generating 6,000 records per second per CPU, we were seeing hundreds of aggregation drops per second per CPU without this multi-threading framework.  Running the same test with a multi-threaded dtrace(1M), we saw no aggregation drops.

errexit option

Another minor enhancement in Solaris 11.2 is the errexit option.  This option causes dtrace(1M) to exit when it first hits an error.

As a trivial example, consider the following script:

tick-1s
{
        this->i = 0;
        this->j = 5 / this->i;
}

If run normally, this script would run until termination reporting an error once per second:

# dtrace -q -s divide-by-zero.d
dtrace: error on enabled probe ID 1 (ID 5148: profile:::tick-1s): divide-by-zero in action #2 at DIF offset 2
dtrace: error on enabled probe ID 1 (ID 5148: profile:::tick-1s): divide-by-zero in action #2 at DIF offset 2
dtrace: error on enabled probe ID 1 (ID 5148: profile:::tick-1s): divide-by-zero in action #2 at DIF offset 2
^C

#

When run with the errexit option, the script terminates after the first error:

# dtrace -x errexit -q -s divide-by-zero.d
dtrace: error on enabled probe ID 1 (ID 5148: profile:::tick-1s): divide-by-zero in action #2 at DIF offset 2

#

tracemem() enhancements

We've added an optional third argument to tracemem().  Where the second argument specifies how many bytes of memory to trace, the optional third argument specifies how many bytes of memory to display.  This can be useful in cases where the size of the data you care about is variable.  (The DTrace architecture requires that you trace a constant amount of data, but in some cases, the amount of data you're interested in is variable.  What you would currently do is trace enough to capture what you're interested in and ignore the rest.  This optional argument lets dtrace(1M) ignore the garbage for you.)

As an example, consider tracing the beginning of an SSH connection, as seen from the server side.  (This is a simplified example, so we'll just look at what the server writes.):

syscall::write:entry
/execname == "sshd"/
{
        tracemem(copyin(arg1, arg2), 1024, arg2);
}

While running this script and opening a connection from a remote server, we see this:

 CPU     ID                    FUNCTION:NAME
   0   5900                      write:entry
             0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f  0123456789abcdef
         0: 53 53 48 2d 32 2e 30 2d 53 75 6e 5f 53 53 48 5f  SSH-2.0-Sun_SSH_
        10: 32 2e 32 0a                                      2.2.

   0   5900                      write:entry
             0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f  0123456789abcdef
         0: 25 54 00 00                                      %T..

   1   5900                      write:entry
             0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f  0123456789abcdef
         0: 90 39 c4 53 d7 82 01 00 25 54 00 00              .9.S....%T..

   1   5900                      write:entry
             0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f  0123456789abcdef
         0: 90 39 c4 53 49 84 01 00 25 54 00 00              .9.SI...%T..

   1   5900                      write:entry
             0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f  0123456789abcdef
         0: 90 39 c4 53 d2 84 01 00 25 54 00 00              .9.S....%T..

[ ... ]

This is certainly much better than seeing this:

 CPU     ID                    FUNCTION:NAME
   0   5900                      write:entry
             0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f  0123456789abcdef
         0: 53 53 48 2d 32 2e 30 2d 53 75 6e 5f 53 53 48 5f  SSH-2.0-Sun_SSH_
        10: 32 2e 32 0a 00 00 00 00 18 00 00 00 00 00 00 00  2.2.............
        20: 98 f6 d3 08 25 1f d3 08 01 21 d3 08 00 00 00 00  ....%....!......
        30: 00 00 00 00 01 00 00 00 18 00 00 00 00 00 00 00  ................
        40: c0 23 d2 08 0d 21 d3 08 10 00 00 00 01 00 00 00  .#...!..........
        50: 9d 11 00 00 00 00 00 00 18 00 00 00 00 00 00 00  ................
        60: c0 24 d2 08 25 1f d3 08 0d 21 d3 08 00 00 00 00  .$..%....!......
        70: 00 00 00 00 01 00 00 00 18 00 00 00 00 00 00 00  ................
        80: 80 25 d2 08 01 00 00 00 02 00 00 00 03 00 00 00  .%..............
        90: ff ff ff ff 04 00 00 00 18 00 00 00 00 00 00 00  ................
        a0: a0 22 d2 08 2e 21 d3 08 0e 00 00 00 01 00 00 00  ."...!..........
        b0: ab 11 00 00 00 00 00 00 18 00 00 00 00 00 00 00  ................
        c0: 00 23 d2 08 2e 21 d3 08 00 00 00 00 25 1f d3 08  .#...!......%...
        d0: 00 00 00 00 01 00 00 00 18 00 00 00 00 00 00 00  ................
        e0: b8 f6 d3 08 00 00 00 00 00 00 00 00 00 00 00 00  ................
        f0: 00 00 00 00 00 00 00 00 18 00 00 00 00 00 00 00  ................
       100: d8 f6 d3 08 00 00 00 00 00 00 00 00 00 00 00 00  ................
[ ... ]
       3d0: 00 00 00 00 00 00 00 00 18 00 00 00 00 00 00 00  ................
       3e0: b8 f9 d3 08 00 00 00 00 00 00 00 00 00 00 00 00  ................
       3f0: 00 00 00 00 00 00 00 00 18 00 00 00 00 00 00 00  ................

tracemem() output consistency

Another enhancement we've made to tracemem() is to modify the behavior of tracemem() for certain traced memory sizes.  Previously, when tracing 1, 2, 4 or 8 bytes, the traced memory would be treated as a signed decimal integer.  When tracing any other amount below 32 bytes, the traced memory was treated as a string if the buffer contained only printable ASCII characters.  For example:

# dtrace -qn 'BEGIN {tracemem(&`initname, 1); exit(0)}'
47
# dtrace -qn 'BEGIN {tracemem(&`initname, 4); exit(0)}'
1920169263
# dtrace -qn 'BEGIN {tracemem(&`initname, 32); exit(0)}'
/usr/sbin/init
# dtrace -qn 'BEGIN {tracemem(&`initname, 64); exit(0)}'

             0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f  0123456789abcdef
         0: 2f 75 73 72 2f 73 62 69 6e 2f 69 6e 69 74 00 00  /usr/sbin/init..
        10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        20: 80 00 00 00 01 00 00 00 84 2b ab fb ff ff ff ff  .........+......
        30: ac 2e ab fb ff ff ff ff ac 31 ab fb ff ff ff ff  .........1......

#

We've modified this behavior to be consistent across traced memory sizes.  For example:

# dtrace -qn 'BEGIN {tracemem(&`initname, 1); exit(0)}'

             0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f  0123456789abcdef
         0: 2f                                               /

# dtrace -qn 'BEGIN {tracemem(&`initname, 4); exit(0)}'

             0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f  0123456789abcdef
         0: 2f 75 73 72                                      /usr

# dtrace -qn 'BEGIN {tracemem(&`initname, 32); exit(0)}'

             0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f  0123456789abcdef
         0: 2f 75 73 72 2f 73 62 69 6e 2f 69 6e 69 74 00 00  /usr/sbin/init..
        10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................

# dtrace -qn 'BEGIN {tracemem(&`initname, 64); exit(0)}'

             0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f  0123456789abcdef
         0: 2f 75 73 72 2f 73 62 69 6e 2f 69 6e 69 74 00 00  /usr/sbin/init..
        10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        20: 80 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00  ................
        30: 88 32 b6 fb ff ff ff ff 00 36 b6 fb ff ff ff ff  .2.......6......

#

Structure alignment

DTrace has had a longstanding issue with how it calculates alignment for structures, unions, and bit-fields.  At best, this provided an annoyance requiring users to modify their scripts to manually add padding to reflect correct behavior, or at worst, would provide wrong information when using things like the sizeof() action or produce alignment errors where there shouldn't be any issue.  We have made several modifications such that structures, unions, and bit-fields are now more ABI compliant.

In the case of bit fields, depending on where the break is between the bits, some would get lost or placed where they shouldn't.  For example, having a DTrace script written as:

union u1tag
{
        unsigned char c;
        struct
        {
                unsigned int b1:1;
                unsigned int b2:7;
        } s1;
} u1;

BEGIN
{
        u1.c = 255;
        printf ("%d %d %d\n", u1.c, u1.s1.b1, u1.s1.b2);
        u1.c = 0;
        u1.s1.b1 = 1;
        u1.s1.b2 = 127;
        printf ("%d %d %d\n", u1.c, u1.s1.b1, u1.s1.b2);
        exit(0);
}

Would produce a result of:

255 1 0
128 1 127

If you were to write the same thing in C, you would see:

255 1 127
255 1 127

For structs and unions, the problem is more pronounced.  Suppose we have a DTrace script:

typedef struct _my_data {
        uint32_t a;
        uint64_t c;
} my_data;

typedef struct _more_data {
        uint32_t x;
        my_data y;
} more_data;

more_data a;

BEGIN {
        a.y.c = 30;
        printf("%lu\n", a.y.c);
}

When we run it, DTrace will report that this otherwise valid structure will result in an invalid alignment.

 # dtrace -64 -s /tmp/test.d
dtrace: script '/tmp/test.d' matched 1 probe
dtrace: error on enabled probe ID 1 (ID 1: dtrace:::BEGIN): invalid alignment
(0xffffff08bf603a0c) in action #1 at DIF offset 24

What would be done in the past to work around this issue is to add a padding variable so that the struct _more_data would look like:

typedef struct _more_data {
        uint32_t x;
        uint32_t padding;
        my_data y;
} more_data;

This means that without the padding, not only did we run into alignment issues that would produce runtime errors, but we ran the risk of reporting wrong data within the members of the struct.  This would also impact the size being reported by actions such as sizeof().

For instance suppose we have two structs:

struct s1 {
        int x;          /* 4 bytes */
        short a;        /* 2 bytes */
};

struct s2 {
        struct s1 b;    /* 6 bytes */
        short c;        /* 2 bytes */
        int d;          /* 4 bytes */
};

In this case, sizeof(s1) would report a size of 6, and sizeof(s2) would report a size of 12 instead of 8 and 16 respectively.

One last alignment issue is being addressed with this release, and that deals with the memory alignment restrictions between SPARC and x86 architectures.  In the past, DTrace would force a 4-byte alignment regardless of the architecture the script happened to be running on. With 11.2, alignment is based on the platform since SPARC still requires 4-byte alignment for memory accesses, but the x86 architecture is more lenient and doesn't require such restrictions.

Summing up, by insuring ABI compliance each structure will match their C counterpart, and when you copyin a struct from your program, and rely on the structs being defined inside a header file or your own D scripts, you can rest assured the data is reliable.

Monday Jul 04, 2011

Making yourself indispensible with DTrace

DTrace technology and tools appeared in 2005, and despite this, it is still not very well known among majority of developers and system administrators. It is especially surprising given that for five and a half years since the release of Solaris 10, where DTrace was first implemented, there was no better technology made for analyzing of operating systems and applications.[Read More]

Tuesday May 31, 2011

A D Script for Observing the Network - Take Two

About a year and a half ago I wrote about a nifty little D script I came across while working with the now defunct Crossbow Virtual Wire Demo Tool. However, the script stopped compiling somewhere between the release of OpenSolaris 2009.06 and Solaris 11 Express. 

Solaris 11 Express also included the addition of two new DTrace providers, TCP and UDP. These new providers make it possible to simplify the script and Solaris Systems Architect Andrew Gabriel did this.[Read More]

Tuesday Nov 10, 2009

A D Script for Observing the Network

A little gem included with the Crossbow Virtual Wire Demo Tool is a DTrace script and associated Chime display written by Crossbow engineer Kais Belgaied. I've augmented the script a bit by adding an END clause to format the output, which shows the number of bits transferred over the period of time the script is run:[Read More]

Wednesday Oct 28, 2009

D Script Archeology

I've read that the best way to learn DTrace is to study existing D scripts. However, upon doing so I quickly find myself asking "where is that documented?" And it turns out, with tens of thousands of probes, as is often the case, it's not. So I'd love to hear what others with more DTrace experience do, but these are the steps I seem to have stumbled upon.[Read More]

Friday Oct 23, 2009

DTrace Quick Start Guide

Sun ISV engineer and DTrace evangelist Angelo Rajadurai has just authored a new quick start guide to DTrace. The guide is based on his "Observing Applications with DTrace" presentation (slides, video) at CommunityOne West and focuses on tracing web applications. The examples from the book have also been added to a wiki page for easier copy & paste.

(click the image to open PDF)

Wednesday Oct 21, 2009

DTrace Toolkit

A hidden little gem that comes included with your installation of OpenSolaris is the DTrace Toolkit. You'll find it installed at /opt/DTT.

The DTrace Toolkit is a collection of tools written by volunteers of the DTrace community, led by Brendan Gregg. What I find most impressive is how well it has been put together. There are 230 executable DTrace scripts, each with their own man page. The scripts are organized by type (CPU analysis, I/O analysis, etc.) so they are easy to study, but they've all been symlinked into a Bin directory, so they are easy to put on your path. I've added the following to my ~/.bashrc file:

#
# DTrace Toolkit
#
PATH=$PATH:/opt/DTT/Bin
MANPATH=$MANPATH:/opt/DTT/Man

In addition to their man pages, all of the scrips have a -h help option, for example:

bleonard@os200906:~$ execsnoop -h
USAGE: execsnoop [-a|-A|-ehjsvZ] [-c command]
       execsnoop                # default output
                -a              # print all data
                -A              # dump all data, space delimited
                -e              # safe output, parseable
                -j              # print project ID
                -s              # print start time, us
                -v              # print start time, string
                -Z              # print zonename
                -c command      # command name to snoop
  eg,
        execsnoop -v            # human readable timestamps
        execsnoop -Z		# print zonename
        execsnoop -c ls         # snoop ls commands only

Note, if you want to run the DTrace scripts from your user account, run the following command to add the needed privileges, substituting in your user name:

pfexec usermod -K defaultpriv=basic,dtrace_user,dtrace_proc,dtrace_kernel <user>

To get started learning all that's available, I recommend reading the DTrace Tookkit page. Also, Stefan Parvu has an excellent presentation on DTrace and the toolkit that's worth a look.

With this toolkit there's probably little need to write a DTrace script from scratch - it's literally a treasure chest of material that I've only begun to explore.

Wednesday Jul 29, 2009

Remote Systems Tracing with Chime

Chime has a nifty little feature where you can run it in server mode and then monitor the system remotely using the Chime GUI.[Read More]

Friday Jul 24, 2009

Chime

Chime is a graphical user interface for DTrace that makes viewing DTrace output enjoyable.[Read More]

Saturday Apr 11, 2009

Another OpenSolaris presentation from Community One

Nick Solter and Dave Miner (authors of OpenSolaris Bible) presented a session called Becoming an OpenSolaris Power User, which covers topics such as ZFS, DTrace and networking at Community One. Source: Nick Solter's blog.

Tuesday Mar 03, 2009

OpenSolaris 2008.11 Training in Brno

Me, Martin Man and Lubos Kocman (all three CZOSUG leaders) organized a full day training at Faculty of Informatics Masaryk university, Brno. Sun is partnering this university and we cooperate a lot around OpenSolaris. I've been to the university about 6 times so far. The most recent training was focused on 2008.11 and it was also an install fest. Here are the slides from the event:[Read More]

Thursday Feb 05, 2009

Don't shout at your disks!

A friend of mine from CZOSUG sent me a link to this amazing video - realtime analysis of vibrations caused by sound with DTrace and FishWorks on the Sun 7000 storage (based on OpenSolaris). Lesson learned: shouting can be dangerous to your data!

Friday Aug 29, 2008

What makes OpenSolaris interesting? Reason #3: DTrace

DTrace is very cool. I was skeptical first - because I hate learning a new language for a task, but learning D language is worth it, you can get so much information from the system...

[Read More]

Tuesday Aug 26, 2008

DTrace Privileges

I wanted to learn more about DTrace, so I started at the beginning: Chapter 1 of the Solaris Dynamic Tracing Guide. Using my installation of OpenSolaris 2008.05, I typed in the first command described, but it did not work. This is because in order to use DTrace, my account needs additional privileges that were not specified when the account was created.

To add the necessary privileges to my account, I followed these instructions on the frequently helpful Solaris Internals performance wiki.

[Read More]

Tuesday Aug 05, 2008

New NetBeans plug-in for DTrace

A new version (0.4) of NetBeans DTrace GUI plug-in is ready for download from NetBeans Update Center.

New features include:

- Chime (the DTrace output visualization) is embedded into the NetBeans DTrace GUI Plugin. Chime is fully integrated with NB DTrace GUI.
- The new version of DTraceToolkit is integrated with the new release of NB DTrace GUI. Users can debug and tune their Java, JavaScripts, PyThon, Ruby, Php, C, C++, and etc. applications using the new version of DTrace GUI on Solaris platfrom.
- New & enhanced UI.
- An Open-Sourced project at netbeans.org. Developers can participate in future implementations such as DTrace Editor which includes Dtrace code completion, syntax checking and code folding.
- Works with Sun Studio IDE Express releases.
- Works with NetBeans IDE 6.0, 6.1, and 6.5.

For more information, see:
http://www.netbeans.org/kb/docs/ide/NetBeans_DTrace_GUI_Plugin_0_4.html

About

The Observatory is a blog for users of Oracle Solaris. Tune in here for tips, tricks and more as we explore the Solaris operating system from Oracle.

Search

Archives
« May 2015
SunMonTueWedThuFriSat
     
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
      
Today