Monday Dec 19, 2011

Flame on: Graphing hot paths through X server code

Last week, Brendan Gregg, who wrote the book on DTrace, published a new tool for visualizing hot sections of code by sampling the process stack every few milliseconds and then graphing which stacks were seen the most during the run: Flame Graphs. This looked neat, but my experience has been that I get the most understanding out of things like this by trying them out myself, so I did.

Fortunately, it's very easy to setup, as Brendan provided the tools as two standalone perl scripts you can download from the Flame Graph github repo. Then the next step is deciding what you want to run it on and capturing the data from a run of that.

It just so turns out that a few days before the Flame Graph release, another mail had crossed my inbox that gave me something to look at. Several years ago, I added some Userland Statically Defined Tracing probes to the Xserver code base, creating an Xserver provider for DTrace. These got integrated first to the Solaris Xservers (both Xsun & Xorg) and then merged upstream to the X.Org community release for Xserver 1.4, where they're available for all platforms with DTrace support.

Earlier this year, Adam Jackson enabled the dtrace hooks in the Fedora Xorg server package, using the SystemTap static probe support for DTrace probe compatibility. He then noticed a performance drop, which isn't supposed to happen, as DTrace probes are simply noops when not being actively traced, and submitted a fix for it upstream.

This was due to two of the probes I'd defined - when the Xserver processes a request from a client, there's a request-start probe just before the request is processed, and a request-done probe right after the request is processed. If you just want to see what requests a client is making you can trace either one, but if you want to measure the time taken to run a request, or determine if something is happening while a request is being processed, you need both of the probes. When they first got integrated, the code was simple:

#ifdef XSERVER_DTRACE
                XSERVER_REQUEST_START(GetRequestName(MAJOROP), MAJOROP,
                              ((xReq *)client->requestBuffer)->length,
                              client->index, client->requestBuffer);
#endif
                // skipping over input checking and auditing, to the main event,
                // the call to the request handling proc for the specified opcode:
                    result = (* client->requestVector[MAJOROP])(client);
#ifdef XSERVER_DTRACE
                XSERVER_REQUEST_DONE(GetRequestName(MAJOROP), MAJOROP,
                              client->sequence, client->index, result);
#endif

The compiler sees XSERVER_REQUEST_START and XSERVER_REQUEST_DONE as simple function calls, so it does whatever work is necessary to set up their arguments and then calls them. Later, during the linking process, the actual call instructions are replaced with noops and the addresses recorded so that when a dtrace user enables the probe the call can be activated at that time. In these cases, that's not so bad, just a bunch of register access and memory loads of things that are going to be needed nearby. The one outlier is GetRequestName(MAJOROP) which looks like a function call, but was really just a macro that used the opcode as the index an array of strings and returned the string name for the opcode so that DTrace probes could see the request names, especially for extensions which don't have static opcode mappings. For that the compiler would just load a register with the address of the base of the array and then add the offset of the entry specified by MAJOROP in that array.

All was well and good for a bit, until a later project came along during the Xorg 1.5 development cycle to unify all the different lists of protocol object names in the Xserver, as there were different ones in use by the DTrace probes, the security extensions, and the resource management system. That replaced the simple array lookup macro with a function call. While the function doesn't do a lot more work, it does enough to be noticed, and thus the performance hit was taken in the hot path of request dispatching. Adam's patch to fix this simply uses is-enabled probes to only make those function calls when the probes are actually enabled. x11perf testing showed the win on a Athlon 64 3200+ test system running Solaris 11:

Before: 250000000 trep @   0.0001 msec ( 8500000.0/sec): X protocol NoOperation
After:  300000000 trep @   0.0001 msec (10400000.0/sec): X protocol NoOperation 

But numbers are boring, and this gave an excuse to try out Flame Graphs. To capture the data, I took advantage of synchronization features built into xinit and dtrace -c:

# xinit /usr/sbin/dtrace -x ustackframes=100 \
  -n 'profile-997 /execname == "Xorg"/ { @[ustack()] = count(); }' \
  -o out.user_stacks.noop -c "x11perf -noop" \
  -- -config xorg.conf.dummy
To explain this command, I'll start at the end. For xinit, everything after the double dash is a set of arguments to the Xserver it starts, in this case, Xorg is told to look in the normal config paths for xorg.conf.dummy, which it would find is this simple config file in /etc/X11/xorg.conf.dummy setting the driver to the Xvfb-like “dummy” driver, which just uses RAM as a frame buffer to take graphics driver considerations out of this test:
Section "Device"
	Identifier  "Card0"
	Driver      "dummy"
EndSection
Since I'm using a modern Xorg version, that's all the configuration needed, all the unspecified sections are autoconfigured. xinit starts the Xserver, waits for the Xserver to signal that it's finished its start up, and then runs the first half of the command line as a client with the DISPLAY set to the new Xserver. In this case it runs the dtrace command, which sets up the probes based on the examples in the Flame Graphs README, and then runs the command specified as the -c argument, the x11perf benchmark tool. When x11perf exits, dtrace stops the probes, generates its report, and then exits itself, which in turn causes xinit to shut down the X server and exit.

The resulting Flame Graphs are, in their native SVG interactive form:

Before

After


If your browser can't handle those as inline SVG's, or they're scrunched too small to read, try Before (SVG) or Before (PNG), and After (SVG) or After (PNG).

You can see in the first one a bar showing a little over 10% of the time was in stacks involving LookupMajorName, which is completely gone in the second patch. Those who saw Adam's patch series come across the xorg-devel list last week may also notice the presence of XaceHook calls, which Adam optimized in another patch. Unfortunately, while I did build with that patch as well, we don't get the benefits of it since the XC-Security extension is on by default, and those fill in the hooks, so it can't just bypass them as it does when the hooks are empty.

I also took measurements of what Xorg did as gdm started up and a test user logged in, which produced the much larger flame graph you can see in SVG or PNG. As you can see the recursive calls in the font catalogue scanning functions make for some really tall flame graphs. You can also see that, to no one's surprise, xf86SlowBCopy is slow, and a large portion of the time is spent “blitting” bits from one place to another. Some potential areas for improvement stand out - like the 5.7% of time spent rescanning the font path because the Solaris gnome session startup scripts make xset fp calls to add the fonts for the current locale to the legacy font path for old clients that still use it, and another nearly 5% handling the ListFonts and ListFontsWithInfo calls, which dtracing with the request-start probes turned out to be the Java GUI for the Solaris Visual Panels gnome-panel applet.

Now because of the way the data for these is gathered, from looking at them alone you can't tell if a wide bar is one really long call to a function (as it is for the main() function bar in all these) or millions of individual calls (as it was for the ProcNoOperation calls in the x11perf -noop trace), but it does give a quick and easy way to pick out which functions the program is spending most of its time in, as a first pass for figuring out where to dig deeper for potential performance improvements.

Brendan has made these scripts easy to use to generate these graphs, so I encourage you to try them out as well on some sample runs to get familiar with them, so that when you really need them, you know what cases they're good for and how to capture the data and generate the graphs for yourself. Trying really is the best method of learning.

Monday Jun 11, 2007

The Irregular X11 DTrace Companion: Issue 2

In response to a Solaris 10 7/07 beta tester question last week, I updated the Xserver provider for DTrace web page to point out that besides the original method of patching the Xorg sources and rebuilding, pre-built binaries are now included directly in Solaris...

Getting an X server with DTrace probes built in

The Xserver dtrace providers are now included in the regular X server binaries (including Xorg, Xsun, Xvfb, Xephyr, Xnest, & Xprt) in recent Solaris releases, including:

For those who still want to build from source, the provider is integrated into the X.Org git master repository for the Xserver 1.4 release, planned to be released in August 2007 with X11R7.3, and into the OpenSolaris X code base for Nevada builds 53 and later.

Thursday May 17, 2007

The Irregular X11 DTrace Companion: Issue 1

All the cool kids are doing unscheduled update newsletters now, and I had a collection of X11 Dtrace things to post about piling up, so I figured I'd give it a whirl. (BTW, Nouveau guys - I'd read yours more if you had an RSS feed - having to go check the wiki on an irregular basis doesn't work so well.)

Doing newsletters on a regular basis is hard, as Glynn has learned, so don't expect updates too often...

Sample script for tracing events

Jeremy asked recently for an example DTrace script using the event probes in the Xserver DTrace provider. I had lost the ones I wrote when developing/testing that probe, but dug one out of my e-mail that was written by DTrace Master Guru Bryan Cantrill and posted it as ButtonPress-events.d.

He wrote this script when he'd upgraded to Nevada build 58 and found shift-click was no longer working. Using the script he was able to show that the Xserver was sending ButtonPress events without the ShiftMask set. I wrote a corresponding script to decode the VUID events the X server was getting from the kernel mouse and keyboard drivers to determine the keyboard wasn't giving us the shift key pressed events until another key was pressed, which wasn't happening for shift-clicks with the mouse. That script can be seen in the bug report I filed with the kernel keyboard team as Solaris bug 6526932.

Using DTrace to win a race

A bug was filed with X.Org recently stating “iceauth can dump a core in auth_initialize() if a signal is caught before iceauth_filename has been malloced.” As I was fixing bugs in iceauth anyway, I figured I'd look at it to see if I could roll in a fix to the new iceauth release I'd be making for those. Fortunately, iceauth is a small program, and I could see where I thought the race condition lied between the main code and the signal handler. Testing any race condition has always been a pain in the past, since if you don't get the timing just right you don't know if you really fixed it, or just changed the timing enough to not hit it.

So I pulled up the DTrace manual and DTrace wiki and created a one-liner to force the race condition by sending the signal when I knew it was in the right spot:

 # dtrace -w -c ./iceauth -n 'pid$target::IceLockAuthFile:return { raise(1); }'

For those who haven't memorized the DTrace manual, this can be broken down as:

  • -w - allows “destructive” actions - i.e. those that change the running of the program, and don't just monitor it - in this case, allows sending a signal to the process
  • -c ./iceauth - specifies the command to run with this script, and puts the process id of the command in $target in the script
  • pid$target::IceLockAuthFile:return - sets the probe we're looking for to be the return from any call to the function IceLockAuthFile in the process matching the pid $target
  • { raise(1); } - sets the action to be run when that probe is hit to send signal 1 (aka SIGHUP) to the target process

Ran that and a few seconds later had a nice core file to verify where the race was. Turned out I was right about where the signal had to come in for the race to cause a crash, but was slightly off about the point in the signal handler where it was crashing. The fix was easy, and thanks to DTrace, easy to verify, and is now in X.Org's git repository.

DTrace Wiki: Topics: X

After last month's Silicon Valley OpenSolaris User Group meeting, Brendan asked if I'd work on the “X” topic on the DTrace Wiki. I agreed to, and have put up a small placeholder while I figure out what to flesh it out with - if you have any suggestions or requests for what you'ld like to see in a short introductory article to using DTrace to probe X, especially using, but not limited to, the Xserver DTrace provider, send me e-mail or leave a comment here.

Tuesday Nov 07, 2006

Xserver DTrace probes integrated

It's been a little over a year since I posted the initial open release of the DTrace probes for the X server. I've gotten busy with a number of other things since then but after a little prodding from the JDS team [1], I've finally run them by the Sun Architecture Review Committee (PSARC case id 2006/609), got their thumbs up, completed code review, and integrated into our Solaris builds of Xorg & Xsun (which due to code sharing also means they're in Xnest, Xvfb, and Xprt). If they pass QA testing, they should appear in Nevada Build 53. I've also put them into the X.Org community git master branch, so they should be in Xorg 7.3 when it ships next May. (The autoconf code for them only checks for the dtrace command needed to build, not for the OS, so when MacOS X & FreeBSD users get DTrace, they should be able to build the probes as well.)

I've also updated http://people.freedesktop.org/~alanc/dtrace/ with the few changes made to the probes since initial release (added an event probe, and changed one of the arguments made available by the request-done probe) and to include a patch against the Xorg 7.2.RC1 tarball for the current state of the probes for those who don't want to wait.

[1] The conversation went something like this (slightly paraphrased):

  • me: These JavaScript DTrace Probes are cool! We should include them in the FireFox in Solaris!
  • jmr: Yes - and what about the XServer dtrace probes ;)
  • me: oh, um, well, yeah, okay...

[Technorati Tags: , , , ]

Monday Aug 07, 2006

"the open source DTrace, now built into Mac OS X Leopard."

Awesome.

(which reminds me, I still have to put the Xserver dtrace probes into the X.Org git repository - though I guess they'll have to check for more than just $host_os eq "solaris" now in the autoconf script)

Update: Team DTrace speaks.

[Technorati Tags: , , , ]

Monday Dec 05, 2005

Solaris Desktop Summit: Performance Day 1

The Performance portion of the Solaris Desktop Summit got off to a rousing start today with Bryan's talk on the secrets of the DTrace gurus (or at least all the new bits they've added recently and are still working on documenting in the DTrace Guide.

In the afternoon, we broke into working groups, and I joined the team looking at Boot Time. We split our group further, so I worked with gdm maintainer Brian Cameron on trying to see if we could figure out where we could improve the time it takes to start gdm and the X server once the rest of the system is up. We pieced together enough dtrace to determine that about half of the time was spent in gdmlogin, the program that draws the actual login dialog, about a quarter in Xorg itself, and the rest spread across the other processes involved (including the many svcprop calls from the /usr/X11/bin/Xserver script to get the X preferences from the Solaris SMF registry - an area already noted by others as ripe for improvement).

So we started with the gdmlogin process and dug in - it wasn't long before we hit the limit of my DTrace skills, and we called in Bryan to pinch-hit. [To keep things less confusing in the next part, I'll switch to login names, and refer to Bryan as “bmc” and Brian as “yippi.”] He flew through it - finding things that looked strange and digging in, while yippi looked in the source code for explanations. We saved the scripts and DTrace one-liners on yippi's laptop, and will hopefully be able to post more later. There were three main things we found to investigate/try improving:

  • bmc noticed a lot of time being spent in the close() system call, which seemed strange. Tracking this down to the stack trace involved, yippi saw it was socket closes in the gdmlogin configuration calls. When gdmlogin needs to get a configuration value from the parent gdm daemon it opens a socket, checks the version, asks for the value it needs, then tears down the socket. Over and over again it does this, instead of simply caching the socket and reusing it for future calls, so yippi is looking into seeing if we can convert it to do so.
  • bmc also noticed a lot of stat() system calls, and found two big causes:
    • gdm has a list of possible supported locales and stats the locale directories for each ones to see which are installed. We thought of two ways to reduce the cost of these. First, instead of having gdmlogin do this every time, having the parent do it and pass the list to the children could reduce the time on systems with multiple login screens, such as multiseat systems, including Sun Ray. That shouldn't make much difference on single user system though, but the second idea, to change the scan to instead read the parent directory containing all the locales and use the list of it's subdirs instead of a huge list of all possible subdirs could greatly cut down the number of system calls on systems with only a few locales installed.
    • gdm was also scanning lots of directories to find the icons for the gtk theme, including stat'ing the same directory over and over again. Another group is looking into gtk performance and will be looking into this more.

We didn't get a chance to make all the changes and test the code — at least not before I left, though yippi was still debugging the first set of changes then and if he didn't finish today, we should be tackling it again tomorrow to see if we can measure improvememnts from the changes and find any more or start digging into Xorg.

P.S. John has written a lot more than I did about the Usability portion of the Summit we held last week, including some of the problems we identifies and the ideas we kicked around to solve them.

[Technorati Tags: , , , , , .]

Tuesday Jul 19, 2005

DDC Talk: "Peeking under the Hood"

I've just finished my tutorial presentation on tools for tracing X server/client interactions here at the Desktop Developer's Conference in Ottawa. I've posted both the slides from my talk and the DTrace probes and scripts I showed as one of the tools.

Unfortunately, I didn't get to cover all the tools I wanted to (mostly because I spent a lot more time in the last week on getting the Xorg 6.9 & 7.0 releases ready for their slightly delayed Release Candidate Zero milestones than on my presentation). There are some tools we've developed for our own use in the X group in Sun I was hoping to be able to release before the talk and cover, but didn't get around to - such as a tool that uses the X-Record extension to generate statistics on the number of times each request is called, which we used when switching from CDE to GNOME to determine which parts of the server needed more optimization attention now (Motif & GTK have very different call profiles - for instance, GTK is much more image/pixmap based than Motif was, while Motif use lots of filled rectangles and straight lines to draw it's window decorations and widgets).

About half the talk, and most of the interest though, was in the dtrace probes in the X server (which I wrote about in previous posts). A lot of what I showed was doable before, and in many cases, has been done before, but took much more work. For instance, the X Record program I wrote earlier, and an even older program that post-processed xscope output to produce call counts, both gave counts of the number of times a function was called, but they were a lot more work to write than the simple requests.d script. And that script was easily extended to time-requests.d to instead print the average CPU time the X server took to process each request.

Tip for giving presentations using StarOffice 7 in JDS 3 on Solaris 10: When you want to switch from slide show mode to another window to run a demo, exit slide show mode first in StarOffice, and then switch apps. Trying to Alt-Tab directly did amuse my audience (especially the author of an alternative window manager watching from his comfy spot on the couch), but discovering StarOffice/Metacity interaction bugs during a presentation to a less forgiving audience may not be the best demo of our "integrated" desktop. Fortunately, pkill soffice worked well in the terminal window I had switched to.

[Technorati Tags: , , , , ]

Monday Jul 11, 2005

Can GNOME startup time be improved via ld flags?

Bryan Cantrill, Master DTrace Guru, First Class, spent some time today looking at what exactly GNOME is doing when you login to a Java Desktop System session on Solaris, and posted his findings to his weblog. (The current JDS on Solaris is based on GNOME 2.6, since that's what was the stable release last year when Solaris 10 hit feature freeze. The JDS team is working on an update to GNOME 2.10 now.)

One of the things Bryan found was that a large part of the I/O time was spent loading shared object text. I took a quick look at some of the binaries and libraries using elfdump, and noticed that there were no signs of using flags that could reduce the time needed to load shared libraries at process startup. Some of these (like -z lazyload) defer work until later - others (like -z combreloc) reduce the work needed whenever it happens.

I sent some suggestions to the JDS team on using these flags and others to improve this and suggested especially reading the Performance Considerations chapter of the Solaris Libraries and Linkers Guide for more ideas. I also cc'ed the linker gurus, and Senior Linker Alien Rod Evans added a suggestion to try out the check_rtime perl script on the binaries to check for the recommended flags and whether any of the libraries linked against aren't really needed. It's currently set up for use in the build system of the OS/Networking consolidation (the portion of the Solaris sources already released via OpenSolaris), but should be adaptable to the JDS build system or in fact, any project that wants to try to optimize it's library/linker use on Solaris.

Unfortunately, just tweaking the flags will mostly help Solaris, but the GNU binutils ld used on Linux and some other platforms offers some similar functionality - it recognizes many of the same -z options for instance, though I haven't tried them to see how they compare.

Something that may help more on both platforms is ensuring the libraries listed in the various .pc files for GNOME only list the direct requirements, not all the dependencies they depend on as well. For instance, look at what is linked into every program on Solaris that uses the gtk toolkit:

alanc@unknown:~ [2] pkg-config --libs gtk+-2.0
-lgtk-x11-2.0 -lgdk-x11-2.0 -latk-1.0 -lgdk_pixbuf-2.0 -lm -lmlib -lpangoxft-1.0 
-lpangox-1.0 -lpango-1.0 -lgobject-2.0 -lgmodule-2.0 -lglib-2.0
But if you run elfdump -d /usr/lib/libgtk-x11-2.0.so you'll see libgtk-x11-2.0.so already lists those dependencies, so duplicating them in the applications simply wastes time as the linker at runtime will load libgtk-x11-2.0.so and have to check the same list of libraries it already checked in the application (though it should find it's already taken care of them and doesn't duplicate all the work). Additionally it hardcodes in the applications knowledge of the internals and backends used that they shouldn't need to know about, and makes it harder to change or replace one of them. While all those libraries need to be listed when statically linking, or on older systems (mainly pre-ELF I think), the pkg-config entries should be streamlined when using ELF shared libraries on modern systems.

[Technorati Tags: , , , ]
[Now Playing: Deep Space 9 series finale (recorded today off Spike TV by our TiVo)]

Friday Jul 08, 2005

Xserver provider for DTrace

A few months ago I sent mail to Sun's dtrace and X11 internal mailing lists about something I'd been playing with:

After trying to absorb as much as possible at last week's dtrace classes, I sat down with the manual and tried things out for a bit to help it sink in before I forgot it all. One of the chapters I stumbled across was the one on adding your own probes to your own applications, which reminded me of some conversations I'd had with various people (Bart and Mahmood and others I've probably forgotten). Caffeine was consumed, and one thing led to another, and after a push in the right direction from the dtrace-interest list...

# dtrace -l -n 'Xserver\*:::'
   ID   PROVIDER            MODULE                FUNCTION NAME
    4 Xserver848              Xsun                Dispatch request-start
    5 Xserver848              Xsun                Dispatch request-done
    6 Xserver848              Xsun              InitClient client-connect
    7 Xserver848              Xsun         CloseDownClient client-disconnect 

# dtrace -q -n 'Xserver\*:::request-start { t = vtimestamp } \\
  Xserver\*:::request-done { \\
   printf("Client: %3d Request: %20s Size: %5d Time: %10d\\n", \\
          arg3, copyinstr(arg0), arg2, vtimestamp -t)}'
Client:   4 Request:  X_SetClipRectangles Size:     5 Time:      46736
Client:   4 Request:           X_ChangeGC Size:     4 Time:      16307
Client:   4 Request:           X_CopyArea Size:     7 Time:      68328
Client:   4 Request:  X_SetClipRectangles Size:     5 Time:      26480
Client:   4 Request:           X_ChangeGC Size:     4 Time:       8833
Client:   4 Request:  X_PolyFillRectangle Size:     5 Time:      43680
Client:   4 Request:  X_SetClipRectangles Size:     5 Time:      28506
Client:   4 Request:           X_ChangeGC Size:     4 Time:      11920
Client:   4 Request:           X_CopyArea Size:     7 Time:      46566

[sitting at the dtlogin screen, watching the cursor blink - it could probably cache the GC's and clip lists and use a little less bandwidth and CPU to blink that cursor]

This script waits for the next client to connect, keeps a count of all the requests it makes, then prints the count and exits when that client does:

#!/usr/sbin/dtrace -s

string Xrequest[uintptr_t];

Xserver$1:::client-connect
/clientid == 0/
{
    clientid = arg0;
    printf("\\nClient %d connected - tracing requests from it\\n", clientid);
}

Xserver$1:::client-disconnect
/clientid == arg0/
{
    printf("\\nClient %d disconnected - ending trace\\n", clientid);
    exit(0);
}

Xserver$1:::request-start
/Xrequest[arg0] == ""/
{
    Xrequest[arg0] = copyinstr(arg0);
}

Xserver$1:::request-start
/arg3 == clientid/
{
    @counts[Xrequest[arg0]] = count();
}

This started out as a plaything to give me something to learn dtrace with, but it looks useful to me and could easily turn into something more if others see uses for it. Since some of these probe points are directly in the hottest code path of our benchmarks, we'd have to make sure that they don't affect our benchmark scores too much, but that shouldn't be much of a problem. Yes, you can do much of this with xscope, but this doesn't require tunnelling everything through the slow xscope proxy server and then finding some way to make sense of the huge output logs.

And I've just put probe points in the easiest and most obvious places - there's other interesting places we could put in probes - when a client does a grab for instance, or on outgoing events and/or errors.

So I guess this is also a request for comments - would others find this useful? What probe points would be useful to use in the X server and what data would you like to get out at those probe points?

For instance, this is what I picked to make available in the current probes:

request-start: request name/extension name, minor code (for extensions), request length, client id, client sequence
request-done: request name/extension name, minor code (for extensions), request length, client id, result code
client-connect & disconnect: client id

After some feedback from people, I made some refinements to the existing probes, and added some more probes, then made it available for internal use. Since then I've added a few more probes to help in tracking down the CDE window manager pixmap leak I blogged about earlier. And now, I've made it available outside Sun as well, so more people can try it out, see if the probes available are useful or if they should be modified or expanded, and give feedback on it. At some point I hope to integrate this directly into both the Xsun and Xorg servers delivered in Solaris, as well as into the open source Xorg server code, but I'd like to get some more experience with it from more people first.

After you install it, you should see this set of probes available: (the number after "Xserver" in the provider name is the process id of the currently running Xorg server process)

# dtrace -l -n 'Xserver\*:::'
   ID   PROVIDER            MODULE                          FUNCTION NAME
    4 Xserver1335              Xorg    FreeClientNeverRetainResources resource-free
    5 Xserver1335              Xorg                FreeResourceByType resource-free
    6 Xserver1335              Xorg                      FreeResource resource-free
    7 Xserver1335              Xorg                          Dispatch request-done
    8 Xserver1335              Xorg           EstablishNewConnections client-connect
    9 Xserver1335              Xorg          AllocLbxClientConnection client-connect
   10 Xserver1335              Xorg        CloseDownRetainedResources client-disconnect
   11 Xserver1335              Xorg                   CloseDownClient client-disconnect
   12 Xserver1335              Xorg                    ProcKillClient client-disconnect
33598 Xserver1335              Xorg                          Dispatch client-disconnect
33667 Xserver1335              Xorg                       AddResource resource-alloc
33668 Xserver1335              Xorg                          Dispatch request-start
33669 Xserver1335              Xorg                  ClientAuthorized client-auth
33670 Xserver1335              Xorg                  FreeAllResources resource-free
33671 Xserver1335              Xorg               FreeClientResources resource-free

One of the example scripts I've also made available, client-watch.d reports every client that connects and when it disconnects a count of how many X requests it made and how much time the X server spent processing them. For example, I captured this from logging into and then out of a Java Desktop System (GNOME 2.6) session on Solaris 10:

   connect -> id:    8
 client id -> id:    8 is from local process 1706 (/usr/bin/gnome-session)
1706:   /usr/bin/gnome-session
[...]
   connect -> id:   14
 client id -> id:   14 is from local process 1831 (/usr/bin/nautilus)
1831:   nautilus --no-default-window --sm-client-id default3
   connect -> id:   15
 client id -> id:   15 is from local process 1833 (/usr/bin/gnome-volcheck)
1833:   gnome-volcheck -i 30 -z 3 -m cdrom,floppy,zip,jaz,dvdrom --sm-client-id default
  connect -> id:   16
 client id -> id:   16 is from local process 1857 (/usr/lib/clock-applet)
1857:   /usr/lib/clock-applet --oaf-activate-iid=OAFIID:GNOME_ClockApplet_Factory --oaf
   connect -> id:   17
 client id -> id:   17 is from local process 1831 ()
1831:   nautilus --no-default-window --sm-client-id default3
   connect -> id:   18
 client id -> id:   18 is from local process 1859 (/usr/lib/wnck-applet)
1859:   /usr/lib/wnck-applet --oaf-activate-iid=OAFIID:GNOME_Wncklet_Factory --oaf-ior-
   connect -> id:   19
 client id -> id:   19 is from local process 1867 (/usr/lib/gnome-netstatus-applet)
1867:   /usr/lib/gnome-netstatus-applet --oaf-activate-iid=OAFIID:GNOME_NetstatusApplet
   connect -> id:   20
 client id -> id:   20 is from local process 1875 (/usr/lib/mixer_applet2)
1875:   /usr/lib/mixer_applet2 --oaf-activate-iid=OAFIID:GNOME_MixerApplet_Factory --oa
   connect -> id:   21
 client id -> id:   21 is from local process 1877 (/usr/lib/notification-area-applet)
1877:   /usr/lib/notification-area-applet --oaf-activate-iid=OAFIID:GNOME_NotificationA
[... logout of JDS ...]
disconnect -> id:   16, lifetime: 8508 ms, requests: 364 (55 ms of CPU time)
disconnect -> id:   12, lifetime: 9597 ms, requests: 979 (50 ms of CPU time)
disconnect -> id:   20, lifetime: 6908 ms, requests: 286 (2 ms of CPU time)
disconnect -> id:   19, lifetime: 7157 ms, requests: 424 (53 ms of CPU time)
disconnect -> id:   14, lifetime: 9466 ms, requests: 940 (1532 ms of CPU time)
disconnect -> id:   15, lifetime: 9389 ms, requests: 119 (0 ms of CPU time)
disconnect -> id:   13, lifetime: 9542 ms, requests: 4544 (209 ms of CPU time)
disconnect -> id:   21, lifetime: 6718 ms, requests: 308 (1 ms of CPU time)
disconnect -> id:    8, lifetime: 15691 ms, requests: 1607 (1530 ms of CPU tim
disconnect -> id:    7, lifetime: 15851 ms, requests: 16 (0 ms of CPU time)
disconnect -> id:   10, lifetime: 11327 ms, requests: 130 (0 ms of CPU time)
disconnect -> id:   17, lifetime: 8483 ms, requests: 16 (0 ms of CPU time)
disconnect -> id:   18, lifetime: 7402 ms, requests: 789 (14 ms of CPU time)

I've also made the request itself available to copy in so you can get any part of it you want. An example of tracing the CreatePixmap and FreePixmap calls from a single client during a JDS session:

# ./client-pixmaps.d 1306 \\"/usr/bin/nautilus\\"
Creating pixmap: id: 0x1c00004 size: 1,1
Creating pixmap: id: 0x1c0001e size: 48,35
Creating pixmap: id: 0x1c0002a size: 48,48
Creating pixmap: id: 0x1c0002c size: 48,48
Freeing pixmap: id: 0x1c0001e
Freeing pixmap: id: 0x1c00025
Creating pixmap: id: 0x1c0002d size: 48,48
Creating pixmap: id: 0x1c0002f size: 48,48
Freeing pixmap: id: 0x1c0002a
Freeing pixmap: id: 0x1c0002c
Creating pixmap: id: 0x1c00030 size: 48,48
Creating pixmap: id: 0x1c00032 size: 48,48
Freeing pixmap: id: 0x1c0002d
Freeing pixmap: id: 0x1c0002f
Creating pixmap: id: 0x1c0003f size: 2,2
Creating pixmap: id: 0x1c00041 size: 1024,768
Creating pixmap: id: 0x1c00025 size: 48,35
Creating pixmap: id: 0x1c00047 size: 48,48
Creating pixmap: id: 0x1c00049 size: 48,48
Freeing pixmap: id: 0x1c00030
Freeing pixmap: id: 0x1c00032
Creating pixmap: id: 0x1c0004a size: 1024,768
Creating pixmap: id: 0x1c0004e size: 1,1
Freeing pixmap: id: 0x1c0004e
Creating pixmap: id: 0x1c00051 size: 1,1
Freeing pixmap: id: 0x1c00051
Freeing pixmap: id: 0x1c0004a
Creating pixmap: id: 0x1c00059 size: 1,1
Freeing pixmap: id: 0x1c00059
Creating pixmap: id: 0x1c0005b size: 109,496
Freeing pixmap: id: 0x1c0005b
Freeing pixmap: id: 0x1c00064
Creating pixmap: id: 0x1c00075 size: 1024,768
Freeing pixmap: id: 0x1c00075
Creating pixmap: id: 0x1c00064 size: 1024,768
Creating pixmap: id: 0x1c00086 size: 1024,768
Freeing pixmap: id: 0x1c00086
Creating pixmap: id: 0x1c00097 size: 199,384
Freeing pixmap: id: 0x1c00097
Creating pixmap: id: 0x1c000a0 size: 199,384
Freeing pixmap: id: 0x1c000a0
Creating pixmap: id: 0x1c000a9 size: 1024,768
Freeing pixmap: id: 0x1c000a9
Freeing pixmap: id: 0x1c00047
Freeing pixmap: id: 0x1c00049
Freeing pixmap: id: 0x1c00041

(This simple example just traces - but I'm sure you could enhance it or write a perl script to post-process the output to find pixmap leaks without much trouble, and even from just the trace I note nautilus is creating pixmaps the size of the entire root window (1024 x 768 due to the video card currently in this test machine) more often than I expected it to - all I did for this test was login and choose the menu item to log out.)

Please let me know if you find this useful, find more places that probes would be useful, or have other suggestions. You can reach me via e-mail, comments on this blog, or share with myself and other interested people in the OpenSolaris discussion forums/mailing lists for the dtrace and/or the X Window System.

Meanwhile, if you've got your own applications you'd like to add probes to like this, see Bart's blog on putting developer-defined DTrace probe points in an application, Alan H's blog on statically defined dtrace probes, and the Statically Defined Tracing for User Applications chapter of the DTrace manual.

[Technorati Tags: , , , ]

Thursday Jun 23, 2005

dtracing down an X server "memory leak"

A bug came into our group today complaining of a memory leak in X. The customer and the support engineer they'd worked with had actually done a good job of narrowing down a simple repeatable test case:

#!/bin/sh

while : ; do
        /usr/openwin/bin/xterm &
        sleep 1
        kill -KILL `pgrep xterm`
done
They had also run Xsun with libumem preloaded and had captured the stack trace that was making repeated allocations and not freeing them:
After 7 minutes:

> ::umausers
1040384 bytes for 254 allocations with data size 4096:
        libumem.so.1`umem_cache_alloc+0x13c
        libumem.so.1`umem_alloc+0x44
        libumem.so.1`malloc+0x2c
        Xalloc+0x18
        MapAlloc+0x14
        libcfb.so.1`cfbCreatePixmap+0x74
        ProcCreatePixmap+0x118
        Dispatch+0x17c
        main+0x788
        _start+0x108

After 11 minutes:

> ::umausers
3506176 bytes for 856 allocations with data size 4096:
        libumem.so.1`umem_cache_alloc+0x13c
        libumem.so.1`umem_alloc+0x44
        libumem.so.1`malloc+0x2c
        Xalloc+0x18
        MapAlloc+0x14
        libcfb.so.1`cfbCreatePixmap+0x74
        ProcCreatePixmap+0x118
        Dispatch+0x17c
        main+0x788
        _start+0x108

One of the other support engineers noticed the bug was reproducible only under CDE and not in the GNOME desktop. At that point they sent the bug to our group to investigate further.

I just happened to be working on a new toy at the time the bug came in: an Xserver provider for dtrace. (I've actually been working on this for a while, and am getting it ready to release for people to experiment with - watch this blog for news on that soon.) This seemed the perfect bug to put it to the test with.

Since it had already been narrowed down to CreatePixmap, the next question was who was calling CreatePixmap and which ones were freed afterwards. Unfortunately, there's about a half dozen different ways in the X server to free a resource like a pixmap, and not all of them involve X requests - for instance, when an X client connection is closed, the X server garbage collects the resources that were specific to that client. So what I really wanted was to trace pixmap allocations and frees and match those to the clients calling CreatePixmap. This actually ended up using a good cross section of the initial set of dtrace probes I'd created in the X server.

First, to record who was making calls, I used the client-connect, client-authenticate, and client-disconnect probes. You can probably guess what each of those probes do. For connect & disconnect I simply printed that the event happened and which client id was involved. (In the X server all clients are tracked by a simple integer id, which is usually the position of the entry in the array of client information pointers that has the details for that client.) At client authenticate time, when we actually figure out what client is connecting, I also printed the string representing the address of the client and the pid of the client for local clients. This looks like:

Xserver$1:::client-connect
{
	printf("\*\* Client Connect: id %d\\n", arg0);
}

Xserver$1:::client-auth
{
	printf("\*\* Client auth'ed: id %d => %s pid %d\\n",
		arg0, copyinstr(arg1), arg2);
}

Xserver$1:::client-disconnect
{
	printf("\*\* Client Disconnect: id %d\\n", arg0);
}
(The $1 refers to the first argument to the dtrace script when it's run, which I was using for the process id of the Xserver I wanted to trace. Unfortunately, since the X server provider is a User-Defined Dtrace Provider, the available data for each probe gets generic names, hence the arg0, arg1, etc. I'll provide a chart of what those really mean when I get the provider released.)

Next I needed to record when pixmaps were created and freed, so I used the resource-alloc and resource-free probes, and printed the XID for each alloc & free of a resource of type PIXMAP:

Xserver$1:::resource-alloc
/Xrestype[arg3] == "PIXMAP"/
{
	printf("\*\* Pixmap alloc: %08x\\n", arg0);
}


Xserver$1:::resource-free
/Xrestype[arg3] == "PIXMAP"/
{
	printf("\*\* Pixmap free:  %08x\\n", arg0);
}

Then I added probes for the requests coming in to the Xserver from clients, and whenever one came in for CreatePixmap, print when it started and finished, and which client it was from, so you could see which pixmap allocs came from those requests:

Xserver$1:::request-start
/Xrequest[arg0] == "X_CreatePixmap"/
{
    	printf("-> %s: client %d\\n", Xrequest[arg0], arg3);
}

Xserver$1:::request-done
/Xrequest[arg0] == "X_CreatePixmap"/
{
    	printf("<- %s: client %d\\n", Xrequest[arg0], arg3);
}

Put it all together, with a little bit of extra glue to cache the strings being copied through the kernel for better performance and reduced overhead, and the final script looks like this:

#!/usr/sbin/dtrace -qs

string Xrequest[uintptr_t];
string Xrestype[uintptr_t];

Xserver$1:::request-start
/Xrequest[arg0] == ""/
{
	Xrequest[arg0] = copyinstr(arg0);
}

Xserver$1:::resource-alloc
/arg3 != 0 && Xrestype[arg3] == ""/
{
	Xrestype[arg3] = copyinstr(arg3);
}


Xserver$1:::request-start
/Xrequest[arg0] == "X_CreatePixmap"/
{
    	printf("-> %s: client %d\\n", Xrequest[arg0], arg3);
}

Xserver$1:::request-done
/Xrequest[arg0] == "X_CreatePixmap"/
{
    	printf("<- %s: client %d\\n", Xrequest[arg0], arg3);
}

Xserver$1:::resource-alloc
/Xrestype[arg3] == "PIXMAP"/
{
	printf("\*\* Pixmap alloc: %08x\\n", arg0);
}


Xserver$1:::resource-free
/Xrestype[arg3] == "PIXMAP"/
{
	printf("\*\* Pixmap free:  %08x\\n", arg0);
}

Xserver$1:::client-connect
{
	printf("\*\* Client Connect: id %d\\n", arg0);
}

Xserver$1:::client-auth
{
	printf("\*\* Client auth'ed: id %d => %s pid %d\\n",
		arg0, copyinstr(arg1), arg2);
}

Xserver$1:::client-disconnect
{
	printf("\*\* Client Disconnect: id %d\\n", arg0);
}

So I sent my test machine back to the dtlogin screen, started the script with the pid of the running X server and logged into CDE. I ran a few iterations of the xterm loop and found this pattern repeated for each iteration:

\*\* Client Connect: id 17
\*\* Client auth'ed: id 17 => local host pid 20273
-> X_CreatePixmap: client 17
\*\* Pixmap alloc: 02200009
<- X_CreatePixmap: client 17
-> X_CreatePixmap: client 15
\*\* Pixmap alloc: 01e00180
<- X_CreatePixmap: client 15
-> X_CreatePixmap: client 15
\*\* Pixmap alloc: 01e00181
<- X_CreatePixmap: client 15
-> X_CreatePixmap: client 14
\*\* Pixmap alloc: 01c004c8
<- X_CreatePixmap: client 14
\*\* Pixmap free:  02200009
\*\* Client Disconnect: id 17
\*\* Pixmap free:  01e00180
\*\* Pixmap free:  01e00181

As you can see, the xterm was assigned client id 17 - it created a single pixmap itself which was freed, client id 15 created and freed two more, but for every xterm started, client id 14 created one that wasn't freed. Looking back at the beginning of the log to see who client id 14 and 15 were revealed they were two connections from the same process:

\*\* Client auth'ed: id 14 => local host pid 20089
\*\* Client auth'ed: id 15 => local host pid 20089

Not surprisingly, ps showed that pid 20089 was the CDE window manager, dtwm. At this point I could have just punted the bug to our CDE team, but since I had this all set up, I decided to go one level further. So far, nothing I'd done with dtrace couldn't have been done by rebuilding the Xserver with some carefully selected printf()'s added - of course, being able to do it without having to rebuild the X server is very nice for those who aren't X server engineers and don't rebuild the X server several thousand times a year. But the true power of dtrace is that it's a whole system tracing tool, so I can see what's going on in the Xserver & dtwm processes at the same time.

My first attempt was to simply use the dtrace system() call to call pstack on the dtwm pid every time we got a CreatePixmap call from it - unfortunately, XCreatePixmap() is an asynchronous call, and gets buffered up and sent later, so all that showed is that by the time the X server got the CreatePixmap call is that dtwm had finished sending requests and was waiting in the X event loop for new events to come in.

So I cheated. Since I knew the function in libX11 dtwm would be calling was XCreatePixmap, I simply traced calls to it from the process id (which I made a command line argument to the script):

pid$4::XCreatePixmap:entry
{
	ustack();
}
I also changed the script to only print the CreatePixmap requests from the two client ids representing dtwm:
Xserver$1:::request-start
/(arg3 == $2 || arg3 == $3) && Xrequest[arg0] == "X_CreatePixmap"/
{
    	printf("-> %s: client %d\\n", Xrequest[arg0], arg3);
}

This expanded the detail in the output to show where dtwm was creating the pixmaps:

\*\* Client Connect: id 19
\*\* Client auth'ed: id 19 => local host pid 20497
\*\* Pixmap alloc: 02600009

              libX11.so.4`XCreatePixmap
              libXm.so.4`_XmGetScaledPixmap+0x2fc
              libXm.so.4`Xm21GetPixmapByDepth+0x6b
              libXm.so.4`Xm21GetPixmap+0x31
              dtwm`MakeNamedIconPixmap+0xc4
              dtwm`ProcessWmHints+0x310
              dtwm`GetClientInfo+0x299
              dtwm`ManageWindow+0x18
              dtwm`WmDispatchWsEvent+0x6e
              dtwm`main+0x1f1
              dtwm`0x809772a

              libX11.so.4`XCreatePixmap
              libXm.so.4`_XmGetScaledPixmap+0x2fc
              libXm.so.4`Xm21GetPixmapByDepth+0x6b
              libXm.so.4`Xme21GetMask+0x43
              libXm.so.4`XmeGetMask+0x1b
              libDtSvc.so.1`_DtGetMask+0x45
              dtwm`MakeNamedIconPixmap+0xf2
              dtwm`ProcessWmHints+0x310
              dtwm`GetClientInfo+0x299
              dtwm`ManageWindow+0x18
              dtwm`WmDispatchWsEvent+0x6e
              dtwm`main+0x1f1
              dtwm`0x809772a
-> X_CreatePixmap: client 15
\*\* Pixmap alloc: 01e001d0
-> X_CreatePixmap: client 15
\*\* Pixmap alloc: 01e001d1

              libX11.so.4`XCreatePixmap
              dtwm`MakeIconPixmap+0x10c
              dtwm`MakeNamedIconPixmap+0x14a
              dtwm`ProcessWmHints+0x310
              dtwm`GetClientInfo+0x299
              dtwm`ManageWindow+0x18
              dtwm`WmDispatchWsEvent+0x6e
              dtwm`main+0x1f1
              dtwm`0x809772a
-> X_CreatePixmap: client 14
\*\* Pixmap alloc: 01c006f8
\*\* Pixmap free:  01e001d0
\*\* Pixmap free:  01e001d1
\*\* Pixmap free:  02600009
\*\* Client Disconnect: id 19
\*\* Client Connect: id 19

So now that I could point the CDE team to the exact function to look at, MakeIconPixmap in dtwm, I passed the bug on to them. (Since Sun's CDE team is overseas, they were asleep while all this happened, and should be getting to work soon, so I don't have an ending yet to this story, but since this is a blog, I can just post that later.)

[] [] [] []
About

Engineer working on Oracle Solaris and with the X.Org open source community.

Disclaimer

The views expressed on this blog are my own and do not necessarily reflect the views of Oracle, the X.Org Foundation, or anyone else.

See Also
Follow me on twitter

Search

Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today