dtracing down an X server "memory leak"

A bug came into our group today complaining of a memory leak in X. The customer and the support engineer they'd worked with had actually done a good job of narrowing down a simple repeatable test case:

#!/bin/sh

while : ; do
        /usr/openwin/bin/xterm &
        sleep 1
        kill -KILL `pgrep xterm`
done
They had also run Xsun with libumem preloaded and had captured the stack trace that was making repeated allocations and not freeing them:
After 7 minutes:

> ::umausers
1040384 bytes for 254 allocations with data size 4096:
        libumem.so.1`umem_cache_alloc+0x13c
        libumem.so.1`umem_alloc+0x44
        libumem.so.1`malloc+0x2c
        Xalloc+0x18
        MapAlloc+0x14
        libcfb.so.1`cfbCreatePixmap+0x74
        ProcCreatePixmap+0x118
        Dispatch+0x17c
        main+0x788
        _start+0x108

After 11 minutes:

> ::umausers
3506176 bytes for 856 allocations with data size 4096:
        libumem.so.1`umem_cache_alloc+0x13c
        libumem.so.1`umem_alloc+0x44
        libumem.so.1`malloc+0x2c
        Xalloc+0x18
        MapAlloc+0x14
        libcfb.so.1`cfbCreatePixmap+0x74
        ProcCreatePixmap+0x118
        Dispatch+0x17c
        main+0x788
        _start+0x108

One of the other support engineers noticed the bug was reproducible only under CDE and not in the GNOME desktop. At that point they sent the bug to our group to investigate further.

I just happened to be working on a new toy at the time the bug came in: an Xserver provider for dtrace. (I've actually been working on this for a while, and am getting it ready to release for people to experiment with - watch this blog for news on that soon.) This seemed the perfect bug to put it to the test with.

Since it had already been narrowed down to CreatePixmap, the next question was who was calling CreatePixmap and which ones were freed afterwards. Unfortunately, there's about a half dozen different ways in the X server to free a resource like a pixmap, and not all of them involve X requests - for instance, when an X client connection is closed, the X server garbage collects the resources that were specific to that client. So what I really wanted was to trace pixmap allocations and frees and match those to the clients calling CreatePixmap. This actually ended up using a good cross section of the initial set of dtrace probes I'd created in the X server.

First, to record who was making calls, I used the client-connect, client-authenticate, and client-disconnect probes. You can probably guess what each of those probes do. For connect & disconnect I simply printed that the event happened and which client id was involved. (In the X server all clients are tracked by a simple integer id, which is usually the position of the entry in the array of client information pointers that has the details for that client.) At client authenticate time, when we actually figure out what client is connecting, I also printed the string representing the address of the client and the pid of the client for local clients. This looks like:

Xserver$1:::client-connect
{
	printf("\*\* Client Connect: id %d\\n", arg0);
}

Xserver$1:::client-auth
{
	printf("\*\* Client auth'ed: id %d => %s pid %d\\n",
		arg0, copyinstr(arg1), arg2);
}

Xserver$1:::client-disconnect
{
	printf("\*\* Client Disconnect: id %d\\n", arg0);
}
(The $1 refers to the first argument to the dtrace script when it's run, which I was using for the process id of the Xserver I wanted to trace. Unfortunately, since the X server provider is a User-Defined Dtrace Provider, the available data for each probe gets generic names, hence the arg0, arg1, etc. I'll provide a chart of what those really mean when I get the provider released.)

Next I needed to record when pixmaps were created and freed, so I used the resource-alloc and resource-free probes, and printed the XID for each alloc & free of a resource of type PIXMAP:

Xserver$1:::resource-alloc
/Xrestype[arg3] == "PIXMAP"/
{
	printf("\*\* Pixmap alloc: %08x\\n", arg0);
}


Xserver$1:::resource-free
/Xrestype[arg3] == "PIXMAP"/
{
	printf("\*\* Pixmap free:  %08x\\n", arg0);
}

Then I added probes for the requests coming in to the Xserver from clients, and whenever one came in for CreatePixmap, print when it started and finished, and which client it was from, so you could see which pixmap allocs came from those requests:

Xserver$1:::request-start
/Xrequest[arg0] == "X_CreatePixmap"/
{
    	printf("-> %s: client %d\\n", Xrequest[arg0], arg3);
}

Xserver$1:::request-done
/Xrequest[arg0] == "X_CreatePixmap"/
{
    	printf("<- %s: client %d\\n", Xrequest[arg0], arg3);
}

Put it all together, with a little bit of extra glue to cache the strings being copied through the kernel for better performance and reduced overhead, and the final script looks like this:

#!/usr/sbin/dtrace -qs

string Xrequest[uintptr_t];
string Xrestype[uintptr_t];

Xserver$1:::request-start
/Xrequest[arg0] == ""/
{
	Xrequest[arg0] = copyinstr(arg0);
}

Xserver$1:::resource-alloc
/arg3 != 0 && Xrestype[arg3] == ""/
{
	Xrestype[arg3] = copyinstr(arg3);
}


Xserver$1:::request-start
/Xrequest[arg0] == "X_CreatePixmap"/
{
    	printf("-> %s: client %d\\n", Xrequest[arg0], arg3);
}

Xserver$1:::request-done
/Xrequest[arg0] == "X_CreatePixmap"/
{
    	printf("<- %s: client %d\\n", Xrequest[arg0], arg3);
}

Xserver$1:::resource-alloc
/Xrestype[arg3] == "PIXMAP"/
{
	printf("\*\* Pixmap alloc: %08x\\n", arg0);
}


Xserver$1:::resource-free
/Xrestype[arg3] == "PIXMAP"/
{
	printf("\*\* Pixmap free:  %08x\\n", arg0);
}

Xserver$1:::client-connect
{
	printf("\*\* Client Connect: id %d\\n", arg0);
}

Xserver$1:::client-auth
{
	printf("\*\* Client auth'ed: id %d => %s pid %d\\n",
		arg0, copyinstr(arg1), arg2);
}

Xserver$1:::client-disconnect
{
	printf("\*\* Client Disconnect: id %d\\n", arg0);
}

So I sent my test machine back to the dtlogin screen, started the script with the pid of the running X server and logged into CDE. I ran a few iterations of the xterm loop and found this pattern repeated for each iteration:

\*\* Client Connect: id 17
\*\* Client auth'ed: id 17 => local host pid 20273
-> X_CreatePixmap: client 17
\*\* Pixmap alloc: 02200009
<- X_CreatePixmap: client 17
-> X_CreatePixmap: client 15
\*\* Pixmap alloc: 01e00180
<- X_CreatePixmap: client 15
-> X_CreatePixmap: client 15
\*\* Pixmap alloc: 01e00181
<- X_CreatePixmap: client 15
-> X_CreatePixmap: client 14
\*\* Pixmap alloc: 01c004c8
<- X_CreatePixmap: client 14
\*\* Pixmap free:  02200009
\*\* Client Disconnect: id 17
\*\* Pixmap free:  01e00180
\*\* Pixmap free:  01e00181

As you can see, the xterm was assigned client id 17 - it created a single pixmap itself which was freed, client id 15 created and freed two more, but for every xterm started, client id 14 created one that wasn't freed. Looking back at the beginning of the log to see who client id 14 and 15 were revealed they were two connections from the same process:

\*\* Client auth'ed: id 14 => local host pid 20089
\*\* Client auth'ed: id 15 => local host pid 20089

Not surprisingly, ps showed that pid 20089 was the CDE window manager, dtwm. At this point I could have just punted the bug to our CDE team, but since I had this all set up, I decided to go one level further. So far, nothing I'd done with dtrace couldn't have been done by rebuilding the Xserver with some carefully selected printf()'s added - of course, being able to do it without having to rebuild the X server is very nice for those who aren't X server engineers and don't rebuild the X server several thousand times a year. But the true power of dtrace is that it's a whole system tracing tool, so I can see what's going on in the Xserver & dtwm processes at the same time.

My first attempt was to simply use the dtrace system() call to call pstack on the dtwm pid every time we got a CreatePixmap call from it - unfortunately, XCreatePixmap() is an asynchronous call, and gets buffered up and sent later, so all that showed is that by the time the X server got the CreatePixmap call is that dtwm had finished sending requests and was waiting in the X event loop for new events to come in.

So I cheated. Since I knew the function in libX11 dtwm would be calling was XCreatePixmap, I simply traced calls to it from the process id (which I made a command line argument to the script):

pid$4::XCreatePixmap:entry
{
	ustack();
}
I also changed the script to only print the CreatePixmap requests from the two client ids representing dtwm:
Xserver$1:::request-start
/(arg3 == $2 || arg3 == $3) && Xrequest[arg0] == "X_CreatePixmap"/
{
    	printf("-> %s: client %d\\n", Xrequest[arg0], arg3);
}

This expanded the detail in the output to show where dtwm was creating the pixmaps:

\*\* Client Connect: id 19
\*\* Client auth'ed: id 19 => local host pid 20497
\*\* Pixmap alloc: 02600009

              libX11.so.4`XCreatePixmap
              libXm.so.4`_XmGetScaledPixmap+0x2fc
              libXm.so.4`Xm21GetPixmapByDepth+0x6b
              libXm.so.4`Xm21GetPixmap+0x31
              dtwm`MakeNamedIconPixmap+0xc4
              dtwm`ProcessWmHints+0x310
              dtwm`GetClientInfo+0x299
              dtwm`ManageWindow+0x18
              dtwm`WmDispatchWsEvent+0x6e
              dtwm`main+0x1f1
              dtwm`0x809772a

              libX11.so.4`XCreatePixmap
              libXm.so.4`_XmGetScaledPixmap+0x2fc
              libXm.so.4`Xm21GetPixmapByDepth+0x6b
              libXm.so.4`Xme21GetMask+0x43
              libXm.so.4`XmeGetMask+0x1b
              libDtSvc.so.1`_DtGetMask+0x45
              dtwm`MakeNamedIconPixmap+0xf2
              dtwm`ProcessWmHints+0x310
              dtwm`GetClientInfo+0x299
              dtwm`ManageWindow+0x18
              dtwm`WmDispatchWsEvent+0x6e
              dtwm`main+0x1f1
              dtwm`0x809772a
-> X_CreatePixmap: client 15
\*\* Pixmap alloc: 01e001d0
-> X_CreatePixmap: client 15
\*\* Pixmap alloc: 01e001d1

              libX11.so.4`XCreatePixmap
              dtwm`MakeIconPixmap+0x10c
              dtwm`MakeNamedIconPixmap+0x14a
              dtwm`ProcessWmHints+0x310
              dtwm`GetClientInfo+0x299
              dtwm`ManageWindow+0x18
              dtwm`WmDispatchWsEvent+0x6e
              dtwm`main+0x1f1
              dtwm`0x809772a
-> X_CreatePixmap: client 14
\*\* Pixmap alloc: 01c006f8
\*\* Pixmap free:  01e001d0
\*\* Pixmap free:  01e001d1
\*\* Pixmap free:  02600009
\*\* Client Disconnect: id 19
\*\* Client Connect: id 19

So now that I could point the CDE team to the exact function to look at, MakeIconPixmap in dtwm, I passed the bug on to them. (Since Sun's CDE team is overseas, they were asleep while all this happened, and should be getting to work soon, so I don't have an ending yet to this story, but since this is a blog, I can just post that later.)

[] [] [] []
Comments:

Hi Alan, Very good use case showing the power of dtrace. Can you shed a little light on time involved to produce this output? On a scale of 1 to 10 (10 being Bryan's level of dtrace knowledge) how would you rate yourself in the dtrace jedi arts? I'm trying to communicate to my customer the knowledge base barriers to entry so they can perform the same level of magic with their apps as you have done in this scenario. Thanks in advance...

Posted by Wayne Abbott on June 24, 2005 at 12:42 AM PDT #

Post a Comment:
Comments are closed for this entry.
About

Engineer working on Oracle Solaris and with the X.Org open source community.

Disclaimer

The views expressed on this blog are my own and do not necessarily reflect the views of Oracle, the X.Org Foundation, or anyone else.

See Also
Follow me on twitter

Search

Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today