A few months ago I sent mail to Sun's dtrace and X11 internal mailing lists about something I'd been playing with:
After trying to absorb as much as possible at last week's dtrace classes, I sat down with the manual and tried things out for a bit to help it sink in before I forgot it all. One of the chapters I stumbled across was the one on adding your own probes to your own applications, which reminded me of some conversations I'd had with various people (Bart and Mahmood and others I've probably forgotten). Caffeine was consumed, and one thing led to another, and after a push in the right direction from the dtrace-interest list...
# dtrace -l -n 'Xserver*:::' ID PROVIDER MODULE FUNCTION NAME 4 Xserver848 Xsun Dispatch request-start 5 Xserver848 Xsun Dispatch request-done 6 Xserver848 Xsun InitClient client-connect 7 Xserver848 Xsun CloseDownClient client-disconnect # dtrace -q -n 'Xserver*:::request-start { t = vtimestamp } \ Xserver*:::request-done { \ printf("Client: %3d Request: %20s Size: %5d Time: %10d\n", \ arg3, copyinstr(arg0), arg2, vtimestamp -t)}' Client: 4 Request: X_SetClipRectangles Size: 5 Time: 46736 Client: 4 Request: X_ChangeGC Size: 4 Time: 16307 Client: 4 Request: X_CopyArea Size: 7 Time: 68328 Client: 4 Request: X_SetClipRectangles Size: 5 Time: 26480 Client: 4 Request: X_ChangeGC Size: 4 Time: 8833 Client: 4 Request: X_PolyFillRectangle Size: 5 Time: 43680 Client: 4 Request: X_SetClipRectangles Size: 5 Time: 28506 Client: 4 Request: X_ChangeGC Size: 4 Time: 11920 Client: 4 Request: X_CopyArea Size: 7 Time: 46566
[sitting at the dtlogin screen, watching the cursor blink - it could probably cache the GC's and clip lists and use a little less bandwidth and CPU to blink that cursor]
This script waits for the next client to connect, keeps a count of all the requests it makes, then prints the count and exits when that client does:
#!/usr/sbin/dtrace -s string Xrequest[uintptr_t]; Xserver$1:::client-connect /clientid == 0/ { clientid = arg0; printf("\nClient %d connected - tracing requests from it\n", clientid); } Xserver$1:::client-disconnect /clientid == arg0/ { printf("\nClient %d disconnected - ending trace\n", clientid); exit(0); } Xserver$1:::request-start /Xrequest[arg0] == ""/ { Xrequest[arg0] = copyinstr(arg0); } Xserver$1:::request-start /arg3 == clientid/ { @counts[Xrequest[arg0]] = count(); }
This started out as a plaything to give me something to learn dtrace with, but it looks useful to me and could easily turn into something more if others see uses for it. Since some of these probe points are directly in the hottest code path of our benchmarks, we'd have to make sure that they don't affect our benchmark scores too much, but that shouldn't be much of a problem. Yes, you can do much of this with xscope, but this doesn't require tunnelling everything through the slow xscope proxy server and then finding some way to make sense of the huge output logs.
And I've just put probe points in the easiest and most obvious places - there's other interesting places we could put in probes - when a client does a grab for instance, or on outgoing events and/or errors.
So I guess this is also a request for comments - would others find this useful? What probe points would be useful to use in the X server and what data would you like to get out at those probe points?
For instance, this is what I picked to make available in the current probes:
request-start: request name/extension name, minor code (for extensions), request length, client id, client sequence
request-done: request name/extension name, minor code (for extensions), request length, client id, result code
client-connect & disconnect: client id
After some feedback from people, I made some refinements to the existing probes, and added some more probes, then made it available for internal use. Since then I've added a few more probes to help in tracking down the CDE window manager pixmap leak I blogged about earlier. And now, I've made it available outside Sun as well, so more people can try it out, see if the probes available are useful or if they should be modified or expanded, and give feedback on it. At some point I hope to integrate this directly into both the Xsun and Xorg servers delivered in Solaris, as well as into the open source Xorg server code, but I'd like to get some more experience with it from more people first.
After you install it, you should see this set of probes available: (the number after "Xserver" in the provider name is the process id of the currently running Xorg server process)
# dtrace -l -n 'Xserver*:::' ID PROVIDER MODULE FUNCTION NAME 4 Xserver1335 Xorg FreeClientNeverRetainResources resource-free 5 Xserver1335 Xorg FreeResourceByType resource-free 6 Xserver1335 Xorg FreeResource resource-free 7 Xserver1335 Xorg Dispatch request-done 8 Xserver1335 Xorg EstablishNewConnections client-connect 9 Xserver1335 Xorg AllocLbxClientConnection client-connect 10 Xserver1335 Xorg CloseDownRetainedResources client-disconnect 11 Xserver1335 Xorg CloseDownClient client-disconnect 12 Xserver1335 Xorg ProcKillClient client-disconnect 33598 Xserver1335 Xorg Dispatch client-disconnect 33667 Xserver1335 Xorg AddResource resource-alloc 33668 Xserver1335 Xorg Dispatch request-start 33669 Xserver1335 Xorg ClientAuthorized client-auth 33670 Xserver1335 Xorg FreeAllResources resource-free 33671 Xserver1335 Xorg FreeClientResources resource-free
One of the example scripts I've also made available, client-watch.d reports every client that connects and when it disconnects a count of how many X requests it made and how much time the X server spent processing them. For example, I captured this from logging into and then out of a Java Desktop System (GNOME 2.6) session on Solaris 10:
connect -> id: 8 client id -> id: 8 is from local process 1706 (/usr/bin/gnome-session) 1706: /usr/bin/gnome-session [...] connect -> id: 14 client id -> id: 14 is from local process 1831 (/usr/bin/nautilus) 1831: nautilus --no-default-window --sm-client-id default3 connect -> id: 15 client id -> id: 15 is from local process 1833 (/usr/bin/gnome-volcheck) 1833: gnome-volcheck -i 30 -z 3 -m cdrom,floppy,zip,jaz,dvdrom --sm-client-id default connect -> id: 16 client id -> id: 16 is from local process 1857 (/usr/lib/clock-applet) 1857: /usr/lib/clock-applet --oaf-activate-iid=OAFIID:GNOME_ClockApplet_Factory --oaf connect -> id: 17 client id -> id: 17 is from local process 1831 () 1831: nautilus --no-default-window --sm-client-id default3 connect -> id: 18 client id -> id: 18 is from local process 1859 (/usr/lib/wnck-applet) 1859: /usr/lib/wnck-applet --oaf-activate-iid=OAFIID:GNOME_Wncklet_Factory --oaf-ior- connect -> id: 19 client id -> id: 19 is from local process 1867 (/usr/lib/gnome-netstatus-applet) 1867: /usr/lib/gnome-netstatus-applet --oaf-activate-iid=OAFIID:GNOME_NetstatusApplet connect -> id: 20 client id -> id: 20 is from local process 1875 (/usr/lib/mixer_applet2) 1875: /usr/lib/mixer_applet2 --oaf-activate-iid=OAFIID:GNOME_MixerApplet_Factory --oa connect -> id: 21 client id -> id: 21 is from local process 1877 (/usr/lib/notification-area-applet) 1877: /usr/lib/notification-area-applet --oaf-activate-iid=OAFIID:GNOME_NotificationA [... logout of JDS ...] disconnect -> id: 16, lifetime: 8508 ms, requests: 364 (55 ms of CPU time) disconnect -> id: 12, lifetime: 9597 ms, requests: 979 (50 ms of CPU time) disconnect -> id: 20, lifetime: 6908 ms, requests: 286 (2 ms of CPU time) disconnect -> id: 19, lifetime: 7157 ms, requests: 424 (53 ms of CPU time) disconnect -> id: 14, lifetime: 9466 ms, requests: 940 (1532 ms of CPU time) disconnect -> id: 15, lifetime: 9389 ms, requests: 119 (0 ms of CPU time) disconnect -> id: 13, lifetime: 9542 ms, requests: 4544 (209 ms of CPU time) disconnect -> id: 21, lifetime: 6718 ms, requests: 308 (1 ms of CPU time) disconnect -> id: 8, lifetime: 15691 ms, requests: 1607 (1530 ms of CPU tim disconnect -> id: 7, lifetime: 15851 ms, requests: 16 (0 ms of CPU time) disconnect -> id: 10, lifetime: 11327 ms, requests: 130 (0 ms of CPU time) disconnect -> id: 17, lifetime: 8483 ms, requests: 16 (0 ms of CPU time) disconnect -> id: 18, lifetime: 7402 ms, requests: 789 (14 ms of CPU time)
I've also made the request itself available to copy in so you can get any part of it you want. An example of tracing the CreatePixmap and FreePixmap calls from a single client during a JDS session:
# ./client-pixmaps.d 1306 \"/usr/bin/nautilus\" Creating pixmap: id: 0x1c00004 size: 1,1 Creating pixmap: id: 0x1c0001e size: 48,35 Creating pixmap: id: 0x1c0002a size: 48,48 Creating pixmap: id: 0x1c0002c size: 48,48 Freeing pixmap: id: 0x1c0001e Freeing pixmap: id: 0x1c00025 Creating pixmap: id: 0x1c0002d size: 48,48 Creating pixmap: id: 0x1c0002f size: 48,48 Freeing pixmap: id: 0x1c0002a Freeing pixmap: id: 0x1c0002c Creating pixmap: id: 0x1c00030 size: 48,48 Creating pixmap: id: 0x1c00032 size: 48,48 Freeing pixmap: id: 0x1c0002d Freeing pixmap: id: 0x1c0002f Creating pixmap: id: 0x1c0003f size: 2,2 Creating pixmap: id: 0x1c00041 size: 1024,768 Creating pixmap: id: 0x1c00025 size: 48,35 Creating pixmap: id: 0x1c00047 size: 48,48 Creating pixmap: id: 0x1c00049 size: 48,48 Freeing pixmap: id: 0x1c00030 Freeing pixmap: id: 0x1c00032 Creating pixmap: id: 0x1c0004a size: 1024,768 Creating pixmap: id: 0x1c0004e size: 1,1 Freeing pixmap: id: 0x1c0004e Creating pixmap: id: 0x1c00051 size: 1,1 Freeing pixmap: id: 0x1c00051 Freeing pixmap: id: 0x1c0004a Creating pixmap: id: 0x1c00059 size: 1,1 Freeing pixmap: id: 0x1c00059 Creating pixmap: id: 0x1c0005b size: 109,496 Freeing pixmap: id: 0x1c0005b Freeing pixmap: id: 0x1c00064 Creating pixmap: id: 0x1c00075 size: 1024,768 Freeing pixmap: id: 0x1c00075 Creating pixmap: id: 0x1c00064 size: 1024,768 Creating pixmap: id: 0x1c00086 size: 1024,768 Freeing pixmap: id: 0x1c00086 Creating pixmap: id: 0x1c00097 size: 199,384 Freeing pixmap: id: 0x1c00097 Creating pixmap: id: 0x1c000a0 size: 199,384 Freeing pixmap: id: 0x1c000a0 Creating pixmap: id: 0x1c000a9 size: 1024,768 Freeing pixmap: id: 0x1c000a9 Freeing pixmap: id: 0x1c00047 Freeing pixmap: id: 0x1c00049 Freeing pixmap: id: 0x1c00041
(This simple example just traces - but I'm sure you could enhance it or write a perl script to post-process the output to find pixmap leaks without much trouble, and even from just the trace I note nautilus is creating pixmaps the size of the entire root window (1024 x 768 due to the video card currently in this test machine) more often than I expected it to - all I did for this test was login and choose the menu item to log out.)
Please let me know if you find this useful, find more places that probes would be useful, or have other suggestions. You can reach me via e-mail, comments on this blog, or share with myself and other interested people in the OpenSolaris discussion forums/mailing lists for the dtrace and/or the X Window System.
Meanwhile, if you've got your own applications you'd like to add probes to like this, see Bart's blog on putting developer-defined DTrace probe points in an application, Alan H's blog on statically defined dtrace probes, and the Statically Defined Tracing for User Applications chapter of the DTrace manual.
A Brown math PhD?
If so, Hi!
- Randy
I am trying to use Solaris Express and having an annoying problem in which Xorg freezes on my i85x Intel video chipset. Can you give me some guidance as how to try to debug it? Can dtrace help me?
Do you have any other idea I can try?
Thanks in advance.
-- Douglas