Finding application memory leaks
By user12614486 on Sep 24, 2007
Today, I unlocked my screensaver and found that I couldn't launch anything from the GNOME panel launchbar. Everything reported fork failing from "out of memory".
A coworker had had a similar problem, and it went away when, in trying to debug it, we had to kill gnome-panel.
Well, let's look at gnome-panel's pmap output: gee, its heap is over 1GB. (!)
So, that's probably a memory leak. How to find it? With some quick IRCs to some helpful GNOME engineers (GMan and Laca, to the rescue, as so often happens), I discover that I can remove gnome-panel from my session so it doesn't auto-restart when I kill it, by using
(from a command line, of course, because...I can't launch anything). First,
though, I copied down its current arguments for restart, which were
--session default1. Then, I'm ready to try out libumem.
Adam has a wonderful introduction, Jonathan Adams wrote a really nice reference on the topic of libumem and mdb, and there are plenty of other examples of use around. My particular method was to enter the command
LD_PRELOAD=/usr/lib/libumem.so.1 UMEM_DEBUG=default gnome-panel --session default1
and then use
mdb -p $(pgrep gnome-panel) to start up an mdb and access libumem's debugging features.
::findleaks immediately showed some false hits, and ::umem_verify was clean, so I waited for a little while and saw no further leaks in ::findleaks. Then I started a terminal from my custom terminal launcher on the panel. Bam! New leaks from ::findleaks! After messing about with manual ::bufctl_audits on the bufctl addresses in ::findleaks output, I decided that ::help findleaks might be handy, and indeed, it showed the -d option which put it all together, showing stack backtraces for the leaked buffers, which allowed us to examine the code and spot the leaks.
GMan and I were quickly able to find a bug in libgnome-desktop and another in gnome-panel itself (a fix to one of the Sun patches for gnome-panel, in panel_lockdown_is_forbidden_key_file). I was able to verify that all the leaked buffers sprang from one of those two sources.
I can't even begin to tell you how much faster and more fun that was with libumem than without. If you have any sort of allocation problem at all, you should seriously consider libumem debugging...free with Solaris.