Memory Leak Detection with libumem

I recently had the opportunity to do some memory leak detection with libumem, so I decided to share some thoughts and examples on its use.  The issue I was working on was related to a call for help from a colleague who was working primarily on Linux and OS X.  His application had a memory footprint that was growing over time, and he had used Valgrind and dtrace (on OS X) to try to find a leak, but had reached a dead end.  I offered to run the application on Solaris and use the libumem(3LIB) library and mdb(1) to search for leaks, and was able to quickly find a leak in the open source SIGAR library that he was using with his application.  For more details and current status on the specific leak, check out bug SIGAR-132.  For this discussion, I'll focus primarily on a simple example program to highlight libumem.

What is libumem?

The libumem(3LIB) library is a highly scalable memory allocation library that supports the standard malloc(3C) family of functions as well as its own umem_alloc(3MALLOC) functions and umem_cache_create(3MALLOC) object caching services.  It also provides debugging support including detection of memory leaks and many other common programming  errors.  The debugging capabilities are described in umem_debug(3MALLOC).  This discussion will focus primarily on using the debugging capabilities with standard malloc(3C).  For a performance comparison between libumem and several other memory allocators, have a look at Tim Cook's memory allocator bake-off from a few weeks back.

What is a memory leak?

Before I get started, let me clarify what I mean by a memory leak.  To me, a pure memory leak occurs when you allocate memory but then fail to retain a pointer to that memory.  For example, you might overwrite a pointer with a new value, or allow an automatic variable to be discarded without first freeing the memory that it references.  Without a pointer to the memory, you can't use it any more or free it, and it has leaked out of your control.  Some people also refer to situations where memory is held longer than necessary as a memory leak, but to me that is a memory hog, not a memory leak.  The debugging tools in libumem can help with both issues, but the techniques are different.  I will focus on what I consider a pure memory leak for today.

How do I enable libumem?

If you are compiling a new application and want libumem as your default memory allocator, just add -lumem to your compile or link command.  If you want to use any of the libumem specific functions, you should also #include <umem.h> in your program.  If you want to enable libumem on an existing application, you can use the LD_PRELOAD environment variable (or LD_PRELOAD_64 for 64 bit applications) to interpose the library on the application and cause it to use the malloc() family of functions from libumem instead of libc.

For example with sh/ksh/bash:

LD_PRELOAD=libumem.so your_command

with csh/tcsh:

(setenv LD_PRELOAD libumem.so; your_command)

To confirm that you are using libumem, you can use the pldd(1) command to list the dynamic libraries being used by your application.  For example:

$ pgrep -l my_app
 2239 my_app
$ pldd 2239
2239:    my_app
/lib/libumem.so.1
/usr/lib/libc/libc_hwcap2.so.1
$

How do I enable libumem debugging?

As described in umem_debug(3MALLOC), the activation of run-time debugging features is controlled by the UMEM_DEBUG and UMEM_LOGGING environment variables.  For memory leak detection, all we need to enable is the audit feature of UMEM_DEBUG.

For example, with sh/ksh/bash:

LD_PRELOAD=libumem.so UMEM_DEBUG=audit your_command

with csh/tcsh:

(setenv LD_PRELOAD libumem.so; setenv UMEM_DEBUG audit; your_command)

How do I access the debug data?

The libumem library provides a set of mdb(1) dcmds to inspect the debug data collected while the program runs.  To use the dcmds, you can either run your program under the control of mdb, attach to the program with mdb, or generate a core dump (for example with gcore(1)) and examine the dump with mdb.  The latter is the simplest, and looks like this:

$ pgrep -l my_app
1603 my_app
$ gcore 1603
gcore: core.1603 dumped
$ mdb core.1603
Loading modules: [ libumem.so.1 ld.so.1 ]
>

The commands above assume that your program runs long enough for you to generate the core dump, and that the memory leak has been triggered before the core dump is generated.  For a fast running program or to examine the image just before program exit, you can do the following:

$ LD_PRELOAD=libumem.so UMEM_DEBUG=audit mdb ./your_app
> ::sysbp _exit
> ::run
mdb: stop on entry to _exit
mdb: target stopped at:
0xfee3301a: addb %al,(%eax)
> ::load libumem.so.1
>

Once you are in mdb, you can get a listing of the libumem dcmds by running ::dmods -l libumem.so.1 and can get help on an individual dcmd with ::help dcmd.  For example:

> ::dmods -l libumem.so.1

libumem.so.1
dcmd allocdby - given a thread, print its allocated buffers
dcmd bufctl - print or filter a bufctl
dcmd bufctl_audit - print a bufctl_audit
dcmd findleaks - search for potential memory leaks
...
> ::help findleaks

NAME
findleaks - search for potential memory leaks

SYNOPSIS
[ addr ] ::findleaks [-dfv]

DESCRIPTION

Does a conservative garbage collection of the heap in order to find
potentially leaked buffers. Similar leaks are coalesced by stack
trace, with the oldest leak picked as representative. The leak
table is cached between invocations.
...

You can now use the various dcmds to look for memory leaks and other common problems with memory allocation, or to simply better understand how your application uses memory.

A complete example

The attached mem_leak.c program includes three simple memory leaks.  The first is within main(), where we overwrite a pointer after allocating memory.  The second is within a function, where we allow an automatic variable to be discarded before freeing memory that it references.  The last is a nested function call that includes a logic bug that causes it to return early, also allowing an automatic variable to be discarded before freeing memory that it references.

To get started, compile the program and start it up with libumem and its audit feature enabled:

$ /opt/SunStudioExpress/bin/cc -o mem_leak mem_leak.c
$ LD_PRELOAD=libumem.so UMEM_DEBUG=audit ./mem_leak
Memory allocated, hit enter to continue:
Memory freed, hit enter to exit:

With the program waiting at the second prompt, go to another window to generate a core dump and examine the results with mdb:

$ pgrep -l mem_leak
1714 mem_leak
$ gcore 1714
gcore: core.1714 dumped
$ mdb core.1714
Loading modules: [ libumem.so.1 ld.so.1 ]
> ::findleaks
CACHE LEAKED BUFCTL CALLER
08072c90 1 0807dd08 buf_create+0x12
08072c90 1 0807dca0 func_leak+0x12
08072c90 1 0807dbd0 main+0x12
------------------------------------------------------------------------
Total 3 buffers, 3456 bytes
>

The output from ::findleaks shows that we have leaked three memory buffers, as expected, and we can now obtain a stack trace for each by running ::bufctl_audit against each bufctl address:

> 0807dbd0::bufctl_audit
ADDR BUFADDR TIMESTAMP THREAD
CACHE LASTLOG CONTENTS
807dbd0 807bb00 f5c5bb73837 1
8072c90 0 0
libumem.so.1`umem_cache_alloc_debug+0x144
libumem.so.1`umem_cache_alloc+0x19a
libumem.so.1`umem_alloc+0xcd
libumem.so.1`malloc+0x2a
main+0x12
_start+0x7d

> 0807dca0::bufctl_audit
ADDR BUFADDR TIMESTAMP THREAD
CACHE LASTLOG CONTENTS
807dca0 807b180 f5c5bb74120 1
8072c90 0 0
libumem.so.1`umem_cache_alloc_debug+0x144
libumem.so.1`umem_cache_alloc+0x19a
libumem.so.1`umem_alloc+0xcd
libumem.so.1`malloc+0x2a
func_leak+0x12
main+0x2f
_start+0x7d

> 0807dd08::bufctl_audit
ADDR BUFADDR TIMESTAMP THREAD
CACHE LASTLOG CONTENTS
807dd08 807acc0 f5c5bb7446e 1
8072c90 0 0
libumem.so.1`umem_cache_alloc_debug+0x144
libumem.so.1`umem_cache_alloc+0x19a
libumem.so.1`umem_alloc+0xcd
libumem.so.1`malloc+0x2a
buf_create+0x12
nested_leak_l3+0xb
nested_leak_l2+8
nested_leak_l1+8
nested_leak+8
main+0x34
_start+0x7d

>

Note that if you have leaked any "oversized" allocations (currently anything over 16k) the output will include a list of these leaked buffers including a byte count and vmem_seg address.  You can obtain the stack traces for these buffer allocations by running ::vmem_seg -v against each vmem_seg address.

Looking at the stack traces, the entry just below libumem.so.1`malloc in each stack is the function that allocated the leaked buffer.  If it isn't clear which malloc() got leaked, it may help to use the ::dis dcmd to disassemble the code.  For example:

> main+0x12::dis
main: pushl %ebp
main+1: movl %esp,%ebp
main+3: subl $0x28,%esp
main+6: pushl $0x0
main+8: pushl $0x400
main+0xd: call -0x256 <PLT=libumem.so.1`malloc>
main+0x12: addl $0x8,%esp
main+0x15: movl %eax,-0x8(%ebp)
main+0x18: pushl $0x0
main+0x1a: pushl $0x400
main+0x1f: call -0x268 <PLT=libumem.so.1`malloc>
main+0x24: addl $0x8,%esp
main+0x27: movl %eax,-0x8(%ebp)
main+0x2a: call -0xff <func_leak>
main+0x2f: call -0x44 <nested_leak>
main+0x34: pushl $0x0
main+0x36: pushl $0x8050e70
>

The example above shows that there were two calls to malloc() near the beginning of main(), and we have leaked the memory allocated by the first one.  Note that the second malloc() is not reported as a leak even if the core is generated while the buffer is still active.  That is because we still have a reference to the buffer and it has not actually been leaked.  Whether the buffer is eventually freed doesn't really matter.  As long as we have a reference to the buffer at the time the core is generated or mdb examines the running program, it will not be reported as a leak.

Even with the information obtained from libumem and mdb, you will still have some detective work to do to determine exactly why you have leaked a particular buffer.  However, knowing which buffer has been leaked, and the point in the code where it was allocated, is more than half the battle.

Keep in mind that the allocation of the leaked memory may occur in a system library, not in the code for your program. This could mean you have found a leak in a system library, but more likely it means that you requested an object from the library and were supposed to call another function to discard that object when you were finished with it. For example, in the SIGAR leak that I mentioned at the start of this discussion, the leaks were related to buffers allocated by libnsl, but the real bug was a failure by sigar_rpc_ping() to call clnt_destroy(3NSL) to clean up a CLIENT handle it had created with clntudp_create(3NSL).

Comments:

Can i use a similar technique on rhel ? We are debugging leakages too ?

Posted by Mohan on August 19, 2013 at 08:34 AM PDT #

@Mohan - I am not aware of any tools on Linux that are as simple and effective as libumem for memory leak detection.

Posted by David Lutz on August 20, 2013 at 06:43 AM PDT #

Post a Comment:
  • HTML Syntax: NOT allowed
About

user12610824

Search

Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today