The implementation of ::findleaks

Now that OpenSolaris is available, it's time to explain how it all works. I thought I'd start with some of the mdb(1) dcmds I've worked on, since they are relatively self-contained, and have very strong interface boundaries.

I'll start with my favorite dcmd, ::findleaks, a memory leak detector originally written by Bryan Cantrill, which I re-factored substantially late in Solaris 10. There were a few reasons the refactoring was necessary:

  • When I'd done the original ::findleaks for libumem implementation, I simply copied the original leaky.c and re-wrote it into submission. This worked fine, but was an additional maintenance burden; two dissimilar copies of the same code is to be avoided if possible.
  • The original code reported oversize leaks in an obscure way:
    findleaks: Oversize leak at seg 0x12345678!
    which, unless you are very familiar with the way the allocator works internally, was unlikely to lead you to the stack trace or size of the leaked buffer. (for the curious, 12345678$<vmem_seg was the necessary incantation in the old days)
  • The original ::findleaks algorithm was designed with running in a user process in mind. In particular, it assumed it had a lot of memory to play with. Unfortunately, with the coming of kmdb(1), memory space for dcmds was getting very tight. For ::findleaks to work under kmdb(1), it needed substantial belt-tightening.
  • There were some enhancements I wanted to make; in particular, there was no simple way to dump all of the leaked stack traces. Internally, the "leakbot" tool (also written by Bryan) automatically extracted the set of stack traces, but this didn't help people without access to that tool.
So I started a side project to re-work ::findleaks. The following bugs were part of it:
  4840780 ::findleaks oversized leak reporting needs overhaul
  4849669 ::findleaks' cleanup code needs work
  4931271 kmem walkers and ::findleaks use large amounts of memory
  5030758 ::findleaks should have an option to display the stack traces
but this note only covers the generic ::findleaks implementation.

The files of ::findleaks

Let's start with a lay of the land. The following files encompass the implementation of ::findleaks (to shorten things, I'm assuming a prefix of usr/src/cmd/mdb/common for all file references):

.../modules/genunix/leaky.h The dcmd, dcmd help, and walker function declarations, used by .../modules/genunix/genunix.c and .../modules/libumem/libumem.c to create the dcmd linkage for mdb(1).
.../modules/genunix/leaky_impl.h An implementation header, which defines (and documents, via a large block comment) the interface between the main engine, .../modules/genunix/leaky.c and the two target implementations.
.../modules/genunix/leaky.c The common ::findleaks engine.
The two target implementations, one for the kernel, one for libumem(3lib).

I'll use the term "target" when talking about the other side of ::findleaks's implementation. The non-static target interfaces all start with leaky_subr_.

The target interface is a link-time, functional binding; during the compilation process, alternate versions of the target routines are linked against essentially the same "generic" framework. This is less flexible than doing some kind of dynamic binding (i.e. using function pointers), but is sufficiently general as long as only one target interfaces is required for a given dmod.

The public interface

The top layer of interface consists of the dcmds (D commands) and walkers exported by the genunix and libumem dmods. In the genunix dmod, these are defined in .../modules/genunix/genunix.c:

static const mdb_dcmd_t dcmds[] = {
	/\* from leaky.c + leaky_subr.c \*/
	{ "findleaks", FINDLEAKS_USAGE,
	    "search for potential kernel memory leaks", findleaks,
	    findleaks_help },
static const mdb_walker_t walkers[] = {
	/\* from leaky.c + leaky_subr.c \*/
	{ "leak", "given a leaked bufctl or vmem_seg, find leaks w/ same "
	    "stack trace",
		leaky_walk_init, leaky_walk_step, leaky_walk_fini },
	{ "leakbuf", "given a leaked bufctl or vmem_seg, walk buffers for "
	    "leaks w/ same stack trace",
		leaky_walk_init, leaky_buf_walk_step, leaky_walk_fini },
(the structures used are part of the Debugger Module Linkage, defined in the Solaris Modular Debugger Guide)
The walkers are straightforward; they just walk the cached leak table. ::findleaks is the main event.

First up: initialize, estimate, slurp in the buffers

When ::findleaks is invoked, mdb(1) calls findleaks(), which validates its arguments, calls leaky_cleanup() to clean up any interrupted state, then checks if there is cached data. If not, we start the run in earnest.

The first call into the target interface is here:

	if ((ret = leaky_subr_estimate(&est)) != DCMD_OK)
		return (ret);
This calls over to here or here, which first check that finding memory leaks is possible (i.e. savecore -L dumps are not a consistent snapshot, so we refuse to run on them), then calculate an upper bound on the number of allocated segments in the system.

Shiny new estimate in hand, ::findleaks allocates a (possibly huge) array of leak_mtab_ts, calls into the target to fill it, and sorts the result:

	lk_mtab = mdb_zalloc(est \* sizeof (leak_mtab_t), UM_SLEEP | UM_GC);
	lmp = lk_mtab;

	if ((ret = leaky_subr_fill(&lmp)) != DCMD_OK)
		return (ret);

	lk_nbuffers = lmp - lk_mtab;

	qsort(lk_mtab, lk_nbuffers, sizeof (leak_mtab_t), leaky_mtabcmp);
leaky_subr_fill(), defined here and here, fills in the array with the details of every allocated buffer in the system. Each entry has a base, limit, and control pointer, the last being completely target-defined. The qsort(3C) call sorts the array by base address.

Stage two: find everything reachable

::findleaks is, at its heart, a simulation of a conservative garbage collection run. All of the lk_mtab entries start in the "unmarked" state; that is, they are not known to be reachable. We call into the target:

	if ((ret = leaky_subr_run()) != DCMD_OK)
		return (ret);
( here and here) to scan the "root set"; that is, all statically defined objects. For each such object, the target implementation calls leaky_grep() or one of its variants. These look for pointers, and mark each lk_mtab entry they find, along with any (recursively) reachable entries. When the run is finished, any unmarked buffers must be unreachable leaks.

While the interface for leaky_grep() has a simple description, it invites a spectrum of implementation possibilities, each with different trade-offs in speed, memory usage, and maintainability.[1] Any findleaks run spends virtually all of its time and energy in leaky_grep(), so it is worth the effort of optimizing. The most straightforward implementation is something like:

leaky_grep(addr, size)
	allocate a buffer of size bytes
	read [addr, size) into the buffer
	for each pointer in buffer {
		look up mtab entry
		if it is marked, continue
		mark it
		leaky_grep(mtab's base, mtab's size);
	free the buffer
The immediate problem with this is the recursive depth (hundreds of thousands of call frames are typical in a normal dump.)[2] The next is memory usage; each recursive step requires additional storage for the entire contents of a memory buffer; with hundreds of thousands of recursive invocations occurring at once, that quickly adds up to tens of megabytes of peak storage.

Instead of storing a buffer for each recursive step, the current algorithm uses a single small (16 kbyte) buffer. The following comment describes how we use the buffer:

		 \* The buffer looks like:  ('+'s are unscanned data)
		 \* -----------------------------++++++++++++++++
		 \* |				|		|
		 \* buf				cur		end
		 \* cur scans forward.  When we encounter a new buffer, and
		 \* it will fit behind "cur", we read it in and back up cur,
		 \* processing it immediately.
The idea is that we read in a chunk of data at the end of the buffer, then start scanning for pointers. When we find a pointer to an unmarked mtab entry, we mark it, then either read it in to the buffer immediately, or put it on a stack of unfinished work. Since each stack entry only has to name an entry in the lk_mtab array, the stack is quite compact. Small buffers will almost always be processed in line, and the overall memory usage is quite small; the only large allocation is the lk_mtab array itself.

Stage three: collect the leaks

Once leaky_subr_run() returns successfully, we're on the home stretch. First, we scan the lk_mtab array for unmarked buffers, and report them to the target implementation:

	for (i = 0; i < lk_nbuffers; i++) {
		if (LK_MARKED(lk_mtab[i].lkm_base))
leaky_subr_add_leak(), defined here and here, reads in the available debugging information about the leak, and invokes leaky_add_leak() with the details. leaky_add_leak() adds each leak to a global hash table, coalescing any duplicates. Each entry in the table is a leak_bufctl_t, which is used later for leak reporting. Once all of the leaks have been processed, the final processing is done:
which groups all of the representative leaks by type, and sorts each type's list of buffers using another target interface, leaky_subr_bufctl_cmp(), which defines a "display" order for the leaks.

Final stage: report the leaks

Once leaky_sort has completed, the results have been stored and cached in static data. leaky_dump() dumps the data, using the algorithm explained in .../modules/genunix/leaky_impl.h:

 \* void leaky_subr_dump_start(type)
 \* void leaky_subr_dump(lkb, verbose)
 \* void leaky_subr_dump_end(type)
 \*	Used to dump the table of discovered leaks.  invoked as:
 \*		for i in 0 .. LK_NUM_TYPES
 \*			leaky_subr_dump_start(i)
 \*			for lkb in (possibly a subset of) the type i leaks
 \*				leaky_subr_dump(lkb, 0)
 \*			leaky_subr_dump_end(i)
 \*	leaky_subr_dump_start()/end() are always invoked for each type, even
 \*	if there are no leaks of that type.  If '-d' was passed to
 \*	::findleaks, then we loop through the sequence of lkbs:
 \*		for i in 0 .. LK_NUM_TYPES
 \*			for lkb of type i, same subset and order as above
 \*				leaky_subr_dump(lkb, 1)
 \*	leaky_subr_dump() can use the leaks chained off of lkb_next to
 \*	access coalesced leaks.  lkb_dups is the length of the dup list.
The targets use these invocations to produce tabular output.

Conclusion and Final Thoughts

The generic findleaks code is compact (913 lines), and for the most part without external dependency.[3] It is a useful shell for running the leak processing, but knows nothing of how to discover allocated buffers, root sets, or any other implementation details. All such external dependencies are covered by the target implementations. (It should be possible to do a target implementation for any allocator which has enough state to track all allocated buffers.)

One interesting aspect of this re-working is that the "target" interface places a huge set of requirements on the generic framework; given just the contents of .../modules/genunix/leaky_impl.h, someone writing a clean-room implementation of leaky.c would be constrained to writing something extremely similar to what is there today. On the other hand, the target implementation is less constrained by the target interface. Instead, it is mostly determined by how it must retrieve the needed data from the process, memory image, or core file mdb(1) is running against. This trade off of flexibility on two sided of a software interface is a common pattern, and interesting to see on a smaller scale.


[1] Interestingly, speed is pretty low on the priority list, especially in the post-mortem case: ::findleaks, when run against a dump, is almost entirely I/O bound.
[2] Which will blow out the stack, causing a core dump. Simple recursion removal techniques lead you to the original leaky_grep() implementation: an iterative form of the same algorithm, with the stack of pending work stored on the heap. This implementation served ::findleaks well, until kmdb's memory constraints came along.
[3] Ignoring the "verbose" output, the only external calls not part of the target interface is the call to mdb_vread() in leaky_grep(), which reads from the program's address space, and mdb_getopts(), used to parse the arguments to ::findleaks.

Technorati Tag:
Technorati Tag:
Technorati Tag:


Interestingly, speed is pretty low on the priority list, especially in the post-mortem case: ::findleaks, when run against a dump, is almost entirely I/O bound.

Posted by idp4u on October 09, 2010 at 12:28 PM PDT #

Post a Comment:
  • HTML Syntax: NOT allowed



Top Tags
« July 2016