On ::findleaks -- wot is it?

A few days ago I was asked by a partner engineer:


When we were looking at a core file we ran ::findleaks and it came back with some buffers from our driver. However, the buffers are from our buffer pool. They are allocated at driver attach and will be freed at driver detach. The core file was from a Leadville stack panic during device discovery at boot, so I expected the driver to be fully present. The ::findleaks is identifying 5 buffers.

We actually have allocated and pooled 512 of these buffers. I am wondering why ::findleaks considers these buffers as "leaked". Could you help me understand what this means?



I'm glad you asked.



To start with, the ::findleaks dcmd (manual page entry) has a ::help entry within mdb:



> ::help findleaks

NAME
findleaks - search for potential kernel memory leaks
SYNOPSIS
[ addr ] ::findleaks [-dfv]
DESCRIPTION
   Does a conservative garbage collection of the heap in order to find potentially leaked buffers. Similar leaks are coalesced by stack trace, with the oldest leak picked as representative. The leak table is cached between invocations.
   addr, if provided, should be a function or PC location. Reported leaks will then be limited to those with that function or PC in their stack trace.
   The 'leak' and 'leakbuf' walkers can be used to retrieve coalesced leaks.
OPTIONS
    -d    detail each representative leak (long)
    -f    throw away cached state, and do a full run
    -v    report verbose information about the findleaks run
ATTRIBUTES

   Target: kvm
   Module: genunix
   Interface Stability: Unstable





You can follow along with the findleaks code via
http://cvs.opensolaris.org/source/xref/on/usr/src/cmd/mdb/common/modules/genunix/leaky.c
but be warned --- the code is more than just a tad tricky to follow.




Ok, as to how you really make use of ::findleaks, have a look through this transcript:



$ mdb -k 12
mdb: warning: dump is from SunOS 5.11 onnv-gate:2006-01-12; dcmds and macros may not match kernel implementation
Loading modules: [ unix krtld genunix specfs dtrace uppc pcplusmp ufs ip sctp usba s1394 fctl nca lofs random nfs fcip cpc ipc ptm sppp ]
> ::status
debugging crash dump vmcore.12 (64-bit) from doppio
operating system: 5.11 onnv-gate:2006-01-12 (i86pc)
panic message:
BAD TRAP: type=e (#pf Page fault) rp=fffffffffbc58a90 addr=a0 occurred in module "" due to a NULL pointer
dereference
dump content: kernel pages only
> ::msgbuf
### snippage ###
kernel memory allocator:
invalid free: buffer not in cache
buffer=ffffffff87a8cda0 bufctl=ffffffff879d70d8 cache: streams_mblk
previous transaction on buffer ffffffff87a8cda0:
thread=ffffffff94795c00 time=T-15569.454683674 slab=ffffffff879b2780 cache: streams_mblk
kmem_cache_free_debug+148
kmem_cache_free+51
dblk_destructor+73
kmem_cache_free+1c0
dblk_lastfree+7e
freeb+b0
struiocopyout+95
kstrgetmsg+973
sotpi_recvmsg+20d
socktpi_read+9c
fop_read+29
read+2a4

panic[cpu0]/thread=fffffe800000bc80:
kernel heap corruption detected


fffffe800000baa0 genunix:kmem_error+4ab ()
fffffe800000baf0 genunix:kmem_slab_free+dd ()
fffffe800000bb50 genunix:kmem_magazine_destroy+127 ()
fffffe800000bb90 genunix:kmem_depot_ws_reap+9d ()
fffffe800000bbc0 genunix:kmem_cache_reap+35 ()
fffffe800000bc60 genunix:taskq_thread+200 ()
fffffe800000bc70 unix:thread_start+8 ()

#### snippage ####

> ::findleaks
BYTES LEAKED VMEM_SEG CALLER
16729 1 ffffffff848bf528 AcpiOsTableOverride+0x15f
------------------------------------------------------------------------
Total 1 kmem_oversize leak, 16729 bytes

CACHE LEAKED BUFCTL CALLER
ffffffff82a1d008 55 fffffed4cc4c89e0 allocb+0x65
ffffffff82a1d008 128 ffffffff879ecab0 allocb+0x65
ffffffff80040008 128 ffffffff879e6578 dblk_constructor+0x57
ffffffff80040008 55 ffffffff935f9040 dblk_constructor+0x57
ffffffff8003a748 1 ffffffff93a5de80 devi_attach+0x94
ffffffff8002f748 1 ffffffff89c92678 kobj_alloc+0x88
ffffffff8002e008 2 ffffffff8a188ce8 kobj_alloc+0x88
ffffffff80039008 1 ffffffff8970c4c0 kobj_alloc+0x88
ffffffff84896748 116 ffffffff983da998 rootnex_dma_allochdl+0x5a
ffffffff84896748 128 ffffffff879ec9d8 rootnex_dma_allochdl+0x5a
------------------------------------------------------------------------
Total 615 buffers, 997408 bytes
> ffffffff879e6578::bufctl -v
ADDR BUFADDR TIMESTAMP THREAD
CACHE LASTLOG CONTENTS
ffffffff879e6578 ffffffff87a8a4e0 b7d45a53f ffffffff82aefe80
ffffffff80040008 ffffffff808dc340 0
kmem_cache_alloc_debug+0x2a6
kmem_cache_alloc+0x1fc
dblk_constructor+0x57
kmem_cache_alloc+0x237
allocb+0x65
FillTxDescriptor+0x2b
FillTxRing+0x2e
0xfffffffff5136697
SkGeAttachReq+0x116
SkGeProto+0x8d
SkGeWsrv+0x26
runservice+0x62
queue_service+0x5b
stream_runservice+0x10a
strput+0x1f3




So what we do is take the bufctl address for each of the leaked bufs, and run it through the ::bufctl dcmd with the "-v"  option, which allows us to see the stack trace of the thread which allocated that particular buf. From that we can see which function (and where in it) leaked the memory.

In the msgbuf for a panic like the one above you see the address of the suspect bufctl, so you can do this:



> ffffffff879d70d8::bufctl -v
ADDR BUFADDR TIMESTAMP THREAD
CACHE LASTLOG CONTENTS
ffffffff879d70d8 ffffffff87a8cda0 6c5b3b526b33 ffffffff94795c00
ffffffff80040008 ffffffff80b770c0 ffffffff820514c8
kmem_cache_free_debug+0x148
kmem_cache_free+0x51
dblk_destructor+0x73
kmem_cache_free+0x1c0
dblk_lastfree+0x7e
freeb+0xb0
struiocopyout+0x95
kstrgetmsg+0x973
sotpi_recvmsg+0x20d
socktpi_read+0x9c
fop_read+0x29
read+0x2a4



Note that we have two thread addresses (right hand column)



> ffffffff820514c8::thread
ADDR STATE FLG PFLG SFLG PRI EPRI PIL INTR
ffffffff820514c8 inval/badd cafe badd cafe -13570 -17699 0 ffffffff91af94e0
> ffffffff94795c00::thread
ADDR STATE FLG PFLG SFLG PRI EPRI PIL INTR
ffffffff94795c00 run 1002 104 3 59 0 0 n/a


Ok, we know that we're going through streams code, and that the panic stack went through dblk_destructor -- therefore it's reasonable to assume that we should look at the two leaks from dblk_constructor as a first point:




> ffffffff879e6578::bufctl -v
ADDR BUFADDR TIMESTAMP THREAD
CACHE LASTLOG CONTENTS
ffffffff879e6578 ffffffff87a8a4e0 b7d45a53f ffffffff82aefe80
ffffffff80040008 ffffffff808dc340 0
kmem_cache_alloc_debug+0x2a6
kmem_cache_alloc+0x1fc
dblk_constructor+0x57
kmem_cache_alloc+0x237
allocb+0x65
FillTxDescriptor+0x2b
FillTxRing+0x2e
0xfffffffff5136697
SkGeAttachReq+0x116
SkGeProto+0x8d
SkGeWsrv+0x26
runservice+0x62
queue_service+0x5b
stream_runservice+0x10a
strput+0x1f3

> ffffffff935f9040::bufctl -v
ADDR BUFADDR TIMESTAMP THREAD
CACHE LASTLOG CONTENTS
ffffffff935f9040 ffffffff934fe7c0 7a840278b8ac fffffe80000b3c80
ffffffff80040008 ffffffff805b7e00 ffffffff8268c6e8
kmem_cache_alloc_debug+0x2a6
kmem_cache_alloc+0x2ac
dblk_constructor+0x57
kmem_cache_alloc+0x237
allocb+0x65
FillRxRing+0x46
ReceiveIrq+0xad3
SkGeIsrOnePort+0xa5
av_dispatch_autovect+0x97




Which for me means logging a call with SysKonnect.de since SkGe is their code.

You can find out more about ::findleaks by reading Jonathan Adams' blog entry on the implementation, and reading
manpages for umem_debug --- you can use ::findleaks on userland stuff too, which is a real boon.


I hope the above - although rambling - gives you enough to get started with on the magic of ::findleaks.

Technorati Tags: , ,
Comments:

Post a Comment:
Comments are closed for this entry.
About

I work at Oracle in the Solaris group. The opinions expressed here are entirely my own, and neither Oracle nor any other party necessarily agrees with them.

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today