A parting MDB challenge

Like most of Sun's US employees, I'll be taking the next week off for vacation. On top of that, I'll be back in my hometown in MA for the next few weeks, alternately working remotely and attending my brother's wedding. I'll leave you with an MDB challenge, this time much more involved than past "puzzles". I don't have any prizes lying around, but this one would certainly be worth one if I had anything to give.

So what's the task? To implement munges as a dcmd. Here's the complete description:

Implement a new dcmd, ::stacklist, that will walk all threads (or all threads within a specific process when given a proc_t address) and summarize the different stacks by frequency. By default, it should display output identical to 'munges':

> ::stacklist
73      ##################################  tp: fffffe800000bc80
        swtch+0xdf()
        cv_wait+0x6a()
        taskq_thread+0x1ef()
        thread_start+8()

38      ##################################  tp: ffffffff82b21880
        swtch+0xdf()
        cv_wait_sig_swap_core+0x177()
        cv_wait_sig_swap+0xb()
        cv_waituntil_sig+0xd7()
        lwp_park+0x1b1()
        syslwp_park+0x4e()
        sys_syscall32+0x1ff()

...

The first number is the frequency of the given stack, and the 'tp' pointer should be a representative thread of the group. The stacks should be organized by frequency, with the most frequent ones first. When given the '-v' option, the dcmd should print out all threads containing the given stack trace. For extra credit, the ability to walk all threads with a matching stack (addr::walk samestack) would be nice.

This is not an easy dcmd to write, at least when doing it correctly. The first key is to use as little memory as possible. This dcmd must be capable of being run within kmdb(1M), where we have limited memory available. The second key is to leverage existing MDB functionality without duplicating code. You should not be copying code from ::findstack or ::stack into your dcmd. Ideally, you should be able to invoke ::findstack without worry about its inner workings. Alternatively, restructuring the code to share a common routine would also be acceptable.

This command would be hugely beneficial when examining system hangs or other "soft failures," where there is no obvious culprit (such as a panicking thread). Having this functionality in KMDB (where we cannot invoke 'munges') would make debugging a whole class of problems much easier. This is also a great RFE to get started with OpenSolaris. It is self contained, low risk, but non-trivial, and gets you familiar with MDB at the same time. Personally, I have always found the observability tools a great place to start working on Solaris, because the risk is low while still requiring (hence learning) internal knowledge of the kernel.

If you do manage to write this dcmd, please email me (Eric dot Schrock at sun dot com) and I will gladly be your sponsor to get it integrated into OpenSolaris. I might even be able to dig up a prize somewhere...

Comments:

I ran munges on my computer and found the output of <tt>::walk thread | ::findstack</tt> is like this:

stack pointer for thread 2a10027fcc0: 2a10027f151
[ 000002a10027f151 cv_wait+0x5c() ]
  000002a10027f201 taskq_thread+0x13c()
  000002a10027f2d1 thread_start+4()
...

And there are many of them, yet the result of <tt>::walk thread | ::findstack ! ./munges</tt> is as follows:

52      ##################################  tp: 2a100017cc0
        taskq_thread+0x13c()
        thread_start+4()

<tt>cv_wait()</tt> has gone. Is it a bug? Or it is supposed to be like this? Thank you!

Posted by yxn on July 14, 2005 at 08:09 PM PDT #

It is supposed to be like that; if you want to include the bracketed line, use munges -b.

That said, I always use munges -b and have absolutely no recollection why I made it an option...

Posted by Dave Powell on July 20, 2005 at 08:39 AM PDT #

When is Sun going to release Solaris Core Analysis Tool (SCAT) for Solaris 10? In my opinion, SCAT is a far superior tool for analyzing crashdumps that MDB. For example, for a system hang, I would like to sort all threads based on idle time in the sd driver - can MDB do that?

Posted by S-CAT Addict on June 27, 2006 at 01:59 AM PDT #

2 years later, and no response to my question?

Posted by S-CAT Addict on May 21, 2008 at 05:59 AM PDT #

Post a Comment:
Comments are closed for this entry.
About

Musings about Fishworks, Operating Systems, and the software that runs on them.

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today