more on gcore

Trawling through b.s.c I noticed Fintan Ryan talking about gcore(1), and I realized that I hadn't sufficently promoted this cool utility. As part of my work adding variable core file content, I rewote gcore from scratch (it used to be a real pile) to add a few new features and to make it use libproc (i.e. make it slightly less of a pile).

You use gcore to take a core dump of a live running process without actually causing the process to crash. It's not completely uninvasive because gcore stops the process you're taking the core of to ensure a consistent snapshot, but unless the process is huge or it's really cranky about timing the perturbation isn't noticeable. There are a lot of places where taking a snapshot with gcore is plenty useful. Let's say a process is behaving strangely, but you can't attach a debugger because you don't want to take down the service, or you want to have a core file to send to someone who can debug it when you yourself can't -- gcore is perfect. I use to it to take cores of mozilla when it's chugging away on the processor, but not making any visible progress.

I mentioned that big processes can take a while to gcore -- not surprising because we have to dump that whole image out to disk. One of the cool uses of variable core file content is the ability to take faster core dumps by only dumping the sections you care about. Let's say there's some big ISM segment or a big shared memory segment: exclude it and gcore will go faster:

hedge /home/ahl -> gcore -c default-ism 256755
gcore: core.256755 dumped

Pretty handy, but the coolest I've been making of gcore lately is by mixing it with DTrace and the new(ish) system() action. This script snapshots my process once every ten seconds and names the files according to the time they were produced:

# cat gcore.d
#pragma D option destructive
#pragma D option quiet

tick-10s
{
        doit = 1;
}

syscall:::
/doit && pid == $1/
{
        stop();
        system("gcore -o core.%%t %d", pid);
        system("prun %d", pid);
        doit = 0;
}
# dtrace -s gcore.d  256755
gcore: core.1097724567.256755 dumped
gcore: core.1097724577.256755 dumped
gcore: core.1097724600.256755 dumped
\^C

WARNING! When you specify destructive in DTrace, it means destructive. The system() and stop() actions can be absolutely brutal (I've rendered at least one machine unusable my indelicate use of that Ramirez-Ortiz-ian one-two combo. That said, if you screw something up, you can break into the debugger and set dtrace_destructive_disallow to 1.

OK, so be careful, but that script can give you some pretty neat results. Maybe you have some application that seems to be taking a turn for the worse around 2 a.m. -- put together a DTrace script that detects the problem and use gcore to take a snapshot so you can figure out what was going on when to get to the office in the morning. Take a couple of snapshots to see how things are changing. You do like debugging from core dumps, right?

Comments:

Dtrace not needed really :) #!/usr/bin/bash N=$1 PID=$2 if [ -z $N ] || [ -z $PID ]; then echo "usage: $0 (interval) (pid)" exit 0; fi EPOCH=`truss -t time date 2>&1 | awk '/\^time/ {print $3; exit}'` while [ $n -gt 0 ]; do /usr/bin/pstop $pid /usr/bin/gcore -o core.$EPOCH $pid /usr/bin/prun $pid let n=$n-1 done

Posted by Rodrick Brown on October 13, 2004 at 03:46 PM PDT #

Rodrick,

True, that we don't need DTrace -- this was an example of how you can mix DTrace and gcore. The idea is that you can grab a core when you see any series of events. One other thing to note is that the %t format character can be used with the -o option to gcore(1) -- while your use of truss(1) is exciting, there's a simpler way.

Posted by Adam Leventhal on October 13, 2004 at 03:51 PM PDT #

Yes good point Adam now I think about it this will be very good for situations where you could have a probe in dtrace trigger an event that could call gcore to further trouble shoot an issue with mdb :) Very cool in deed. One thing #pragma D option destructive scares me lol :)

Posted by Rodrick Brown on October 13, 2004 at 04:11 PM PDT #

Rodrick,

I basically think you can't be too afraid of destructive actions. Consider the following D script:
syscall:::entry
{
        stop();
}
Ouch... and it's real tough to recover from a system where every process is stopped...
Destructive actions are <em>incredibly</em> useful, but wield them with care.

Posted by Adam Leventhal on October 13, 2004 at 07:46 PM PDT #

Very cool, I have a bunch of situations where this will be handy.

Posted by Fintan Ryan on October 13, 2004 at 08:24 PM PDT #

whoever adam leventhal is I ran across his web page on accident and he is hot.

Posted by E on November 24, 2004 at 04:23 PM PST #

Post a Comment:
Comments are closed for this entry.
About

Adam Leventhal, Fishworks engineer

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today