Monday Dec 12, 2011

Dumping Stacks

Stack dumping seems like it should be an easy thing.  After all, you're just walking a linked list.  What the truth is that stack dumping in Oracle Solaris Crash Analysis Tool is one its most complicated parts.

Sure, generating a list of function/frame pointers is pretty easy, but it turns out there are vast amounts of useful data and clues in that stack that it's useful to dig up and display (cue the digging pirate).

The first thing to consider is that when Oracle Solaris takes a trap (any kind of fault, exception or interrupt, which includes system calls), it has to save all of the registers to the stack.  This happens on system calls, interrupts, and of course, traps caused by approaching system crashes.  The reason this is done is to remember the state of the current thread before totally switching contexts to a different stack, and the operating system, therefore, needs to preserve all of the registers should it ever return so it can restore state and continue.

That's the reason you will often see register dumps in the stack.  Getting the register values at a precise point in time is valuable information - particularly when it contains state data about something going wrong and we're preparing to crash the operating system.  

You can see this trap information by enabling the stk_trap scatenv setting, which is on by default.  You can also see the user thread registers for userland threads in a core by displaying the full stack data with stack -l.

Other things which arise are architecture-specific, and related to how the compiler tries to optimize things for a particular architecture.  On SPARC, for example, a useful optimization is re-using stack space for functions we'll never return to.

For example, functions which look like:

  funcA() {
    return (funcB());
  callerFunc() {

have no reason to return to funcA ever again.  What the compiler does in such situations is to pop the stack space it reserved for funcA for funcB's use.  We coined the term "recycled frame" for this.

The way this would normally appear in the stack is this:


which means the person looking at this stack has no information about funcA ever having been called.  If they then went to look at callerFunc, they'd see it never calls funcB.  But, if you look at callerFunc+offset, you can see a call to funcA, which isn't the next thing in the stack.  Hence you can see why the tool was enhanced to show:

  funcA() - frame recycled

for these cases.  Display of the stack with recycled frames is controlled by the stk_recycled scatenv setting, which is on by default.

Note that if there are two such optimizations in the calling sequence, there's no useful way to dig up anything but the first step.  The codepath command was written to search for call linkages between such functions.

Something similar is also done for leaf functions.  Leaf functions are functions which never call anything else.  An optimization done for those is that they don't have to necessarily do a save if they're short enough to operate in the volatile global and output registers (%g and %o on SPARC).

Knowledge of this is used when walking the stack as the "next" function down the stack will use the same stack pointer as we are using now.

There are other interesting bits done in the stack which are worth noting.  For example, as part of the ABI on SPARC 64-bit, stack pointers are offset by a number so as to make them easily distinguishable from 32-bit stack pointers.  That is the STACK_BIAS, which is 2047.  So if you find an address that's in the range of stack addresses, but is misaligned (in this case, it ends up with a "1" as the last digit), it probably needs the STACK_BIAS added to it to get the number you actually want.  Note that 32-bit SPARC, and x86/x64 use a STACK_BIAS of 0.

For convenience, the frame pointers that Oracle Solaris Crash Analysis Tool uses already have the STACK_BIAS applied. 

Another useful concept to know is MINFRAME.  The ABI defines how functions pass arguments between functions, and on SPARC, when a save instruction is run, it saves some space for input (%i) and local (%l) registers of the caller to be stored, plus space for output registers (%o) for any functions it calls (note that for leaf functions, this is sometimes optimized to skip the output registers - something we've tagged a MINIFRAME).

Knowing that a function uses MINFRAME is a good clue that this is a short function, and makes no use of local variables beyond what it can get away with using the local and input registers.

Knowing when we switch between stacks is also useful information.  This means we've switched from the userland stack space to kernel, or from one kernel stack to another.  This happens on traps, and also happens when we've detected a stack overflow - if we're out of stack space where we are, we still need stack space to deal with it.  In Oracle Solaris Crash Analysis Tool you can see stack switches with the stk_switch scatenv setting.  It also tells you details about whose stack it is, such as the thread's kernel stack, a CPU's interrupt or idle thread's stack, or the ptl1_stk.

The ptl1_stk is a special space used for dealing with panics when we're already processing a trap.  When we process a trap, we first switch to the kernel's nucleus, which is low-level code for handling all the details required switching between userland and kernel.  However, when we are already in the nucleus on SPARC, and we take a trap, we could be processing a kernel stack overflow, which means we're in trouble in the low-level code, and Oracle Solaris sets aside a special stack space for dealing with those on SPARC - the ptl1_stk.

Stack overflows are another interesting area where the stack dumper can help.  Kernel stack space is a limited resource - typically only 1-2 pages of memory.  Kernel code needs to be aware of this, and not allocate too much stack space for local variables.  Problems still arise here, and you can examine stack space usage by enabling scatenv settings stk_s_fromend and stk_s_size (both disabled by default).  stk_s_fromend shows how far each frame is from the end of the stack.  stk_s_size shows the size of each frame. Each kernel stack also has an unmapped page of vmem assigned to it at the end so that any accesses past the end of the stack trigger a page fault which can't be resolved and thus results in a panic.  That page is referred to as the redzone.  This prevents stack overflows from corrupting a neighboring thread stack.

One of the most useful things the stack dumper can do is display arguments passed into a function.  On SPARC, arguments are passed in registers. However, those registers are re-used by the callee, and thus can't be relied upon to determine what was passed to the function. The passed-in values can often be determined by examining the assembly code in the caller to see what it put in the output registers (input registers for leaf functions).  Doing that manually is time-consuming, even if you've had a lot of practice.

The scatenv setting stk_args causes the stack dumper to attempt to calculate those passed-in values for you, and display it in the stack. It isn't perfect, and can't always determine the arguments, but saves a lot of time in most cases.  It only works for SPARC at this time.

There are a few other more obscure scatenv settings which control how the stack dumper behaves. 

  • stk_l_sym - decodes any numbers to a kernel symbol if possible in the long stack output
  • stk_l_symonly - any numbers that can be decoded as a kernel symbol are displayed as only the kernel symbol - without the  number
  • stk_s_addr - displays the address of each frame in the normal stack output
  • stk_s_regs - displays the values of the input registers (%i0 - %i5) in the stack output (less useful with the stack arguments available)
  • stk_s_sym - decodes any numbers to a kernel symbol if possible in the normal stack output
  • stk_trap_mmu_sfsr - display and decode the mmu_sfsr information available in SPARC trap frames
  • stk_trap_tstate - display and decode the tstate information available in SPARC trap frames
There is a vast amount useful information available in a thread stack.  The Oracle Solaris Crash Analysis Tool stack dumper has many options to control how much and which information is displayed - probably more than anyone will ever need.  However, new pieces come up all the time and will be added.

Monday Dec 05, 2011


Have you ever wondered if there was a way to get the Oracle Solaris Crash Analysis Tool to always run a set of commands when it starts?  And have you ever seen one user's command output look different from the output printed when you run the same command? 


When the tool starts, it checks your home directory for a file named .scatrc.  If it finds one, it runs all the commands listed.  For example, here's the author's .scatrc:

export HISTFILE=.scathist$USER
alias less=more
set -o vi
scatenv human_readable on
alias ibase="base -i"
alias obase="base -o"
scatenv minsize 0x1000000
scatenv dis_synth_only on
scatenv dis_synth_cc on
scatenv dis_br_label on
scatenv stk_switch on
scatenv str_syncq on
scatenv str_data on
scatenv sym_size_full on
scatenv thr_stkdata off
scatenv thr_pri on
scatenv thr_lwp off
scatenv thr_cpu off
scatenv thr_age on
scatenv thr_syscall off
scatenv thr_flags off
scatenv table_whitespace off
scatenv scroll 0
color background light
alias cpuc=cpu | grep "cpu id"

As you can see, the purpose is to initialize Oracle Solaris Crash Analysis Tool with the settings I'm accustomed to using.  Given that the tool based on ksh, you can also set ksh environment settings, such as the editing mode, set ksh environment variables for later use, or create command aliases.


In the .scatrc, you see many settings made using the scatenv command which is used to query and change the settings of the environment settings used by the tool.  To see the complete list of settings available, simply issue the scatenv command with no options. Since the scatenv settings can be either boolean, a number value, or a string, the variable type is provided in the command output.  Luckily, one can search for applicable settings using the -? option.  For example, to find all settings that involve threads, one would use:

 CAT(vmcore.0)> scatenv -? threads   
    Flag Name    Current  Type  Description
    dispq_empty  on       on    When displaying dispatch queues, also show  
                                CPUs that have no threads in their dispatch 
    thr_age      on       on    Show age information when dumping threads  
    thr_cpu      off      on    Show CPU information when dumping threads
    thr_flags    off      on    Show flag information when dumping threads
    thr_idle     on       on    Show idle time when dumping threads
    thr_lwp      off      on    Show lwp information when dumping threads
    thr_pri      on       on    Show priority information when dumping threads
    thr_proc     on       on    Show process information when dumping threads
    thr_stime    off      on    Show t_stime when dumping threads.
    thr_stkdata  off      on    Show stack related information when dumping
    thr_syscall  off      on    Show syscall information when dumping threads
    thr_wchan    on       on    Show wchan information when dumping threads

 To set an environment flag, simply enter:

scatenv flag_name setting


  • flag_name  - the name of the flag in question.
  • setting - the value to assign to the flag.




« December 2011 »