Tuesday May 19, 2015

Oracle Solaris Crash Analysis Tool 5.5 Release

The Oracle Solaris Crash Analysis Engineering Team is happy to announce that Oracle Solaris CAT  5.5 is available for download.

The 5.5 release patches are available on MOS, and can be found by searching on the patchIDs 21099218 (Combined package supporting SPARC and X86/X64) and 21099215 (platform specific packages).

Go to MOS:  https://support.oracle.com and login
Click on tab entitled Patches and Updates at top
Enter 21099218 and 21099215 for patch numbers

Click on Search

From here, click on the patchID link you want, and then click on the  Readme or Download button. 

Release Notes

These release notes include all changes to the tool since the 5.4 release.

General

Oracle Solaris 12 Support

As of Oracle Solaris Crash Analysis Tool 5.5, support for Oracle Solaris 12 has been added.

Version 11 dumphdr Support

The new multi-part version 11 crashdump format and its dumphdr are now supported. The tool will open all available vmcore files with the same number by default.

Running with No Crashdump or Live Kernel

The tool can now be run without opening a crashdump or live kernel. This is useful for running the calc command or some of the data conversion or display commands.

To open the tool with no crashdump, use the --nocore command-line option. It will also open in this mode if no arguments are provided and /dev/kmem cannot be opened (typically due to not running the tool as root and thus not having permissions to open /dev/kmem).

Only a limited set of commands are available when run this way:

2base
2dec
2double
2float
2hex
2ip
2neg
2string
2time
base
bits
calc
clock (partially)
decode
demangle
dis
help
legal
pctcpu
scat_version
scatenv
sim
size

When running with no crashdump or live kernel, the scatenv calc_ishex setting is enabled by default.

Branch Targets

Branch targets are now displayed on x86 if the dis_br_label setting is enabled.

Explore

The --scat_explore option has been renamed to just --explore. The files will still be written into a directory that starts with the name scat_explore.

Write-locked Page Locks

When printing a thread waiting for a write-locked page, the tool can find the owner only in some situations.

The lock is marked by part of the lock owner thread's address. So to find the matching owner thread, a list of all the threads is required. That list is created the first time the tlist command is run. So if a thread is displayed prior to running tlist, that lock owner cannot be determined.

For example:

CAT(vmcore.0/11V)> thread 0x1000a7e08aa0
==== user (LWP_SYS) thread: 0x1000a7e08aa0  PID: 2203 ====
cmd: XXXXXXXXXXXXXX
fmri: lrc:/etc/rc3_d/XXXXXXXX
t_wchan: 0x3002a631144  sobj: condition var (from unix:page_lock_es+0x27c) 
t_procp: 0x1000a6677020
…

CAT(vmcore.0/11V)> tlist pagelock      
  thread         pri  pctcpu                     idle   PID          wchan command
  0x10006c34fc20  60   0.000          3m17.784433135s  2201  0x3002a631144 XXXXXXXXXXXXXX
  0x1000a7e08aa0  60   0.000          3m17.828439725s  2203  0x3002a631144 XXXXXXXXXXXXXX
  0x1000a23f4020 101   0.000          3m17.828442535s  2205  0x3002a631144 XXXXXXXXXXXXXX
  0x1000a8c9bc00 101   0.000          3m17.828442215s  2209  0x3002a631144 XXXXXXXXXXXXXX
  0x100078da7c20 101   0.000          3m17.784432255s  2213  0x3002a631144 XXXXXXXXXXXXXX
  0x1000a6400e00 101   0.000          3m17.828441575s  2217  0x3002a631144 XXXXXXXXXXXXXX
  0x1000a8c9b880 101   0.000          3m17.784441635s  2225  0x3002a631144 XXXXXXXXXXXXXX

   7 threads in page_lock_es() found.

threads in page_lock_es() by page:
7 threads: 0x10006c34fc20 0x1000a7e08aa0 0x1000a23f4020 0x1000a8c9bc00 0x100078da7c20...
page @ 0x3002a631100
  vnode:           0x10006b0d3740(*genunix(bss):swap_vnodeops)
  offset:          0x20015a3e000
  state:           !FREE|SWAP
  p_selock:        0x87e083a0 (EXCL owner:0x1000a7e083a0)
  p_lckcnt:        1
  p_slckcnt:       0
  p_cowcnt:        0
  p_mapping:       0x30009c639e8
  p_szc:           0 (8K)
  p_nrm:           0x3 MOD|REF
  calculated PA:   0x78f944000 (pagenum: 0x3c7ca2)

CAT(vmcore.0/11V)> thread 0x1000a7e08aa0
==== user (LWP_SYS) thread: 0x1000a7e08aa0  PID: 2203 ====
cmd: XXXXXXXXXXXXXX
fmri: lrc:/etc/rc3_d/XXXXXXXX
t_wchan: 0x3002a631144  sobj: condition var (from unix:page_lock_es+0x27c)  p_selock owner: 0x1000a7e083a0
top owner (0x1000a7e083a0) is waiting for read-locked rwlock 0x10009987a798
t_procp: 0x1000a6677020
…

Piping sdump/sarray/slist

Background

The rationale for scat pipes was mainly performance. The added convenience is a bonus, but not enough to justify the work of implementing it.

The main problem for writing scripts for scat is the time it takes to fork(). Even though only the page tables are copied, the time spent for a single fork() can be greater than 500ms for a large crashdump.

As example, take a crashdump with 500 disks multipathed with 8 paths each. We want to check one attribute per path that can be directly looked up. This requires 500 commands (fork()s) for the disks and 4000 for the paths. With a fork() time of 500ms that will take 2250 seconds just for the fork()s plus all time spent computing. If awk, sed and other utilities are used for each entry, the time multiplied for each step. This is clearly not feasible, as runtime quickly moves into hours.

The approach to solve this is to minimize the calls to fork(). This is done by allowing sdump and other commands to take an array of values to dump on standard input. This way the number of fork()s is unrelated to the number of elements to parse, but only related to the number of steps required to access the data for one single path.

For the above example, the script would be something like:

sdump ... | sarray ... | filter | sdump ... | slist ... | filter | sdump ...
With a few more steps we get ~20 fork()s and a runtime of roughly 10s. Each additional step adds 0.5s, as opposed to 2250s with the old approach.

The complexity of the steps adds very little additional time, as all steps of the pipeline run in parallel and can be executed on different CPUs.

Implementation

One major goal in the implementation was to keep everything working exactly as before, unless the new features are used. This prevents any issues with existing scripts. For most things that was easy to implement, the only issues appear with slist and sarray, but more about this later.

To tell a command to read one of its arguments from stdin, the argument is replaced by a "-". For example:

CAT(live/10U)> echo "sd_state\nssd_state" | sdump - long
0x600104ee8a0
0x60015eb7b90
dumps the values for sd_state and ssd_state. The requested action will always be repeated for each input line.

Some commands like slist and sarray can take multiple values from stdin, like:

echo <addr> <start> <stop> | sarray - - - <type>
Lines starting with "#" are treated as comments and are passed through. This may seem unimportant, but by manipulating comments you can implement if/then/else constructs. It also allows drilling down different paths in the structures in one go.

All commands that take values from stdin will accept a -d switch that prints out the command-line equivalent for the stdin line that is printed. Besides helping debugging scripts, this makes all intermediate steps and results available to steps later in the pipeline. This can be made default by:

scatenv typedb_dump_comment on

Type info, offsets and field names from sdump output are ignored. Trailing text after the number of fields requested is also ignored.

Setting this parameter is required, if you want to parse the output of slist or sarray.

Example

As an example, lets print the svl_transient field of the scsi_vhci_lun_t structures of all ssd devices. First we need some helper functions:

# we may encounter NULL pointers while drilling down the tree, comment
# these out
function comnull {
  nawk '
        /^#/    {print $0; next}
        /NULL/  {print "#" $0; next}
                {print $0; next} '
}

# for sarray, we may need <address> and <count> on one line, but sdump
# will print the requested fields in two separate line, so we merge them
function merge {
  nawk '
        /^#/            { print $0 }
        /'$1' =/        { printf("%s",$3) }
        /'$2' =/        { printf(" %s\n",$3)}'
}

# print the ssd or sd state array
function statearray {
  [ $# -ne 1 ] && echo "Usage: statearray <statearray>" && return
  sdump *$1 i_ddi_soft_state array,n_items | merge array n_items | sarray - - "unsigned long";
}

# drill down from the [s]sd device to the vhci structure and print
# either the whole structure or the fields passed as argument
function vhcilun {
  sdump - sd_lun un_sd |
  sdump - scsi_device sd_dev |
  sdump - dev_info devi_mdi_client |
  comnull |
  sdump - mdi_client ct_vprivate |
  sdump - scsi_vhci_lun $1;
}

With these functions we can now write:

statearray ssd_state | vhcilun svl_transient
# sarray 0x60015e2e000 0x100 unsigned long
# sarray 0x60015e2e000 0x100 unsigned long
# ssd18, address 0x60015e2e090:
# sdump 0x60015ff2cc0 sd_lun un_sd
# sdump 0x60015ed0108 scsi_device sd_dev
# sdump 0x60015ff1ce8 dev_info devi_mdi_client
# sdump 0x60015f3fc80 mdi_client ct_vprivate
# sdump 0x60015e58bc0 scsi_vhci_lun svl_transient
   svl_transient = 0
# ssd19, address 0x60015e2e098:
# sdump 0x60015ffb340 sd_lun un_sd
# sdump 0x60015ff7e00 scsi_device sd_dev
# sdump 0x60015ff1888 dev_info devi_mdi_client
# sdump 0x60015f3fa40 mdi_client ct_vprivate
# sdump 0x60015e5eb40 scsi_vhci_lun svl_transient
   svl_transient = 0
…

The script executes 9 fork()s. Done in a conventional way with 500 LUNs, this would involve a minimum of 4500 fork()s.

Examples
CAT(vmcore.0/12X)> sdump 0xffffc100229fc9c0 kthread_t t_procp | sdump - proc u_psargs
    char [0x50] u_psargs = [ '/' 'u' 's' 'r' '/' 's' 'b' 'i' 'n' '/' 'z' 'p' 'o' 'o' 'l' ' ' 'c' 'r' 'e' 'a' 't' 'e' ' ' 't' 'e' 's' 't' 'p' 'o' 'o' 'l' '.' '2' '2' '9' '2' '.' '1' ' ' '/' 'd' 'e' 'v' '/' 'z' 'v' 'o' 'l' '/' 'd' 's' 'k' '/' 't' 'e' 's' 't' 'p' 'o' 'o' 'l' '.' '2' '2' '9' '2' '/' 't' 'e' 's' 't' 'v' 'o' 'l' '.' '9' '3' '3' '3' '\0' ]

CAT(vmcore.0/12X)> sdump *cpu_list cpu cpu_thread | sdump - kthread_t t_procp | sdump - proc u_psargs
    char [0x50] u_psargs = [ 'z' 'p' 'o' 'o' 'l' '-' 't' 'e' 's' 't' 'p' 'o' 'o' 'l' '.' '2' '2' '9' '2' '.' '1' '\0' … ]

CAT(vmcore.0/12X)> sdump *cpu_list cpu disp_q | sarray - 100 dispq_t

Array element #0, address 0xffffc100000ce080:
dispq_t {
  kthread_t *dq_first = NULL
  kthread_t *dq_last = NULL
  int dq_sruncnt = 0
}
…
Array element #60, address 0xffffc100000ce620:
dispq_t {
  kthread_t *dq_first = 0xfffffffc80251bc0
  kthread_t *dq_last = 0xfffffffc80251bc0
  int dq_sruncnt = 1
}
…

CAT(vmcore.0/12X)> sdump *cpu_list cpu disp_q | sarray -d - 60 1 dispq_t dq_first | sdump -d - kthread_t t_procp | sdump -d - proc u_psargs
# sarray 0xffffc100000ce080 60 1 dispq_t dq_first
# Array element #60, address 0xffffc100000ce620:
# sdump 0xfffffffc80251bc0 kthread_t t_procp
# sdump 0xfffffffffc039eb0 proc u_psargs
    char [0x50] u_psargs = [ 's' 'c' 'h' 'e' 'd' '\0' … ]

Recycled Frames

Recycled frames are now displayed on x86 if the stk_recycled setting is enabled. They are also searched for now with tlist call.

System Duty Cycle Scheduling Class

Support has been added for the System Duty Cycle (SDC) scheduling class. This includes output of the thread command with the scatenv thr_cldata setting enabled, and the classtbl command now has an SDC subcommand.

STABS Support in Oracle Solaris 9+

Because of the availability of CTF, support for STABS files has been removed the tool for Oracle Solaris 9+. This includes the typedb command, which was used to load and reorder type databases.

Startup Files

Previously, the tool included a scat_env file to set aliases for all users of the tool. That file is now named scatstartup and is run just before the user's ~/.scatstartup file.

Additionally, the tool provides for a scatinit file, but doesn't include one. Like the scatstartup, the scatinit file is run just before the user's ~/.scatinit file, but is run earlier during the tool's initialization process, and is intended to store scatenv setting such that they can affect how the tool starts up, and what is displayed in the tool's banner.

Thus, the startup files are processed in this order:

$SCAT_HOME/scatinit
~/.scatinit
(banner and sanity checks)
$SCAT_HOME/scatstartup
~/.scatstartup

Note that previous names of the ~/.scatinit (.scatrc and .fmrc) and ~/.scatstartup (.fmstartup and .fmlogin) files will still be run if present, but only if the files with the new names are not present.




New Commands

ereport [-raw|-dump]

This command works similarly to memerr, but only displays the ereport errorqs.

softstate [-c] [<softstate addr>]

softstate [-aosmntig] <softstate addr> [<module>:]<type> [[!]<field>[,<field>[,...]]]

This command dumps the contents of an i_ddi_softstate.

It can be used to display all such structures it can find in the symbol table which contain "_soft_state" or "_softstate" and are part of any module's data or bss segments. The i_ddi_softstate is displayed, along with any non-NULL array elements.

With a single argument, it is treated as the address of an i_ddi_softstate structure and displayed as above.

If the -c flag is used, instead of a list of non-NULL elements, a count of them is printed.

If an address and a type is provided, each element of the array is displayed as the specified type, just as if sdump was run on it. An optional list of fields can be used to specify which fields of the structure to display or exclude, just as sdump.

This version of the command also allows the -aosmntig flags of sdump to be used to control whether the address, offset, size, module, type, array index, gaps, or to perform ntoh*() on the numeric structure fields.

vnode <vnode_addr>

This command displays a vnode in the same format as findfiles.

vnode summary

This command displays a summary of vnodes open or mapped by processes. The list is sorted by the number of references seen, and those whose reference count is less than the scatenv vnode_summary_min setting are not shown. vnode_summary_min defaults to 10.

It is not pedantic about this, and if there's just one left that's got less than vnode_summary_min, it will be displayed anyway.

This only works on Oracle Solaris 9+.

vnode vn_cache

This command displays a summary of vnodes which are allocated in the vn_cache. The list is sorted by the vnodes' v_count, and any with a v_count less than the scatenv vnode_summary_min setting are not shown. vnode_summary_min defaults to 10.

It is not pedantic about this, and if there's just one left that's got less than vnode_summary_min, it will be displayed anyway.

This only works on Oracle Solaris 9+.




Previously Undocumented Commands


These commands were present in the tool previously, but never documented.

2ip

This command displays the specified value(s) as an IP address.

An IPv4 address is a 32b value, and thus if one argument is provided, it is treated as an IPv4 address. An IPv6 address is a 128b value, and thus two separate 64b values must be provided, one representing the upper 64b, and the second the lower 64b.

The <value> will have ntoh() performed on it, which will reverse the byte order on x86.

2neg <value>

This command performs a two's complement operation on the <value> and displays the results.

ire <ire_addr>

This command displays the ire_t structure at the specified <ire_addr> and decodes the fields therein.

pctcpu <value>

This command converts that <value> to a percent and displays it.

The value stored in a kthread_t's t_pctcpu is a 32-bit scaled integer less than or equal to 1 with the binary point to the right of the high-order bit.

var

Displays the var structure v. It is equivalent to running the command sdump v var. Note that the configuration information therein in most cases has been deprecated.




Interface Changes

base [dec|hex|<number>]

This command has been un-deprecated, and changed slightly. While you can still change the input base and output base with scatenv ibase, and scatenv obase respectively, this command changes both simultaneously.

The base can be specified as dec to get decimal (base 10), hex to get hexadecimal (base 16), or any number from 2 to 36.

The default input and output base is 16. The default base for <number> is decimal.

buf -a|-b

The buf command now has a -a flag to follow the buf's av_forw field or -b flag to follow the bufs' b_forw field to print the list of bufs.

bufs with zero b_flags are considered the end of the list since the hbuf and dwbuf structures end lists for the hbuf and dwbuf arrays, and have a zero b_flags.

calc Base Specification

A number in the input may include a base specifier by following the number by a hash symbol (#) followed by a decimal base.

This base is ignored if the number starts with 0 which specifies an octal number, 0x which specifies a hexadecimal number, or 0b which specifies a binary number.

An output base may also now be specified in the calc command by ending the expression with two hash symbols (##) followed by a decimal base.

In the result, the output base is indicated by a leading 0b for decimal, a leading 0x for hexadecimal, no specifier for decimal, or a trailing ##<base> for all other bases.

The input and output bases may be any number in the range 2 to 36.

Examples

CAT> calc 'ffff&1010##2'
ffff&010#2 = 0b1000000010000
CAT> calc 'ffff&1010##3'
ffff&1010#3 = 12122022##3
CAT> calc 'ffff&1010##32'
ffff&1010#32 = 40g##32
CAT> calc 'ffff&1010##16'
ffff&1010#16 = 0x1010 
CAT> calc 'ffff&1010##10'
ffff&1010#10 = 4112 

This example converts 332 base 5 to base 8:

CAT> calc '332#5##8'
332#5##8 = 0134

This example converts 12 (default base 16) plus 12 base 12 to default base 16:

CAT> calc '12+12#12'
12+12#12 = 0x20 

In effect, this allows the calc command to supercede the 2hex, 2dec, and 2base commands.

Due to the hash symbol also being the shell comment character, it must be used inside quotes, similar to the multiplication (*), division (/), and AND (&;) symbols.

calc offset() Specification

A new operator has been added to calc to obtain the offset of a field within a structure. The format is:

offset(<type>, <field>;)

classtbl SDC

The classtbl command now includes an SDC subcommand. This displays the contents of the System Duty Class scheduling table, including the current and target duty cycles (DutyC), the minimum, current, and maximum priority (min:pri:max), and whether the thread is asleep (s).

codepath on x86

The codepath command is now supported on x86. The -o (tail-recursion or leaf optimization) and -j (jmpl) flags are not supported.

dev snode

This new subcommand displays the table of special device files (snodes) in the system.

dis on x86

The dis command now works on x86/x64. Because an x86/x64 instruction may be up to 14 bytes in length, it will not fit into a C type. Thus, the instruction must be expressed as a hexadecimal string, with or without the leading 0x. Using a leading 0t for decimal or a leading 0 for octal will not be recognized and may give unexpected results.

Since the instruction size on x86/x64 is variable length, the length of the instruction that was disassembled is also displayed.

dispq -t

A new -t flag was added to the dispq command to display how long a thread waiting in a dispatch queue has been waiting. This is obtained from the t_waitrq field in the kthread_t structure.

door data <addr>
door node <addr>

A new data subcommand was added to allow displaying a door_data structure. For consistency, a node subcommand was also added to display door_node structure, although with no subcommand, the <addr> is still displayed as a door_node.

findfiles -n <vnode>
findfiles -v <vfs>

In addition to open files, these commands will now also search memory segments for the specified <vnode> or <vfs>. This only works on Oracle Solaris 9+.

findfiles -l

The -l flag has been removed. Structure addresses are now always displayed.

findfiles -s

A new -s flag was added. If this flag is included, memory segments' vnodes are also displayed. This only works on Oracle Solaris 9+.

flip -c on x86

The flip command now works on x86/x64 instructions. Because an x86/x64 instruction may be up to 14 bytes in length, it will not fit into a C type. Thus, the instruction must be expressed as a hexadecimal string, with or without the leading 0x. Using a leading 0t for decimal or a leading 0 for octal will not be recognized and may give unexpected results.

Since the instruction size on x86/x64 is variable length, the length of the instruction that was disassembled is also displayed.

The instruction provided is padded to 14 bytes with zeroes, so warnings are displayed if the resulting disassembled instruction is longer than the instruction provided, or if a bit was flipped beyond the providec instruction provided that resulted in a valid disassembled instruction.

ipc -a

This new flag for the ipc command causes the address of the relevant structure being displayed to be displayed in the short/table output.

mem

This command was renamed from meminfo.

A new -s option has been added to allow sorting of the output for the user subcommand. The supported sort fields are:

sort field description
pid sort by PID
command sort by the command. This is either the u_psargs or u_comm, depending on the scatenv proc_comm setting.
assize sort by the proc.p_as.as_size
rss sort by the calculated rss
swresv sort by the swap reserved
anon sort by the anonymous memory in use
file sort by the memory in use by files
swap sort by the amount of data on swap

If no sort type is specified, the default is anon.

This command does not work on Oracle Solaris 8.

mem log

This new subcommand for mem displays the log of writes to /dev/mem or /dev/kmem, or reads or writes to /dev/allkmem. This is typically run when a sanity check such as:

WARNING: 26 writes to /dev/kmem (run "mem log")
is seen. The RW+ column is used to indicate whether the memory was read from or written to, and a + indicates the operation was successful.

nfs rnode

This subcommand displays the NFS rnodes in memory.

pdump [rpc|smb]

The pdump command now has basic support for rpc and smb headers.

The never-implemented tr and fddi command-line options were removed.

pkma streams_dblk

If the cache name provided is streams_dblk, then all caches whose name start with streams_dblk are scanned instead of an individual cache.

pkma -p <cachename> <filename>

The pkma command can now write buffers to a file instead of trying to interpret/summarize them itself. To specify this, use the -p flag, and specify the <filename>. The file is written in libpcap format. Since there are no timestamps in the caches, all times will be written as zero.

As described above, using a <cachename> of streams_dblk will process all of the caches whose name starts with streams_dblk.

By default, at most the first 1024 bytes of the buffer are written. This value may be changed using the -m flag.

By default, an offset of two bytes from the start of the buffer is used to start copying, based on the alignment of the size of an ether_header, which is 14 bytes. This can be altered using the -o option.

If the -a flag is used, the file is appended to instead of overwritten.

The -f flag is also supported to write free buffers in addition to allocated buffers.

proc -s

Processes may now be sorted by p_lwpcnt by supplying lwpcnt as the sort type.

If no sort type is provided, it now defaults to sorting by PID (equivalent to -s pid).

rd/rdh/rd16/rd32/rd64/rdf/rdd/rdw -n

The new -n flag for these command causes the values read to be converted from network byte order to host byte order prior to display.

rdi -m

The ability to read all instructions in a kernel module has been removed.

rwlock -n <krnumalock_addr>

A new option has been added to the rwlock command. If the -n flag is given, then the <rwlock_addr> is treated as the address of a krwnumalock instead.

This is normally unnecessary, as the tool will detect a rwlock being a krwnumalock and display it appropriately.

Sanity Check Changes

The follow sanity checks were added:

  • zfs_arc_max limited
  • entries in the mm_*mem*_log of writes to memory
  • ZFS dataset over or near quota
  • DTrace error injection in use
  • tmpfs using more than 1GB and more than 10% of memory
  • str_ftnever is 0
  • freebs_list is non-NULL
  • squeues with non-zero sq_count (Oracle Solaris 10+ only)

stack find Selectors

stack find now has a set of selector subcommands available. The now-deprecated selection flags have been replaced with with strings similar to tlist's selectors.

selector flag description
align 4|8|16 -i 4|8|16 select threads whose stack frames are aligned by the specified value - 4|8 on 32-bit crashdumps, or 8|16 on 64-bit crashdumps
arg <value> -a <value> select thread stacks which include <value> as an argument to a stack function
call <function> -c <function> select thread stacks which include <function> (exact match)
frames <value> -f <value> select thread stacks which include at least <value> frames
module <module> -m <module> select thread stacks which include <module> (exact match)
stkbase -g select threads whose final frame is at the thread's t_stkbase

stack summary Selectors

stack summary now uses selectors similar to tlist. The selector flags have been replaced with strings similar to tlist's selectors and more selectors added.

selector flag description
call <function> -f <function> select thread stacks which include <function> (exact match)
callstr <function> -F <function> select thread stacks which include <function> (substring match)
module <module> -m <module> select thread stacks which include <module> (exact match)
modulestr <module> -M <module> select thread stacks which include <module> (substring match)
proc <proc> (none) select threads whose t_procp matches the specified <proc>. The <proc> may be specified as a decimal PID or the process address.
cmd <string> (none) select threads whose process's command contain <string> (substring match). This matches u_comm or u_psargs depending on the scatenv proc_comm setting.
state <state> (none) select threads whose t_state matches the specified state. The <state> may be any one of free, sleep, run, onproc, zomb, stopped, or wait.
sobj <sobj> (none) select threads which are waiting on the specified type of synchronization object. The <sobj> may be any one of none, mutex, reader, writer, cv, sema, rwlock, locks, user, or shuttle.
wchan <wchan> (none) select threads whose t_wchan or t_wchan0 matches the specified <wchan>.

An optional ! before a specifier causes it to exclude stacks which match that specifier.

These specifiers may be chained together to further specify which thread stacks to display in the summary, and those are ANDed together.

CAT(vmcore.0/11X)> stacks call zfs_write !call zfs_range_lock
would summarize stack's threads which include a call to zfs_write but don't have a call to zfs_range_lock.

stream [-s] [-c] [-n] mblk

The stream mblk command interface was changed to allow for summarizing mblks as well as to provide more control as to whether the b_cont and b_next fields are followed.

The -l flag previously was used to cause stream mblk to follow the b_next fields for more mblks. This no longer works. Instead, the -c flag causes the b_cont field to be followed, and the -n flag causes the b_next field to be followed. If neither are provided, a single mblk is displayed. If both are provided, the b_cont is followed first.

The new -s flag to stream mblk, instead of displaying the mblks, will cause a summary of the mblks seen by count and size between db_base and db_lim of the linked dblks.

This flag will also give a summary by count and size of the db_frtnp->free_func if db_frtnp is set.

stream [-d] summary

The stream summary command will now handle the addition of the -d flag. If -d is included, only streams, queues and syncqs with data in them will be displayed.

tlist Subcommands

tlist subcommands can now be chained together to indicate that the specifiers be ANDed together. For example:
tlist module zfs call cdev_ioctl
will find all threads that have the zfs module in their stack AND the function cdev_ioctl.

As part of this change, aliases were added as follows:

subcommand alias
arg arg
call call
cmd cmd
module module

The original subcommands will be removed in a future release in favor of the aliases.

Additionally, the subcommands may be preceded by a !, which causes the selection to be reversed, e.g. !swapped would select threads which are not swapped out instead of those that are.

This can also be useful for further specifying a call subcommand, such as:

CAT(vmcore.0/11X)> tlist call zfs_read !call zfs_range_lock
…
  52 matching threads found
    with function "zfs_write" in its stack AND
    without function "zfs_range_lock" in its stack
Thus it selects threads which have zfs_write in its stack but not zfs_range_lock.

tlist -f

The -f flag was removed from tlist in favor of the global scatenv thr_ignore_free setting.

tlist arg on x86

This command now works on x86 using the function arguments stored in the stack. Since stack arguments are only saved on Oracle Solaris 11+, it only works on those versions.

On x86, there is no concept of "local" registers, so the "-l" flag does not enable searching local registers stored in the stack as it does on SPARC.

sdump/sarray/slist/slistt/savl/stree/shash/skma/softstate -n

This new flag for these commands causes the data dumped to be converted from network byte order to host byte order via ntoh*(). The flag may be enabled for all commands by enabling the scatenv typedb_dump_ntoh setting.

Note that this is only performed on numeric fields, and is ignored for fields which are known pointers.

sdump/stype/sarray/slist/slistt/savl/stree/shash/skma/softstate and fields

Fields may be specified by name or offset, and multiple entries are specified with a comma-separated list. If any field specification starts with a '!', the whole list is treated as a list of fields to exclude.

thread locks

This new subcommand displays a list of locks being waited for by threads. Only mutexes, rwlocks, UPI mutexes, and page locks are included.

The threadlist is walked, and any threads waiting to get a lock are summarized by lock address. The output includes any locks for which more than the scatenv thr_locks_min setting threads are waiting.

After each displayed lock, a list of caller functions is included, showing the function which called the locking code. This list is also limited by the thr_locks_min setting. It is not pedantic about this, and if there's just one left that's got less than thr_locks_min, it will be displayed anyway.

This command does not work on Oracle Solaris 8.

thread summary

A count of threads reading or writing a vnode and vfs is now included in the output. vnodes and vfses with less than the scatenv thr_rw_min setting are not displayed individually. It is not pedantic about this, and if there's just one left that's got less than thr_rw_min, it will be displayed anyway.

thread tree

A new subcommand for thread was added to help display the interrelationships between threads. This tree shows dependencies that come from locks, being pinned by an interrupt, doors client/server relationships, and threads in dispatch queues to onproc threads.

This command does not work on Oracle Solaris 8.

zone -L

Some of the information previously displayed with zone -l is now only displayed if the new -L flag is used. This was changed to reduce the output from zone -l.

The zone_zsd data is currently the only part added by -L at this time.

zfs -lz <zio_addr> zio

The new -l flag for zfs -z <zio_addr> zio causes the single zio at <zio_addr> to be displayed in more detail.





scatenv Changes

builtins_only

If a command is given which is not a shell builtin, your PATH is searched for the command. If the new scatenv builtins_only setting is enabled, such non-builtins result in an error instead.

If builtins_only is enabled, this can be bypassed by starting a command with an exclamation point (!) to indicate that the external command is intentional.

This setting is disabled by default.

cpu_walk_array

This flag was renamed from alternate_cpu_walk.

func_arg_types

Function argument types are now used for more than the stack, such as the callout, taskq, iommu, intr, clock cyclic, and syscall display in thread output.

Therefore, the setting was renamed from stk_arg_types to func_arg_types. The old name will still work for now, but will be removed in a future release.

kma_stat_allocations

In kma stat output, the allocations columns makes the output wider than the average 80 column screen, yet adds very little useful information.

This new setting allows hiding those columns in the output, and allowed the other relevant ones to be widened so the columns are not misaligned for large caches.

This setting is disabled by default.

pdump_max_scan

Since the packet headers should be in the early part of a dblk's data, a limit has been added on how much of the data to scan for network headers. This both provides a limited sanity check, and improves the performance on large buffers being scanned.

At most, the first pdump_max_scan bytes of a buffer will be scanned for network headers. This defaults to 200.

pkma_min

This scatenv setting was renamed from pdump_min_pkt since it controls pkma -s behavior rather than pdump, plus limits all results rather than just packets.

Use of this value is no longer pedantic in that if printing another line of results would print all of the results, they are printed instead of a line with a count of how many more there are.

Additionally, it is not adhered to if there is room to print a few more results on a results line already started.

proc_contract

This flag causes the -c flag to be forced on for the proc command, meaning the contract information is always displayed.

By default, this is disabled.

proc_lwpcnt

This flag causes the time column in proc output to be replaced by the process's p_lwpcnt.

It is enabled by default.

proc_project

This flag causes the -j flag to be forced on for the proc command, meaning the project information is always displayed.

By default, this is disabled.

proc_task

This flag causes the -k flag to be forced on for the proc command, meaning the task information is always displayed.

By default, this is disabled.

proc_zone

This flag causes the -z flag to be forced on for the proc command, meaning the zone information is always displayed.

By default, this is disabled.

sanity_zfs_quota_pct

Any ZFS filesystems using more than this percentage of their quota are reported. This is in addition to any being over their quota being reported.

The default is 95%.

str_flag_short

Flags in stream previously were displayed one per line with an explanation of each. With this flag set, all the values are displayed on a single line with no explanation. This is a way to shorten the overall stream output.

It is enabled by default.

sym_near

This scatenv setting was renamed from near_symbol.

thr_flag_short

Flags in thread/tlist output previously were displayed one per line with an explanation of each. With this flag set, all the values are displayed on a single line with no explanation. This is a way to shorten the overall thread/tlist output.

It is enabled by default.

thr_ignore_free

This new setting causes tlist, thread [summary|locks|tree], and stack summary to ignore threads which have their t_state set to TS_FREE.

It is enabled by default.

thr_ignore_idle_pause

Each CPU has an assigned idle and pause thread. Most of the time, these threads are uninteresting, and can distract from threads which might be relevant. If this flag is set, CPU idle and pause threads are ignored by tlist thread [summary|locks|tree], and stack summary.

It is disabled by default.

thr_locks_min

The new thread locks command will only display locks which have more than the scatenv thr_locks_min threads trying to acquire it. It also limits the list of caller functions listed with any displayed lock to caller functions with this number or more threads at that point in the function.

It is not pedantic about this, and if there's just one left that's got less than thr_locks_min, it will be displayed anyway.

The default value for this setting is 10.

thr_rw_min

This setting changes how thread summary displays its count of threads in read/write per vnode/vfs. If a given vnode or vfs has less than thr_rw_min threads reading or writing it, it is not displayed, but instead included in a summary count at the end of the list.

It can be set to a high value to only see the summary, or to 0 to see all vnodes/vfses being read from or written to. The default is 10.

It is not pedantic about this, and if there's just one left that's got less than thr_rw_min, it will be displayed anyway.

thr_use_panic

Much of the tool checks whether it's displaying the panic_thread, and changes its behavior to use panic-related data to display its stack. However, in some cases, this obfuscates important of the data for that thread. Disabling the thr_use_panic flag causes it to treat the panic_thread as a regular thread for the purposes of displaying its stack.

It is enabled by default.

typedb_pointer

The typedb_charp flag was replaced with this one, and now controls whether pointer types have further information displayed about them when available.

This includes:

type information
char * string
refstr_t * string
struct vnode * v_path string
struct vfs * vfs_resource and vfs_mntpt strings
dev_info_t * devi_node_name strings up through devi_parents

It is enabled by default.

typedb_dump_ntoh

This setting forces the sdump/sarray/slist/slistt/savl/stree/shash/skma/softstate -n flag to be turned on.

It is disabled by default.

vnode_summary_min

This setting changes whether vnode summary and vnode vn_cache display a given vnode. If a given vnode is seen less times than this setting, it is not displayed.

It can be set to a high value to only see the summary, or to 0 to see all vnodes. The default is 10.

It is not pedantic about this, and if there's just one left that's got less than thr_rw_min, it will be displayed anyway.






Deprecated Command Removal

The following previously-deprecated commands have been removed:

removed command replacement
base scatenv base
bigdump bufc buf list
bigdump inode inode list
bigdump idleq inode idleq
bigdump dnlc dnlc list
bigdump dwbuf buf dw
bigdump tmpfs tmpfs
bigdump ssfcplog ssfcplog
bigdump fplog qlcfc
bigdump vfs findfiles






INDEX

Monday Dec 12, 2011

Dumping Stacks

Stack dumping seems like it should be an easy thing.  After all, you're just walking a linked list.  What the truth is that stack dumping in Oracle Solaris Crash Analysis Tool is one its most complicated parts.

Sure, generating a list of function/frame pointers is pretty easy, but it turns out there are vast amounts of useful data and clues in that stack that it's useful to dig up and display (cue the digging pirate).

The first thing to consider is that when Oracle Solaris takes a trap (any kind of fault, exception or interrupt, which includes system calls), it has to save all of the registers to the stack.  This happens on system calls, interrupts, and of course, traps caused by approaching system crashes.  The reason this is done is to remember the state of the current thread before totally switching contexts to a different stack, and the operating system, therefore, needs to preserve all of the registers should it ever return so it can restore state and continue.

That's the reason you will often see register dumps in the stack.  Getting the register values at a precise point in time is valuable information - particularly when it contains state data about something going wrong and we're preparing to crash the operating system.  

You can see this trap information by enabling the stk_trap scatenv setting, which is on by default.  You can also see the user thread registers for userland threads in a core by displaying the full stack data with stack -l.

Other things which arise are architecture-specific, and related to how the compiler tries to optimize things for a particular architecture.  On SPARC, for example, a useful optimization is re-using stack space for functions we'll never return to.

For example, functions which look like:

  funcA() {
    return (funcB());
  } 
  callerFunc() {
    funcA();
  }

have no reason to return to funcA ever again.  What the compiler does in such situations is to pop the stack space it reserved for funcA for funcB's use.  We coined the term "recycled frame" for this.

The way this would normally appear in the stack is this:

  funcB+offset()
  callerFunc+offset()

which means the person looking at this stack has no information about funcA ever having been called.  If they then went to look at callerFunc, they'd see it never calls funcB.  But, if you look at callerFunc+offset, you can see a call to funcA, which isn't the next thing in the stack.  Hence you can see why the tool was enhanced to show:

  funcB+offset()
  funcA() - frame recycled
  callerFunc+offset()

for these cases.  Display of the stack with recycled frames is controlled by the stk_recycled scatenv setting, which is on by default.

Note that if there are two such optimizations in the calling sequence, there's no useful way to dig up anything but the first step.  The codepath command was written to search for call linkages between such functions.

Something similar is also done for leaf functions.  Leaf functions are functions which never call anything else.  An optimization done for those is that they don't have to necessarily do a save if they're short enough to operate in the volatile global and output registers (%g and %o on SPARC).

Knowledge of this is used when walking the stack as the "next" function down the stack will use the same stack pointer as we are using now.

There are other interesting bits done in the stack which are worth noting.  For example, as part of the ABI on SPARC 64-bit, stack pointers are offset by a number so as to make them easily distinguishable from 32-bit stack pointers.  That is the STACK_BIAS, which is 2047.  So if you find an address that's in the range of stack addresses, but is misaligned (in this case, it ends up with a "1" as the last digit), it probably needs the STACK_BIAS added to it to get the number you actually want.  Note that 32-bit SPARC, and x86/x64 use a STACK_BIAS of 0.

For convenience, the frame pointers that Oracle Solaris Crash Analysis Tool uses already have the STACK_BIAS applied. 

Another useful concept to know is MINFRAME.  The ABI defines how functions pass arguments between functions, and on SPARC, when a save instruction is run, it saves some space for input (%i) and local (%l) registers of the caller to be stored, plus space for output registers (%o) for any functions it calls (note that for leaf functions, this is sometimes optimized to skip the output registers - something we've tagged a MINIFRAME).

Knowing that a function uses MINFRAME is a good clue that this is a short function, and makes no use of local variables beyond what it can get away with using the local and input registers.

Knowing when we switch between stacks is also useful information.  This means we've switched from the userland stack space to kernel, or from one kernel stack to another.  This happens on traps, and also happens when we've detected a stack overflow - if we're out of stack space where we are, we still need stack space to deal with it.  In Oracle Solaris Crash Analysis Tool you can see stack switches with the stk_switch scatenv setting.  It also tells you details about whose stack it is, such as the thread's kernel stack, a CPU's interrupt or idle thread's stack, or the ptl1_stk.

The ptl1_stk is a special space used for dealing with panics when we're already processing a trap.  When we process a trap, we first switch to the kernel's nucleus, which is low-level code for handling all the details required switching between userland and kernel.  However, when we are already in the nucleus on SPARC, and we take a trap, we could be processing a kernel stack overflow, which means we're in trouble in the low-level code, and Oracle Solaris sets aside a special stack space for dealing with those on SPARC - the ptl1_stk.

Stack overflows are another interesting area where the stack dumper can help.  Kernel stack space is a limited resource - typically only 1-2 pages of memory.  Kernel code needs to be aware of this, and not allocate too much stack space for local variables.  Problems still arise here, and you can examine stack space usage by enabling scatenv settings stk_s_fromend and stk_s_size (both disabled by default).  stk_s_fromend shows how far each frame is from the end of the stack.  stk_s_size shows the size of each frame. Each kernel stack also has an unmapped page of vmem assigned to it at the end so that any accesses past the end of the stack trigger a page fault which can't be resolved and thus results in a panic.  That page is referred to as the redzone.  This prevents stack overflows from corrupting a neighboring thread stack.

One of the most useful things the stack dumper can do is display arguments passed into a function.  On SPARC, arguments are passed in registers. However, those registers are re-used by the callee, and thus can't be relied upon to determine what was passed to the function. The passed-in values can often be determined by examining the assembly code in the caller to see what it put in the output registers (input registers for leaf functions).  Doing that manually is time-consuming, even if you've had a lot of practice.

The scatenv setting stk_args causes the stack dumper to attempt to calculate those passed-in values for you, and display it in the stack. It isn't perfect, and can't always determine the arguments, but saves a lot of time in most cases.  It only works for SPARC at this time.

There are a few other more obscure scatenv settings which control how the stack dumper behaves. 

  • stk_l_sym - decodes any numbers to a kernel symbol if possible in the long stack output
  • stk_l_symonly - any numbers that can be decoded as a kernel symbol are displayed as only the kernel symbol - without the  number
  • stk_s_addr - displays the address of each frame in the normal stack output
  • stk_s_regs - displays the values of the input registers (%i0 - %i5) in the stack output (less useful with the stack arguments available)
  • stk_s_sym - decodes any numbers to a kernel symbol if possible in the normal stack output
  • stk_trap_mmu_sfsr - display and decode the mmu_sfsr information available in SPARC trap frames
  • stk_trap_tstate - display and decode the tstate information available in SPARC trap frames
There is a vast amount useful information available in a thread stack.  The Oracle Solaris Crash Analysis Tool stack dumper has many options to control how much and which information is displayed - probably more than anyone will ever need.  However, new pieces come up all the time and will be added.

Monday Dec 05, 2011

scatenv

Have you ever wondered if there was a way to get the Oracle Solaris Crash Analysis Tool to always run a set of commands when it starts?  And have you ever seen one user's command output look different from the output printed when you run the same command? 

The SCATRC File

When the tool starts, it checks your home directory for a file named .scatrc.  If it finds one, it runs all the commands listed.  For example, here's the author's .scatrc:

export HISTFILE=.scathist$USER
alias less=more
set -o vi
scatenv human_readable on
alias ibase="base -i"
alias obase="base -o"
scatenv minsize 0x1000000
scatenv dis_synth_only on
scatenv dis_synth_cc on
scatenv dis_br_label on
scatenv stk_switch on
scatenv str_syncq on
scatenv str_data on
scatenv sym_size_full on
scatenv thr_stkdata off
scatenv thr_pri on
scatenv thr_lwp off
scatenv thr_cpu off
scatenv thr_age on
scatenv thr_syscall off
scatenv thr_flags off
scatenv table_whitespace off
scatenv scroll 0
color background light
alias cpuc=cpu | grep "cpu id"

As you can see, the purpose is to initialize Oracle Solaris Crash Analysis Tool with the settings I'm accustomed to using.  Given that the tool based on ksh, you can also set ksh environment settings, such as the editing mode, set ksh environment variables for later use, or create command aliases.

SCATENV

In the .scatrc, you see many settings made using the scatenv command which is used to query and change the settings of the environment settings used by the tool.  To see the complete list of settings available, simply issue the scatenv command with no options. Since the scatenv settings can be either boolean, a number value, or a string, the variable type is provided in the command output.  Luckily, one can search for applicable settings using the -? option.  For example, to find all settings that involve threads, one would use:

 CAT(vmcore.0)> scatenv -? threads   
    Flag Name    Current  Type  Description
                 Setting        
    dispq_empty  on       on    When displaying dispatch queues, also show  
                                CPUs that have no threads in their dispatch 
                                queues.
    thr_age      on       on    Show age information when dumping threads  
    thr_cpu      off      on    Show CPU information when dumping threads
    thr_flags    off      on    Show flag information when dumping threads
    thr_idle     on       on    Show idle time when dumping threads
    thr_lwp      off      on    Show lwp information when dumping threads
    thr_pri      on       on    Show priority information when dumping threads
    thr_proc     on       on    Show process information when dumping threads
    thr_stime    off      on    Show t_stime when dumping threads.
    thr_stkdata  off      on    Show stack related information when dumping
    thr_syscall  off      on    Show syscall information when dumping threads
    thr_wchan    on       on    Show wchan information when dumping threads

 To set an environment flag, simply enter:

scatenv flag_name setting

where:

  • flag_name  - the name of the flag in question.
  • setting - the value to assign to the flag.

Tuesday Nov 29, 2011

Oracle Solaris Crash Analysis Tool 5.3 now available

Oracle Solaris Crash Analysis Tool 5.3

The Oracle Solaris Crash Analysis Tool Team is happy to announce the availability of release 5.3.  This release addresses bugs discovered since the release of 5.2 plus enhancements to support Oracle Solaris 11 and updates to Oracle Solaris versions 7 through 10.

The packages are available on My Oracle Support - simply search for Patch 13365310 to find the downloadable packages.

Release Notes

General

blast support

The blast GUI has been removed and is no longer supported.

Oracle Solaris 2.6 Support

As of Oracle Solaris Crash Analysis Tool 5.3, support for Oracle Solaris 2.6 has been dropped. If you have systems running Solaris 2.6, you will need to use Oracle Solaris Crash Analysis Tool 5.2 or earlier to read its crash dumps.

New Commands

Sanity Command

Though one can re-run the sanity checks that are run at tool start-up using the coreinfo command, many users were unaware that they were. Though these checks can still be run using that command, a new command, namely sanity, can now be used to re-run the checks at any time.

Interface Changes

scat_explore -r and -t option

The -r option has ben added to scat_explore so that a base directory can be specified and the -t option was added to enable color tagging of the output. The scat_explore sub-command now accepts new options. Usage is:

scat --scat_explore [-atv] [-r base_dir] [-d dest] [unix.N] [vmcore.]N
Where:


-v Verbose Mode: The command will print messages highlighting what it's doing.
-a Auto Mode: The command does not prompt for input from the user as it runs.
-d dest Instructs scat_explore to save it's output in the directory dest instead of the present working directory.
-r base_dir Instructs scat_explore to save it's under the directory base_dir instead of the present working directory. If it is not specified using the -d option, scat_explore names it's output file as "scat_explore_system_name_hostid_lbolt_value_corefile_name."
-t Enable color tags. When enabled, scat_explore tags important text with colors that match the level of importance. These colors correspond to the color normally printed when running Oracle Solaris Crash Analysis Tool in interactive mode.

Tag Name Definition
FATAL An extremely important message which should be investigated.
WARNING A warning that may or may not have anything to do with the crash.
ERROR An error, usually printer with a suggested command
ALERT Used to indicate something the tool discovered.
INFO Purely informational message
INFO2 A follow-up to an INFO tagged message
REDZONE Usually used when prnting memory info showing something is in the kernel's REDZONE.

N The number of the crash dump. Specifying unix.N vmcore.N is optional and not required.

Example:


$ scat --scat_explore -a -v -r /tmp vmcore.0
#Output directory: /tmp/scat_explore_oomph_833a2959_0x28800_vmcore.0
#Tar filename:     scat_explore_oomph_833a2959_0x28800_vmcore.0.tar
#Extracting crash data...
#Gathering standard crash data collections...
#Panic string indicates a possible hang...
#Gathering Hang Related data...
#Creating tar file...
#Compressing tar file...
#Successful extraction
SCAT_EXPLORE_DATA_DIR=/tmp/scat_explore_oomph_833a2959_0x28800_vmcore.0

Sending scat_explore results

The .tar.gz file that results from a scat_explore run may be sent using Oracle Secure File Transfer. The Oracle Secure File Transfer User Guide describes how to use it to send a file.

The send_scat_explore script now has a -t option for specifying a to address for sending the results. This option is mandatory.

Known Issues

There are a couple known issues that we are addressing in release 5.4, which you should expect to see soon:

  • Display of timestamps in threads and clock information is incorrect in some cases.
  • There are alignment issues with some of the tables produced by the tool.


Friday Aug 28, 2009

Crash Dump Info Extractor

Though the Solaris Crash Analysis Tool (CAT) script scat_explore has existed in prior releases of the Solaris CAT, release 5.1and now 5.2 includes a stand-alone mode that allows users to extract crash data without having to run Solaris CAT first as well as a send_scat_explore feature which automates the process of data collection and transmits that data to Sun.  The following is a description of how to use send_scat_explore usage and how to run scat_explore in "stand-alone" mode.

send_scat_explore

Sending crash data to Sun requires an open Sun Service Request (SR). Once a valid SR is open, the customer can then run /opt/SUNWscat/bin/send_scat_explore to send the crash data.

send_scat_explore usage is as follows:

send_scat_explore [-n sr_number] [-e email] [unix.x] vmcore.x

    Where:

        \* -n sr_number - sets the Sun Service Request number
        \* -e email          - sets the reply-to email address that Sun
                                   should use to acknowledge the receipt of the
                                   data.
        \* [unix.x] vmcore.x - the crash dump from which crash data should
                                   be gathered. Please note that unix.X need not
                                   be supplied and the core number, X, can be
                                   specified with or without the vmcore. prefix.

If the above -n and -e options are not specified, the user is prompted for Sun SR number and reply-to address.

Note that if the reply-to address is not specified on the command line, send_scat_explore looks to see if a reply-to address has been saved in the Sun Explorer configuration and it will offer that address as a response, see the example below.

Example 1: Using send_scat_explore without options:

#/opt/SUNWscat/bin/send_scat_explore vmcore.0
   Found a reply address in Sun Explorer settings...
   Email address for replies? [someone@a.com]: me@a.com
   Sun Service Request number: XXXXXXXX
   Collecting scat_explore from core file instance 0...
   #Extracting crash data...
   #Successful extraction
   Sending scat_explorer data file ./oomph_1775b1b7_0xf93_vmcore.0 for SR 51234567 to dreap@sun.com...


Example 2: Using send_scat_explore with command line options:

# /opt/SUNWscat/bin/send_scat_explore -n XXXXXXXX -e me@a.com vmcore.0
  Collecting scat_explore from core file instance 0...
  #Extracting crash data...
  #Successful extraction
  Sending scat_explorer data file ./oomph_1775b1b7_0xf93_vmcore.0 for SR XXXXXXXX to dreap@sun.com...


scat_explore

scat_explore is a script included with Solaris CAT which extracts crash data from a crash dump.  When the --scat_explore option is issued to Solaris CAT, the crash dump is opened and scat_explore is run. The collected crash data is saved in a directory with the crash dump and the  directory name is displayed. scat_explore also saves a compressed tar archive of the crash data in this directory.
The scat_explores usage is:
scat --scat_explore [-v] [-a] [-d dest] [unix.N] [vmcore.]N
Where:


-v Verbose Mode: The command will print messages highlighting what it's doing.
-a Auto Mode: The command does not prompt for input from the user as it runs.
-d dest Instructs scat_explore to save it's output in the directory dest instead of the present working directory.
N The number of the crash dump. Specifying unix.N vmcore.N is optional and not required.

Example:

$ scat --scat_explore -a -v 0
#Output directory: ./scat_explore_ebsmro2_808cc87b_0xde2d09d_vmcore.0
#Tar filename:     scat_explore_ebsmro2_808cc87b_0xde2d09d_vmcore.0.tar
#Extracting crash data...
#Gathering standard crash data collections...
#Panic string indicates a possible hang...
#Gathering Hang Related data...
#Creating tar file...
#Compressing tar file...
#Successful extraction
SCAT_EXPLORE_DATA_DIR=./scat_explore_ebsmro2_808cc87b_0xde2d09d_vmcore.0


Solaris CAT 5.2 Released

We're happy to announce the release of Solaris CAT 5.2.  It can be downloaded from the Sun Download Center.  Here's the release notes:

General

Solaris 2.6 Support

This version of Solaris CAT is the last release to support Solaris 2.6. If you have systems running Solaris 2.6, please note that that release reached it's end-of-service-life in July 2006. You should plan an upgrade soon.

New Commands

ctf

When a type is specified on the command line, it can include an optional module to retrieve the type information from. If no module is specified, the modules in the core are searched in order for the specified type.

This new command allows specifying a list of modules to be searched prior to scanning the module. Modules are specified using a colon-separated list.

It can also be used to display the current list, or to clear it.

qlcfc fplog|ssfcplog

This new command dumps the qlc logs. If the fplog subcommand is specified, the fp_logq is displayed. If the ssfcplog subcommand is specified, the ssfcp_logq is displayed.
scat_version

scat_version has been added so that the release/version of the currently running Solaris CAT instance can be easily retrieved.

Interface Changes

dev id

This new subcommand was added to simplify decoding ddi_devid_t (impl_devid_t) structures in the kernel and display the string representation of the devid.

refclock

On a live system, the default is now to use the (possibly scaled) results of gethrtime(). Additionally, under some circumstances, lbolt, and lbolt64 are now inappropriate, and under those conditions, will be disallowed. Using curtime is also now disallowed where the ct_curtime field is not available in the callout_table structure.

tlist affinitiy <cpu>

This new subcommand of the tlist displays the threads that have an affinity set for a CPU. If <cpu> is specified, then only thread with affinity for the specified <cpu> are displayed.

scat_explore options

The scat_explore sub-command now accepts new options. Usage is:

scat --scat_explore [-v] [-a] [-d dest] [unix.N] [vmcore.]N
Where:


-v Verbose Mode: The command will print messages highlighting what it's doing.
-a Auto Mode: The command does not prompt for input from the user as it runs.
-d dest Instructs scat_explore to save it's output in the directory dest instead of the present working directory.
N The number of the crash dump. Specifying unix.N vmcore.N is optional and not required.


Tuesday Mar 03, 2009

Release 5.1 re-spun

A few folks inside Sun asked if there was anything we could do to make the Solaris Crash Analysis Tool package(s) smaller.  Sure! We just left the package "fat" to ease debugging but folks don't always have the disk space to install a monster analysis tool.  We've therefore, re-spun release 5.1 as 5.1b with a much smaller footprint.   The bonus is that since we had to spin new packages, we decided to add the bug fixes that we've added since the original release.  

Therefore, if you pulled a copy of 5.1, you might want to revisit the download page and pull a copy of 5.1b. 

The installed packages are now:

Solaris CAT 5.1b x86/x64
~20MB
Solaris CAT 5.1b SPARC
~50MB
Solaris CAT 5.1b Combined
~70MB


Tuesday Feb 10, 2009

Stand-alone Sanity Checks

One new feature in Solaris CAT 5.1 is the ability to run the tool's sanity checks without having to fully start the tool.  You can do this using the --sanity_checks option. For example:

# scat --sanity_checks vmcore.0
sanity checks: settings...
NOTE: /etc/system: ce:ce_taskq_disable set to 0x1 2 times
NOTE: /etc/system: module ge not loaded for "set ge:ge_intr_mode=0x833"
vmem...CPU...
WARNING: CPU0 has cpu_intr_actv for PIL 1
WARNING: TS thread 0x3000f36f520 on CPU3 using 98%CPU
WARNING: TS thread 0x33732a2a7e0 on CPU515 using 98%CPU

sysent...clock...misc...
WARNING: 213 severe kstat errors (run "kstat xck" )
WARNING: tmpfs filesystem on /tmp using 4.66G virtual memory

NOTE: kcage_freemem < kcage_lotsfree
WARNING: 1 pending softcalls (no softlevel1 interrupt queued)
done

These are the same checks that are run when the tool first reads the crash dump, as well as when the coreinfo command is used.

What is even better is that, with the proper permission, e.g., you are running the command as root, this can be run on the live system at any time to get a snap shot of the overall "health" of the system.  For example, sanity check output for a healthy system would look like:

#scat --sanity_checks
sanity checks: settings...vmem...CPU...sysent...misc...done


Once you've installed the tool, complete details on the sanity checks can be found in /opt/SUNWscat/docs/sanity.html.

NOTE: In some cases things like disk drivers generate a few errors in kstats during initialization.  If Solaris CAT reports kstat counts of one or two hits for a device, be sure to research the devices kstats before assuming there's a  problem. Within Solaris CAT, a simple thing to try is "kstat xck" to run a cross check of the kstats. You can also display kstats outside Solaris CAT using Solaris' kstat(1M) command.


Thursday Feb 05, 2009

Solaris CAT 5.1 Now Available

It's always fun when you can beat your own goals and release something early.  The bits made it through the tests, the legal tasks got completed, and the packaging was done so why wait until next week when one could "pull the trigger" today.  We're, therefore, happy to announce that Solaris CAT 5.1 is now available for download here. The release notes are provided in /opt/SUNWscat/docs/relnotes_5.1.html after you install the package or in our last blog entry.

If you have comments, questions, want to report a bug, or request an RFE, please feel free to send us a note at SolarisCAT_Feedback@sun.com.

Wednesday Feb 04, 2009

Solaris CAT 5.1 Release

As promised, we'll be releasing Solaris Crash Analysis Tool updates every six months or so.  Yes!  The final release process for Solaris CAT 5.1 is reaching its end and you'll soon be able to download the latest and greatest bits. Come back here on Feb 15th and you'll likely see the release announcement.

Though this release mostly addresses bugs and added functionality to support the latest changes in the Nevada/OpenSolaris kernel there are a few new features.  Here's the release notes:

General

Solaris 2.4, 2.5, and 2.5.1 Support

This version of Solaris CAT no longer supports Solaris 2.4, 2.5, or 2.5.1. Please use the 5.0 version of the tool if support for those Solaris releases is required.

FMRI Reporting

In Solaris Nevada/OpenSolaris build 86 and up, the FMRI string for the SMF service is maintained with each proc. Therefore, the FRMI string for that service is now displayed with the command name for all threads and procs.

Solaris Volume Manager Scans for Active Data Set

On Solaris 9 and up, the svm command now scans for a MD set that has active devices. This means that the command increments the set it is using until it finds a set that contains metadevices. The command will start with the set defined, 0 by default. The command displays a message when the set number is changed. For example:

SolarisCAT(vmcore.14/10U)> svm
Solaris Volume Manager Status (md_status):
    MD_GBL_DAEMONS_LIVE (Master daemon has been started)
    MD_GBL_OPEN (Administration is open)

Active Metadata Set(s):
    Set   Address     Name  Status
    0     0x7003ef08  null  MD_SET_SNARFED MD_SET_NM_LOADED
    1     0x7003ef78  foo   MD_SET_SNARFED MD_SET_NM_LOADED

 SVM's md_set 0 is empty, trying set 1 instead.

d0 (ms_unit @ 0x6001468c8c8) (md_set 1) Concat/Stripe
    actual # blocks:      113207296 (53.9G)
    Unit Status:          Okay
    Stripe 0:
    Device       Starting Block  State
    239(did),96  0               Okay

New Commands

CYCLIC

cyclic

This new command displays the cyclic at the address specified. That address is sometimes referred to as a cyclic_id.

This command ONLY works on Solaris 9+.

SLISTT

slistt

This new command displays kernel list_t linked lists and the structures linked by them. The address provided needs to be the address of a list_t structure. This is often embedded in other structures. In those cases, the offset into the structure of the list_t should be added to the structure address for printing.

This command ONLY works with CTF data.

Interface Changes

callout

This command was substantially changed in its command-line interface for consistency, and consistency with the kernel code.

By default, all tables are displayed. Instead of the rt and ts options, there are now the flags -r for selecting the realtime callouts and -n for selecting the normal callouts.

Options were added to include relevant structure addresses (-a), to decode the callout's argument into the thread or process it represents (-t) or to display only expired callouts (-e).

Finally, an option was added to display only a specified callout. The callout is selected using its XID.

scat --sanity_checks

The new command line argument --sanity_checks can be used to run a quick check of a running system or crash dump file. The intent is to allow easy access to these extensive checks without having to run Solaris CAT interactively.

send_scat_explore
scat --scat_explore

In release 5.1, one can now run scat_explore in a quasi standalone mode using the --scat_explore option to scat. For customers who open a Sun Service Request, crash data can now be gathered and transmitted to Sun using the send_scat_explore command. The syntax for send_scat_explore is:
send_scat_explore [-n service_number] [-e email_address] [unix.x] vmcore.x

Where:

  • -n service_number - sets the Sun Service Request number to assign to the crash data
  • -e email_address - sets the reply-to email address that Sun should used to acknowledge the receipt of the data.
  • [unix.x] vmcore.x -the crash dump from which crash data should be gathered. Please note that unix.X need not be supplied and the core number, X, can be specified with or without the vmcore. prefix.

If the above -n and -e options are not specified the user is promted for them at run time.

If the system in question is not configured to send email directly to Sun, the crash data can be collected manually using scat --scat_explore. The scat_explore feature will print the name of the dirctory in which the data was placed and will also place in that directory a compressed tar archive of the crash data.

Wednesday Jun 18, 2008

Solaris Crash Analysis Tool 5.0 Release Notes

Are you sitting down?  We're very close to releasing Solaris CAT 5.0 to the public.  We're waiting for one more approval and we'll push it out.  The plan is to release the package at the end of June but I think we'll make mid-July. As an appetizer, here are the release notes for 5.0 so you can see what's coming.

General

x86/x64 Support

The tool now supports the analysis of core dumps from Solaris 10 and above. This requires that they are analyzed on an x86 or x64 system for 32 bit core dumps and an x64 system for 64 bit core dumps.

Currently, the stack display does not include arguments. The disassembler is also under construction.

Sun Cluster Support

The clust command has been added to retrieve cluster specific information. The new command syntax is as follows:

clust [flag] <cmd> [arg]
Where:


abuf reports available dbg options which can be enabled
addr reports important cluster addresses
ebuf reports enabled dbg buf which can be displayed
dbuf [-a] <dbg_buf> dumps specified dbg_buf, -a dumps possible previous data from from buffer when it has wrapped wrap
delta <tm> calculates delta between hrtime and tm (timestamp) found in the debug buffers.
did finds and reports our did devices
hb reports heartbeat data
invo [-t|-s|-i] reports invocation flags add thread reporting
members reports current cluster members
path reports current paths and associated data
pm reports path manager data
rdbg <\*dbg_buf> dumps specified dbg_buf (no size calculation)


Zone Support

A new zone command has been added to display zone information in the core. Additionally, the proc and vfstab commands now have options to include the zone name for the process/vfstab entry, or to select processes or vfstab entries by zone. The new project, rctl, and task commands also have per-project options.

New Structure Command Flags

New flags were added to the sdump, slist, sarray, shash, and skma commands to enable display of more details of structures elements. Those include:

option function
-a display the address of each structure element
-o display the offset of each structure element
-s display the size of each structure element
-m display the CTF module of each structure element
-t display the type of each structure element
-i display the index of each structure array element
-g display any gaps between structure elements
-e <size> allows specification of the array element size (sarray and shash on ly)
These options are only available on the CTF versions of structures, and not the stabs-based definitions.

Additionally, new options were added to the stype command:

option function
-o display the origin of the type. This could be from the CTF data in the core, or the stabs file name.
-d display all of the CTF versions of the type, including the names of the modules which include that version
-m display the module for each element of the type shown
-g display gaps between structure elements

stype xck will compare all available CTF versions of a type and display a list of the differing versions of the type in the core with a list of the modules which use that version. The parenthesized numbers show the number of elements, followed by the size of the type.

Additionally, types which are "dissonant" are shown. If two version of a type are dissonant, then they are not only different, but they have differing offsets or sizes for the same named members of that type.

Finally, if there is a region of memory where a pointer to another known type was found, the "stype field" subcommand will help search for types which exist in the CTF data which has that type at a known offset.

Deprecated Commands

Some commands which have been documented as deprecated for a long time have now been eliminated from the tool. Specifically:

deprecated command
replacement
summary
thread summary
findstk
stack find
fminfo
toolinfo
The findstk command still exists as an alias, although it cannot be searched for with the help command.

Additionally, the bigdump command is now deprecated.

deprecated command
replacement
bigdump bufc
buf list
bigdump inode
inode list
bigdump idleq
inode idleq
bigdump ireuse
inode ireuse
bigdump dnlc
dnlc list
bigdump dwbuf
buf dw
bigdump tmpfs
tmpfs


User-defined Symbols

Support has been added for user-defined symbols. These will supersede any in-core symbols that would normally be shown. This allows specification of names that might be useful to the user for regions of memory, including treating it an an array. See the symbols, section for further details.



New Commands

autofs

This new command displays the autofs configuration in the core by traversing the fnnode tree.

cv

This new command lists the threads waiting for a condition variable.

clust

The clust command has been added to retrieve cluster specific information.

nfs

This new command lists the NFS shares for the system.

pool

This new command displays the pools present in the system. It can also be used to get information about a specific pool. Pools may be selected by name, pool id, or pool structure address.

project

This command prints a table of projects on the system. It can also be used to list the tasks in a project. A project may be selected by name, project id, or kproject structure address, or those in a zone specified by name, zone id, or zone structure address.

rctl

This command is used to display resource controls. They can be displayed by rctl structure address, rctl_set structure address, or for a process specified by PID or proc structure address.

sleepq

This command simply lists the threads waiting in a specified sleepq specified by the provided sleepq_head structure address.

task

This command prints a table of tasks on the system. It can also be used to display processes in the task. A task may be specified by task id, task structure address, project id, or kproject structure address.

xmmu

This command is similar to the sfmmu command for displaying kernel virtual to physical address translation structures, but is for the x86/x64 architecture. It can be used to display PTEs, hments, htables, ptables, hat structures, or searching for translations for a specified virtual address.

zfs

The zfs command has been added to retrieve zfs specific information.

zone

This command lists the zones, including their id, name, root path, and status. If the -l flag is included, more data about the zone is displayed. If the -z <zone> option is included, then only the specified zone is displayed. The <zone> may be specified a s a zone address, id, or name.

Interface Changes

color

The color command's options for fatal, warning, info, info2, error, and alert now include the ability to set background colors and attributes.

Two new color classifications were added: redzone and bad. The former is used to indicate the portion of data which is supposed to be the redzone. The latter indicates "bad" data such as data which should be 0xdeadbeef but is not in "kma buf" output.

The basic command format is:

color <class> <foreground> <background> <attrubutes>
The <foreground> and <background> may be any one of black, red, green, yellow, blue, magenta, cyan, and white. The options are positional, so the word none can be used in place of a color to indicate not to set a color.

Attributes supported are bold, faint, italic, underline, blink, reverse, and strikethrough. These may be combined in a comma-separated list.

Some examples are:

color fatal red none bold
color warning none none underline,bold
color bad white red
color alert green
color info white green
Note that attribute support is dependent on the terminal emulator in use. For example, some of the attributes, faint, italic and strikethrough in particular, may not be supported by the terminal emulator you are using. Some may display these attributes differently, such as using a brighter color instead of making the font bold. Other possibilities is that the terminal emulator may not support certain combinations of attributes.

cpu -c [<pgroup>]

In Solaris Nevada, the chip structure and notation was eliminated. A new facility called the processor group (pgroup) was implemented in its place. The -c flag's meaning was changed to mean a pgroup specifier instead of a chip specifier on that version of Solaris.

cpu -h [<pghw_type>]

This new flag allows display of CPUs by their hardware sharing relationship. This matches against the pghw_hw type field, and dumps the CPUs by this grouping. Valid <pghw_type>s are PGHW_IPIPE, PGHW_CACHE, PGHW_FPU, PGHW_MPIPE, PGHW_MEMORY, and PGHW_CHIP. This feature only works on Solaris Nevada.

If the <pghw_type> is omitted, then a summary of the hardware relationships is displayed, only listing the CPU IDs and their troups. For example, from a system with an UltraSPARC-T1:

PGR_PHYSICAL, class:cmt(id:1), PGHW_IPIPE
  0/0: 0 1 2 3
  3/68: 4 5 6 7
  4/70: 8 9 10 11
  5/72: 12 13 14 15
  6/74: 16 17 18 19
  7/76: 20 21 22 23
  8/78: 24 25 26 27
  9/80: 28 29 30 31
PGR_PHYSICAL, class:cmt(id:1), PGHW_FPU
  1/0: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
PGR_PHYSICAL, class:cmt(id:1), PGHW_MPIPE
  2/0: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
The X/Y represents the pgroup ID and the instance number.

dev node [<addr>]

This new option for dev displays the tree of in_node structures rooted at e_ddi_inst_state.ins_root, which is a fully-populated parallel tree to the dev_info tree.

findfiles

Two new options have been added to this command to allow searching for files based on vnode, and based on vnode's v_vfsp.

The new -n option for findfiles allows searching for processes using files whose vnode matches the supplied <vnode_addr>. This matching includes a namenode's nm_filevp and nm_mountvp, a fifonode's fn_realvp and fn_dest, and an snode's s_realvp.

The new -f option for findfiles allows searching for processes using files whose vnode's v_vfsp matches the supplied <vfs_addr>. This matching includes a namenode's nm_filevp's v_vfsp and nm_mountvp's v_vfsp, a fifonode's fn_realvp's v_vfsp and fn_dest's v_vfsp, and an snode's s_realvp's v_vfsp.

findval

The flags for findval have been altered to make the size specifiers into single-character flags.
old (bits)
new (bytes)
-8
-1
-16
-2
-32
-4
-64
-8

A new flag, -L has also been added. If a match is found in a kmem cache, that cache's slab freelist and magazines are scanned to determine whether the matching buffer is free or allocated.

flip

The -va option has been replaced by -V. Correspondingly, the -pa option has been replaced by -P.

A new option was added to allow specification of the size of the object to be read for the (implicit) -a case where a value is read from the core. By default, sizeof(ulong) is read and displayed. The -S <size> flag allows specification of 1, 2, 4, or 8-byte chunks of data to be read instead.

ifconf

The ifconf command's output now includes IPMP group names.

kma stat

In the kma stat output, an asterisk at the end of the line for a cache is now used to indicate that setting KMF_AUDIT in kmem_flags would not make it into this cache's cache_flags due to the cache_flags and cache_cflags set. This means that setting KMF_AUDIT in kmem_flags wouldn't allow kma users to be run against this cache.

If KMF_AUDIT is already set in kmem_flags, the asterisk then indicates whether KMF_AUDIT is set for that cache.

kma users -b <ltstamp> <htstamp>

This new flag for kma users allows selection of only data from the specified time range [ <ltstamp>, <htstamp> ). The specified range is compared against the kmem_bufctl_audit_t's bc_timestamp. The values are in the same units as the bc_timestamp field.

kma users -f

By default, kma users processes allocated buffers. The new flag -f causes it to process free buffers instead. This is accomplished by walking the cache's magazines and slab freelists. Note that the cache must have one of the KMF_BUFTAG flags enabled to be able to get from a given free buffer to its kmem_bufctl_audit structure which contains the stack and thread data which kma users needs.

kstat xck [<filename>]

The xck subcommand causes the entire set of kstats to be walked, looking for any that match a class/module/name specification. Any that match are checked against a condition, and if they fail that check, they cause a message to be displayed, and the values which triggered it.

By default, the list of specifications are kept in the file <installdir>/lib/kstat_xck. You can specify a different rules file on the command line. When it successfully loads a rules file, it reports the number of rules found, then runs the checks.

The format of the rules file is a colon-separated list of the following six fields:

class
the kstat class
module
the kstat module
name
the kstat name
rule
the rule checked - see below for details
severity
the severity of the rule if it fails, 0-9. This may be used to emphasize higher (numeric) severities over lower.
message
the error message to be displayed if the rule fails
The comment character is '#'. If a kstat matches the class, module, and name specified, then the rule is applied to the kstat. If the class, module, or name are empty, then they match anything.

The rule is matched against the kstat_named_t entries in the kstat. A rule can specify <name><comparator><value> where <name> is used to match against the kstat_named_t's name, and it's value is then compared against <value>. <comparator> can be any of ">", "<", ">=", "<=", "==", or "!=". <value> can be an integer, the name of another kstat_named_t within that kstat, or a string enclosed in quotes.

<value> may also include a percentage calculation for a name.

Some examples:

device_error:::Hard Errors>0:9:device had hard errors
any kstat's whose class is "device_error", matching any module or name, and has a kstat_named_t named "Hard Errors" which has a value greater than zero, report "disk had hard errors" using severity 9.

An example of a hit on this rule would be:

st54,err:Hard Errors>0(33):device had hard errors
This shows the name of the kstat that triggered the report, the check that failed, the value in parentheses, and the error message.

Another example:

net:::duplex!="full":9:not full duplex
matches class "net"'s kstat_named_t of "duplex" and has a string value that is not "full", and reports "not full duplex" at severity 9.

A third example:

::biostats:buffers_locked_by_someone>1%buffer_cache_lookups:9:
This matches any kstat named "biostats" and compares a kstat_named_t with the name "buffers_locked_by_someone" against 1% of one named "buffer_cache_lookups", and reports if it's greater at severity 9. In this case, no specific error message is printed, just the rule and values.

mdump

mdump now prints a leading asterisk (\*) instead of a line which is a repeat of the previous line. This is done to help point out "interesting" data. See the scatenv mdump_compression section for further details.

Additionally, mdump now displays the "next" address at the end of its output. This is done to show what the "next" address would be, and, in the case where we ended with multiple compressed lines, to show that it continued to the end of the requested data.

mdump/rd\* -P

The new -P option to the mdump and all of the rd commands allows reading from the core or system by physical address instead of virtual address.

memerr

Two new flags were added to this command which can be useful when the errorq data isn't in the "normal" places. The -raw flag causes all queues to be processed, which may include old errors, or possibly even junk.

The -dump flag causes the errorqs to be processed using the eq_dump element, which should contain crashdump-related elements.

meminfo tree <proc>

This version of the meminfo command works similarly to the user option, but only walks the process tree under the specified <proc>, giving that trees totals.

meminfo user <cmd>

Previously, you could select processes by their process address or PIDs. You can now instead match by the command string. This is done by matching the psargs for the process against the provided <cmd> substring.

The process address/PID specification is distinguished from the command substring based on the first character. If it is numeric, it is considered a process address/PID list, otherwise, a command substring.

meminfo -m user

This new option to the meminfo command will display the memory layout for all the processes on the system.

nvlist

This command was added to display the contents of an nvlist structure.

page color

This new option to the page command will examine the page_freelists and page_cachelists and display a list of how many pages are in each of the per-color buckets, organized by page size.

page frag [<sizecode>]

This new option to the page command walks the page_counters arrays and counts the free base constituent pages for a given pagesize. The output shows a number of free base constituent pages, followed by a count of the number of pages of size <sizecode> that have that many constituent pages free.

For example, if the chart showed:

free page count
==== ==========
 512 35
 511 112
...
that would mean that there are 35 large pages that have 512 of their constituent pages free, and 112 that have 511 free.

The last line of output for pages with 0 constituent pages free also includes a count of those pages by their pagesize. Since they're entirely in-use, it's possible that they're in-use as large pages already. The list shows how many are at each pagesize. Note that pages larger than the sizecode being displayed are factored by their size.

For example, when requesting size 1(64KB):

   0 978433 8K:5927, 64K:284434, 512K:143528, 4M:104064, 32M:413150 (213M unknown)
would appear to show 413150 32MB pages, when this is more than there are in the system. So instead, that value is divided by 512, since there are 512 32MB pages in a 64KB page. The result is:
   0 978433 8K:5927, 64K:284434, 512K:17941, 4M:1626, 32M:808 (209M unknown)
But in this case, the sum of the pages doesn't equal 978433.

The "unknown" entry is for entries in the page_counters for which a page cannot be found. These are typically parts of memory for which no page structure is allocated (memory used during boot), and thus the page size cannot be easily determined.

Two fragmentation percentages are calculated:
coalesce fragmentation - how fragmented completely-free large pages are into free constituent pages
relocate fragmentation - how non-free constituent pages are fragmenting large pages of the selected size

The coalesce fragmentation measure is based on 100% representing all constituent pages being the smallest pages. For the scaled fragmentation calculation, constituent pages which are sized between the smallest and the one being examined are counted at 1/rate where "rate" is how many smallest pages make up the current size. For example, fragmentation into 512K pages are counted at 1/64 the fragmentation of 8K pages.

For example, from the following fragmentation of 4M pages:

 512 12964 free 8K:2067328, 64K:13552, 512K:4508, 4M:8151
the calculation would be:
(2067328 + 13552 + 4508) / (12964 \* 512)
Which would make 31.4179% of the total possible fragmentation. For the scaled fragmentation, using the same numbers, the calculation would be:
(2067328 + 13552 / 8 + 4508 / 64) / (12964 \* 512)
which would make 31.1724% scaled coalesce fragmentation.

Relocate fragmentation is measured by summing the slots using this formula:

  (count of pages in this slot / total number of partially-free pages) \*
  (count of subpages free for this slot / number of subpages in a page)
Fully in-use (non-free) pages at slot 0, are not counted.

Unless specified, the <sizecode> used is mmu_ism_pagesize, or mmu_page_sizes-1 if mmu_ism_pagesize is not available. This command is only supported on Solaris 10 and above.

page mappers -P

The -pa flag for page mappers has been replaced by the -P flag.

pkma -fslL

Two new options were added to the pkma command.

By default, only allocated buffers from the specified cache are scanned for packet matches. The -f flag causes both allocated and free buffers to be scanned.

The -s flag causes pkma to display a summary of the packet data seen. This includes counts of the following:

ethernet type
ethernet source
ethernet destination
IPv4 protocol
IPv4 source
IPv4 destination
IPv6 protocol
IPv6 source
IPv6 destination
ICMPv4 type/code
ICMPv6 type/code
TCP source port
TCP destination port
UDP source port
UDP destination port
(R)ARP op
ARP sender ethernet
ARP target ethernet
RARP sender ethernet
RARP target ethernet
ARP sender IP
ARP target IP
RARP sender IP
RARP target IP

The output for each is shown with a count of occurrences in brackets. For example:

ethernet type: 0x800(IPv4)[316983], 0x806(ARP)[4738]
The new scatenv setting pdump_min_pkt affects how many of a given item must be seen before it is displayed. That is, if less than pdump_min_pkt would be displayed in the brackets, the entry is not displayed. See the pdump_min_pkt section for more details on this setting.

The -l flag adds two more categories:

source packet half
destination packet half
The -L flag further adds the whole packet as a category.

proc

The proc command has some options added to display processes' zone (-z), project ID (-j), task ID (-k), or contract ID (-c).

Additionally, processes may be selected by using the uppercase versions of the above flags. To select processes by zone, use -Z <zone_name>. Processes within a project may be selected using -J <proj_id>. Similarly, processes within a task may be selected with -K <task_id>. Finally, processes may be selected by contract ID with -C <contract_id>.

sarray/shash -e <size>

When dumping arrays or the array of hash buckets, the array members may be separated by something other than the element size. This new option allows specification of the size that the command will increment the address of array elements.

This will be used for both the calculation of the start element, and all the rest of the elements.

scatenv alternate_cpu_walk

By default, most of the time when the CPU structures are walked by the tool, it starts at cpu_list and walks the list of CPUs via the cpu_next links.

If that list is broken, then this setting may be turned on to change how the list of CPUs is obtained. With this enabled, it instead uses the cpu array of pointers to CPU structures.

scatenv dis_instr_bytes

Due to x86/x64 instructions being variably-sized, it can sometimes be useful to see the actual bytes that make up the instruction. When this is enabled, the bytes of the instruction are displayed with it.

scatenv dis_instr_size

Due to x86/x64 instructions being variably-sized, it can sometimes be useful to see the size of the instruction. When this is enabled, the size of the instruction is displayed with it.

scatenv dispq_empty

The dispq command previously displayed only CPUs with threads in their dispatch queues. This change shows all CPUs unless the scatenv setting for dispq_empty is enabled. By default it is disabled.

Additionally, if any threads are pinned by the cpu_thread, they are also shown in the dispq output. Finally, if the cpu_dispthread doesn't match the cpu_thread, and hasn't shown up in the pinned thread(s), it is also shown.

scatenv mdump_compression

mdump now compresses repeated lines to help find "interesting" data in the output. It now prints a leading asterisk (\*) instead of a line which is a repeat of the previous line. Further repititions are omitted. This behavior can be reverted to the original by disabling the mdump_compression setting.

This also effects the output of kma buf and panic kmem similarly.

mdump_compression is enabled by default.

scatenv pdump_min_pkt

This setting affects how many of an item in pkma -s must be seen before it is shown in the output. The default is 100. See the pkma -s section for more details on that command.

scatenv thr_microstate

Some of the microstate information is now present in thread/lwp output if the TP_MSACCT flag is set in the t_proc_flag - the t_mstate, ms_prev, ms_state_start, and ms_start.

These are also displayed if the thr_microstate setting is enabled.

Enabling thr_microstate also causes the ms_term, and the ms_acct fields to be displayed, detailing the time spent by the thread in various states.

seg softlock

This new subcommand lists segments with softlocks, and a total of softlocks active in the system.

sema -L

Similar to rwlock -L, this new option searches segkp for threads with the specified semaphore, eliminates those in the sleep queue for that semaphore, and lists those remaining as possible semaphore holders.

slist

The slist command now accepts a type of "none". This is intended to allow walking linked lists where the offset of the link pointer is known, but the type is not.

Only the element number and address are displayed for each. The -c flag also works with this type name.

For type "none", the link offset must be specified as a number.

skma -f

skma dumps buffers which are allocated by default, and if the -f flag is used, then both free and allocated buffers are displayed. This new option to skma allows dumping only the buffers within a specified cache which are free.

stream [-l|-s] [-d] squeue [<squeue_addr>]

Solaris 10 has a new streams feature called squeues. This new subcommand of the stream command allows examination of the squeues present.

In its base form, stream squeue lists all the squeues on the system, including the decoded fields in the structures.

The mblks present in the squeues may be displayed by specifying stream -l sqeueue. This display honors the str_data and str_pdump flags - thus it will not display the mblks unless both -l is specified and str_data is enabled.

A one-line summary of squeues present is displayed if the -s flag is specified. This includes a summary of the mblks present on the squeue. This flag overrides -l.

If -d is included, only squeues which have data (mblks) in them are displayed.

An squeue may be specified by including <squeue_addr>. This will still honor any flags included, and if -d is specified for an squeue which has no data, then nothing will be shown.

svm -i

Sometimes the crash copy of the SVM data may have invalid device sizes. The "-i" option was added to instruct svm to ignore these errors and display info on the meta device.

symbols

New subcommands were added to the symbols command to allow the user to specify their own symbols. The command has three options: add, del, and list.

The add subcommand is how user-defined symbols are created. By default it creates a word-sized symbol. The user can additionally specify a size if something other than word size is desired.

If the symbol size is specified, the number of elements can be specified. In this case, the region is treated as an array with each element the size specified.

The del subcommand is used to remove symbols from the user-specified symbol table.

Finally, the list subcommand is used to list the entries in the user-specified symbol table.

tlist killed

This new subcommand for tlist dumps threads whose process has SKILLED set in their p_flag, indicating that a SIGKILL has been posted to the process.

This is implemented in tlist rather than proc since often the process won't be present in the process list any longer, yet the threads linger in the thread list, holding a pointer to the involved process.

tlist pctcpu

This new subcommand for tlist dumps threads which hae a t_pctcpu greater than the specified value. If no value is specified, 90% is used.

vfstab

The vfstab command now includes an option (-F <fstype>) to specify the filesystem type. This will only list filesystems of the specified type <fstype>.

The device or mount point may now also be specified directly in the command. If specified, only filesystems which exactly match the string will be displayed.

An option (-z) was also added to display the zone associated with each entry. Entries in the vfstab may also be selected by the zone using -Z <zone_name> option.

whatis -P

The -pa flag for whatis has been replaced by the -P flag.

svm [-s <set>] [-d <devnum>]

The -d option was added to support translation of Solaris Volume Manager's minor device numbers. In some cases these device numbers will seem odd and this option should help by providing the metaset wherein the device is defined and device instance within that metaset.

On Solaris 9 and up, the metaset name can now by provided as an argument to the -s option.

The structures associated with SVM change frequently. This causes Solaris CAT to dump incorrect information. We are in the process of changing the SVM code to use CTF instead of static structures. In this release the metaset and metadb info is now read using CTF.

Monday May 22, 2006

Who or what is Solaris CAT?

Solaris CAT is a crash analysis tool (C.A.T, get it?) that a small team of dedicated engineers in Sun's Support Services kernel team have been developing over the past 10 years. It's been available internally for ages, and with the 4.0 release we got the go-ahead to provide it to you, our customers, via SDLC.

We've also got version 4.1 on SDLC but that doesn't support Solaris 10.

We've had a few hiccups along the path to getting 4.2 out for you and we also branched to start a 5.0 series. The 5.0 series will also contain support for machines running Solaris 10 and Solaris Nevada. That Solaris 10 / Nevada support includes x86 and x64/amd64 too, btw.

Let me say now, no, we're not going to provide x86 support for Solaris releases prior to Solaris 10. The engineering work involved is too large for a volunteer team such as ourselves to do in addition to our day jobs. Believe me, we've tried. In fact, the x86 porting effort first got started in 2001 but never really made any headway until late in 2003 when Solaris 10 unified a whole heap of kernel interfaces and structures.

More news when there's stuff that's fit to print.... and don't forget there'll be film at 11.

Ciaociao.

About

Danaf-Oracle

Search

Categories
Archives
« July 2015
SunMonTueWedThuFriSat
   
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
 
       
Today