What’s New in the Solaris Modular Debugger (MDB) in the Oracle Solaris 11.4.81 CBE

The Common Build Environment (CBE) release for Oracle Solaris 11.4 SRU 81 is now available via “pkg update” from the release repository or by downloading the install images from the Oracle Solaris Downloads page. As with the first Oracle Solaris 11.4 CBE, this is licensed for free/open source developers and non-production personal use, and this is not the final, supported version of the 11.4.81 SRU, but the pre-release version on which the SRU was built. It contains all of the new features and interfaces, but not all of the final rounds of bug fixes, from the 11.4.81 SRU.

The previous version was the CBE for 11.4.42, so there’s more than 3 years worth of changes between these two releases. A list of changes throughout the system is provided in the earlier blog What’s New in Oracle Solaris 11.4.81 CBE. But there’s so many changes in the Solaris Modular Debugger (MDB) in this release, that we split them out into a blog of its own. Further details on many of these can be found in the mdb(1) or kmdb(1) man pages. For those not familiar with MDB, we also provide the Oracle Solaris Modular Debugger Guide, but many of these changes are not yet reflected in it.

The main What’s New blog also includes a section of CTF changes that make it easier to build binaries containing the CTF type data that MDB uses to decipher data in programs.

Base behavior

mdb handling of ambiguous symbol names

MDB now has better handling of ambiguous symbol names in Solaris 11.4.75 and later. There are 4 kinds of ambiguous symbol names that are now handled:

The same name is used for a global symbol in multiple modules.
The same name is used for multiple local symbols in the same module.
Capability symbols, where the linker chooses the symbol based on the runtime environment.
Filter symbols, where a module provides a name that can be resolved but the implementation is provided in a different module.

In prior releases mdb would, when looking up symbols by name, handle cases 1 and 2 by returning the first symbol that matched the criteria given. Thanks to the default way the address space is laid out that tended to resolve names in the a.out first and then libraries.

Now attempts to access symbols using names that are not sufficiently qualified to be unique will result in an error which will include the fully qualified names that can be used:

mdb_target> head=p
mdb: head: not a unique symbol name
mdb: libc.so.1`popen.c`head                                : 0x7fc919fb7a10
mdb: libc.so.1`yp_match.c`head                             : 0x7fc919fc9608
mdb: mdb_target`head                                       : 0x7fc91a79eaa0
mdb_target>

If obj`file`name does not uniquely describe a symbol then an additional index is added after the file:

mdb_target> substr::dis
mdb: substr: not a unique symbol name
mdb: libc.so.1`collstr.c`1`substr                          : 0x7fe5f5a8b980
mdb: libc.so.1`collwstr.c`substr                           : 0x7fe5f5a8d410
mdb: libc.so.1`collstr.c`2`substr                          : 0x7fe5f5aa8480
mdb_target>

If you want to have symbols from a module not be considered when checking for ambiguous symbol names the modules can be added to the ignore_mods list via: ::set -o ignore_mods=+car.so This will ignore all the symbols from the car.so module.

The search_defer_mods option is a colon separated list of modules to defer searching when performing a symbol name lookup. Modules can be specified as full paths or the basename of the module. Symbols from these modules are only used if no non-zero sized symbols are found in other modules with the name given.

When debugging userspace programs, ld.so.1 is added to the search_defer_mods option by default. It can be removed using: ::set -o search_defer_mods=-ld.so.1

If you try and access a symbol that has capability symbols mdb will report this:

mdb_target> memcpy::dis
mdb: memcpy: not a unique symbol name
mdb: libc.so.1``memcpy                                     : 0x7ff1e37410c0
mdb: libc.so.1`memcpy%avx2                                 : 0x7ff1e38a9660
mdb_target>

Attempting to set a breakpoint with the unqualified symbol name will work as the user expects, selecting the same capability for the symbol as the runtime linker does. If you need to select a particular symbol you can now use the qualified name:

mdb_target> libc.so.1``memcpy::dis -n 4
      libc.so.1``memcpy:        movq   %rdx,%r8
    libc.so.1``memcpy+3:        movq   %rdi,%rcx
    libc.so.1``memcpy+6:        movq   %rsi,%rdx
    libc.so.1``memcpy+9:        movq   %rdi,%rax
mdb_target> libc.so.1`memcpy%avx2
      libc.so.1`_memcpy%avx2:        movq   %rdx,%r8
    libc.so.1`_memcpy%avx2+3:        movq   %rdi,%rcx
    libc.so.1`_memcpy%avx2+6:        movq   %rsi,%rdx
    libc.so.1`_memcpy%avx2+9:        movq   %rdi,%rax
mdb_target>

The behaviour can be controlled and even disabled using the setting symname_index. That can be set to one of three values:

full	This is the default for mdb.
ambiguous	This is the default for kmdb. Only the ambiguous names are stored.
off	This turns off the name index and disables ambiguous name checking.

Command line editing modes

You can now select the style of command line editing to use in Solaris 11.4.57 and later. On startup mdb reads the following environment variables: MDB_EDITOR, EDITOR, FCEDIT, VISUAL, until it finds one that contains either the string emacs or vi and sets that as the default command line editor. This can be overridden with either the -o editor={emacs|vi} command line or ::set option.

Vi-mode editing supports all the common mdb movement commands, search and yank and delete buffers both named and automatic. The ::editor dcmd will give any editor specific sub commands and help.

Context (thread) local variables

In Solaris 11.4.69 and later, mdb now supports context (thread) local variables. By default if you create a variable with ::typeset it will be created as a context local variable.

7::typeset bike will create a context local variable called bike with a default value of 7. If no default value is provided, the default defaults to 0. If you then write to that variable while stopped in thread 1, eg: 8>bike, and then continue and stop in thread 2, the value will be 7. If you then stop in thread 1 the value is 8.

This changes the behaviour of ::typeset. If you want to use ::typeset to create a global variable you must use the +C option or set the option typeset_global with ::set or the command line. ::vars -C will display all the context local variables for all contexts that don’t have the default value.

Context-local variables are useful when debugging multi-threaded programs and the kernel where you wish to save away data for later use. For example you can save the first argument to a function for use later:

bike> ::typeset bike
bike> gear_select::bp -c '<rdi>bike;:c'
bike> gear_select::bp -oc '<bike::printf "%s Gear %d\n" bike_t b_name
<MDB_RETVAL::time -a;:c'

Named constants

Starting in the 11.4.51 release, mdb now has the concept of named constants. These are immediate values that can be declared on the mdb command line, and more commonly by modules, and then referenced by name, similar to symbols. In both the kernel and user space, constants have been created for all the errno values, signals, and dkio ioctls. The new :constant dcmd can be used to add, delete, or list constants.

This means you can now do things like:

::walk zio | if zio_t io_error == ECKSUM | ....

::walk zio | ::grep "*/zio_t io_error/. == ECKSUM" | ...

New options for handling forked processes

When you debug a live process that forks or spawns a new process in Solaris 11.4.78 and later, you will be offered an option to leave the process that you don’t follow stopped, allowing you to attach another debugging session to that:

$ mdb /usr/bin/ksh
Loading modules: [ libc.so.1 ]
ksh> ::run
mdb: spawn detected: follow parent/child: p/c, or stop in parent/child: P/C P
mdb: target spawned child process 3169 (debugger following parent)
mdb: spawn detected, following parent: (l)eave stopped or (r)elease child process? l
Leaving child process 3169 stopped. You must continue it using prun 3169 or mdb -p 3169
mdb: target stopped at:
libc.so.1`__systemcall6+0x1e:   popq   %r10
mdb: You've got symbols!
Loading modules: [ ld.so.1 ]
ksh:3168*>

Then in another session you can attach to the other process and debug that as well:

>% mdb -p 3169
mdb: target performed spawn of /usr/bin/tty
Loading modules: [ ld.so.1 ]
tty:3169*> ::status
debugging PID 3169 (64-bit)
file: /usr/bin/tty
status: stopped on exit from spawn system call
tty:3169*> :c

If you don’t want this behaviour then you can set the option other_fork_branch_mode=release and the other process will be released to run as it used to.

The follow_fork_mode option accepts two new values in Solaris 11.4.78 and later:

stop_parent: The debugger follows the parent process, detaches from the child process and then stops the parent on exit from the system call.
stop_child: The debugger follows the child process, detaches from the parent process and then stops the child on exit from the system call.

The process id of the child most recently created while mdb was attached is available in the MDB_CHILD_PID variable in Solaris 11.4.57 and later. The process id of the target process is available in the MDB_PID variable in Solaris 11.4.48 and later.

Output redirection

Starting in Solaris 11.4.81, the mdb command interpreter now supports output redirection to files in command pipelines, using > and >> much like most shells, to overwrite or append to the specified file.

For example, instead of the previously supported method of:

::echo "I want this output in a file" ! cat > /tmp/echo.log

You can now omit the cat command and just run:

::echo "I want this output in a file" ! > /tmp/echo.log

ptl1_regs alias

A new ::ptl1_regs alias was added to help debug TL1 panics on SPARC systems. It will print all the stored registers on all the CPU’s where the ptl1 handler was called. You may also directly specify the cpu_t address if you want the contents from the specified CPU.

Return breakpoints and Below events

In Solaris 11.4.69 and later, mdb and kmdb now have support for return breakpoints and below events. Return breakpoints will fire when the function on which they are set returns. To match the syntax for other events that support firing after the event has occurred they are set using the -o flag. For example:

bicycle_wait::bp -o

will fire when bicycle_wait returns, in much the same way as if you had stopped in bicycle_wait and then done ::step out, however this can be automated such that:

bicycle_wait::bp -oc "<MDB_retval::if -e :c == 0"

will only stop if bicycle_wait returns non-zero. The variable MDB_retval is a new alias for the register that contains the return value from a function. So this will work on SPARC and intel, 32 and 64 bit.

Below events are events (breakpoints, system call breakpoint, watch points, fault breakpoints and signal breakpoints) that will only fire if you are below a point on the stack.

bicycle_wait::bp -B

will only fire if bicycle_wait is called below the current stack frame in the current thread. Once the thread returns from the current frame the event is automatically cancelled.

bicycle_wait::bp -b bicycle_io

will only stop when bicycle_wait is called as an a descendant of bicycle_io. Putting the two together:

bicycle_wait::bp -ob bicycle_io -c "<MDB_retval::if -e :c == 0"

will only stop when a call to bicycle_wait does not return zero as a descendant of bicycle_io.

Additionally there are 3 new mdb variables and a new option:

`MDB_retaddr`	This is the address to which the current frame will return.
`MDB_retbp`	When stopped on a return breakpoint, this is the address that was used to trigger the return breakpoint. For example, when this event fires: `bicycle_wait::bp -o` <MDB_retbp will be the address of `bicycle_wait`
`MDB_retval`	This is an alias for the register that contains the return value from a function, as discussed above.

There is a new option that can be set with ::set -o:

context_events Set the default for new events to be per context (thread). To get per process events use the +C option when specifying the event.

Searching for kernel crash dumps

Before Solaris 11.4.69, when given a numerical argument mdb only searched the current directory for a system crash dump with that suffix. The 11.4.69 release extended that so that it also searches in the current directory for a directory named after that suffix, and then searches in there. If that fails it then searches in the directory specified for crash dumps by dumpadm(8) and again looks in a subdirectory named after the suffix. Only then will it declare that it can not find a dump.

If mdb is given the argument ‘latest’ and there is no file called ‘latest’ in the current directory mdb will follow a similar search as above. This time looking in the current directory and then in the system crash dump directory, for a directory called ‘latest’ and then finds the newest system crash dump in that directory.

Type casting

When de-referencing pointers in Solaris 11.4.66 and later, you can now cast a pointer’s type:

bike:101327*> <rdi::array shed_bikes shed_nbikes | map *. \
    | if b_type == PEARSON | print -t b_data
void *b_data = 0x504380
bike:101327*> <rdi::array shed_bikes shed_nbikes | map *. \
    | if b_type == PEARSON | print -t b_data[]
mdb: cannot dereference void type
bike:101327*> <rdi::array shed_bikes shed_nbikes | map *. \
    | if b_type == PEARSON | print -t b_data(pearson_t *)[]
pearson_t b_data(pearson_t *) = {
     const char [10] b_data(pearson_t *)->pearson_name = [ "Fixie" ]
}
bike:101327*>

Although print is used in this example, casting is much more useful when used with ::if and */type member/. operations. Unlike C casting, in mdb the cast comes after the member that is being cast. See Casting in the mdb(1) man page and the help for ::if, ::print, et al.

dcmd changes

A debugger command, or dcmd, is a routine in the debugger that can access any of the properties of the current target. MDB parses commands from standard input, then executes the corresponding dcmds. Each dcmd can also accept a list of string or numerical arguments. MDB contains a set of built-in dcmds that are always available. You can also extend the capabilities of MDB by writing dcmds using a programming API provided with MDB.

MDB output formatting

The following new options are now provided for the ::set dcmd starting in Solaris 11.4.48:

output_format
output_level

The option output_format takes the type of output desired, currently either output_format=xml or output_format=plain. The option output_level can take three values: default, verbose, and debug, where default is the least output and debug the most.

::print, ::enum, ::status, ::stack, and ::stackregs were made parsable.

Once output_format is set to xml then the output at the default output_level from a command that has not been modified to support parsable output is encoded so that it is parsable. The switch of format always happens after the current commands have all completed. For that reason, it’s not possible to write an alias that switches to XML output and then produces XML.

j format character

The formatting dcmds in Solaris 11.4.78 and later now take a format character of j for “jazzed-up” binary unsigned long long (8 bytes), similar to the illumos mdb. Using =j, each set bit shows an associated ‘mask’ value, which can be useful in quickly knowing if a specific radix-16 bit is set:

> 1234=j
                1001000110100
                |  |   || |
                |  |   || +---- bit  2 mask 0x0004
                |  |   |+------ bit  4 mask 0x0010
                |  |   +------- bit  5 mask 0x0020
                |  +----------- bit  9 mask 0x0200
                +-------------- bit 12 mask 0x1000

z format character

The new z format in Solaris 11.4.72 and later will byte swap 64 bit values in the same way H and h byte swap 32 and 16 bit values respectively:

mdb_target:108641*> ::formats swap
H - swap bytes and shorts (4 bytes)
h - swap bytes (2 bytes)
z - swap bytes, shorts, and ints (8 bytes)
mdb_target:108641*> 4379636c696e672e=z
                 2e676e696c637943
mdb_target:108641*>

::ctfdump – dumping CTF types

A new dcmd named ::ctfdump was added in the 11.4.57 release. This dcmd is primarily targeted on the CTF types and IDs from ctf_file_t containers on various places. These CTF containers may be part of the mdb itself, its target modules, from the target, or directly from the target process address space containing a ctf_file_t container in memory. See ::help ctfdump for details and examples.

::history – Persistent history for mdb

mdb has long had a command line history, but it was not preserved over invocations of mdb. In the 11.4.57 release, mdb was extended to have a ::history command that allows the listing, saving and loading of command line history.

There is a new option savehist that when set results in the history being saved. For mdb it will be saved when mdb exits and for kmdb it will be saved when the kernel is continued. The savehist option is set by default.

For mdb the name of the history file can be set via the MDB_HISTFILE environment variable or the histfile option. The name of the history file for kmdb is /var/share/user/0/.kmdb_history and can not be changed.

::if & ::sort control over ADI normalization

Starting in 11.4.66, the ::if and ::sort dcmds take arguments of +a or -a to force enabling or disabling the ADI normalization of a member.

::kill added to kmdb

In Solaris 11.4.78, kmdb now has a ::kill dcmd that allows sending a signal to a given process when the kernel is continued. When the signal is delivered it will have an si_code of SI_KMDB. See the kmdb(1) man page for details and available options.

::mgrep new search capabilities

Starting in 11.4.72, ::mgrep can now do misaligned searches and can be used to search for strings and byte sequences. (Everywhere you use ::kgrep or ::ugrep you can use ::mgrep so we only discuss ::mgrep here.)

Misaligned searches use the -f flag (for “fuzzy”).

mdb_target:108641*> 0x73656c63796369::mgrep -s 8 -f
0x7fdacd2f04c2f
mdb_target:108641*>

On x86 systems the default search is now misaligned, to do an aligned search use the +f flag.

Searching for strings is done via the -S flag:

::mgrep -v -S "campagnolo is the answer"

searches for the string "campagnolo is the answer". Adding -E searches for the NUL terminated string. Adding -i makes the search case insensitive.

To search for byte sequences do:

::mgrep -bvS 0x62696379636c657320617265206265737421004a6f686e20537461726c657900

See ::help mgrep for details which now has examples.

::nm updates for non-unique symbol names

The ::nm dcmd has been updated to add the ability to report symbols that do not have unique names (::nm -U); report symbols by name or names (::nm -N sym[,sym1,...]); sort symbols using any or all of the keys: bind, ctype, data, file, uqn, name, object, sz, type, val (::nm -s key[,key2,...]); and now has the following formats for the output:

`objbase`	The basename of the object.
`file`	The file for the symbol.
`filebase`	The basename of the file.
`uqn`	The unique qualified name for the symbol.

Also the default symbol table for ::nm is now the same as the target. So in the case of a stripped object it will use the dynsym table automatically. If you need to look at the symtab then there is a new -T flag that limits ::nmto just that symbol table.

See ::help nm for details.

::operators prints help on mdb operators

The new ::operators dcmd, added in Solaris 11.4.78, prints all the mdb operators with short descriptions.

::poke – a type-aware mdb dcmd for poking memory

mdb(1) contains 4 format specifiers for writing to memory:

`v`	write decimal signed int (1 byte)
`w`	write default radix unsigned short (2 bytes)
`W`	write default radix unsigned int (4 bytes)
`Z`	write hexadecimal long long (8 bytes)

and they are a semi-regular cause of errors when engineers use the wrong format for the size of the memory object that they wish to patch.

The 11.4.54 release updated mdb so that, where it can, mdb now checks the size of an element being written and verify that the size of that element matches the size being written. If the size does not match then the write will fail with a message to use ::poke to write the data. It is possible to override the error using a -f flag to the format specifier.

The preferred way to write to memory now is the new ::poke dcmd. Like ::print, ::poke is fully type aware allowing engineers to write individual data structure members, automatically writing the correct number of bytes or even bits in the case of bit fields. It also supports writing floating point values in mdb, but not in kmdb as there is no floating point support in kmdb.

Additionally ::poke is able to save the values being overwritten into mdb variables, which can then be used to put the values back. It also allows updating multiple addresses in a single command.

See ::help poke from within mdb for more information, including examples.

::printf options added

The -e option, added to ::printf in Solaris 11.4.69, sends the output to standard error, instead of the usual standard output stream.

Starting in Solaris 11.4.72, the option -i tells ::printf to treat the subsequent arguments as immediate values. The +i option stops ::printf from treating arguments as immediate values. There can be multiple -i and +i arguments allowing immediate and non-immediate arguments to be mixed in a single line.

::stack, $c, & $C changes

Starting in Solaris 11.4.48, ::stack, $c, and $C will now append a ‘?‘ to any function arguments that they can’t be certain are correct (i.e. for 64-bit x86 targets if they haven’t been built with preserve_args and CTF).

They also now take the following extra flags:

`-A`	Do not print annotations
`-r`	Print registers if available
`-t`	Print type information if it is available
`-v`	Verbose, print the frame pointer
`-x`	Always print values as hex

By default when they can determine the type of an argument they will print them correctly, with annotations. Signed values are printed correctly in decimal unless the -x option is specified. While they all take a -v option that is the default for the $C dcmd.

::status distinguishing live dumps

Starting in the 11.4.63 release, when using mdb on a kernel core dump created with savecore -L, the ::status dcmd will print Live system dump instead of claiming the savecore process triggered a kernel panic.

::time and ::hrtime enhancements

In Solaris 11.4.57 and later, ::time and ::hrtime are now available for all targets. ::time now takes a -o flag to describe precisely what you want it to output. The clocks supported are:

The target’s real time clock. (Time since the Epoch)
The target’s high res clock (the time returned by gethrtime() et al)
The target’s unscaled high res clock. (The raw cpu high res clock)
The target’s value of lbolt (kvm targets only)
The target’s process system cpu time (proc targets only)
The target’s process user cpu time (proc targets only)
The system running mdb’s real time clock
The system running mdb’s high res clock

There are also options to convert between scaled & unscaled high res time and between lbolt & res time. See ::help time.

There are a number of mdb variables that contain useful times providing shortcuts to the times available to ::time:

`MDB_CSYSTIME`	The cumulative system time of the target’s children.
`MDB_CUSERTIME`	The cumulative user time of the target’s children.
`MDB_HRTIME`	The high resolution time when the target stopped.
`MDB_STARTHRTIME`	The high resolution time when the target started.
`MDB_STARTTOD`	The real time in seconds when the target started.
`MDB_STARTTOD_NSEC`	The real time in nanoseconds when the target started.
`MDB_SYSHRTIME`	The system running mdb’s high res time.
`MDB_SYSTIME`	The system time of the target.
`MDB_SYSTOD`	The system running mdb’s real time in seconds.
`MDB_SYSTOD_NSEC`	The system running mdb’s real time in nanoseconds.
`MDB_TOD`	Shorthand for `MDB_SYSTOD`
`MDB_TOD_NSEC`	Shorthand for `MDB_SYSTOD_NSEC`
`MDB_USERTIME`	The user time of the target.

::typemap – manage and display type and name mappings

A common programming strategy is to have void pointers in data structures that contain object specific data. For example, the vnode_t structure contains void *v_data; and the type of that is unknown but will be known by the module that creates this vnode.

Another common case is where structures or unions are embedded and then macros are used to hide the extra data structure. This makes commands like <addr>::print tcp_t tcp_pipe fail, as tcp_pipe is a macro that expands to tcp_sack_info.tcp_pipe.

Most mappings will be added by dcmd modules using the type and name mapping APIs discussed below. The 11.4.66 release also added a new dcmd, ::typemap that can be used to add, remove, enable, disable, and list mappings:

::typemap -a -t <type> -c "expression" -m <member> <membertype>

When mdb encounters a <member> in a type <type> it will run the mdb expression “expression” with dot set to the address of <type> and if that returns non-zero it will set the type of the member to <membertype>. For example:

> ::typemap -a -t "struct vfs" -m vfs_data -c '.>vfs;vfssw::print [
   $[*/vfs_fstype/<vfs ]] | if vsw_name streq ctfs' "ctfs_vfs_t *"
> ::fsinfo| if vfs_fstype == 0x12 | print -t vfs_data
ctfs_vfs_t *vfs_data = 0xffffb90000952a60
>

If using variables in the expression best practice would be to use an alias so that the variable has local scope. This has not been done here for clarity.

With this change, users of mdb will be able to directly de-reference opaque pointers for which mappings have been established. The poster children for these are the vfs_data and v_data pointers in the vfs_t and vnode_t data structures in the kernel, but there are potentially many others.

> ::fs -m /var | print -t vfs_data->z_os vfs_data->z_root
objset_t *vfs_data->z_os = 0xffffb900008d85c0 (rpool/ROOT/vdev_cache-1/var)
uint64_t vfs_data->z_root = 4
>

Users will also be able to directly access structure members using their macro names:

> ffffb9000144f088::print -ta ipif_t ipif_v6lcl_addr.s6_addr
ffffb9000144f09c uint8_t [16] ipif_v6lcl_addr._S6_un._S6_u8 = [ 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0xff,
0xff, 0xa, 0xa3, 0xc0, 0x1c ]
>

See ::help typemap for more information.

Module API changes

The MDB Debugger Module Programming API allows writing your own dcmd modules that mdb can use.

Oracle Solaris 11.4.42 introduced annotations to the mdb module API, allowing modules to register help functions that can write an annotation into a buffer that represents the data structure, which will then be displayed by the :print dcmd. For example, these items show annotations in parentheses at the end of each line:

> ::walk zfs | print zfsvfs
{
    z_vfs = root ("/")
    z_parent = 0xffffa70019df0d40 ("rpool/ROOT/11.4.39.104.0")
    z_rootzp = 0xffff81066daa0d28 ("/")
    z_os = 0xffffa700265a3180 ("rpool/ROOT/11.4.39.104.0")
    ....

Modules can use mdb_register_annotation() to register callback functions that will write a string representation of the object into the supplied buffer. Callback routines can use mdb_get_annotation() to obtain annotations of external objects.

This release includes a number of enhancements to the annotation support, to make it easier for dcmd authors to provide annotations and to improve the information shown.

Annotation callback context functions

So that annotation callbacks can better determine what to display, new helper functions have been added in 11.4.72:

boolean_t mdb_annotation_is_signed(const mdb_annotation_arg_t *)

Returns TRUE if the item being annotated has a signed integer type.

boolean_t mdb_annotation_is_stack(const mdb_annotation_arg_t *)

Returns TRUE if the annotation is a for a stack trace. Stack trace annotations should be kept brief.

boolean_t mdb_annotation_is_printf(const mdb_annotation_arg_t *)

Returns TRUE if the annotation is a for ::printf.

boolean_t mdb_annotation_is_alt(const mdb_annotation_arg_t *);

Returns TRUE if the alternate form of a %s expansion was requested.

uint_t mdb_annotation_nbits(const mdb_annotation_arg_t *)

Returns the size in bits of the integer value that was read. If the value being read is a non-integer type it will return 0.

uint_t mdb_annotation_radix(const mdb_annotation_arg_t *)

Returns the radix that has been used to display the value. If the value has not been displayed or the base type is not an integer or pointer type it will return 0.

Array and bit field annotations

After the introduction of the annotations API, two issues were found where the behaviour of annotations could be improved for arrays and bit fields. The initial callback API was:

typedef size_t (mdb_annotate_f *)(uintptr_t addr, char *buf, size_t buflen,
    void *arg);

First, there was no way for a call back to know if addr represents a single object or refers to an array of objects. The poster child for this is an annotation for “char”. Given a structure like:

struct bike {
    char gears;
    char name[128];
    ...
 }

When annotating gears and name[0] the same callback would be called with the only difference being addr will be X for gears and X+1 for name[0], but for gears the annotation should just print a letter and for name[0] the useful annotation is to print the string. Now while in this simple case there were ways the callback could use the type cache to find the type X in more complex data structures, specifically ones that contain unions, it was not possible to do that with 100% accuracy.

The other area was when trying to annotate bit fields. The API only provided a byte aligned address, with no way for the call back to determine the bit offset or size in bits of the member.

The 11.4.63 release has added a new argument to the callback function:

typedef size_t (mdb_annotate_f *)(uintptr_t addr, char *buf, size_t buflen,
    void *arg, const mdb_annotation_arg_t *);

The mdb_annotation_arg_t is an opaque pointer that can only be operated on using access functions. Four access functions are provided in 11.4.63 and later:

uint_t mdb_annotation_get_index(const mdb_annotation_arg_t *);

returns the array index of this element. For gears in the above example it is 0 and for name[X] it will be X.

uint_t mdb_annotation_get_nelem(const mdb_annotation_arg_t *);

returns the number of elements that can be annotated. For gears that will be 1 and for name[X] it will be 128 – X.

If the length of the “array” can not be determined then both mdb_annotation_get_index and mdb_annotation_get_nelem will return 0.

int mdb_annotation_get_uintval(const mdb_annotation_arg_t *, uintmax_t *);

reads the correct value for this member into the uintmax_t pointer argument returning 0 on success and -1 on failure. It will only succeed if the type being annotated is an integer value. If the value is signed then the value is sign extended.

int mdb_annotation_get_float(const mdb_annotation_arg_t *, long double *);

reads the correct value for this member into the long double pointer argument returning 0 on success and -1 on failure. It will only succeed if the type being annotated is a floating point value. This interface is not available to kmdb.

Constant handling

Constants are added by default when modules register annotations that use any of the mdb_annotate_uint8, mdb_annotate_uint16, mdb_annotate_uint32, mdb_annotate_uint64, or mdb_annotate_uint callbacks.

In a small number of cases it is desirable to be able to add constants explicitly, for example when your annotation function does more than just decode a bitmask. Also, there may be cases where you do not wish your annotation to result in constants being created. The 11.4.72 release added a new function to the mdb module API:

void mdb_register_constants(const mdb_bitmask_t **, size_t);

The first argument is an array of pointers to mdb_bitmask_t arrays and the second is the number of entries in the first array. Constants are automatically unloaded if the module that loaded them is unloaded.

To support mdb_bitmask_t arrays used for annotations but not for exporting constants a new BM_NOCONSTANT option was added to the options that can be set with BM_SET_OPTIONS() in the 11.4.72 release.

mdb_annotate_enum helper function for annotating enums

There are a number of places where enum values are stored in non-enum types. Typically where space is a premium and all the enums values will fit in a smaller type. To annotate those values, rather than using mdb_annotate_uint() which requires an mdb_bitmask_t with the enum values, a new helper, mdb_annotate_enum() was added in 11.4.69.

size_t mdb_annotate_enum(uintptr_t, char *, size_t, const void *,
    const mdb_annotation_arg_t *);

The “void *” arg should be set to the name of the enum or a type that refers to an enum.

mdb_annotation_set_annotated_array function

When an annotation callback annotates an array, it needs a way to tell mdb it annotated the entire array if it has. A common example for this behaviour is when annotating arrays of char as strings. The annotation callback is passed the address of the first element of the array and then annotates the entire array. It needs mdb to know this and so not annotate subsequent elements. To this end a new API is provided in Solaris 11.4.78 and later:

void mdb_annotation_set_annotated_array(const mdb_annotation_arg_t *);

which can be called by annotation callbacks if they are annotating more than one element in an array. If the annotation calls this routine ::print will not attempt to print subsequent members of an array.

mdb_annotate_str helper function for annotating strings

A common requirement for annotation callbacks is to read a string into the supplied buffer and return the length of the string that would be returned. We added such a helper function to the module API in 11.4.78.

extern size_t mdb_annotate_str(uintptr_t addr, char *buf,
    size_t len, const void *arg, const mdb_annotation_arg_t *);

mdb_annotate_str will write a string of up to len bytes including the NUL into buf and return the length of the string that would be written if there is space. If the address supplied can not be read or does not contain a string the function returns MDB_ANNOTATE_FAIL.

The string is read from the virtual address addr + (size_t)arg, meaning it is possible to define an annotation for the structure:

struct foo {
    int foo_int;
    char *foo_string;
    ....
};

thus:

static const mdb_annotation_t ma[] = {
     {
       .ma_type = BUILD_STRING(struct foo),
       .ma_func = mdb_annotate_str,
       .ma_arg = (void *)offsetof (struct foo, foo_string)
     }
};

mdb_annotate_time helper function for annotating time

The 11.4.78 release added a new helper function for annotating times:

size_t mdb_annotate_time(uintptr_t addr, char *buf, size_t len,
    const void *arg, const mdb_annotation_arg_t *maa);

The arg must be one of:

`TAT_NSEC_RT`	The value is nanoseconds since the epoch
`TAT_NSEC_HR`	The value is nanosecond value from gethrestime()
`TAT_NSEC_HR_REL`	As with `TAT_NSEC_HR` but the value is reported relative to the time the returned by `mdb_gettime(MTT_HRSTOP, ...)`
`TAT_NSEC_ABS`	The value is an absolute nanosecond time. Eg a time interval.
`TAT_NSEC_UHR`	The value is an unscaled hrtime.
`TAT_NSEC_UHR_REL`	As with `TAT_NSEC_UHR` but the value is reported relative to the time the returned by `mdb_gettime(MTT_HRUSTOP, ...)`
`TAT_SEC_RT`	The value is seconds since the epoch
`TAT_SEC_ABS`	The value is an absolute second time. Eg a time interval.

Developers can use this to annotate time fields in data structures:

static const mdb_annotation_t zfs_annotations[] = {
        {
                .ma_type = MDB_BUILD_TYPE_STRING(uint64_t),
                .ma_struct = MDB_BUILD_TYPE_STRING(struct spa),
                .ma_member = MDB_BUILD_MEMBER_STRING(spa_load_txg_ts),
                .ma_callback = mdb_annotate_time,
                .ma_arg = (void *)TAT_SEC_RT
        }, ...
}

mdb_annotate_uint helper function for annotating integers

The 11.4.63 release adds a new helper function to annotate all integer types:

size_t mdb_annotate_uint(uintptr_t, char *, size_t,
    const void *, const mdb_annotation_arg_t *);

This one routine can and should be used in place of mdb_annotate_uint8, mdb_annotate_uint16, mdb_annotate_uint32, and mdb_annotate_uint64.

A new return code for annotation callbacks is also defined. Any annotation function that returns MDB_ANNOTATE_STOP results in no further attempts to annotate that type. The use case here is where there is an annotation for a base type, eg “char”, but you don’t want int8_t values annotated with that annotation. By adding an annotation for int8_t that returns MDB_ANNOTATE_STOP the annotation engine will not attempt to use the annotation for the base type.

mdb_bitmask_t options

mdb supports a "%b" conversion specifier which will decode a bitmap into a human readable string as described in an array of mdb_bitmask_t.

typedef struct mdb_bitmask {
        const char *bm_name;    /* String name to print */
        u_longlong_t bm_mask;   /* Mask for bits */
        u_longlong_t bm_bits;   /* Result required for value & mask */
} mdb_bitmask_t;

For a value when (value & bm_mask) == bm_bits is true bm_name is printed as part of a comma separated list of values. Since many of the bits being printed are declared as PREFIX_VALUE, developers will use "VALUE" as the name: eg:

{
        .bm_name = "ONE",
        .bm_mask = PREFIX_ONE,
        .bm_bits = PREFIX_ONE
},
{
        .bm_name = "TWO",
        .bm_mask = PREFIX_TWO,
        .bm_bits = PREFIX_TWO
}

which when the value (PREFIX_ONE|PREFIX_TWO) is rendered results in the string: ONE,TWO.

This is fine if the context where this is printed it is obvious to the user that “ONE” is really PREFIX_ONE. However in many cases, particularly in annotations and constants, that is not the case.

It is helpful to be able to display bitmasks in a similar way to the way enums are displayed:

PREFIX_{ONE,TWO}

Using the fact that a bm_mask of zero is only valid when bm_bits is also zero, since Solaris 11.4.51, mdb will allow the first entry in a mdb_bitmask_t array to take the following:

{
        .bm_name = "",
        .bm_mask = 0,
        .bm_bits = bitmask_options
}

where bitmask_options will be set with:

#define BM_PREFIX_STRIP         (1ULL<<63)
#define BM_PREFIX_COLLECT       (1ULL<<62)
#define BM_PREFIX_LENMASK       0xff
#define BM_PREFIX_SETLEN(len)   ((len) & BM_PREFIX_LENMASK)

BM_PREFIX_STRIP will result in the values being collected together and printed without a prefix or braces:

ONE,TWO

BM_PREFIX_COLLECT will result in the values being collected together and printed with a single prefix. Eg:

PREFIX_{ONE,TWO}

BM_PREFIX_SETLEN(len) will set the length of the prefix to be len, which is an 8 bit value and must be less than the length of each bm_name in the mdb_bitmask_t array. If the length of the prefix is not set then the prefix length is determined by the system.

As programming aids the following macros are provided for setting the mdb_bitmask_t options and creating entries:

#define BM_SET_OPTIONS(OPT) { "", 0ULL, (opt) }
#define BM_BITS(B) { #B, B, B }
#define BM_VALUE(B) { #B, ~0ULL, B }
#define BM_NULL { NULL, 0ULL, 0ULL }

mdb_disable_pager added

The 11.4.54 release added mdb_disable_pager() to the API to give MDB modules the ability to disable the output pager. The pager will be enabled again after dcmd completion if it was enabled before.

For example, this API was used to add the dcmd option ::klog -f that behaves like the Unix command tail -f, i.e. without the need to prompt the user for output continuation.

mdb_disable_vmem_cache added

As mentioned in the previous item, the ::klog dcmd now has a -f option which continuously reads from the running target and needs to disable the vmem cache, otherwise it never sees any changes. To support that, the 11.4.66 release added a new function mdb_disable_vmem_cache() that disables mdb’s vmem cache while this dcmd or walker is running. The cache reverts to its previous state once the dcmd or walker completes.

mdb_get_annotation_byval added

MDB has an API, mdb_get_annotation(), that allows dcmds and annotation callbacks to get the annotations for other types. In the common case, it takes a target address to read the value whose annotation is to be obtained. However, as it can also be useful to compute an annotation for an immediate value that is not obtained from the target address space, a new function was added in 11.4.72:

size_t mdb_get_annotation_byval(const char *type, uintmax_t val,
    mdb_get_annotation_flags_t flags, char *buf, size_t len);

The new interface can then be used without reading from the target:

char buf[256];

if (mdb_get_annotation_byval("mode_t",
    S_IFDIR|S_IRWXU|S_IRGRP|S_IXGRP|S_IROTH|S_IXOTH,
    MGAF_ALT, buf, sizeof (buf)) != MDB_ANNOTATE_FAIL) {
        /* now buf will contain "drwxr-xr-x" */
        ....
}

mdb_get_annotation_flags added

This new function added in 11.4.72 is identical to mdb_get_annotation() but with the addition of the flags argument. mdb_get_annotation() behaves like a call to mdb_get_annotation_flags() with a flags value of MGAF_NONE.

size_t mdb_get_annotation_flags(const char *type, const char structname, const char *member,
    uintptr_t addr, mdb_get_annotation_flags_t flags, char *buf, size_t len);

The flags argument consists of the following bits:

MGAF_NONE	No flags are set.
MGAF_ALT	Get the alternate annotation.
MGAF_PRINTF	Get the annotation used by `printf`.
MGAF_STACK	Get the annotation used when annotating function arguments during stack printing.

mdb_getopts extended to support flags starting with ‘+’

The mdb module API defines mdb_getopts() for dcmds to parse command line arguments. Previously, it only supported options beginning with ‘-’. The 11.4.66 release introduced a new option flag MDB_OPT_PLUSBITS. When specified this option can have a first char of either ‘-’ or ‘+’.

When the flag is ‘-’ the debugger will OR bits into the integer referenced by the specified pointer. When the flag is lsquo;+’ the debugger will clear bits in the integer referenced by the specified pointer.

mdb_getopts option MDB_OPT_FLAGS added

The mdb module API defines mdb_getopts() for dcmds to parse command line arguments. Up until now, if an option was unrecognized then mdb_getopts() reported an error and stopped processing, returning the count of processed options. There were a number of situations were it would be helpful for mdb_getopts to return silently on failure. The caller could then check if the next unprocessed argument begins with ‘-’ or ‘+’ and from that deduce if an error should be produced.

For that reason, a new mdb_getopt() option MDB_OPT_FLAGS was added in 11.4.69. This takes as its argument an opaque mdb_getopt_opt_t* that must be initialised using mdb_getopts_init(). This will not read any arguments but will set the option. It should be the first variable option passed to mdb_getopts(). mdb_getopts_init() returns 0 on success. If its arguments contain any unrecognized bits, it returns -1 and sets errno to ENOTSUP.

mdb_getopt_fini() can be called to release the mdb_getopts_opt_t. If not called the mdb_getopt_opt_t will be garbage collected when the dcmd completes.

mdb_getopts option MDB_OPT_CLRSETBITS added

A new option MDB_OPT_CLRSETBITS was added in 11.4.72 to mdb_getopts(). It behaves like MDB_OPT_SETBITS and MDB_OPT_CLRBITS but takes three arguments:

bits to clear
bits to set
pointer to the uint_t to work on

For example, the following allows the ::bike dcmd to be supplied with all four drive options, but only the last given to be used.

#define ALL_DRIVES (BELT_DRIVE|CHAIN_DRIVE|DIRECT_DRIVE|SHAFT_DRIVE)

int
cmd_bike(uintptr_t addr, uint_t flags, int argc,
    const mdb_arg_t *argv)
{
    uint_t d;
    int c;

      c = mdb_getopts(argc, argv,
        'b', MDB_OPT_CLRSETBITS, ALL_DRIVES, BELT_DRIVE, &d,
        'c', MDB_OPT_CLRSETBITS, ALL_DRIVES, CHAIN_DRIVE, &d,
        'd', MDB_OPT_CLRSETBITS, ALL_DRIVES, DIRECT_DRIVE, &d,
        's', MDB_OPT_CLRSETBITS, ALL_DRIVES, SHAFT_DRIVE, &d,
        NULL);

    argv += c;
    argc -= c;

mdb_maa2mgaf helper function to get mdb_get_annotation_flags_t from mdb_annotation_arg_t

If calling mdb_get_annotation_flags() or mdb_get_annotation_byval() from an annotation callback you need to create the mdb_get_annotation_flags_t from the mdb_annotation_arg_t. The 11.4.78 release added:

extern mdb_get_annotation_flags_t mdb_maa2mgaf(const mdb_annotation_arg_t *);

With typical usage being:

/*
 * Annotate bike_t structures with the bike type.
 */
static size_t
bike_annotate(uintptr_t addr, char *buf, size_t len, const void *arg,
    const mdb_annotation_arg_t *maa)
{
        NOTE(ARGUNUSED(arg))
        return (mdb_get_annotation_flags("bike_type_t", "bike_t",
            "bike_type", addr, mdb_maa2mgaf(maa), buf, len));
}

mdb_printf sub-second time formatter

Mdb already had the %Y format specifier for mdb_printf et al to print a time in seconds since the epoch. As the ISO8601 format for time specifies that the last element represents the timezone, existing formats using "%Y.%09d" produce:

YYYY-MM-DDThh:mm:ss[+-]O0OO.FFFFFFFFF

The 11.4.60 release introduced a %Z specifier to print struct timespec as follows:

YYYY-MM-DDThh:mm:ss.FFFFFFFFF[+-]O0OO

mdb_printf supports the ‘z’ modifier to specify the size to be a size_t

The printf(3c) function has long supported specifying an argument is of type size_t via the z conversion specification to the % expression. Starting in 11.4.51, the MDB API now also supports z to specify that the integer argument for mdb_printf et al is a size_t.

Time information

A common question when debugging user space core files and processes is what was the UTC time when the target stopped and what was the value of the time returned by gethrtime() at that time. Additionally being able to approximate the UTC time from an hrtime_t is useful when comparing core file data to log files or external servers.

The 11.4.57 release added a generic interface to allow mdb modules to access the start times, stop times and CPU times of a target when it is available:

int mdb_gettime(mdb_time_type_t type, mdb_hrtime_t *resolution, mdb_hrtime_t *timep);

A new mdb_printf conversion specification of y was also added that formats an hrtime_t as:

DD day[s] hh:mm:ss.nsec

or when the the ‘H’ (human) modifier is added:

DD day[s] hh hour[s] mm min[s] ss.nsec sec[s]

with each of the days, hours and mins values only printed if they are non zero and with correct plurals. When the alternate form is used the value is printed relative to the targets current hrtime.

Type support for dcmds and walkers

The 11.4.54 release added support for dmodules to declare the types that dcmds and walkers will accept as input and will produce as output.

The mdb_dcmd_t and mdb_walker_t now each have two new members added. For dcmds:

const char *dcmd_types_in;
const char *dcmd_types_out;

and walkers:

const char *walk_types_in;
const char *walk_types_out;

Each of these are comma separated lists of types that a dcmd or walker will accept or output to a pipe. If they are NULL then it is assumed the dcmd or walker is not type aware or will provide type information for outputs at run-time via the new API calls described below.

Additionally the {dcmd,walk}_types_out value can take the sentinel values:

MDB_TYPES_OUT_IS_IN: for use by filtering dcmds, like ::grep which only ever output the same type as they are input
MDB_TYPES_OUT_FIXED: the dcmd/walker does not know the type of the output but an instance of the dcmd/walkers will produce the same fixed type. If mdb knows the type of a single output address it is safe to apply that type to all the output addresses. Walkers like ::walk avl and ::walk list can use this value.

Modules that call mdb_add_walker and wish to pass type information must set the MDB_MI_HAS_TYPES flag in mi_flags. Modules setting MDB_MI_HAS_TYPES flag must ensure that all of the fields in the mdb_walker_t are initialised. Not doing so could result in a failure in the future if the API is extended.

New APIs are also provided:

extern int mdb_set_output_type(const char *);

This allows walkers and dcmds that can produce more then one type to inform mdb which type that they are currently producing. If the dcmd/walker has a non-NULL {dcmd,walk}_types_out member then the type given must be in the list of types. This is setting the default output type for the entire dcmd and as such will set the types of previous addresses that have been output. If the dcmd instance will produce more than type this should not be used. If a NULL name is passed in then the type of the output will be unset.

extern int mdb_set_addr_type(uintptr_t, const char *);

Set the type of an address. If a dcmd or walker instance can produce more then once type then this routine should be used to set the type of any addresses. It can also be used to set the type of any intermediate addresses a dcmd or walker reads.

extern ssize_t mdb_get_addr_type(uintptr_t addr, char *buf, size_t size);

Write the type of addr into the buffer of length size pointed to by buf. This will write at most size – 1 characters leaving the buffer NUL terminated. On failure it will return -1. On success it will return the length of the string that will be in buf if it is long enough.

extern int mdb_compat_addr_type(uintptr_t addr, const char *type);

Return 0 if the type of addr is compatible with type. If the type of addr can’t be established or type is not valid return -1. Otherwise return 1.

A type is compatible if it is one of the possible types at addr. For example,

typedef struct frame {
    enum material f_material;
    ...
} frame_t;
typedef struct bike {
    frame_t b_frame;
    ....
} bike_t;

if addr points to type bike_t, it will be compatible with “bike_t *”, “struct bike *”, “frame_t *”, “struct frame *”, and “enum material *”.

This routine can be used by dcmds that can accept more than one type to determine which type they have been given.

With these changes:

MDB will populate the type cache. This allows type aware dcmds to use addresses without the need for the user to explicitly state the type. For the common case of dcmds and walkers that only produce one type, the only modification required will be to specify the type on the .{walk,dcmd}_types_out string and mdb will then handle everything else.
::help, ::dcmds, and ::walkers will be able to display the type information about dcmds and walkers.
::dcmds and ::walkers can be modified to tell the user which dcmds and walkers are appropriate for a type.
mdb will be able to check that the types of addresses are appropriate for dcmd or walker.

Converting an existing dcmd to understand types is in most cases as simple as updating the mdb_dcmd_t for the dcmd:

{
    .dc_name = "ulwp2tid",
    .dc_usage = ":|",
    .dc_descr = "convert ulwp_t address to TID",
    .dc_funcp = ulwp2tid,
    .dc_types_in = MDB_BUILD_TYPE_STRING(ulwp_t *),
    .dc_types_out = MDB_BUILD_TYPE_STRING(thread_t)
},

Dcmds and walkers that do not declare any input types are assumed to accept any type.

Type and name mapping for mdb

In addition to the ::typemap dcmd described above, the 11.4.66 release added two sets of APIs to enable modules to define these mappings. The first set of APIs are used to establish mappings. The second set establish a way for dmodules to publish helper functions that other dmodules can call.

Modules can use:

extern void mdb_register_typemap(const mdb_typemap_t *, size_t);

to register a series of mappings by supplying an array of mdb_typemap_t entries and the number of entries.

Callbacks can use the following helper functions:

mdb_typemap_save_handle(const mtm_handle_t *, void *key, void *handle, void (*fini)(void *));

This saves handle indexed by key so that future callbacks can look up handle with mdb_typemap_get_handle(). If fini is defined then it is called with handle as its argument to release any resources.

const void *mdb_typemap_get_handle(const mtm_handle_t *, void *key);

Look up any handle stored with mdb_typemap_save_handle().

const char *mdb_typemap_get_type(const mtm_handle_t *);

Returns the mtm_type from the mdb_typemap_t.

uintptr_t mdb_typemap_get_member_addr(const mtm_handle_t *);

Returns address of the member that is being mapped.

const char *mdb_typemap_get_member_name(const mtm_handle_t *);

Returns the mtm_member from the mdb_typemap_t.

const char *mdb_typemap_get_mapped_type(const mtm_handle_t *);

Returns the mtm_mapped_type from the mdb_typemap_t

As a common use case for these interfaces are the file system APIs where vfs_t and vnode_t are defined in genunix but the file system modules need to define the rules to map from vfs_data and v_data to types. A general purpose helper subsystem has been added which allows one module to call routines in another module. A module can register helper callbacks that it exports via:

mdb_register_helpers(const mdb_helper_t *, size_t);

Modules wishing to use another modules helper functions can call:

void *mdb_helper_get(const char *, const char *, mdb_helper_type_t);

To look up the ("vfs_t", MDB_MAPTYPE) helper from the genunix module you would do:

func = mdb_helper_get("genunix", "vfs_t", MDB_MAPTYPE)

These inter-module helpers are private interfaces agreed between module developers.

To simplify adding mappings to modules the following helper functions are provided:

extern boolean_t mdb_typemap_member_uint(uintptr_t,
    const mtm_handle_t *, mdb_typemap_arg_t, mdb_typemap_arg_t);
extern boolean_t mdb_typemap_member_enum(uintptr_t,
    const mtm_handle_t *, mdb_typemap_arg_t, mdb_typemap_arg_t);
extern boolean_t mdb_typemap_member_str(uintptr_t,
    const mtm_handle_t *, mdb_typemap_arg_t, mdb_typemap_arg_t);
extern boolean_t mdb_typemap_member_sym(addr,
    const mtm_handle_t *, mdb_typemap_arg_t, mdb_typemap_arg_t);

The helpers handle caching of types and symbols for maximum efficiency.

Additionally a strfree_gc(char *) function has been added that frees strings allocated with UM_GC.