drgn
?drgn
is a powerful and flexible debugger. With drgn
, one can write scripts in python to analyze either a live system or a vmcore or a program. The symbols and types in a program are exposed by drgn
so that the programmer can use those symbols to access data easily. With drgn
, the vmcore analysis seems like natural coding. Having the extensive collection of python libraries also helps, as we can use complex algorithms and data structures to aid with system analysis.
As of now, if the debuginfo RPMs for the kernel are not installed, we need to provide debug symbols as an argument to drgn
to analyze the vmcore in detail.
drgn
drgn
is available as a package in Oracle Linux 8 onwards through the EPEL repository. To enable EPEL:
# sudo dnf config-manager --enable ol8_developer_EPEL
Once the repository is enabled, drgn
can be installed using the command:
# sudo dnf install drgn
debuginfo files are different modules compiled with the debug option. These help drgn
resolve exported symbols from the vmcore. We can use the following commands to install debuginfo files for the current kernel:
Add the debuginfo repository using:
[debuginfo] name=Oracle Linux 8 Debuginfo Packages baseurl=https://oss.oracle.com/ol8/debuginfo/ gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-oracle gpgcheck=1 enabled=1
Install using:
# sudo dnf install kernel-uek-debuginfo-$(uname -r)
If the user does not want to install the RPM, they can choose to extract the modules from the RPM. For doing that, download the debuginfo RPM and extract:
# sudo dnf install --downloadonly kernel-uek-debuginfo-$(uname -r) # rpm2cpio kernel-uek-debuginfo-$(uname -r).rpm | cpio -idmv
If the user just wants to extract the vmlinux
file from the RPM:
# rpm2cpio kernel-uek-debuginfo-$(uname -r).rpm | cpio -ivd --rename *vmlinux
Note: The RPM(s) downloaded by the dnf
command will be downloaded to the /var/cache/dnf/debuginfo*/packages
folder. Before running rpm2cpio
, please copy the RPM to another folder, otherwise the extracted files may get cleaned up and deleted later.
After extracting, vmlinux
can be found in <extract-dir>/usr/lib/debug/lib/modules/
uname -r/ and the modules in the
kernel` directory in this path.
drgn
on a Live kernelFor running drgn
on a live kernel, just run the command drgn
without the -c
option, and it will open for live kernel debug. However, the “more correct” way to debug the running kernel will be to give the -c /proc/kcore
option to drgn
. We also need to specify the correct vmlinux to load alongwith the module symbol files that we need:
# sudo `drgn` -c /proc/kcore -s path/to/vmlinux -s path/to/module1.ko.debug path/to/module2.ko.debug ...
Note: If the debuginfo RPM is installed via dnf, it is not necessary to use the -s option.
drgn
on a vmcoreThe program argument would be the vmcore for this case and the command will be:
# drgn -c path/to/vmcore -s path/to/vmlinux -s path/to/module1.ko path/to/module2.ko ...
A program is the entity under analysis. It could be a live kernel, a vmcore or any running program. The drgn
CLI is initialized with a Program named prog.
A program is used to lookup symbols, type definitions etc. We can access variables in a program using the prog[‘name’] command. For example to read the slab_caches
symbol in a vmcore (vmcore is your program), we can simply do:
>>>prog['slab_caches'] (struct list_head){ .next = (struct list_head *)0xffff91e770e31e60, .prev = (struct list_head *)0xffff912dc7c08060, }
All variables, constants, and functions are objects in drgn
. An object can exist in memory or can be a simple value. Objects can be used in drgn
scripts the same way they can be used in the source code. For example, an object can be of type struct ib_device
. This drgn
object can be accessed in a drgn
script as it is used in the source code, i.e. to obtain the name of the driver for this device, the ib_dev.dma_device.driver.name
field can be obtained in drgn
script by simply running:
>>>ib_dev.dma_device.driver.name (const char *)0xffffffffc06cef5e = "mlx4_core"
drgn
provides a set of helpers to do a common tasks. Like walking a list or getting stack traces etc. A detailed list of helpers is provided here: https://drgn.readthedocs.io/en/latest/helpers.html.
drgn
scriptAs mentioned in the section describing Objects, it is simple to write scripts in drgn
and access fields of complex structures. This capability and pythons immense set of libraries makes a programmer’s life easy. However, it also helps to be familiar with the crash utility in order to be comfortable with drgn
. One can refer to the source code of crash to understand how it extracts information from a vmcore and try to replicate that in drgn
. Through this article we will try to write a script to get the sysinfo details from a vmcore in drgn
.
Crash reads symbol init_uts_ns
to get most of the information related to a system.
Now, we will try to replicate the code in the crash utility to get sys details in drgn
. As crash reads init_uts_ns
to get sysinfo, we will write code to read the same in drgn
:
>>>uts_namespace_detail = prog['init_uts_ns'] >>>
Printing this variable will give us the values of this symbol.
>>>uts_namespace_detail (struct uts_namespace){ .kref = (struct kref){ .refcount = (refcount_t){ .refs = (atomic_t){ .counter = (int)4, }, }, }, .name = (struct new_utsname){ .sysname = (char [65])"Linux", .nodename = (char [65])"scao08adm03.us.oracle.com", .release = (char [65])"4.14.35-2047.511.5.5.3.el7uek.x86_64", .version = (char [65])"#2 SMP Thu May 5 19:33:38 PDT 2022", .machine = (char [65])"x86_64", .domainname = (char [65])"(none)", }, .user_ns = (struct user_namespace *)init_user_ns+0x0 = 0xffffffffa9458780, .ucounts = (struct ucounts *)0x0, .ns = (struct ns_common){ .stashed = (atomic_long_t){ .counter = (long)0, }, .ops = (const struct proc_ns_operations *)utsns_operations+0x0 = 0xffffffffa8e2af00, .inum = (unsigned int)4026531838, }, } >>>
Now we have much of the useful information that the sys
command in crash displays - like release
, nodename
, machine
etc.
We can write the following code to print these in a more readable format:
>>>print(f'NODENAME : {uts_namespace_detail.name.nodename.string_().decode("utf-8")}\n') NODENAME : scao08adm03.us.oracle.com >>>print(f'RELEASE : {uts_namespace_detail.name.release.string_().decode("utf-8")}\n') RELEASE : 4.14.35-2047.511.5.5.3.el7uek.x86_64 >>>print(f'VERSION : {uts_namespace_detail.name.version.string_().decode("utf-8")}\n') VERSION : #2 SMP Thu May 5 19:33:38 PDT 2022 >>>print(f'MACHINE : {uts_namespace_detail.name.machine.string_().decode("utf-8")}\n') MACHINE : x86_64
The string_() attribute is used to read the character array as a string.
But usually, this is not all we get in sys. We get all of below aswell:
KERNEL: /share/linuxrpm/vmlinux_repo/64/4.14.35-2047.511.5.5.3.el7uek.x86_64/vmlinux DUMPFILE: vmcore [PARTIAL DUMP] CPUS: 48 DATE: Thu Jun 30 13:13:39 PDT 2022 UPTIME: 9 days, 02:37:13 LOAD AVERAGE: 0.12, 0.12, 0.09 TASKS: 716 NODENAME: scao08adm03.us.oracle.com RELEASE: 4.14.35-2047.511.5.5.3.el7uek.x86_64 VERSION: #2 SMP Thu May 5 19:33:38 PDT 2022 MACHINE: x86_64 (2693 Mhz) MEMORY: 512 GB PANIC: "BUG: unable to handle kernel NULL pointer dereference at (null)" PID: 11422 COMMAND: "bash" TASK: ffff9a31787697c0 [THREAD_INFO: ffff9a31787697c0] CPU: 45 STATE: TASK_RUNNING (PANIC)
The date-time related information can be obtained by reading the exported symbol shadow_timekeeper
:
>>>prog['shadow_timekeeper'] (struct timekeeper){ .tkr_mono = (struct tk_read_base){ .clock = (struct clocksource *)clocksource_tsc+0x0 = 0xffffffffa942f460, .mask = (u64)18446744073709551615, .cycle_last = (u64)2126713617356984, .mult = (u32)6228816, .shift = (u32)24, .xtime_nsec = (u64)2390728062303892, .base = (ktime_t)787032911659616, }, .tkr_raw = (struct tk_read_base){ .clock = (struct clocksource *)clocksource_tsc+0x0 = 0xffffffffa942f460, .mask = (u64)18446744073709551615, .cycle_last = (u64)2126713617356984, .mult = (u32)6228758, .shift = (u32)24, .xtime_nsec = (u64)16194257266886124, .base = (ktime_t)787024000000000, }, .xtime_sec = (u64)1656620019, .ktime_sec = (unsigned long)787033, .wall_to_monotonic = (struct timespec){ .tv_sec = (__kernel_time_t)-1655832987, .tv_nsec = (long)911659616, }, .offs_real = (ktime_t)1655832986088340384, .offs_boot = (ktime_t)0, .offs_tai = (ktime_t)1655832986088340384, .tai_offset = (s32)0, .clock_was_set_seq = (unsigned int)5, .cs_was_changed_seq = (u8)3, .next_leap_ktime = (ktime_t)9223372036854775807, .raw_sec = (u64)787024, .cycle_interval = (u64)2693509, .xtime_interval = (u64)16777371955344, .xtime_remainder = (s64)268178, .raw_interval = (u64)16777215731822, .ntp_tick = (u64)4295007633211392, .ntp_error = (s64)-228547328, .ntp_error_shift = (u32)8, .ntp_err_mult = (u32)0, }
The fields xtime_sec
and ktime_sec
denote time since epoch and current uptime in seconds respectively. Here, we can use python libraries to convert these times to a human readable format. The time.ctime() method in python converts a time in “seconds since the epoch” to a string in local time. The datetime library is useful in manipulating dates and times.
>>>import time >>>timekeeper = prog['shadow_timekeeper'] >>>xtime_sec = timekeeper.xtime_sec >>>date = time.ctime(xtime_sec) >>>date 'Thu Jun 30 13:13:39 2022' >>>
And uptime:
>>>import datetime >>>uptime = str(datetime.timedelta(seconds=int(timekeeper.ktime_sec))) >>>uptime '9 days, 2:37:13' >>>
Load average calculation is slightly more complicated as we need to adjust for exponentials and precision.
Crash code:
static char * get_loadavg(char *buf) { int a, b, c; long avenrun[3]; readmem(symbol_value("avenrun"), KVADDR, &avenrun[0], sizeof(long)*3, "avenrun array", FAULT_ON_ERROR); a = avenrun[0] + (FIXED_1/200); b = avenrun[1] + (FIXED_1/200); c = avenrun[2] + (FIXED_1/200); sprintf(buf, "%d.%02d, %d.%02d, %d.%02d", LOAD_INT(a), LOAD_FRAC(a), LOAD_INT(b), LOAD_FRAC(b), LOAD_INT(c), LOAD_FRAC(c)); return buf; }
Here’s an example python code snippet to replicate the kernel code that calculates load average.
>>>def load(a): fixed_1=1<<11 ans=a>>11 f_part=((a&fixed_1-1)*100)>>11 ans+=float(f_part)/100 return ans >>>def load_avg(avenrun): fixed_1=1<<11 add=int(fixed_1/200) a=avenrun[0]+add b=avenrun[1]+add c=avenrun[2]+add str='{:0.2f} , {:0.2f} , {:0.2f}'.format(float(load(a)), float(load(b)), float(load(c))) return str >>>load_avg(prog['avenrun'])
Memory also needs some basic conversion:
>>>pages = int(prog['totalram_pages']) >>>pagesize = int(prog['PAGE_SIZE']) >>>memory_bytes = pages*pagesize >>>memory_KB = memory_bytes/1024 >>>memory_MB = memory_KB/1024 >>>memory_GB = memory_MB/1024 >>>memory_GB 503.36669158935547
Similary the number of CPUS:
>>>int(prog["nr_cpu_ids"]) 48
Task related information, such as PID, comm and address can be obtained by extracting the current process on the crashing_cpu
:
>>>task = per_cpu(prog["runqueues"], prog["crashing_cpu"]).curr >>>task.comm (char [16])"bash" >>>task.pid (pid_t)11422 >>>hex(task.value_()) '0xffff9a31787697c0'
Now, let’s try and use some helpers in drgn
. For this sample task, we will try to list all the open files by all tasks. The helper provided by drgn
to walk through each task is for_each_task(prog)
The below code will get the list of all open files:
>>>for task in for_each_task(prog): ... print(f'PID: {task.pid} COMM: {task.comm}') ... for fd, filp in for_each_file(task): ... path = d_path(filp.f_path.address_of_()).decode('utf-8') ... print(f'[{fd}:] {path}') ...
The first loop iterates through all tasks using the helper for_each_task
https://drgn.readthedocs.io/en/latest/helpers.html#process-ids in the program. The second loop iterates through files opened by each task and uses the helper for_each_file
https://drgn.readthedocs.io/en/latest/helpers.html#virtual-filesystem-layer for it and returns the full path. The drgn
helper d_path
https://drgn.readthedocs.io/en/latest/helpers.html#virtual-filesystem-layer returns the full path of a dentry given a mount and dentry.
The address_of_() attribute returns the address of the object.
Here is an example of how a complex data structure can be analysed in drgn
. For this example, lets try to get the various tasks in a runqueue and the time they are waiting to get the CPU.
To get the runqueue pointer, we can use the per_cpu
helper:
>>>runq = per_cpu(prog["runqueues"], 29)
Now that we have the runqueue pointer, we can get the current task details by referring to the curr field of struct rq
:
>>>print("CURRENT: PID: ", runq.curr.pid.value_(), "\tTASK: ", hex(runq.curr.address_of_()), "\tCOMMAND: ", runq.curr.comm.string_().decode("utf-8")) CURRENT: PID: 84519 TASK: 0xffff8ce480d6b608 COMMAND: ora_lms1_so_sis
We can see the current running task is ora_lms1_so_sis
. Now, we can get the other tasks, i.e the tasks in RT PRIO_ARRAY and those in the CFS tree. To get the tasks in RT PRIO_ARRAY, we can refer to the rt
field in struct rq
. So, in a similar manner as we obtained the currently running task, we can get tasks in RT PRIO_ARRAY:
>>>print("CURRENT: PID: ", runq.rt.rq.curr.pid.value_(), "\tTASK: ", hex(runq.rt.rq.curr.address_of_()), "\tCOMMAND: ", runq.rt.rq.curr.comm.string_().decode("utf-8")) CURRENT: PID: 84519 TASK: 0xffff8ce480d6b608 COMMAND: ora_lms1_so_sis
The RT task is the one that has acquired the CPU and is running.
Now, to get the tasks in the CFS tree, we can refer to the cfs_tasks
field of struct rq
, cfs_tasks
is a list, so we can use the list helper of drgn
:
>>>for t in list_for_each_entry("struct task_struct", runq.cfs_tasks.address_of_(), "se.group_node"): ... print(t.comm) ... (char [16])"ksoftirqd/29" (char [16])"kworker/29:0"
As can be observed, there are two tasks in the CFS tree. Now lets try and find out how long the tasks in the CFS tree are waiting to be scheduled and get the CPU. This can be obtained by reading the last_queued
field in the sched_info
structure from each task. The last_queued
field denotes the timestamp of when the task was queued to run. To get the wait time, we will substract this timestamp from the current clock struct rq->clock
(the values are in nano-seconds, so we will convert them to seconds to make them understandable):
>>>for t in list_for_each_entry("struct task_struct", runq.cfs_tasks.address_of_(), "se.group_node"): ... wait_time = (runq.clock - t.sched_info.last_queued) / (1000 * 1000 * 1000) ... print("Task : %s, waiting for CPU since : %d sec" %(t.comm.string_().decode("utf-8"), wait_time)) ... Task : ksoftirqd/29, waiting for CPU since : 241 sec Task : kworker/29:0, waiting for CPU since : 4 sec
It is very easy to obtain the values of function arguments and even local variables of a function in a stack trace in drgn
. While using crash
one may need to analyze the full stack trace in order to get the correct stack address of the variables and make calculations of addresses and offsets. This process is error-prone. With drgn
, however, it is much simpler and straight forward. For example, look at this stack trace:
#0 sysrq_handle_crash (drivers/tty/sysrq.c:147) #1 __handle_sysrq (drivers/tty/sysrq.c:559) #2 write_sysrq_trigger (drivers/tty/sysrq.c:1106) #3 proc_reg_write (fs/proc/inode.c:231) #4 __vfs_write (fs/read_write.c:480) #5 vfs_write (fs/read_write.c:544) #6 SYSC_write (fs/read_write.c:590) #7 SyS_write (fs/read_write.c:582) #8 do_syscall_64 (arch/x86/entry/common.c:298) #9 entry_SYSCALL_64+0x191/0x293 (arch/x86/entry/entry_64.S:238)
This is a sysrq triggered crash, say the user wants to get the filename which is an argument of function __vfs_write:
ssize_t __vfs_write(struct file *file, const char __user *p, size_t count, loff_t *pos)
To get this value using drgn
, the user can simply access the stack trace frame of __vfs_write()
and use the argument name as key:
>>> task = per_cpu(prog["runqueues"], prog["crashing_cpu"]).curr >>> trace = prog.stack_trace(task) >>> trace #0 sysrq_handle_crash (drivers/tty/sysrq.c:147) #1 __handle_sysrq (drivers/tty/sysrq.c:559) #2 write_sysrq_trigger (drivers/tty/sysrq.c:1106) #3 proc_reg_write (fs/proc/inode.c:231) #4 __vfs_write (fs/read_write.c:480) #5 vfs_write (fs/read_write.c:544) #6 SYSC_write (fs/read_write.c:590) #7 SyS_write (fs/read_write.c:582) #8 do_syscall_64 (arch/x86/entry/common.c:298) #9 entry_SYSCALL_64+0x191/0x293 (arch/x86/entry/entry_64.S:238) >>> file = trace[4]["file"] >>> file.f_path.dentry.d_iname (unsigned char [32])"sysrq-trigger"
Similarly, consider that the user wants to get the struct thread_info *ti
local variable of function do_syscall_64()
:
>>> ti = trace[8]["ti"] >>> ti *(struct thread_info *)0xffff9a31787697c0 = { .flags = (unsigned long)2147483776, .status = (u32)0, } >>>
In this article, we went through the steps to install drgn
, use it with the correct debug symbols and we also wrote some drgn
code. This is a very high level introduction. Very complex analysis can be done using the helpers and python libraries not discussed in this article and users can write their own helpers as well.
Hope you have fun with drgn
!
Previous Post