Enter the drgn

May 25, 2023 | 15 minute read
Text Size 100%:

What is drgn?

drgn is a powerful and flexible debugger. With drgn, one can write scripts in python to analyze either a live system or a vmcore or a program. The symbols and types in a program are exposed by drgn so that the programmer can use those symbols to access data easily. With drgn, the vmcore analysis seems like natural coding. Having the extensive collection of python libraries also helps, as we can use complex algorithms and data structures to aid with system analysis.

As of now, if the debuginfo RPMs for the kernel are not installed, we need to provide debug symbols as an argument to drgn to analyze the vmcore in detail.

Steps to install drgn

drgn is available as a package in Oracle Linux 8 onwards through the EPEL repository. To enable EPEL:

# sudo dnf config-manager --enable ol8_developer_EPEL

Once the repository is enabled, drgn can be installed using the command:

# sudo dnf install drgn

Installing and Extracting debuginfo

debuginfo files are different modules compiled with the debug option. These help drgn resolve exported symbols from the vmcore. We can use the following commands to install debuginfo files for the current kernel:

Add the debuginfo repository using:

[debuginfo]
name=Oracle Linux 8 Debuginfo Packages
baseurl=https://oss.oracle.com/ol8/debuginfo/
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-oracle
gpgcheck=1
enabled=1

Install using:

# sudo dnf install kernel-uek-debuginfo-$(uname -r)

If the user does not want to install the RPM, they can choose to extract the modules from the RPM. For doing that, download the debuginfo RPM and extract:

# sudo dnf install --downloadonly kernel-uek-debuginfo-$(uname -r)
# rpm2cpio kernel-uek-debuginfo-$(uname -r).rpm | cpio -idmv

If the user just wants to extract the vmlinux file from the RPM:

# rpm2cpio kernel-uek-debuginfo-$(uname -r).rpm |  cpio -ivd --rename *vmlinux

Note: The RPM(s) downloaded by the dnf command will be downloaded to the /var/cache/dnf/debuginfo*/packages folder. Before running rpm2cpio, please copy the RPM to another folder, otherwise the extracted files may get cleaned up and deleted later.

After extracting, vmlinux can be found in <extract-dir>/usr/lib/debug/lib/modules/uname -r/ and the modules in thekernel` directory in this path.

Running drgn on a Live kernel

For running drgn on a live kernel, just run the command drgn without the -c option, and it will open for live kernel debug. However, the “more correct” way to debug the running kernel will be to give the -c /proc/kcore option to drgn. We also need to specify the correct vmlinux to load alongwith the module symbol files that we need:

# sudo `drgn` -c /proc/kcore -s path/to/vmlinux -s path/to/module1.ko.debug path/to/module2.ko.debug ...

Note: If the debuginfo RPM is installed via dnf, it is not necessary to use the -s option.

Running drgn on a vmcore

The program argument would be the vmcore for this case and the command will be:

# drgn -c path/to/vmcore -s path/to/vmlinux -s path/to/module1.ko path/to/module2.ko ...

Some Core concepts

Program

A program is the entity under analysis. It could be a live kernel, a vmcore or any running program. The drgn CLI is initialized with a Program named prog.

A program is used to lookup symbols, type definitions etc. We can access variables in a program using the prog[‘name’] command. For example to read the slab_caches symbol in a vmcore (vmcore is your program), we can simply do:

>>>prog['slab_caches']
(struct list_head){
        .next = (struct list_head *)0xffff91e770e31e60,
        .prev = (struct list_head *)0xffff912dc7c08060,
}

Objects

All variables, constants, and functions are objects in drgn. An object can exist in memory or can be a simple value. Objects can be used in drgn scripts the same way they can be used in the source code. For example, an object can be of type struct ib_device. This drgn object can be accessed in a drgn script as it is used in the source code, i.e. to obtain the name of the driver for this device, the ib_dev.dma_device.driver.name field can be obtained in drgn script by simply running:

>>>ib_dev.dma_device.driver.name
(const char *)0xffffffffc06cef5e = "mlx4_core"

Helpers

drgn provides a set of helpers to do a common tasks. Like walking a list or getting stack traces etc. A detailed list of helpers is provided here: https://drgn.readthedocs.io/en/latest/helpers.html.

Writing your first drgn script

As mentioned in the section describing Objects, it is simple to write scripts in drgn and access fields of complex structures. This capability and pythons immense set of libraries makes a programmer’s life easy. However, it also helps to be familiar with the crash utility in order to be comfortable with drgn. One can refer to the source code of crash to understand how it extracts information from a vmcore and try to replicate that in drgn. Through this article we will try to write a script to get the sysinfo details from a vmcore in drgn.

Crash reads symbol init_uts_ns to get most of the information related to a system.

Now, we will try to replicate the code in the crash utility to get sys details in drgn. As crash reads init_uts_ns to get sysinfo, we will write code to read the same in drgn:

>>>uts_namespace_detail = prog['init_uts_ns']
>>> 

Printing this variable will give us the values of this symbol.

>>>uts_namespace_detail
(struct uts_namespace){
        .kref = (struct kref){
                .refcount = (refcount_t){
                        .refs = (atomic_t){
                                .counter = (int)4,
                        },
                },
        },
        .name = (struct new_utsname){
                .sysname = (char [65])"Linux",
                .nodename = (char [65])"scao08adm03.us.oracle.com",
                .release = (char [65])"4.14.35-2047.511.5.5.3.el7uek.x86_64",
                .version = (char [65])"#2 SMP Thu May 5 19:33:38 PDT 2022",
                .machine = (char [65])"x86_64",
                .domainname = (char [65])"(none)",
        },
        .user_ns = (struct user_namespace *)init_user_ns+0x0 = 0xffffffffa9458780,
        .ucounts = (struct ucounts *)0x0,
        .ns = (struct ns_common){
                .stashed = (atomic_long_t){
                        .counter = (long)0,
                },
                .ops = (const struct proc_ns_operations *)utsns_operations+0x0 = 0xffffffffa8e2af00,
                .inum = (unsigned int)4026531838,
        },
}
>>> 

Now we have much of the useful information that the sys command in crash displays - like release, nodename, machine etc.

We can write the following code to print these in a more readable format:

>>>print(f'NODENAME : {uts_namespace_detail.name.nodename.string_().decode("utf-8")}\n')
NODENAME : scao08adm03.us.oracle.com

>>>print(f'RELEASE : {uts_namespace_detail.name.release.string_().decode("utf-8")}\n')
RELEASE : 4.14.35-2047.511.5.5.3.el7uek.x86_64

>>>print(f'VERSION : {uts_namespace_detail.name.version.string_().decode("utf-8")}\n')
VERSION : #2 SMP Thu May 5 19:33:38 PDT 2022

>>>print(f'MACHINE : {uts_namespace_detail.name.machine.string_().decode("utf-8")}\n')
MACHINE : x86_64

The string_() attribute is used to read the character array as a string.

But usually, this is not all we get in sys. We get all of below aswell:

      KERNEL: /share/linuxrpm/vmlinux_repo/64/4.14.35-2047.511.5.5.3.el7uek.x86_64/vmlinux
    DUMPFILE: vmcore  [PARTIAL DUMP]
        CPUS: 48
        DATE: Thu Jun 30 13:13:39 PDT 2022
      UPTIME: 9 days, 02:37:13
LOAD AVERAGE: 0.12, 0.12, 0.09
       TASKS: 716
    NODENAME: scao08adm03.us.oracle.com
     RELEASE: 4.14.35-2047.511.5.5.3.el7uek.x86_64
     VERSION: #2 SMP Thu May 5 19:33:38 PDT 2022
     MACHINE: x86_64  (2693 Mhz)
      MEMORY: 512 GB
       PANIC: "BUG: unable to handle kernel NULL pointer dereference at           (null)"
         PID: 11422
     COMMAND: "bash"
        TASK: ffff9a31787697c0  [THREAD_INFO: ffff9a31787697c0]
         CPU: 45
       STATE: TASK_RUNNING (PANIC)

To get date, uptime

The date-time related information can be obtained by reading the exported symbol shadow_timekeeper:

>>>prog['shadow_timekeeper']
(struct timekeeper){
        .tkr_mono = (struct tk_read_base){
                .clock = (struct clocksource *)clocksource_tsc+0x0 = 0xffffffffa942f460,
                .mask = (u64)18446744073709551615,
                .cycle_last = (u64)2126713617356984,
                .mult = (u32)6228816,
                .shift = (u32)24,
                .xtime_nsec = (u64)2390728062303892,
                .base = (ktime_t)787032911659616,
        },
        .tkr_raw = (struct tk_read_base){
                .clock = (struct clocksource *)clocksource_tsc+0x0 = 0xffffffffa942f460,
                .mask = (u64)18446744073709551615,
                .cycle_last = (u64)2126713617356984,
                .mult = (u32)6228758,
                .shift = (u32)24,
                .xtime_nsec = (u64)16194257266886124,
                .base = (ktime_t)787024000000000,
        },
        .xtime_sec = (u64)1656620019,
        .ktime_sec = (unsigned long)787033,
        .wall_to_monotonic = (struct timespec){
                .tv_sec = (__kernel_time_t)-1655832987,
                .tv_nsec = (long)911659616,
        },
        .offs_real = (ktime_t)1655832986088340384,
        .offs_boot = (ktime_t)0,
        .offs_tai = (ktime_t)1655832986088340384,
        .tai_offset = (s32)0,
        .clock_was_set_seq = (unsigned int)5,
        .cs_was_changed_seq = (u8)3,
        .next_leap_ktime = (ktime_t)9223372036854775807,
        .raw_sec = (u64)787024,
        .cycle_interval = (u64)2693509,
        .xtime_interval = (u64)16777371955344,
        .xtime_remainder = (s64)268178,
        .raw_interval = (u64)16777215731822,
        .ntp_tick = (u64)4295007633211392,
        .ntp_error = (s64)-228547328,
        .ntp_error_shift = (u32)8,
        .ntp_err_mult = (u32)0,
}

The fields xtime_sec and ktime_sec denote time since epoch and current uptime in seconds respectively. Here, we can use python libraries to convert these times to a human readable format. The time.ctime() method in python converts a time in “seconds since the epoch” to a string in local time. The datetime library is useful in manipulating dates and times.

>>>import time
>>>timekeeper =  prog['shadow_timekeeper']
>>>xtime_sec = timekeeper.xtime_sec
>>>date = time.ctime(xtime_sec)
>>>date
'Thu Jun 30 13:13:39 2022'
>>> 

And uptime:

>>>import datetime
>>>uptime = str(datetime.timedelta(seconds=int(timekeeper.ktime_sec)))
>>>uptime
'9 days, 2:37:13'
>>> 

Load average

Load average calculation is slightly more complicated as we need to adjust for exponentials and precision.

Crash code:

static char *
get_loadavg(char *buf)
{
        int a, b, c;
  long avenrun[3];

        readmem(symbol_value("avenrun"), KVADDR, &avenrun[0],
                sizeof(long)*3, "avenrun array", FAULT_ON_ERROR);

        a = avenrun[0] + (FIXED_1/200);
        b = avenrun[1] + (FIXED_1/200);
        c = avenrun[2] + (FIXED_1/200);
        sprintf(buf, "%d.%02d, %d.%02d, %d.%02d",
                LOAD_INT(a), LOAD_FRAC(a),
                LOAD_INT(b), LOAD_FRAC(b),
                LOAD_INT(c), LOAD_FRAC(c));

  return buf;
}

Here’s an example python code snippet to replicate the kernel code that calculates load average.

>>>def load(a):
    fixed_1=1<<11
    ans=a>>11
    f_part=((a&fixed_1-1)*100)>>11
    ans+=float(f_part)/100
    return ans


>>>def load_avg(avenrun):
    fixed_1=1<<11
    add=int(fixed_1/200)
    a=avenrun[0]+add
    b=avenrun[1]+add
    c=avenrun[2]+add
    str='{:0.2f}  ,  {:0.2f} ,  {:0.2f}'.format(float(load(a)), float(load(b)), float(load(c)))
    return str

>>>load_avg(prog['avenrun'])

Memory

Memory also needs some basic conversion:

>>>pages = int(prog['totalram_pages'])
>>>pagesize = int(prog['PAGE_SIZE'])
>>>memory_bytes = pages*pagesize
>>>memory_KB = memory_bytes/1024
>>>memory_MB = memory_KB/1024
>>>memory_GB = memory_MB/1024
>>>memory_GB
503.36669158935547

Similary the number of CPUS:

>>>int(prog["nr_cpu_ids"])
48

Task information:

Task related information, such as PID, comm and address can be obtained by extracting the current process on the crashing_cpu:

>>>task = per_cpu(prog["runqueues"], prog["crashing_cpu"]).curr
>>>task.comm
(char [16])"bash"
>>>task.pid
(pid_t)11422
>>>hex(task.value_())
'0xffff9a31787697c0'

Using Helpers

Now, let’s try and use some helpers in drgn. For this sample task, we will try to list all the open files by all tasks. The helper provided by drgn to walk through each task is for_each_task(prog)

The below code will get the list of all open files:

>>>for task in for_each_task(prog):
...    print(f'PID: {task.pid}    COMM: {task.comm}')
...    for fd, filp in for_each_file(task):
...        path = d_path(filp.f_path.address_of_()).decode('utf-8')
...        print(f'[{fd}:] {path}')
...

The first loop iterates through all tasks using the helper for_each_task https://drgn.readthedocs.io/en/latest/helpers.html#process-ids in the program. The second loop iterates through files opened by each task and uses the helper for_each_file https://drgn.readthedocs.io/en/latest/helpers.html#virtual-filesystem-layer for it and returns the full path. The drgn helper d_path https://drgn.readthedocs.io/en/latest/helpers.html#virtual-filesystem-layer returns the full path of a dentry given a mount and dentry.

The address_of_() attribute returns the address of the object.

Analysing complex data structures

Here is an example of how a complex data structure can be analysed in drgn. For this example, lets try to get the various tasks in a runqueue and the time they are waiting to get the CPU.

To get the runqueue pointer, we can use the per_cpu helper:

>>>runq = per_cpu(prog["runqueues"], 29)

Now that we have the runqueue pointer, we can get the current task details by referring to the curr field of struct rq:

>>>print("CURRENT: PID: ", runq.curr.pid.value_(), "\tTASK: ", hex(runq.curr.address_of_()), "\tCOMMAND: ", runq.curr.comm.string_().decode("utf-8"))
CURRENT: PID:  84519    TASK:  0xffff8ce480d6b608       COMMAND:  ora_lms1_so_sis

We can see the current running task is ora_lms1_so_sis. Now, we can get the other tasks, i.e the tasks in RT PRIO_ARRAY and those in the CFS tree. To get the tasks in RT PRIO_ARRAY, we can refer to the rt field in struct rq. So, in a similar manner as we obtained the currently running task, we can get tasks in RT PRIO_ARRAY:

>>>print("CURRENT: PID: ", runq.rt.rq.curr.pid.value_(), "\tTASK: ", hex(runq.rt.rq.curr.address_of_()), "\tCOMMAND: ", runq.rt.rq.curr.comm.string_().decode("utf-8"))
CURRENT: PID:  84519    TASK:  0xffff8ce480d6b608       COMMAND:  ora_lms1_so_sis

The RT task is the one that has acquired the CPU and is running.

Now, to get the tasks in the CFS tree, we can refer to the cfs_tasks field of struct rq, cfs_tasks is a list, so we can use the list helper of drgn:

>>>for t in list_for_each_entry("struct task_struct", runq.cfs_tasks.address_of_(), "se.group_node"):
...    print(t.comm)
...
(char [16])"ksoftirqd/29"
(char [16])"kworker/29:0"

As can be observed, there are two tasks in the CFS tree. Now lets try and find out how long the tasks in the CFS tree are waiting to be scheduled and get the CPU. This can be obtained by reading the last_queued field in the sched_info structure from each task. The last_queued field denotes the timestamp of when the task was queued to run. To get the wait time, we will substract this timestamp from the current clock struct rq->clock (the values are in nano-seconds, so we will convert them to seconds to make them understandable):

>>>for t in list_for_each_entry("struct task_struct", runq.cfs_tasks.address_of_(), "se.group_node"):
...    wait_time = (runq.clock - t.sched_info.last_queued) / (1000 * 1000  * 1000)
...    print("Task : %s, waiting for CPU since : %d sec" %(t.comm.string_().decode("utf-8"), wait_time))
...
Task : ksoftirqd/29, waiting for CPU since : 241 sec
Task : kworker/29:0, waiting for CPU since : 4 sec

Getting function arguments and Local variables of a function in a stack trace

It is very easy to obtain the values of function arguments and even local variables of a function in a stack trace in drgn. While using crash one may need to analyze the full stack trace in order to get the correct stack address of the variables and make calculations of addresses and offsets. This process is error-prone. With drgn, however, it is much simpler and straight forward. For example, look at this stack trace:

#0  sysrq_handle_crash (drivers/tty/sysrq.c:147)
#1  __handle_sysrq (drivers/tty/sysrq.c:559)
#2  write_sysrq_trigger (drivers/tty/sysrq.c:1106)
#3  proc_reg_write (fs/proc/inode.c:231)
#4  __vfs_write (fs/read_write.c:480)
#5  vfs_write (fs/read_write.c:544)
#6  SYSC_write (fs/read_write.c:590)
#7  SyS_write (fs/read_write.c:582)
#8  do_syscall_64 (arch/x86/entry/common.c:298)
#9  entry_SYSCALL_64+0x191/0x293 (arch/x86/entry/entry_64.S:238)

This is a sysrq triggered crash, say the user wants to get the filename which is an argument of function __vfs_write:

ssize_t __vfs_write(struct file *file, const char __user *p, size_t count, loff_t *pos)

To get this value using drgn, the user can simply access the stack trace frame of __vfs_write() and use the argument name as key:

>>> task = per_cpu(prog["runqueues"], prog["crashing_cpu"]).curr
>>> trace = prog.stack_trace(task)
>>> trace
#0  sysrq_handle_crash (drivers/tty/sysrq.c:147)
#1  __handle_sysrq (drivers/tty/sysrq.c:559)
#2  write_sysrq_trigger (drivers/tty/sysrq.c:1106)
#3  proc_reg_write (fs/proc/inode.c:231)
#4  __vfs_write (fs/read_write.c:480)
#5  vfs_write (fs/read_write.c:544)
#6  SYSC_write (fs/read_write.c:590)
#7  SyS_write (fs/read_write.c:582)
#8  do_syscall_64 (arch/x86/entry/common.c:298)
#9  entry_SYSCALL_64+0x191/0x293 (arch/x86/entry/entry_64.S:238)
>>> file = trace[4]["file"]
>>> file.f_path.dentry.d_iname
(unsigned char [32])"sysrq-trigger"

Similarly, consider that the user wants to get the struct thread_info *ti local variable of function do_syscall_64():

>>> ti = trace[8]["ti"]
>>> ti
*(struct thread_info *)0xffff9a31787697c0 = {
        .flags = (unsigned long)2147483776,
        .status = (u32)0,
}
>>>

Conclusion

In this article, we went through the steps to install drgn, use it with the correct debug symbols and we also wrote some drgn code. This is a very high level introduction. Very complex analysis can be done using the helpers and python libraries not discussed in this article and users can write their own helpers as well.

Hope you have fun with drgn!

Anand Khoje


Previous Post

Oracle announces general availability of latest Oracle Linux releases

Simon Coter | 4 min read

Next Post


Understanding Ext4 Disk Layout, Part 1

Srivathsa Dara | 30 min read