Live Debugging Techniques for the Linux Kernel, Part 3 of 3

November 10, 2021 | 18 minute read
Text Size 100%:

Introduction

In this three-part series of blog posts, I will attempt to detail many of the various methods that I have used to debug running kernels. In the previous two posts in this series, I explained how to set up an Oracle Linux virtual machine where you can experiment with various debugging techniques, and covered some of the basic techniques that I have used. This final post will discuss some more advanced techniques which use tools specifically designed for kernel debugging.

Getting Started

Before we begin, you'll want to start the debugging VM that we set up in the first post, using a command like this:

sudo virsh start ol8.4

Once the VM has booted, log into it with ssh. We're now ready to start experimenting with the kernel's built-in debugging tools, and the crash utility.

kdb and kgdboc

These two utilities are built directly into the kernel. kdb allows you to perform some fairly simple debugging tasks like setting breakpoints and examining/modifying memory locations. kgdboc builds upon this by letting you attach gdb to a running system. kgdboc provides pretty much everything that qemu's gdbserver functionality will provide, aside from the qemu-specific monitor commands.

Attach kdb to the VM's Additional Serial Device

First log into your VM as root and issue the command:

sed -i 's/GRUB_CMDLINE_LINUX=\"\(.*\)\"/GRUB_CMDLINE_LINUX=\"\1 kgdboc=ttyS1\"/' /etc/default/grub

And then run:

grub2-mkconfig -o /boot/grub2/grub.cfg

This will bind the kdb and kgdboc interfaces to the ttyS1 serial device that we created when we did our initial virt-install above. This is why we needed two --serial switches for that command.

Once this is done, run the following command in the VM:

shutdown now

Once that's completed, run the following command from the host system to bring the VM back up:

sudo virsh start ol8.4

After that, we can query the VM to determine the path to the new character device that was created for ttyS1, using a command like this:

sudo virsh qemu-monitor-command --pretty --domain ol8.4 --cmd '{"execute": "query-chardev"}' | grep -B1 charserial1

The output should look something like this:

      "filename": "pty:/dev/pts/3",
      "label": "charserial1"

Connect to the kdb Console

First we need to install screen:

sudo dnf --repo ol8_developer_EPEL install screen

Note that other utilities, such as minicom, can also be used to connect to the character device associated with the serial port. I'm using screen because it does what I need without any extra configuration.

Now use screen to attach to the pty device path that we gathered above:

sudo screen /dev/pts/3

Set and Trigger a Breakpoint Using kdb

At this point, the screen console should be blank, and won't respond to any keyboard input, other than the various Ctrl-A commands that screen understands. In order to do something useful here, we need to log into the VM and issue the following command:

echo g > /proc/sysrq-trigger

This will halt execution on the VM and activate a kdb prompt in the screen session where we attached to the VM's extra serial port. You'll note that the SSH connection will become unresponsive until we resume execution.

From here, we can run the help command, which will show us all of the other possible commands we can run. Most of these are fairly self-explanatory, so we won't go into great detail about the individual commands here. Instead, we'll examine how we can use kdb to set a breakpoint, and achieve a similar effect to what we did previously with qemu's gdbserver.

To do this, enter the following commands into the kdb prompt:

bp show_cpuinfo
go

This sets a breakpoint on show_cpuinfo, and then resumes execution. Now that the breakpoint is set, we can go back to our SSH connection, which should be responsive again, and enter the following command:

cat /proc/cpuinfo

At this point, we'll see a response similar to this in our screen session:

Entering kdb (current=0xffff88806b828000, pid 904) on processor 0 due to Breakpoint @ 0xffffffff8104c650
[0]kdb>

At this point we can enter the bt command to show that we have indeed hit the expected breakpoint:

[0]kdb> bt
Stack traceback for pid 904
0xffff88806b828000      904      860  1    0   R  0xffff88806b8293c0 *cat
Call Trace:
 ? show_cpuinfo+0x1/0x3f1
 ? seq_read+0x157/0x435
 proc_reg_read+0x3e/0x60
 __vfs_read+0x1b/0x34
 vfs_read+0x99/0x152
 ksys_read+0x61/0xd2
 __x64_sys_read+0x1a/0x1c
 do_syscall_64+0x60/0x1cb
 entry_SYSCALL_64_after_hwframe+0x170/0x0
RIP: 0033:0x7fbe902e95b5
Code: fe ff ff 50 48 8d 3d 82 f7 09 00 e8 85 fe 01 00 0f 1f 44 00 00 f3 0f 1e fa 48 8d 05 e5 6f 2d 00 8b 00 85 c0 75 0f 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 53 c3 66 90 41 54 49 89 d4 55 48 89 f5 53 89
RSP: 002b:00007ffebcd106c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
RAX: ffffffffffffffda RBX: 0000000000020000 RCX: 00007fbe902e95b5
RDX: 0000000000020000 RSI: 00007fbe90766000 RDI: 0000000000000003
RBP: 00007fbe90766000 R08: 00000000ffffffff R09: 0000000000000000
R10: 0000000000000022 R11: 0000000000000246 R12: 00007fbe90766000
R13: 0000000000000003 R14: 0000000000000fff R15: 0000000000020000

We can now enter the go command again, to continue on from here. Note that the breakpoint will be triggered once for each VCPU configured on the VM. If you set up your VM exactly the same as I did above, you'll need to issue the go command a second time here to fully resume execution.

Set and Trigger a Breakpoint Using kgdboc

At this point, basically all of the necessary setup to use kgdboc has already been done. The main difference here is that we'll connect gdb directly to the additional serial port on the VM, instead of using the kdb console on that serial port.

In order to do this, you'll need to first determine the baud rate of your additional serial port. To do that, run the following command on the VM:

stty < /dev/ttyS1

Your output should be similar to the following:

speed 9600 baud; line = 0;
-brkint -imaxbel

Here we can see that the baud rate on the second serial port in my VM is 9600. With that, we now have enough information to connect gdb directly to the VM using kgdboc.

First we need to start gdb on the host system, pointed at your kernel with debuginfo, like this:

sudo gdb /usr/lib/debug/usr/lib/modules/5.4.17-2102.201.3.el8uek.x86_64/vmlinux

Note that we need to use sudo when starting gdb this time, because we need to be able to read and write to the devices under /dev/pts.

Once gdb is running, we can go ahead and drop the VM into kdb by running the following command on it:

echo g > /proc/sysrq-trigger

Back in the gdb session, enter the following commands:

set serial baud <YOUR_SPEED>
set architecture i386:x86-64:intel
target remote <YOUR_PTS_DEVICE>

If you have not shut down your VM since the previous steps, the pts device path should be the same. Otherwise, you'll need to run the qemu-monitor-command from above to determine the proper path again. In my case, the commands looked like this:

set serial baud 9600
set architecture i386:x86-64:intel
target remote /dev/pts/3

If everything was successful, you should get a response from gdb that looks like this:

Remote debugging using /dev/pts/3
kgdb_breakpoint () at kernel/debug/debug_core.c:1139
1139            wmb(); /* Sync point after breakpoint */

From here, you have access to all of the regular kdb commands seen above, via gdb's monitor command (just prepend any kdb command with monitor to run it, i.e. monitor help), but you also have access to the functionality of gdb to do source level debugging, etc.

For now, we'll just set a breakpoint at the same place as we did in the kdb example, using the following commands:

break show_cpuinfo
continue

At this point, you can go back to the ssh session connected to your VM and run:

cat /proc/cpuinfo

gdb will now indicate that the breakpoint has been hit, with a message similar to:

Thread 130 hit Breakpoint 1, show_cpuinfo (m=0xffff888035c48300, v=0xffff88806c410260) at arch/x86/kernel/cpu/proc.c:58
58      {

Here you can issue the continue command (once for each VCPU), to get your system running normally again.

Going Further With kdb and kgdboc

Now that we've seen how to set up this utility on a basic level, it's probably worth discussing some more "practical" applications for these two techniques. At the end of the day, when operating on a local VM, neither of these tricks provides us with much utility beyond what we could have expected from qemu's gdbserver functionality. In my opinion, these utilities are much more useful for debugging bare-metal systems.

With kdb, you're able to do quite a bit of debugging, as long as you have some level of access to a serial console for your system. It's important to note there that if you're only able to use one serial port on your system, you can use that serial port as both a serial console and the kdb console - you can also use kgdboc over that same serial port, with the understanding that you won't be able to use the regular serial console while gdb is connected. Using bits of ssh or socat trickery, it's also often possible to bounce the serial communications from a remote machine over the network, and connect gdb to a network socket instead of a serial device.

It's also worth mentioning that any of the Python scripts that we used above when connected to qemu's gdbserver can be used here as well. You just need to pull in the scripts using gdb's source command, as shown above, and these scripts all function the same as they would with qemu.

crash

The final utility we'll discuss here is crash. While generally used for kernel memory dumps taken after a system has panicked (or explicitly triggered by a user), crash can also be used on a live system, to perform debugging tasks similar to what can be achieved with gdb pointed at /proc/kcore. The major difference with using crash is that it has quite a bit of built-in kernel knowledge and functionality.

First off, it's not really necessary to disable KASLR when using crash, as it's KASLR-aware, and able to match symbol locations appropriately without having to boot the system with KASLR disabled. This is handy when you want to do some debugging, but maybe aren't able to reboot the system, or modify the kernel command line. crash also has some built-in knowledge of /proc/kallsyms, so even if you don't have a debug kernel handy, you can still easily examine anything for which the symbol address is stored in /proc/kallsyms, without having to grep for the symbol name and copy/paste the address into your gdb session. Aside from the few bits of extra kernel knowledge that crash has built in regarding locating symbols, there are also a number of kernel-specific commands that crash provides to aid in debugging.

crash Basics

To get started with crash, we first install it like this:

dnf install crash

Since we already have the debug kernel available, I'm going to point my crash session at it, but, as mentioned before, this is not entirely necessary. Start crash like this:

crash /path/to/vmlinux

When debugging a kernel dump generated by kdump or virsh dump you invoke crash like this:

crash /path/to/vmlinux /path/to/kernel/dump

Note that, in the latter case, it is necessary to provide the path to the debug kernel binary. Without this crash is unable to properly interpret the memory layout of the kernel dump.

Once crash has loaded, we can run help to see the various commands available to us:

crash> help

*              extend         log            rd             task
alias          files          mach           repeat         timer
ascii          foreach        mod            runq           tree
bpf            fuser          mount          search         union
bt             gdb            net            set            vm
btop           help           p              sig            vtop
dev            ipcs           ps             struct         waitq
dis            irq            pte            swap           whatis
eval           kmem           ptob           sym            wr
exit           list           ptov           sys            q

For a detailed explanation of each command you can run:

help <command>

I won't attempt to explain all of the available commands here, but a few simple ones that might be of interest are ps and kmem. The ps command does exactly what one might expect: It provides information similar to what the ps command provides, along with some extra information that can be useful for debugging:

crash> ps
   PID    PPID  CPU       TASK        ST  %MEM     VSZ    RSS  COMM
>     0      0   0  ffffffff82414780  RU   0.0       0      0  [swapper/0]
      0      0   1  ffff88800fdd1800  RU   0.0       0      0  [swapper/1]
      1      0   0  ffff88800fd96000  IN   0.5  186316  10160  systemd
      2      0   0  ffff88800fd91800  IN   0.0       0      0  [kthreadd]
      3      2   0  ffff88800fd93000  ID   0.0       0      0  [rcu_gp]
      4      2   0  ffff88800fd94800  ID   0.0       0      0  [rcu_par_gp]
...

There are a number of switches that can be passed to ps to filter and modify this output in various ways. See help ps for more information.

The kmem command provides kernel memory usage and page table information. kmem requires a switch to operate. The most basic flavor of the kmem command is probably kmem -i, which just provides an overview of memory usage on the system:

crash> kmem -i
                 PAGES        TOTAL      PERCENTAGE
    TOTAL MEM   434685       1.7 GB         ----
         FREE    19598      76.6 MB    4% of TOTAL MEM
         USED   415087       1.6 GB   95% of TOTAL MEM
       SHARED    15938      62.3 MB    3% of TOTAL MEM
      BUFFERS        1         4 KB    0% of TOTAL MEM
       CACHED   261038    1019.7 MB   60% of TOTAL MEM
         SLAB    12755      49.8 MB    2% of TOTAL MEM

   TOTAL HUGE        0            0         ----
    HUGE FREE        0            0    0% of TOTAL HUGE

   TOTAL SWAP  1048575         4 GB         ----
    SWAP USED     6051      23.6 MB    0% of TOTAL SWAP
    SWAP FREE  1042524         4 GB   99% of TOTAL SWAP

 COMMIT LIMIT  1265917       4.8 GB         ----
    COMMITTED   169034     660.3 MB   13% of TOTAL LIMIT

This is just a small example of what can be done with the various crash commands. I'd strongly encourage anyone who is unfamiliar with crash to at least skim the help output for each of the commands. There is a lot of useful functionality lurking here - many of the switches might more accurately be described as "subcommands" as they can completely alter the output or behavior of a particular command. In the past I have definitely realized that I'd been using gdb commands to investigate something manually when there was a crash command that could have done a lot of work for me.

It also important to note that crash is basically a wrapper around gdb, so most basic gdb commands work as expected in crash. For instance, we can look at the contents of the linux_banner in the exact same manner that we used previously:

crash> print linux_banner
$1 = 0xffffffff81e001c0 <linux_banner> "Linux version 5.4.17-2102.201.3.el8uek.x86_64 (mockbuild@host-100-100-224-44) (gcc version 8.3.1 20190507 (Red Hat 8.3.1-4.5.0.8) (GCC)) #2 SMP Fri Apr 23 09:05:57 PDT 2021\n"

crash Python Scripts

crash also has extensions which enable you to run Python scripts against your running kernel, similar to what can be done with the plain gdb Python scripts. This was where I originally learned how to use Python scripts for kernel debugging, so I personally find the crash variety more intuitive than the plain gdb counterparts, but I believe that much of the "helper" functionality provided by the Python interpreter module is also now provided by the scripts that ship with the kernel.

In order to use Python scripts with crash, you need to download the module from: https://sourceforge.net/projects/pykdump/files/mpykdump-x86_64/. This tarball contains a mpykdump64.so and a crash64 binary. In general I use the crash that ships with my distro and then load in the mpykdum64 module. You can also use the crash binary that comes in this tarball, but this may not work well for all kernels. Either way, in order to load the mpykdump module, use the extend command:

crash> extend /root/usr/local/lib/mpykdump64.so
Setting scroll off while initializing PyKdump
/root/usr/local/lib/mpykdump64.so: shared object loaded

At this point, you can see several newly avilable commands in the help output, but the one we're concerned with here is the epython command. You can pass the path of a Python script to this command, and crash will run it. Here's an example Python script that I wrote to walk the VMA list of a process:

import argparse
from pykdump.API import *

parser = argparse.ArgumentParser()
parser.add_argument('task_struct', action='store', help='Task struct address')

args = parser.parse_args()

task_struct_addr = int(args.task_struct, 16)
print('Dumping VMA info for task struct: {:x}'.format(task_struct_addr))
task = readSU('struct task_struct', task_struct_addr)
mm = readSU('struct mm_struct', task.mm)

table_hdr_fmt_str = '{:32}{:32}{:32}'
table_fmt_str = '{:<32x}{:<32x}{:<32x}'
print(table_hdr_fmt_str.format('VMA', 'Start', 'End'))

cur_vma = readSU('struct vm_area_struct', mm.mmap)
while True:
        print(table_fmt_str.format(cur_vma, cur_vma.vm_start, cur_vma.vm_end))

        if cur_vma.vm_next != 0:
                cur_vma = readSU('struct vm_area_struct', cur_vma.vm_next)
        else:
                break

I've saved this script off to /root/vma_walk.py in my VM. Here's an example what it looks like when run on the task_struct address of a process running in my VM:

crash> epython /root/vma_walk.py ffff888034da4800
Dumping VMA info for task struct: ffff888034da4800
VMA                             Start                           End
ffff888067c039f8                55e5765ef000                    55e576d33000
ffff888067c03740                55e576f33000                    55e576fd0000
ffff8880679b4000                55e576fd0000                    55e576ff5000
ffff888067c033a0                55e576ff5000                    55e5772b1000
ffff88806ab933a0                55e57884e000                    55e582870000
ffff88805f97a000                7f83a8c78000                    7f83a8eb8000
ffff88805f97aae0                7f83a8eb8000                    7f83a8ebf000
ffff88805f97a2b8                7f83a8ebf000                    7f83a90be000
ffff88805f97a488                7f83a90be000                    7f83a90bf000
ffff88805f97a910                7f83a90bf000                    7f83a90c0000
ffff88805f97a9f8                7f83a90c0000                    7f83a90c2000
...

It's probably good to note that the vm command provided by crash will do this same thing for you, but this is a good simple example to give you an idea of what these types of scripts can do. For complete information about all the Python functionality provided by mpykdump64.so, you can refer to the documentation provided here:     https://pykdump.readthedocs.io/en/latest/developerdoc/reference.html

There is also quite a bit more useful information in other parts of these documents that I won't attempt to cover here, but it's another thing that's certainly valuable to at least skim.

For even more examples of what can be done here, take a look at :     https://github.com/neilbrown/lustre/tree/master/contrib/debug_tools/epython_scripts

These scripts are mostly specific to Lustre debugging (aside from uniqueStacktrace.py), but they are still interesting examples of how powerful these Python scripts can be.

Conclusion

While I've barely scratched the surface of what can be done using live debugging utilities, I hope that this series of blog posts provides a useful introduction to these various tools and techniques, and gives you information about where to go to learn more.

Alex Thorlton


Previous Post

Live Debugging Techniques for the Linux Kernel, Part 2 of 3

Alex Thorlton | 10 min read

Next Post


Announcing the release of Oracle Linux 8 Update 5

Simon Coter | 4 min read