In this three-part series of blog posts, I will attempt to detail many of the various methods that I have used to debug running kernels. In the previous two posts in this series, I explained how to set up an Oracle Linux virtual machine where you can experiment with various debugging techniques, and covered some of the basic techniques that I have used. This final post will discuss some more advanced techniques which use tools specifically designed for kernel debugging.
Before we begin, you'll want to start the debugging VM that we set up in the first post, using a command like this:
sudo virsh start ol8.4
Once the VM has booted, log into it with ssh. We're now ready to start experimenting with the kernel's built-in debugging tools, and the crash utility.
These two utilities are built directly into the kernel. kdb allows you to perform some fairly simple debugging tasks like setting breakpoints and examining/modifying memory locations. kgdboc builds upon this by letting you attach gdb to a running system. kgdboc provides pretty much everything that qemu's gdbserver functionality will provide, aside from the qemu-specific monitor commands.
First log into your VM as root and issue the command:
sed -i 's/GRUB_CMDLINE_LINUX=\"\(.*\)\"/GRUB_CMDLINE_LINUX=\"\1 kgdboc=ttyS1\"/' /etc/default/grub
And then run:
grub2-mkconfig -o /boot/grub2/grub.cfg
This will bind the kdb and kgdboc interfaces to the ttyS1 serial device that we created when we did our initial virt-install above. This is why we needed two --serial switches for that command.
Once this is done, run the following command in the VM:
shutdown now
Once that's completed, run the following command from the host system to bring the VM back up:
sudo virsh start ol8.4
After that, we can query the VM to determine the path to the new character device that was created for ttyS1, using a command like this:
sudo virsh qemu-monitor-command --pretty --domain ol8.4 --cmd '{"execute": "query-chardev"}' | grep -B1 charserial1
The output should look something like this:
"filename": "pty:/dev/pts/3", "label": "charserial1"
First we need to install screen:
sudo dnf --repo ol8_developer_EPEL install screen
Note that other utilities, such as minicom, can also be used to connect to the character device associated with the serial port. I'm using screen because it does what I need without any extra configuration.
Now use screen to attach to the pty device path that we gathered above:
sudo screen /dev/pts/3
At this point, the screen console should be blank, and won't respond to any keyboard input, other than the various Ctrl-A commands that screen understands. In order to do something useful here, we need to log into the VM and issue the following command:
echo g > /proc/sysrq-trigger
This will halt execution on the VM and activate a kdb prompt in the screen session where we attached to the VM's extra serial port. You'll note that the SSH connection will become unresponsive until we resume execution.
From here, we can run the help command, which will show us all of the other possible commands we can run. Most of these are fairly self-explanatory, so we won't go into great detail about the individual commands here. Instead, we'll examine how we can use kdb to set a breakpoint, and achieve a similar effect to what we did previously with qemu's gdbserver.
To do this, enter the following commands into the kdb prompt:
bp show_cpuinfo go
This sets a breakpoint on show_cpuinfo, and then resumes execution. Now that the breakpoint is set, we can go back to our SSH connection, which should be responsive again, and enter the following command:
cat /proc/cpuinfo
At this point, we'll see a response similar to this in our screen session:
Entering kdb (current=0xffff88806b828000, pid 904) on processor 0 due to Breakpoint @ 0xffffffff8104c650 [0]kdb>
At this point we can enter the bt command to show that we have indeed hit the expected breakpoint:
[0]kdb> bt Stack traceback for pid 904 0xffff88806b828000 904 860 1 0 R 0xffff88806b8293c0 *cat Call Trace: ? show_cpuinfo+0x1/0x3f1 ? seq_read+0x157/0x435 proc_reg_read+0x3e/0x60 __vfs_read+0x1b/0x34 vfs_read+0x99/0x152 ksys_read+0x61/0xd2 __x64_sys_read+0x1a/0x1c do_syscall_64+0x60/0x1cb entry_SYSCALL_64_after_hwframe+0x170/0x0 RIP: 0033:0x7fbe902e95b5 Code: fe ff ff 50 48 8d 3d 82 f7 09 00 e8 85 fe 01 00 0f 1f 44 00 00 f3 0f 1e fa 48 8d 05 e5 6f 2d 00 8b 00 85 c0 75 0f 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 53 c3 66 90 41 54 49 89 d4 55 48 89 f5 53 89 RSP: 002b:00007ffebcd106c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 RAX: ffffffffffffffda RBX: 0000000000020000 RCX: 00007fbe902e95b5 RDX: 0000000000020000 RSI: 00007fbe90766000 RDI: 0000000000000003 RBP: 00007fbe90766000 R08: 00000000ffffffff R09: 0000000000000000 R10: 0000000000000022 R11: 0000000000000246 R12: 00007fbe90766000 R13: 0000000000000003 R14: 0000000000000fff R15: 0000000000020000
We can now enter the go command again, to continue on from here. Note that the breakpoint will be triggered once for each VCPU configured on the VM. If you set up your VM exactly the same as I did above, you'll need to issue the go command a second time here to fully resume execution.
At this point, basically all of the necessary setup to use kgdboc has already been done. The main difference here is that we'll connect gdb directly to the additional serial port on the VM, instead of using the kdb console on that serial port.
In order to do this, you'll need to first determine the baud rate of your additional serial port. To do that, run the following command on the VM:
stty < /dev/ttyS1
Your output should be similar to the following:
speed 9600 baud; line = 0; -brkint -imaxbel
Here we can see that the baud rate on the second serial port in my VM is 9600. With that, we now have enough information to connect gdb directly to the VM using kgdboc.
First we need to start gdb on the host system, pointed at your kernel with debuginfo, like this:
sudo gdb /usr/lib/debug/usr/lib/modules/5.4.17-2102.201.3.el8uek.x86_64/vmlinux
Note that we need to use sudo when starting gdb this time, because we need to be able to read and write to the devices under /dev/pts.
Once gdb is running, we can go ahead and drop the VM into kdb by running the following command on it:
echo g > /proc/sysrq-trigger
Back in the gdb session, enter the following commands:
set serial baud <YOUR_SPEED> set architecture i386:x86-64:intel target remote <YOUR_PTS_DEVICE>
If you have not shut down your VM since the previous steps, the pts device path should be the same. Otherwise, you'll need to run the qemu-monitor-command from above to determine the proper path again. In my case, the commands looked like this:
set serial baud 9600 set architecture i386:x86-64:intel target remote /dev/pts/3
If everything was successful, you should get a response from gdb that looks like this:
Remote debugging using /dev/pts/3 kgdb_breakpoint () at kernel/debug/debug_core.c:1139 1139 wmb(); /* Sync point after breakpoint */
From here, you have access to all of the regular kdb commands seen above, via gdb's monitor command (just prepend any kdb command with monitor to run it, i.e. monitor help), but you also have access to the functionality of gdb to do source level debugging, etc.
For now, we'll just set a breakpoint at the same place as we did in the kdb example, using the following commands:
break show_cpuinfo continue
At this point, you can go back to the ssh session connected to your VM and run:
cat /proc/cpuinfo
gdb will now indicate that the breakpoint has been hit, with a message similar to:
Thread 130 hit Breakpoint 1, show_cpuinfo (m=0xffff888035c48300, v=0xffff88806c410260) at arch/x86/kernel/cpu/proc.c:58 58 {
Here you can issue the continue command (once for each VCPU), to get your system running normally again.
Now that we've seen how to set up this utility on a basic level, it's probably worth discussing some more "practical" applications for these two techniques. At the end of the day, when operating on a local VM, neither of these tricks provides us with much utility beyond what we could have expected from qemu's gdbserver functionality. In my opinion, these utilities are much more useful for debugging bare-metal systems.
With kdb, you're able to do quite a bit of debugging, as long as you have some level of access to a serial console for your system. It's important to note there that if you're only able to use one serial port on your system, you can use that serial port as both a serial console and the kdb console - you can also use kgdboc over that same serial port, with the understanding that you won't be able to use the regular serial console while gdb is connected. Using bits of ssh or socat trickery, it's also often possible to bounce the serial communications from a remote machine over the network, and connect gdb to a network socket instead of a serial device.
It's also worth mentioning that any of the Python scripts that we used above when connected to qemu's gdbserver can be used here as well. You just need to pull in the scripts using gdb's source command, as shown above, and these scripts all function the same as they would with qemu.
The final utility we'll discuss here is crash. While generally used for kernel memory dumps taken after a system has panicked (or explicitly triggered by a user), crash can also be used on a live system, to perform debugging tasks similar to what can be achieved with gdb pointed at /proc/kcore. The major difference with using crash is that it has quite a bit of built-in kernel knowledge and functionality.
First off, it's not really necessary to disable KASLR when using crash, as it's KASLR-aware, and able to match symbol locations appropriately without having to boot the system with KASLR disabled. This is handy when you want to do some debugging, but maybe aren't able to reboot the system, or modify the kernel command line. crash also has some built-in knowledge of /proc/kallsyms, so even if you don't have a debug kernel handy, you can still easily examine anything for which the symbol address is stored in /proc/kallsyms, without having to grep for the symbol name and copy/paste the address into your gdb session. Aside from the few bits of extra kernel knowledge that crash has built in regarding locating symbols, there are also a number of kernel-specific commands that crash provides to aid in debugging.
To get started with crash, we first install it like this:
dnf install crash
Since we already have the debug kernel available, I'm going to point my crash session at it, but, as mentioned before, this is not entirely necessary. Start crash like this:
crash /path/to/vmlinux
When debugging a kernel dump generated by kdump or virsh dump you invoke crash like this:
crash /path/to/vmlinux /path/to/kernel/dump
Note that, in the latter case, it is necessary to provide the path to the debug kernel binary. Without this crash is unable to properly interpret the memory layout of the kernel dump.
Once crash has loaded, we can run help to see the various commands available to us:
crash> help * extend log rd task alias files mach repeat timer ascii foreach mod runq tree bpf fuser mount search union bt gdb net set vm btop help p sig vtop dev ipcs ps struct waitq dis irq pte swap whatis eval kmem ptob sym wr exit list ptov sys q
For a detailed explanation of each command you can run:
help <command>
I won't attempt to explain all of the available commands here, but a few simple ones that might be of interest are ps and kmem. The ps command does exactly what one might expect: It provides information similar to what the ps command provides, along with some extra information that can be useful for debugging:
crash> ps PID PPID CPU TASK ST %MEM VSZ RSS COMM > 0 0 0 ffffffff82414780 RU 0.0 0 0 [swapper/0] 0 0 1 ffff88800fdd1800 RU 0.0 0 0 [swapper/1] 1 0 0 ffff88800fd96000 IN 0.5 186316 10160 systemd 2 0 0 ffff88800fd91800 IN 0.0 0 0 [kthreadd] 3 2 0 ffff88800fd93000 ID 0.0 0 0 [rcu_gp] 4 2 0 ffff88800fd94800 ID 0.0 0 0 [rcu_par_gp] ...
There are a number of switches that can be passed to ps to filter and modify this output in various ways. See help ps for more information.
The kmem command provides kernel memory usage and page table information. kmem requires a switch to operate. The most basic flavor of the kmem command is probably kmem -i, which just provides an overview of memory usage on the system:
crash> kmem -i PAGES TOTAL PERCENTAGE TOTAL MEM 434685 1.7 GB ---- FREE 19598 76.6 MB 4% of TOTAL MEM USED 415087 1.6 GB 95% of TOTAL MEM SHARED 15938 62.3 MB 3% of TOTAL MEM BUFFERS 1 4 KB 0% of TOTAL MEM CACHED 261038 1019.7 MB 60% of TOTAL MEM SLAB 12755 49.8 MB 2% of TOTAL MEM TOTAL HUGE 0 0 ---- HUGE FREE 0 0 0% of TOTAL HUGE TOTAL SWAP 1048575 4 GB ---- SWAP USED 6051 23.6 MB 0% of TOTAL SWAP SWAP FREE 1042524 4 GB 99% of TOTAL SWAP COMMIT LIMIT 1265917 4.8 GB ---- COMMITTED 169034 660.3 MB 13% of TOTAL LIMIT
This is just a small example of what can be done with the various crash commands. I'd strongly encourage anyone who is unfamiliar with crash to at least skim the help output for each of the commands. There is a lot of useful functionality lurking here - many of the switches might more accurately be described as "subcommands" as they can completely alter the output or behavior of a particular command. In the past I have definitely realized that I'd been using gdb commands to investigate something manually when there was a crash command that could have done a lot of work for me.
It also important to note that crash is basically a wrapper around gdb, so most basic gdb commands work as expected in crash. For instance, we can look at the contents of the linux_banner in the exact same manner that we used previously:
crash> print linux_banner $1 = 0xffffffff81e001c0 <linux_banner> "Linux version 5.4.17-2102.201.3.el8uek.x86_64 (mockbuild@host-100-100-224-44) (gcc version 8.3.1 20190507 (Red Hat 8.3.1-4.5.0.8) (GCC)) #2 SMP Fri Apr 23 09:05:57 PDT 2021\n"
crash also has extensions which enable you to run Python scripts against your running kernel, similar to what can be done with the plain gdb Python scripts. This was where I originally learned how to use Python scripts for kernel debugging, so I personally find the crash variety more intuitive than the plain gdb counterparts, but I believe that much of the "helper" functionality provided by the Python interpreter module is also now provided by the scripts that ship with the kernel.
In order to use Python scripts with crash, you need to download the module from: https://sourceforge.net/projects/pykdump/files/mpykdump-x86_64/. This tarball contains a mpykdump64.so and a crash64 binary. In general I use the crash that ships with my distro and then load in the mpykdum64 module. You can also use the crash binary that comes in this tarball, but this may not work well for all kernels. Either way, in order to load the mpykdump module, use the extend command:
crash> extend /root/usr/local/lib/mpykdump64.so Setting scroll off while initializing PyKdump /root/usr/local/lib/mpykdump64.so: shared object loaded
At this point, you can see several newly avilable commands in the help output, but the one we're concerned with here is the epython command. You can pass the path of a Python script to this command, and crash will run it. Here's an example Python script that I wrote to walk the VMA list of a process:
import argparse from pykdump.API import * parser = argparse.ArgumentParser() parser.add_argument('task_struct', action='store', help='Task struct address') args = parser.parse_args() task_struct_addr = int(args.task_struct, 16) print('Dumping VMA info for task struct: {:x}'.format(task_struct_addr)) task = readSU('struct task_struct', task_struct_addr) mm = readSU('struct mm_struct', task.mm) table_hdr_fmt_str = '{:32}{:32}{:32}' table_fmt_str = '{:<32x}{:<32x}{:<32x}' print(table_hdr_fmt_str.format('VMA', 'Start', 'End')) cur_vma = readSU('struct vm_area_struct', mm.mmap) while True: print(table_fmt_str.format(cur_vma, cur_vma.vm_start, cur_vma.vm_end)) if cur_vma.vm_next != 0: cur_vma = readSU('struct vm_area_struct', cur_vma.vm_next) else: break
I've saved this script off to /root/vma_walk.py in my VM. Here's an example what it looks like when run on the task_struct address of a process running in my VM:
crash> epython /root/vma_walk.py ffff888034da4800 Dumping VMA info for task struct: ffff888034da4800 VMA Start End ffff888067c039f8 55e5765ef000 55e576d33000 ffff888067c03740 55e576f33000 55e576fd0000 ffff8880679b4000 55e576fd0000 55e576ff5000 ffff888067c033a0 55e576ff5000 55e5772b1000 ffff88806ab933a0 55e57884e000 55e582870000 ffff88805f97a000 7f83a8c78000 7f83a8eb8000 ffff88805f97aae0 7f83a8eb8000 7f83a8ebf000 ffff88805f97a2b8 7f83a8ebf000 7f83a90be000 ffff88805f97a488 7f83a90be000 7f83a90bf000 ffff88805f97a910 7f83a90bf000 7f83a90c0000 ffff88805f97a9f8 7f83a90c0000 7f83a90c2000 ...
It's probably good to note that the vm command provided by crash will do this same thing for you, but this is a good simple example to give you an idea of what these types of scripts can do. For complete information about all the Python functionality provided by mpykdump64.so, you can refer to the documentation provided here: https://pykdump.readthedocs.io/en/latest/developerdoc/reference.html
There is also quite a bit more useful information in other parts of these documents that I won't attempt to cover here, but it's another thing that's certainly valuable to at least skim.
For even more examples of what can be done here, take a look at : https://github.com/neilbrown/lustre/tree/master/contrib/debug_tools/epython_scripts
These scripts are mostly specific to Lustre debugging (aside from uniqueStacktrace.py), but they are still interesting examples of how powerful these Python scripts can be.
While I've barely scratched the surface of what can be done using live debugging utilities, I hope that this series of blog posts provides a useful introduction to these various tools and techniques, and gives you information about where to go to learn more.
Previous Post