Live Debugging Techniques for the Linux Kernel, Part 1 of 3

October 27, 2021 | 12 minute read
Text Size 100%:

Introduction

When investigating Linux kernel bugs, or developing new kernel features, it is often incredibly valuable to be able to examine the state of the system as it is running, rather than to use log messages and other bits of information gathered "after the fact." In this three-part series of blog posts, I will attempt to detail many of the various methods that I have used to debug running kernels. In this first post, I will explain how to set up an Oracle Linux virtual machine where you can experiment with the debugging techniques that will be discussed. The second and third posts will cover "basic" debugging techniques which utilize gdb, and some more advanced techniques which use tools specifically designed for kernel debugging.

Setting up a Virtual Machine for Testing

These instructions will show how to set up an Oracle Linux virtual machine (VM) for debugging, but most of what I will cover can be generalized to any modern Linux distribution. Setting up a VM is an easy way to experiment with these methods, without the need for spare machines and other hardware, such as serial cables, USB adapters, etc. For this reason, I will mainly focus on debugging VMs. However, I will attempt to give details and examples of how these techniques can be used on bare metal systems, where appropriate.

Preparing to Create a VM

Before creating a VM, you'll need to install a few dependencies. I'll show how to do this on a Bare Metal OCI instace, running the most recent Oracle Linux 8 OCI image. The process should be similar for Linux distributions that use the yum or dnf package manager. Commands and package names will vary for other distributions.

Note that I'm doing this on a bare metal OCI server, as these are readily available to me, and the procedure used should closely match what one would do when setting this up on their personal machine.

Install qemu, libvirt and Other Tools

First we need to install the required qemu and libvirt packages:

sudo dnf install qemu-kvm qemu-img libvirt libvirt-client virt-install libguestfs-tools

Start the libvirtd Service

Once the required packages are installed, you'll need to start libvirtd:

sudo systemctl restart libvirtd

Note that this service is enabled by default on Oracle Linux. However, on other distributions, this may not be the case, so you might want to do the following to ensure the service is enabled after every reboot:

sudo systemctl enable libvirtd

Creating a VM

Once libvirt and qemu are properly installed, you'll need to download a base image for your virtual machine. I'm using the latest OL8 image, which is currently found here:
    http://yum.oracle.com/templates/OracleLinux/OL8/u4/x86_64/OL8U4_x86_64-olvm-b85.qcow2

Links to the most up-to-date images should always be available here:
    http://yum.oracle.com/oracle-linux-templates.html

Download the VM Image

In order to create the VM, the image needs to be downloaded to your machine. I do this using wget:

wget http://yum.oracle.com/templates/OracleLinux/OL8/u4/x86_64/OL8U4_x86_64-olvm-b85.qcow2

Move the Image to the Appropriate Location

By default, libvirt stores VM images at /var/lib/libvirt/images. While your image does not necessarily need to be stored in that directory, it's probably best to just go ahead and put it there:

sudo mv OL8U4_x86_64-olvm-b85.qcow2 /var/lib/libvirt/images

Set the root Password in the VM

These VMs do not come with a default root password set, so you will need to set one before bringing up the VM for the first time:

sudo virt-customize -a /var/lib/libvirt/images/OL8U4_x86_64-olvm-b85.qcow2 --root-password password:INSERT_PASSWORD_HERE

Use virt-install to Import the Image

Now you can use the following virt-install incantation to create a VM from the qcow2 image:

sudo virt-install                                                       \
        --disk /var/lib/libvirt/images/OL8U4_x86_64-olvm-b85.qcow2      \
        --os-variant ol8.4                                              \
        --memory 2048                                                   \
        --vcpus 2                                                       \
        --console pty,target_type=virtio                                \
        --serial pty                                                    \
        --serial pty                                                    \
        --network default                                               \
        --noautoconsole                                                 \
        --import

Test the VM's root Login

Once the VM has been created, you should ensure that you can log into it. In order to find the IP address for your VM, use:

sudo virsh domifaddr ol8.4

If this command doesn't return anything, the issue is likely that the VM has not booted up far enough to request an IP. Give it a minute and try again.

The output from that command will look something like this:

 Name       MAC address          Protocol     Address
-------------------------------------------------------------------------------
 vnet9      52:54:00:4c:70:e5    ipv4         192.168.122.202/24

The IP is in the address column. You can try logging into your VM using that IP, like this:

ssh root@192.168.122.202

Note that you need to strip off the subnet mask (i.e. the /24 portion) here. If everything went well, you should now be able to log in with the password you provided to virt-customize.

Copy SSH Public Key to the VM

To make logging into the VM easier in the future, you can run:

ssh-copy-id root@VM_IP

Where VM_IP is the IP address of your VM. If you get an error like this:

/usr/bin/ssh-copy-id: ERROR: No identities found

Then you need to run ssh-keygen first to create an SSH key to copy.

Disable KASLR in the VM and Adjust the console Parameter

This step is not explicitly necessary, but is extremely helpful, as it ensures that symbol addresses found in the kernel binary will match up with what is found on the system. It is easy enough to determine the KASLR offset, and add this to the symbol addresses found in the debuginfo binary, but when doing live debugging we usually have full control of the system, meaning that it's trivial to disable KASLR.

Unfortunately, grub2-editenv, the tool which we need to use to modify the kernel command line stored in the grub config file, does not provide a straightforward method for appending to the kernel command line. Instead, we'll need to copy the existing command line arguments, add the nokaslr argument to it, and then set the kernelopts variable to this updated list of arguments. I'll use some bash trickery to make this a little easier.

First, ssh to the testing VM:

ssh root@<VM_IP>

Next run this command to properly adjust the kernel's console parameter:

sed -i 's/console=tty0/console=ttyS0,115200/' /etc/default/grub

Then run the following command to append nokaslr to the kernel command line:

sed -i 's/GRUB_CMDLINE_LINUX=\"\(.*\)\"/GRUB_CMDLINE_LINUX=\"\1 nokaslr\"/' /etc/default/grub

Now update the actual grub configuration:

grub2-mkconfig -o /boot/grub2/grub.cfg

Finally, we can check to make sure this actually worked as expected:

# grep 'set kernelopts' /boot/grub2/grub.cfg
  set kernelopts="root=/dev/mapper/vg_main-lv_root ro console=ttyS0,115200 no_timer_check net.ifnames=0 biosdevname=0 crashkernel=auto resume=/dev/mapper/vg_main-lv_swap rd.lvm.lv=vg_main/lv_root rd.lvm.lv=vg_main/lv_swap nokaslr "

At this point, the VM needs to be rebooted in order to pick up the new command line arguments. Issue the reboot command and then log back into the VM.

Once you've logged back into the VM, issue the following command:

cat /proc/cmdline

You should be able to see the nokaslr argument at the end of the command line string.

Gathering Kernel Source and Debug Info

Two important pieces of the "debugging puzzle" are the kernel source code, and a kernel binary that matches the kernel you intend to debug, which has been built with debug info.

When debugging a hand-built kernel, one can easily get the debug info built into the kernel by setting CONFIG_DEBUG_INFO=y in the kernel config, and running a build. However, in order to debug vendor-built kernels, you sometimes need to go a bit out of your way to locate a kernel that has been built with debug info. Note that most distributions will provide a "debug kernel" package, which is generally not what we're looking for here. The "debug kernel" is usually a kernel that has been built with various debugging features turned on, but, in most cases, this does not imply that the debug info has been built into that kernel. Oracle, and most other vendors, provide debuginfo packages, which contain binaries that have not been stripped, so that they still contain data useful for debugging tools such as gdb.

Kernel Source

The kernel running in our VM is a recent UEK kernel. There are two potential methods for acquiring the source for this kernel: Either by downloading and partially preparing the source RPM, or by cloning the linux-uek branch from GitHub and checking out the appropriate tag.

Source RPM

I'm usually looking to hack on a kernel when doing work like this, so I prefer to go the source RPM route. This gives you access to the source, and allows you to easily build a kernel that will exactly match the running kernel. However, it is important to note that the code contained in the source RPM does not include git history, so you'll want to clone the git repository if you need that.

I'm running the following commands on the host system, not inside the guest VM. In some cases you will want the debuginfo binary available inside the VM, but for those cases, I find that it's best to just copy it in from the host system.

Anyway, in order to acquire the source RPM for your kernel, you can run a command similar to this:

dnf download --source kernel-uek-5.4.17-2102.201.3.el8uek

You'll need to replace the exact kernel version with whatever is appropriate for your scenario, but this is what I used to grab the source for my kernel.

Now the source RPM needs to be installed:

rpm -i kernel-uek-5.4.17-2102.201.3.el8uek.src.rpm

Once you've got the source installed, you'll need to run the %prep step of the rpmbuild command, to get it extracted and ready to be built.

First install rpm-build and dnf-utils on your host system:

sudo dnf install rpm-build dnf-utils

Now run yum-builddep on the source RPM's specfile:

sudo yum-builddep rpmbuild/SPECS/kernel-uek.spec

Note that, while this script still carries the yum naming scheme, it does, in fact, use dnf under the hood, so it will "do the right thing" for our OS.

Once the build dependencies are installed, you can use rpmbuild to run the %prep step of the build, which will extract and prepare the source for us:

rpmbuild -bp rpmbuild/SPECS/kernel-uek.spec

Once this is finished, you can find the kernel source at:

/root/rpmbuild/BUILD/kernel-5.4.17/linux-5.4.17-2102.201.3.el8uek

And the appropriate config for building a kernel that matches the running kernel at:

/root/rpmbuild/SOURCES/config-x86_64

You can copy in this config if you want to do something like make cscope to generate a cscope database, or make menuconfig to modify the config before building a custom kernel.

For our purposes there is really no need to build the kernel from source, but if you want to do so, you can run something like:

rpmbuild --short-circuit --define '_smp_mflags -j22' -bc rpmbuild/SPECS/kernel-uek.spec

This will compile the kernel, but the build will not generate an actual binary RPM. You can replace -bc with -bb if you want to build an installable RPM.

Note that the -j22 tells make how many concurrent threads to use. You should replace the -j22 with something appropriate for your system.

Git Repository

The source for UEK kernels can also be acquired by cloning the linux-uek repository from GitHub.

If you choose to go this route, you'll first need to clone the repo with:

git clone https://github.com/oracle/linux-uek

Once this is complete, you'll need to check out the appropriate tag. In my case, I'm working with the 5.4.17-2102.201.3.el8uek.x86_64 kernel, so I'll check out the v5.4.17-2102.201.3, like so:

git checkout v5.4.17-2102.201.3

At this point you should have the appropriate souce for your kernel checked out. If you need to run a build of this kernel, you'll need to ensure that all of the appropriate tools are installed on your system (as would be done by yum-builddep for the source RPM), and that you have the appropriate kernel config. Generally the easiest way to locate the proper config is to just look in /boot on a system where the kernel is installed.

Kernel Debug Info

First, add the debuginfo repository to your dnf configuration:

sudo yum-config-manager --add-repo https://oss.oracle.com/ol8/debuginfo/

Note that OL6 and OL7 debuginfo RPMs can be found at these locations, respectively:

The debuginfo for most packages that ship on Oracle Linux, including the Red Hat compatible kernels, can be found at these locations.

Now you'll need to locate the debuginfo package for your kernel. You can search for all the available debuginfo packages for the UEK kernel by doing something like this:

dnf search --showduplicates 'kernel-uek-debuginfo'

There will be quite a few results here, so it helps to grep for the kernel version you're interested in. In my case, the kernel I'm after is the same one I'm running, so I can do:

dnf search --showduplicates 'kernel-uek-debuginfo' | grep $(uname -r)

This will likely return 2 results that look something like this:

kernel-uek-debuginfo-5.4.17-2102.201.3.el8uek.x86_64 : Debug information for package kernel-uek
kernel-uek-debuginfo-common-5.4.17-2102.201.3.el8uek.x86_64 : Kernel source files used by kernel-uek-debuginfo packages

Here we're only interested in the kernel-uek-debuginfo RPM. When installed, it will pull the debuginfo-common RPM in as a dependency, so we do not need to explicitly install that one. In order to install the RPM, do the following:

sudo dnf install kernel-uek-debuginfo-5.4.17-2102.201.3.el8uek.x86_64

After the debuginfo RPMs are installed, the binaries containing the debug info can be found at:

/usr/lib/debug/usr/lib/modules/5.4.17-2102.201.3.el8uek.x86_64

Note that the debug info binaries for any kernel modules that ship with the base kernel package can be found here as well.

Shut Down the VM

At this point, the VM is ready for us to begin trying out some debugging techniques. For now though, we want to shut it down, to avoid wasting system resources. This can be done by logging into the VM through ssh and issuing the command:

shutdown now

Until next time

In the next post I'll discuss how to use gdb both directly on the guest OS, and from the host system, using Qemu's gdbserver functionality.

Alex Thorlton


Previous Post

Gluster Storage Release 8 for Oracle Linux includes self healing and performance improvements

Simon Coter | 3 min read

Next Post


Live Debugging Techniques for the Linux Kernel, Part 2 of 3

Alex Thorlton | 10 min read