When a major customer kernel issue happens, the Oracle Linux Support team is ready to take action. They receive memory dumps of the crashed kernel (called vmcores) and use them to identify the root cause of bugs. Unfortunately, these vmcore files can be quite large, since they scale with the size of a server’s memory. For our largest customers, this can be many hundreds of gigabytes. These files can take quite a lot of time to upload, and that’s time we can’t spend diagnosing the issue.

For a while now, support engineers have been asking for a way for customers to distill the information provided by a vmcore into something small, which could be uploaded immediately, allowing support to resolve their issues even before the vmcore is uploaded. To meet this need, we’ve created a tool called Corelens, which provides a first look at a vmcore. We can now have customers run Corelens and upload the resulting report immediately, helping to speed resolutions. What’s more, there’s no fiddling with debuginfo: Corelens can provide these reports without installing any debuginfo packages!

While we may never eliminate the need for a full core dump, especially when diagnosing never-before-seen issues, Corelens fills an important need for many more kernel-related problems.

Information Provided by Corelens

Corelens provides a report containing several pieces of useful information for diagnosing kernel issues. Its report is split into several modules that each provide a specific kind of output. Some modules provide information similar to what you might get from userspace applications at the time of the crash:

  • The dmesg module provides the contents of the kernel log.
  • The bt module provides stack traces of all kernel threads on the CPU, as well as all threads in an uninterruptible sleep.
  • The meminfo module provides information similar to what you’d find in the /proc/meminfo file.
  • The slabinfo module provides information similar to that provided by /proc/slabinfo or slabtop.

Already, system administrators familiar with these facilities can find useful information to help get an idea why a kernel crash occurred. However, Corelens also provides more in-depth information which could be used to solve specific kernel issues.

For instance, customers may experience issues related to the dentry cache, or page cache usage. Corelens can provide information about which files are taking up the most space in the file cache:

$ corelens ./vmcore -M filecache --limit 5
PAGES  SIZE       FS_TYPE  FILE
46025  179.8 MiB  xfs      /var/lib/rpm/Packages
11265  44.0 MiB   xfs      /usr/lib/debug/usr/lib/modules/5.4.17-2136.321.4.el8uek.x86_64/kernel/fs/xfs/xfs.ko.debug
10107  39.5 MiB   xfs      /var/oled/crash/127.0.0.1-2023-11-14-01:07:22/vmcore
7840   30.6 MiB   xfs      /boot/initramfs-5.4.17-2136.321.4.el8uek.x86_64kdump.img
4455   17.4 MiB   xfs      /usr/libexec/oracle-cloud-agent/plugins/oci-wlp/oci-wlp

$ corelens /proc/kcore -M filecache --limit 5
warning: Running corelens against a live system.
         Data may be inconsistent, or corelens may crash.
PAGES   SIZE       FS_TYPE  FILE
135821  530.6 MiB  ext4     /home/stepbren/repos/linux.git/objects/pack/pack-2d56cdac769d71ccf0f21ad5791aebcee3883316.pack
95230   372.0 MiB  ext4     /home/stepbren/repos/linux.git/objects/pack/pack-ffbcf6f230a68d3d895da62090f7fdf9dcad1fd8.idx
81979   320.2 MiB  ext4     /home/stepbren/repos/linux.git/objects/pack/pack-ffbcf6f230a68d3d895da62090f7fdf9dcad1fd8.pack
62469   244.0 MiB  ext4     /home/stepbren/.config/Slack/IndexedDB/https_app.slack.com_0.indexeddb.blob/1/10/1095
38779   151.5 MiB  ext4     /usr/lib/slack/slack

Or similarly, the dentrycache module shows which directory entries are present in the cache, and the interactive ls command can help identify the count of positive and negative dentries in each directory of the vmcore.

$ corelens /proc/kcore -M dentrycache --limit 5
warning: Running corelens against a live system.
         Data may be inconsistent, or corelens may crash.
00001 /home/stepbren/.emacs.d/.local/straight/build-29.1/cmake-mode/evil-snipe.elc
00002 /home/stepbren/.emacs.d/.local/straight/build-29.1/counsel/vc-git.elc
00003 /usr/share/man/man3/probe::sunrpc.svc.register.3stap.gz
00004 /usr/lib64/libKPipeWire.so.5.27.11
00005 /home/stepbren/repos/linux-upstream/fs/iomap/swapfile.c

$ corelens /proc/kcore -M ls /usr/lib64/ -c
warning: Running corelens against a live system.
         Data may be inconsistent, or corelens may crash.
3474 positive, 85 negative dentries

Like most Corelens modules, these commands don’t just work on vmcores! They can also be used to get the same information from a currently running system, using the /proc/kcore file. Some of the above examples are (perhaps obviously) taken from a developer’s laptop, rather than a server vmcore.

Identifying Potential Issues

Beyond simply extracting information from the core dump, Corelens comes with modules that attempt to get to the root cause of some bugs. For instance, some system crashes are a result of pending I/O requests which may be in-flight for a very long time. Corelens can identify outstanding I/O requests and give a summary, including the device, the operation, and the length of time that the request has been pending.

$ corelens ./vmcore -M inflight-io
device               hwq                  request              cpu              op
flags                offset               len                  inflight-time
sdb                  ffff88d9c2a2ec00     ffff88da10354c80     2                REQ_OP_WRITE
0b100011000011000010 23134208             1048576              0 00:00:03.172
sdb                  ffff88d9c2a2ec00     ffff88da103550c0     2                REQ_OP_WRITE|__REQ_NOMERGE
0b100011000011000010 23136256             1048576              0 00:00:03.171
sdb                  ffff88d9c2a2ec00     ffff88da1035c380     0                REQ_OP_WRITE|__REQ_SYNC
0b100011000011000010 89131040             24576                0 00:00:00.000
sdb                  ffff88d9c2a2ec00     ffff88da10355500     2                REQ_OP_WRITE|__REQ_NOMERGE
0b100011000011000010 23138304             1048576              0 00:00:03.171
...

Finally, one innovative feature of Corelens is to highlight all mutexes, semaphores, and rw_semaphores in use by the kernel, showing which tasks own each one, and which tasks are waiting for them. This is a quite complex problem, since it requires accurately finding the mutex or semaphore in the waiting tasks’ stacks. While this would normally require external DWARF debuginfo (see below), we’ve enabled this analysis without that dependency. This can help uncover deadlocks and performance issues. Here is a simple example, where we used a kernel module to artificially create a deadlock:

$ corelens ./vmcore -M lock
Scanning Mutexes...

Mutex: 0xffffffffc122a000 (object symbol: lock2+0x0)
Mutex OWNER: deadlock_thread PID : 11098
Mutex WAITERS (Index, cpu, comm, pid, state, wait time (d hr:min:sec:ms)):
         [1]: cpu: 0    deadlock_thread  11097    D      0 00:00:40.512

Mutex: 0xffffffffc122a020 (object symbol: lock1+0x0)
Mutex OWNER: deadlock_thread PID : 11097
Mutex WAITERS (Index, cpu, comm, pid, state, wait time (d hr:min:sec:ms)):
         [1]: cpu: 0    deadlock_thread  11098    D      0 00:00:40.522

Scanning Semaphores...

Scanning RWSemaphores...

In this case, this illustrates a deadlock between PIDs 11098 and 11097. One holds the mutex named “lock2” and waits for “lock1”, and the other holds “lock1” and waits for “lock2”.

With information like this, in some cases our support engineers may never need to look at a vmcore, as the information they need may be included in the Corelens report. And for more technically-minded users, the report could help them get the visibility they need to understand how best to proceed when they encounter kernel issues.

Corelens contains several additional modules, each of which provides detailed information for a relevant kernel subsystem. You can see all of the modules in the current version of corelens (version 1.1.2) below, and you can find the the latest list by installing corelens and running corelens -L

$ corelens -L
inflight-io       Display I/O requests that are currently pending
blockinfo         Corelens Module for block device info
ps                Corelens Module for ps
bt                Print a stack trace for a task, CPU, or set of tasks
meminfo           Show various details about the memory management subsystem
buddyinfo         This module shows details about the per-zone buddy page allocator.
cmdline           Display the kernel command line
cpuinfo           Corelens Module for cpuinfo
net               Display network related information.
ls                List or count child dentries given a file path
                  Manually run module (only run with -M)
dentrycache       List dentries from the dentry hash table
dm                Display info about device mapper devices
                  Skipped unless "dm_mod" kernel module is loaded.
ext4_dirlock_scan Scan processes hung by ext4 directory inode lock
                  Skipped unless "ext4" kernel module is loaded.
filecache         Prints files from page cache, sorted by the amount of cached pages
fsnotify          Print details about the fsnotify subsystem
lock              Display active mutex and semaphores and their waiters
lsmod             List loaded modules, dependencies and their parameter value.
md                Display info about "Multiple device" software RAID
mounts            Print info about all mount points
rds               Print info about RDS devices, sockets, connections, and stats
                  Skipped unless "rds" kernel module is loaded.
nfsshow           Print summary information on NFS client side
                  Skipped unless "nfs" kernel module is loaded.
numastat          Show various details about the memory management subsystem for all
partitioninfo     Corelens Module for partition information
dmesg             Display the kernel log
runq              List process that are in RT and CFS queue.
slabinfo          Print info about each slab cache
smp               Display the state of the SMP IPI subsystem
                  Live kernel not supported.
sys               Corelens Module for sysinfo

For any given Corelens module, you can learn more about the command line arguments it accepts, by using the -h option as shown below.

$ corelens -M ls -h
usage: ls [-h] [--count] [--depth DEPTH] directory

List or count child dentries given a file path

positional arguments:
  directory             directory to list

optional arguments:
  -h, --help            show this help message and exit
  --count, -c           only print counts, rather than every element
  --depth DEPTH, -d DEPTH
                        Print the dentries of subdirectories recursively up to the specified depth.

Manually run module (only run with -M)

Inside the Microscope

Corelens is built on the foundation of drgn, a Python-based debugger library which is well-tuned for debugging the Linux kernel. In addition to the helpers that drgn itself provides, we’ve created several more which are specific to Oracle Linux. We’ve contributed much of our code to drgn itself.

One of the most important constraints for Corelens is that it must function without needing to install any debuginfo packages. Traditional debuginfo packages provide DWARF information, which is quite large. Most customers cannot install these packages onto their production systems. Instead, Corelens relies on Compact Type Format (CTF) information for Oracle’s UEK (Unbreakable Enterprise Kernel). We’re continuing to upstream the support for CTF in drgn, as outlined in this blog post, but in the meantime, we offer CTF support for drgn in Oracle Linux today, which Corelens takes advantage of.

In practical terms, without CTF support, customers would normally need to download RPMs that are around 800 MiB large, and require 4GiB of disk space to install. Then, they would need considerable expertise with a tool like Crash or drgn to extract similar information from a vmcore. Thanks to Corelens and CTF, we can produce these reports without any of that.

Reducing Dependence on the Vmcore

Already, Corelens is exciting because it can produce a small report in the place of the large vmcore file. However, we hope that we can take this further in the future.

Creating a vmcore is a costly process. In Oracle Linux, a section of memory is typically reserved for emergency situations. Into this memory, a “kdump kernel” is loaded and left ready in case of a crash. If any kernel panic happens, control is transferred to the kdump kernel. In this environment, the file /proc/vmcore is present, representing the memory image of the crashed kernel. The program makedumpfile is used to create a smaller version of the memory image, by filtering out irrelevent pages and compressing the remaining data.

There are two key issues in this process. First, memory must be reserved ahead of time. Memory which is used by the kdump kernel cannot be used during normal operation, and the kdump kernel cannot reclaim any memory from the crashed kernel (due to risks of corruption from pending DMA operations). So the kdump kernel is greatly limited in what it can do, because it is limited to a small amount of memory compared to the amount available to the system. Secondly, disk space must be made available for a core dump. While makedumpfile does reduce the typical size of vmcores substantially, they can still be large enough that the disk space is an issue.

With CTF, vmcore file analysis uses less memory than DWARF. This means that it could be used directly within the kdump environment to produce a report. While DWARF could be used to produce similar information, it would require a larger kdump memory allocation, and it would require that customers have DWARF debuginfo installed at all times. Since Corelens reports are orders of magnitude smaller than vmcores, they can also be created quicker than a full vmcore. At the same time, our diagnostic scripts would have access to information that makedumpfile would normally filter out, such as userspace memory pages. This would allow our Corelens reports to be smaller and faster to produce, reducing downtime for customers. Yet they may also be able to provide more useful information to our support engineers than ever before.

Of course, no Corelens report can contain the same volume of data as a vmcore would, so it’s still likely that creating and uploading a vmcore will be a necessary part of resolving some kernel issues. However, we hope Corelens can minimize this as much as possible, enabling faster and easier resolutions for customers with lower downtime. One way we intend to do this is by extending sosreport with the ability to collect Corelens reports.

Getting Started

If you’d like to get started with Corelens, you can do so now. Corelens is available for Oracle Linux 8 and 9 as part of our drgn-tools package. Ensure that you have the Add-ons repository enabled (ol8_addons or ol9_addons, respectively), and then install it with:

dnf install drgn-tools

You can see more detailed installation and usage instructions in our documentation for Oracle Linux 8 and Oracle Linux 9. In particular, the corelens command reference provides a high-level overview of how to use the CLI.