When a major customer kernel issue happens, the Oracle Linux Support team is ready to take action. They receive memory dumps of the crashed kernel (called vmcores) and use them to identify the root cause of bugs. Unfortunately, these vmcore files can be quite large, since they scale with the size of a server’s memory. For our largest customers, this can be many hundreds of gigabytes. These files can take quite a lot of time to upload, and that’s time we can’t spend diagnosing the issue.
For a while now, support engineers have been asking for a way for customers to distill the information provided by a vmcore into something small, which could be uploaded immediately, allowing support to resolve their issues even before the vmcore is uploaded. To meet this need, we’ve created a tool called Corelens, which provides a first look at a vmcore. We can now have customers run Corelens and upload the resulting report immediately, helping to speed resolutions. What’s more, there’s no fiddling with debuginfo: Corelens can provide these reports without installing any debuginfo packages!
While we may never eliminate the need for a full core dump, especially when diagnosing never-before-seen issues, Corelens fills an important need for many more kernel-related problems.
Information Provided by Corelens
Corelens provides a report containing several pieces of useful information for diagnosing kernel issues. Its report is split into several modules that each provide a specific kind of output. Some modules provide information similar to what you might get from userspace applications at the time of the crash:
- The
dmesgmodule provides the contents of the kernel log. - The
btmodule provides stack traces of all kernel threads on the CPU, as well as all threads in an uninterruptible sleep. - The
meminfomodule provides information similar to what you’d find in the/proc/meminfofile. - The
slabinfomodule provides information similar to that provided by/proc/slabinfoorslabtop.
Already, system administrators familiar with these facilities can find useful information to help get an idea why a kernel crash occurred. However, Corelens also provides more in-depth information which could be used to solve specific kernel issues.
For instance, customers may experience issues related to the dentry cache, or page cache usage. Corelens can provide information about which files are taking up the most space in the file cache:
$ corelens ./vmcore -M filecache --limit 5
PAGES SIZE FS_TYPE FILE
46025 179.8 MiB xfs /var/lib/rpm/Packages
11265 44.0 MiB xfs /usr/lib/debug/usr/lib/modules/5.4.17-2136.321.4.el8uek.x86_64/kernel/fs/xfs/xfs.ko.debug
10107 39.5 MiB xfs /var/oled/crash/127.0.0.1-2023-11-14-01:07:22/vmcore
7840 30.6 MiB xfs /boot/initramfs-5.4.17-2136.321.4.el8uek.x86_64kdump.img
4455 17.4 MiB xfs /usr/libexec/oracle-cloud-agent/plugins/oci-wlp/oci-wlp
$ corelens /proc/kcore -M filecache --limit 5
warning: Running corelens against a live system.
Data may be inconsistent, or corelens may crash.
PAGES SIZE FS_TYPE FILE
135821 530.6 MiB ext4 /home/stepbren/repos/linux.git/objects/pack/pack-2d56cdac769d71ccf0f21ad5791aebcee3883316.pack
95230 372.0 MiB ext4 /home/stepbren/repos/linux.git/objects/pack/pack-ffbcf6f230a68d3d895da62090f7fdf9dcad1fd8.idx
81979 320.2 MiB ext4 /home/stepbren/repos/linux.git/objects/pack/pack-ffbcf6f230a68d3d895da62090f7fdf9dcad1fd8.pack
62469 244.0 MiB ext4 /home/stepbren/.config/Slack/IndexedDB/https_app.slack.com_0.indexeddb.blob/1/10/1095
38779 151.5 MiB ext4 /usr/lib/slack/slack
Or similarly, the dentrycache module shows which directory entries are present in the cache, and the interactive ls command can help identify the count of positive and negative dentries in each directory of the vmcore.
$ corelens /proc/kcore -M dentrycache --limit 5
warning: Running corelens against a live system.
Data may be inconsistent, or corelens may crash.
00001 /home/stepbren/.emacs.d/.local/straight/build-29.1/cmake-mode/evil-snipe.elc
00002 /home/stepbren/.emacs.d/.local/straight/build-29.1/counsel/vc-git.elc
00003 /usr/share/man/man3/probe::sunrpc.svc.register.3stap.gz
00004 /usr/lib64/libKPipeWire.so.5.27.11
00005 /home/stepbren/repos/linux-upstream/fs/iomap/swapfile.c
$ corelens /proc/kcore -M ls /usr/lib64/ -c
warning: Running corelens against a live system.
Data may be inconsistent, or corelens may crash.
3474 positive, 85 negative dentries
Like most Corelens modules, these commands don’t just work on vmcores! They can also be used to get the same information from a currently running system, using the /proc/kcore file. Some of the above examples are (perhaps obviously) taken from a developer’s laptop, rather than a server vmcore.
Identifying Potential Issues
Beyond simply extracting information from the core dump, Corelens comes with modules that attempt to get to the root cause of some bugs. For instance, some system crashes are a result of pending I/O requests which may be in-flight for a very long time. Corelens can identify outstanding I/O requests and give a summary, including the device, the operation, and the length of time that the request has been pending.
$ corelens ./vmcore -M inflight-io device hwq request cpu op flags offset len inflight-time sdb ffff88d9c2a2ec00 ffff88da10354c80 2 REQ_OP_WRITE 0b100011000011000010 23134208 1048576 0 00:00:03.172 sdb ffff88d9c2a2ec00 ffff88da103550c0 2 REQ_OP_WRITE|__REQ_NOMERGE 0b100011000011000010 23136256 1048576 0 00:00:03.171 sdb ffff88d9c2a2ec00 ffff88da1035c380 0 REQ_OP_WRITE|__REQ_SYNC 0b100011000011000010 89131040 24576 0 00:00:00.000 sdb ffff88d9c2a2ec00 ffff88da10355500 2 REQ_OP_WRITE|__REQ_NOMERGE 0b100011000011000010 23138304 1048576 0 00:00:03.171 ...
Finally, one innovative feature of Corelens is to highlight all mutexes, semaphores, and rw_semaphores in use by the kernel, showing which tasks own each one, and which tasks are waiting for them. This is a quite complex problem, since it requires accurately finding the mutex or semaphore in the waiting tasks’ stacks. While this would normally require external DWARF debuginfo (see below), we’ve enabled this analysis without that dependency. This can help uncover deadlocks and performance issues. Here is a simple example, where we used a kernel module to artificially create a deadlock:
$ corelens ./vmcore -M lock
Scanning Mutexes...
Mutex: 0xffffffffc122a000 (object symbol: lock2+0x0)
Mutex OWNER: deadlock_thread PID : 11098
Mutex WAITERS (Index, cpu, comm, pid, state, wait time (d hr:min:sec:ms)):
[1]: cpu: 0 deadlock_thread 11097 D 0 00:00:40.512
Mutex: 0xffffffffc122a020 (object symbol: lock1+0x0)
Mutex OWNER: deadlock_thread PID : 11097
Mutex WAITERS (Index, cpu, comm, pid, state, wait time (d hr:min:sec:ms)):
[1]: cpu: 0 deadlock_thread 11098 D 0 00:00:40.522
Scanning Semaphores...
Scanning RWSemaphores...
In this case, this illustrates a deadlock between PIDs 11098 and 11097. One holds the mutex named “lock2” and waits for “lock1”, and the other holds “lock1” and waits for “lock2”.
With information like this, in some cases our support engineers may never need to look at a vmcore, as the information they need may be included in the Corelens report. And for more technically-minded users, the report could help them get the visibility they need to understand how best to proceed when they encounter kernel issues.
Corelens contains several additional modules, each of which provides detailed information for a relevant kernel subsystem. You can see all of the modules in the current version of corelens (version 1.1.2) below, and you can find the the latest list by installing corelens and running corelens -L
$ corelens -L
inflight-io Display I/O requests that are currently pending
blockinfo Corelens Module for block device info
ps Corelens Module for ps
bt Print a stack trace for a task, CPU, or set of tasks
meminfo Show various details about the memory management subsystem
buddyinfo This module shows details about the per-zone buddy page allocator.
cmdline Display the kernel command line
cpuinfo Corelens Module for cpuinfo
net Display network related information.
ls List or count child dentries given a file path
Manually run module (only run with -M)
dentrycache List dentries from the dentry hash table
dm Display info about device mapper devices
Skipped unless "dm_mod" kernel module is loaded.
ext4_dirlock_scan Scan processes hung by ext4 directory inode lock
Skipped unless "ext4" kernel module is loaded.
filecache Prints files from page cache, sorted by the amount of cached pages
fsnotify Print details about the fsnotify subsystem
lock Display active mutex and semaphores and their waiters
lsmod List loaded modules, dependencies and their parameter value.
md Display info about "Multiple device" software RAID
mounts Print info about all mount points
rds Print info about RDS devices, sockets, connections, and stats
Skipped unless "rds" kernel module is loaded.
nfsshow Print summary information on NFS client side
Skipped unless "nfs" kernel module is loaded.
numastat Show various details about the memory management subsystem for all
partitioninfo Corelens Module for partition information
dmesg Display the kernel log
runq List process that are in RT and CFS queue.
slabinfo Print info about each slab cache
smp Display the state of the SMP IPI subsystem
Live kernel not supported.
sys Corelens Module for sysinfo
For any given Corelens module, you can learn more about the command line arguments it accepts, by using the -h option as shown below.
$ corelens -M ls -h
usage: ls [-h] [--count] [--depth DEPTH] directory
List or count child dentries given a file path
positional arguments:
directory directory to list
optional arguments:
-h, --help show this help message and exit
--count, -c only print counts, rather than every element
--depth DEPTH, -d DEPTH
Print the dentries of subdirectories recursively up to the specified depth.
Manually run module (only run with -M)
Inside the Microscope
Corelens is built on the foundation of drgn, a Python-based debugger library which is well-tuned for debugging the Linux kernel. In addition to the helpers that drgn itself provides, we’ve created several more which are specific to Oracle Linux. We’ve contributed much of our code to drgn itself.
One of the most important constraints for Corelens is that it must function without needing to install any debuginfo packages. Traditional debuginfo packages provide DWARF information, which is quite large. Most customers cannot install these packages onto their production systems. Instead, Corelens relies on Compact Type Format (CTF) information for Oracle’s UEK (Unbreakable Enterprise Kernel). We’re continuing to upstream the support for CTF in drgn, as outlined in this blog post, but in the meantime, we offer CTF support for drgn in Oracle Linux today, which Corelens takes advantage of.
In practical terms, without CTF support, customers would normally need to download RPMs that are around 800 MiB large, and require 4GiB of disk space to install. Then, they would need considerable expertise with a tool like Crash or drgn to extract similar information from a vmcore. Thanks to Corelens and CTF, we can produce these reports without any of that.
Reducing Dependence on the Vmcore
Already, Corelens is exciting because it can produce a small report in the place of the large vmcore file. However, we hope that we can take this further in the future.
Creating a vmcore is a costly process. In Oracle Linux, a section of memory is typically reserved for emergency situations. Into this memory, a “kdump kernel” is loaded and left ready in case of a crash. If any kernel panic happens, control is transferred to the kdump kernel. In this environment, the file /proc/vmcore is present, representing the memory image of the crashed kernel. The program makedumpfile is used to create a smaller version of the memory image, by filtering out irrelevent pages and compressing the remaining data.
There are two key issues in this process. First, memory must be reserved ahead of time. Memory which is used by the kdump kernel cannot be used during normal operation, and the kdump kernel cannot reclaim any memory from the crashed kernel (due to risks of corruption from pending DMA operations). So the kdump kernel is greatly limited in what it can do, because it is limited to a small amount of memory compared to the amount available to the system. Secondly, disk space must be made available for a core dump. While makedumpfile does reduce the typical size of vmcores substantially, they can still be large enough that the disk space is an issue.
With CTF, vmcore file analysis uses less memory than DWARF. This means that it could be used directly within the kdump environment to produce a report. While DWARF could be used to produce similar information, it would require a larger kdump memory allocation, and it would require that customers have DWARF debuginfo installed at all times. Since Corelens reports are orders of magnitude smaller than vmcores, they can also be created quicker than a full vmcore. At the same time, our diagnostic scripts would have access to information that makedumpfile would normally filter out, such as userspace memory pages. This would allow our Corelens reports to be smaller and faster to produce, reducing downtime for customers. Yet they may also be able to provide more useful information to our support engineers than ever before.
Of course, no Corelens report can contain the same volume of data as a vmcore would, so it’s still likely that creating and uploading a vmcore will be a necessary part of resolving some kernel issues. However, we hope Corelens can minimize this as much as possible, enabling faster and easier resolutions for customers with lower downtime. One way we intend to do this is by extending sosreport with the ability to collect Corelens reports.
Getting Started
If you’d like to get started with Corelens, you can do so now. Corelens is available for Oracle Linux 8 and 9 as part of our drgn-tools package. Ensure that you have the Add-ons repository enabled (ol8_addons or ol9_addons, respectively), and then install it with:
dnf install drgn-tools
You can see more detailed installation and usage instructions in our documentation for Oracle Linux 8 and Oracle Linux 9. In particular, the corelens command reference provides a high-level overview of how to use the CLI.