Introduction

Linux Kernel Core Extractor, lkce, is a program included in the oled-tools package that will automatically extract important information from a kernel vmcore when one is created during a kernel panic from the kexec environment. A vmcore is a crash dump file that may be generated when there is a kernel crash. It contains a snapshot of the system’s memory at the time of the kernel panic and can be used during post-mortem debugging to identify the root cause of the crash.

In addition to being able to run from the kexec environment, lkcecan also manually run commands against an existing vmcore file during normal system operation.

lkce has two modes it can run under: corelens (recommended) and crash (legacy). In this blog, we will only discuss corelens, as the crash functionality is likely to be deprecated in the future.

Some customers find it difficult to upload large vmcore files to Oracle Support. These files often have to be split into smaller files, uploaded, and then reassembled. This can lead to errors and a delay in root cause analysis. lkce can provide some necessary diagnostic information that lessens the need for uploading the vmcore file. It is important to note that there are still circumstances where we will need the vmcore file uploaded. Some bugs require a deeper dive, or the use of other tools like drgn.

Restrictions

DWARF debugging symbols are used when kernel debuginfo packages are installed. When using DWARF debugging symbols, at least 768MB of memory should be available in the kexec environment. You can ensure at least 768MB is available by setting crashkernel=1G on the kernel command line via the /etc/grub2-efi.cfg file.

Installation

oled-tools

For installation instructions please see Oracle Linux Enhanced Diagnostics.

Configuration

kdump

First things first, ensure that kdump is configured, tested, and working. Note that for lkce we recommend you set crashkernel=512M unless using DWARF debug symbols, in which case, the recommendation in the Restrictions section above should be followed. crashkernel=auto is not recommended.

lkce

Because it’s part of the OLED bundle, when we run lkce commands we run them like this:

# oled lkce <COMMAND>

If you want to view the man page for lkce, do the following:

# man oled-lkce

Configuration is very straightforward; you’ve done most of the hard stuff already by configuring and testing kdump. Run the following commands to configure and enable lkce:

# oled lkce configure --default
# oled lkce enable_kexec

Here’s a sample successful run:

# oled lkce configure --default
LKCE is not currently enabled in kexec mode
LKCE has been configured with default values.

# oled lkce enable_kexec
Restarting kdump service...
done!
enabled LKCE in kexec mode

Let’s check the config:

# oled lkce configure --show
        report_cmd : corelens
corelens_args_file : /etc/oled/lkce/corelens_args_file
            vmcore : yes
      vmlinux path : /usr/lib/debug/lib/modules/5.15.0-207.156.6.el9uek.x86_64/vmlinux
   crash_cmds_file : /etc/oled/lkce/crash_cmds_file
       lkce_outdir : /var/oled/lkce
     lkce_in_kexec : True
     max_out_files : 50
  • report_cmd – This is the program used to process the vmcore and create a report. By default, the corelens command is used.
  • corelens_args_file – This file contains command line arguments that will be passed to the corelens command. The default value is -a.
  • vmlinux path – This is the location of the vmlinux file that was installed with the kernel debuginfo package. This file contains necessary information for the crash command to use when accessing the vmcore file.
  • crash_cmds file – This file contains all the commands that crash will run when the vmcore is generated. This file can be updated with additional commands. Here are the default commands:
bt
bt -a
bt -FF
dev
kmem -s
foreach bt
log
mod
mount
net
ps -m
ps -S
runq
quit

if additional crash commands are added, they must be added before the quit command. – vmcore – Indicates whether the vmcore file should be collected. This is an important configuration option. If you set this to no, then the vmcore file will not be created. That will save some time getting your system back up, but may hinder root cause analysis. It’s always best to get a vmcore file if you can. The default value for this parameter is yes. – lkce_outdir – The directory where lkce will store its reports. – lkce_in_kexec – Indicates whether lkce is configured in the kexec kernel. – max_out_files – The maximum number of reports that will be created. Once this limit is reached lkce will begin rotating reports by removing the oldest.

Testing

As you have already done when you setup and tested kdump, you can use /proc/sysrq-trigger to trigger a crash that will cause a vmcore file to be dumped.

# sync;sync;sync; echo c > /proc/sysrq-trigger

Here’s what you can expect to see on the console:

         Starting Kdump Vmcore Save Service...
[    4.358680] kdump.sh[578]: LKCE_KDUMP_SCRIPTS=/etc/oled/lkce/lkce_kdump.d/*
[    4.361356] kdump.sh[578]: Executing /etc/oled/lkce/lkce_kdump.d/kdump_report
[   21.217294] kdump.sh[579]: kdump_report: kdump_report is enabled to run
[   21.219329] kdump.sh[579]: kdump_report: Executing '/usr/bin/corelens /proc/vmcore -a'; output file '/var/oled/lkce/corelens_20240917-195513.out'
[   21.397761] kdump[594]: saving to /kdumproot/var/oled/crash/127.0.0.1-2024-09-17-19:55:12/
[   21.404363] kdump[599]: saving vmcore-dmesg.txt to /kdumproot/var/oled/crash/127.0.0.1-2024-09-17-19:55:12/
[   21.423019] kdump[605]: saving vmcore-dmesg.txt complete
[   21.425937] kdump[607]: saving vmcore
Copying data                                      : [100.0 %] |           eta: 0s
[   22.209594] kdump.sh[608]: The dumpfile is saved to /kdumproot/var/oled/crash/127.0.0.1-2024-09-17-19:55:12//vmcore-incomplete.
[   22.211275] kdump.sh[608]: makedumpfile Completed.
[   22.835359] kdump[612]: saving vmcore complete
[   22.838381] kdump[614]: saving the /run/initramfs/kexec-dmesg.log to /kdumproot/var/oled/crash/127.0.0.1-2024-09-17-19:55:12//
[   22.866103] kdump[620]: Executing final action systemctl reboot -f
[   22.874385] systemd[1]: Shutting down.

Note that lkce took only about 20 seconds to run. Larger/busier systems may run a little longer, but in general it should be quick, often quicker than dumping the vmcore itself.

You can use the oled lkce list command to see what files have been created. The files have a timestamp that denotes the time of their creation as part of their name.

# oled lkce list
The following are the reports found in /var/oled/lkce:
/var/oled/lkce/corelens_20240909-191552.out
/var/oled/lkce/corelens_20240909-191805.out
/var/oled/lkce/corelens_20240910-195513.out
/var/oled/lkce/corelens_20240910-200003.out
#

On a small OCI instance these files are a few hundred KB, compared to the vmcore file that is 126MB. These differences are much more dramatic on large or busy systems. If you have a crash, and lkce is enabled, just tar up the /var/oled/lkce directory and upload it to the SR.

Troubleshooting

If your system has an existing kdump_pre configuration parameter in /etc/kdump.conf, manual intervention is necessary in order to run both your original script and lkce. If you attempt to enable lkce in kexec mode, you will see the following output:

# oled lkce enable_kexec
The LKCE 'kdump_pre' entry is not set in /etc/kdump.conf.
The current 'kdump_pre' entry in kdump.conf is:
kdump_pre /etc/your_script.sh
Manual intervention is necessary.
Please see the oled-lkce manual page for more information.
error: Unable to enable LKCE in kexec mode

On OL8 and OL9, if it is not problematic for the lkce kdump script to run before the existing one, you may: * Comment out the line beginning with kdump_pre in /etc/kdump.conf * Create a symlink to your original script in /etc/kdump/pre.d/:

# ln -s <path_to_your_script> /etc/kdump/pre.d/20-kdump_pre.sh
  • Then, run:
# oled lkce enable_kexec

After completing the steps above, both scripts will run.

For OL7 and above, if your kdump_pre script is on the root filesystem or on another filesystem that will get mounted by either kdump or the lkce script, you may do the following: * Edit /etc/oled/lkce/kdump_pre_sh_body * At around line 139, before the line that starts umount "$LKCE_DIR/dev", call your script, for example, by adding the line:

/lkce_kdump/path/to/your/script
  • Force the lkce kdump script to be regenerated:
# oled lkce disable_kexec
Restarting kdump service...
done!
disabled LKCE in kexec mode
# oled lkce enable_kexec
Restarting kdump service...
done!
enabled LKCE in kexec mode

Conclusion

We’ve taken a look at lkce and demonstrated how it can save time by providing smaller, easily uploaded, files that summarize a crash dump. This doesn’t mean that we might not need more info from the vmcore file, but the output from lkce is a good starting point.