Introduction
Linux Kernel Core Extractor, lkce, is a program included in the oled-tools package that will automatically extract important information from a kernel vmcore when one is created during a kernel panic from the kexec environment. A vmcore is a crash dump file that may be generated when there is a kernel crash. It contains a snapshot of the system’s memory at the time of the kernel panic and can be used during post-mortem debugging to identify the root cause of the crash.
In addition to being able to run from the kexec environment, lkcecan also manually run commands against an existing vmcore file during normal system operation.
lkce has two modes it can run under: corelens (recommended) and crash (legacy). In this blog, we will only discuss corelens, as the crash functionality is likely to be deprecated in the future.
Some customers find it difficult to upload large vmcore files to Oracle Support. These files often have to be split into smaller files, uploaded, and then reassembled. This can lead to errors and a delay in root cause analysis. lkce can provide some necessary diagnostic information that lessens the need for uploading the vmcore file. It is important to note that there are still circumstances where we will need the vmcore file uploaded. Some bugs require a deeper dive, or the use of other tools like drgn.
Restrictions
DWARF debugging symbols are used when kernel debuginfo packages are installed. When using DWARF debugging symbols, at least 768MB of memory should be available in the kexec environment. You can ensure at least 768MB is available by setting crashkernel=1G on the kernel command line via the /etc/grub2-efi.cfg file.
Installation
oled-tools
For installation instructions please see Oracle Linux Enhanced Diagnostics.
Configuration
kdump
First things first, ensure that kdump is configured, tested, and working. Note that for lkce we recommend you set crashkernel=512M unless using DWARF debug symbols, in which case, the recommendation in the Restrictions section above should be followed. crashkernel=auto is not recommended.
lkce
Because it’s part of the OLED bundle, when we run lkce commands we run them like this:
# oled lkce <COMMAND>
If you want to view the man page for lkce, do the following:
# man oled-lkce
Configuration is very straightforward; you’ve done most of the hard stuff already by configuring and testing kdump. Run the following commands to configure and enable lkce:
# oled lkce configure --default # oled lkce enable_kexec
Here’s a sample successful run:
# oled lkce configure --default LKCE is not currently enabled in kexec mode LKCE has been configured with default values. # oled lkce enable_kexec Restarting kdump service... done! enabled LKCE in kexec mode
Let’s check the config:
# oled lkce configure --show
report_cmd : corelens
corelens_args_file : /etc/oled/lkce/corelens_args_file
vmcore : yes
vmlinux path : /usr/lib/debug/lib/modules/5.15.0-207.156.6.el9uek.x86_64/vmlinux
crash_cmds_file : /etc/oled/lkce/crash_cmds_file
lkce_outdir : /var/oled/lkce
lkce_in_kexec : True
max_out_files : 50
- report_cmd – This is the program used to process the
vmcoreand create a report. By default, thecorelenscommand is used. - corelens_args_file – This file contains command line arguments that will be passed to the
corelenscommand. The default value is-a. - vmlinux path – This is the location of the
vmlinuxfile that was installed with the kernel debuginfo package. This file contains necessary information for thecrashcommand to use when accessing thevmcorefile. - crash_cmds file – This file contains all the commands that
crashwill run when thevmcoreis generated. This file can be updated with additional commands. Here are the default commands:
bt bt -a bt -FF dev kmem -s foreach bt log mod mount net ps -m ps -S runq quit
if additional crash commands are added, they must be added before the quit command. – vmcore – Indicates whether the vmcore file should be collected. This is an important configuration option. If you set this to no, then the vmcore file will not be created. That will save some time getting your system back up, but may hinder root cause analysis. It’s always best to get a vmcore file if you can. The default value for this parameter is yes. – lkce_outdir – The directory where lkce will store its reports. – lkce_in_kexec – Indicates whether lkce is configured in the kexec kernel. – max_out_files – The maximum number of reports that will be created. Once this limit is reached lkce will begin rotating reports by removing the oldest.
Testing
As you have already done when you setup and tested kdump, you can use /proc/sysrq-trigger to trigger a crash that will cause a vmcore file to be dumped.
# sync;sync;sync; echo c > /proc/sysrq-trigger
Here’s what you can expect to see on the console:
Starting Kdump Vmcore Save Service... [ 4.358680] kdump.sh[578]: LKCE_KDUMP_SCRIPTS=/etc/oled/lkce/lkce_kdump.d/* [ 4.361356] kdump.sh[578]: Executing /etc/oled/lkce/lkce_kdump.d/kdump_report [ 21.217294] kdump.sh[579]: kdump_report: kdump_report is enabled to run [ 21.219329] kdump.sh[579]: kdump_report: Executing '/usr/bin/corelens /proc/vmcore -a'; output file '/var/oled/lkce/corelens_20240917-195513.out' [ 21.397761] kdump[594]: saving to /kdumproot/var/oled/crash/127.0.0.1-2024-09-17-19:55:12/ [ 21.404363] kdump[599]: saving vmcore-dmesg.txt to /kdumproot/var/oled/crash/127.0.0.1-2024-09-17-19:55:12/ [ 21.423019] kdump[605]: saving vmcore-dmesg.txt complete [ 21.425937] kdump[607]: saving vmcore Copying data : [100.0 %] | eta: 0s [ 22.209594] kdump.sh[608]: The dumpfile is saved to /kdumproot/var/oled/crash/127.0.0.1-2024-09-17-19:55:12//vmcore-incomplete. [ 22.211275] kdump.sh[608]: makedumpfile Completed. [ 22.835359] kdump[612]: saving vmcore complete [ 22.838381] kdump[614]: saving the /run/initramfs/kexec-dmesg.log to /kdumproot/var/oled/crash/127.0.0.1-2024-09-17-19:55:12// [ 22.866103] kdump[620]: Executing final action systemctl reboot -f [ 22.874385] systemd[1]: Shutting down.
Note that lkce took only about 20 seconds to run. Larger/busier systems may run a little longer, but in general it should be quick, often quicker than dumping the vmcore itself.
You can use the oled lkce list command to see what files have been created. The files have a timestamp that denotes the time of their creation as part of their name.
# oled lkce list The following are the reports found in /var/oled/lkce: /var/oled/lkce/corelens_20240909-191552.out /var/oled/lkce/corelens_20240909-191805.out /var/oled/lkce/corelens_20240910-195513.out /var/oled/lkce/corelens_20240910-200003.out #
On a small OCI instance these files are a few hundred KB, compared to the vmcore file that is 126MB. These differences are much more dramatic on large or busy systems. If you have a crash, and lkce is enabled, just tar up the /var/oled/lkce directory and upload it to the SR.
Troubleshooting
If your system has an existing kdump_pre configuration parameter in /etc/kdump.conf, manual intervention is necessary in order to run both your original script and lkce. If you attempt to enable lkce in kexec mode, you will see the following output:
# oled lkce enable_kexec The LKCE 'kdump_pre' entry is not set in /etc/kdump.conf. The current 'kdump_pre' entry in kdump.conf is: kdump_pre /etc/your_script.sh Manual intervention is necessary. Please see the oled-lkce manual page for more information. error: Unable to enable LKCE in kexec mode
On OL8 and OL9, if it is not problematic for the lkce kdump script to run before the existing one, you may: * Comment out the line beginning with kdump_pre in /etc/kdump.conf * Create a symlink to your original script in /etc/kdump/pre.d/:
# ln -s <path_to_your_script> /etc/kdump/pre.d/20-kdump_pre.sh
- Then, run:
# oled lkce enable_kexec
After completing the steps above, both scripts will run.
For OL7 and above, if your kdump_pre script is on the root filesystem or on another filesystem that will get mounted by either kdump or the lkce script, you may do the following: * Edit /etc/oled/lkce/kdump_pre_sh_body * At around line 139, before the line that starts umount "$LKCE_DIR/dev", call your script, for example, by adding the line:
/lkce_kdump/path/to/your/script
- Force the
lkcekdump script to be regenerated:
# oled lkce disable_kexec Restarting kdump service... done! disabled LKCE in kexec mode # oled lkce enable_kexec Restarting kdump service... done! enabled LKCE in kexec mode
Conclusion
We’ve taken a look at lkce and demonstrated how it can save time by providing smaller, easily uploaded, files that summarize a crash dump. This doesn’t mean that we might not need more info from the vmcore file, but the output from lkce is a good starting point.