Many system administrators are hesitant to set up core dumping on their Linux systems because of the large amount of space it can consume or the amount of time it takes to write out large memory segments to disk. With the evolution of enormous enterprise systems, sometimes the vmcore size can scale to several GB. However, sometimes a core dump of kernel memory is the only way to discover and fix a bug on a Linux system. Furthermore, plenty of options are available that make a kernel memory dump much smaller than you might think!
In this article, we provide tools to estimate the size of a core dump with different vmcore size-reducing features enabled, and present a tool for estimating the size of those core dumps. By using these tools, the resulting vmcore is streamlined to only the data required to debug a kernel failure, with most user space and cache content removed. We hope this makes you more likely to enable vmcore dumping on a system!
About kdump
kdump
is a service which provides a crash dumping mechanism. The service is used to save the contents of the system memory for analysis. kdump
uses kexec
functionality to boot into a second kernel whenever the system experiences a crash. This second kernel is often referred to as the “capture kernel,” while the original kernel is termed as the “crash kernel”. When a crash is triggered, kernel exports a memory image known as vmcore
, which can be analyzed for debugging purposes and to find the cause of the crash. The dumped image of the main memory is exported in the “Executable and Linkable Format (ELF)” object format. This image can be accessed through /proc/vmcore
during the handling of a kernel crash, or it can be automatically saved to a locally accessible file system, a raw device, or a remote system over the network. Typically the makedumpfile
program specified as the “core_collector” in /etc/kdump
, accesses this formatted memory and dumps in a custom format that can exclude and compress pages.
kdump
can be triggered using the panic function to debug the kernel when the kernel encounters some serious errors. The panic can also be manually triggered from userspace by running echo c > /proc/sysrq-trigger
, which is useful to understand kernel’s state when applications have issues with kernel services. The vmcore files generated by the kdump
process can be analyzed by tools such as crash
, drgn
and gdb
, to extract valuable debugging data. The kdump infrastructure, which generates vmcore files upon fatal kernel errors, plays a pivotal role in the analysis of kernel-related issues and is used extensively by the kernel community. This article shows how to estimate the space needed for a crash dump and provides a script to aid in the estimation process, primarily on Oracle Linux for x86_64.
Estimating the size of a vmcore file
Tools for understanding the factors mentioned previously include:
- The
/proc/kcore
pseudo file system serves as an interface for accessing kernel memory, presented in the form of an ELF core file that can be easily navigated. makedumpfile
generates a dump file by compressing dump data or excluding unnecessary pages for analysis, or both. For this,makedumpfile
requires access to the debug information of the “crash kernel.” This information is essential for the tool to differentiate unnecessary pages based on its analysis of how the “crash kernel” uses memory.
Checking the memory usage of the current kernel:
#/usr/sbin/makedumpfile --mem-usage /proc/kcore TYPE PAGES EXCLUDABLE DESCRIPTION ---------------------------------------------------------------------- ZERO 27,641 yes Pages filled with zero NON_PRI_CACHE 3,336,797 yes Cache pages without private flag PRI_CACHE 32 yes Cache pages with private flag USER 78,583 yes User process pages FREE 39,704 yes Free pages KERN_DATA 132,127 no Dumpable kernel data page size: 4096 Total pages on system: 3614884 Total size on system: 14,806,564,864 Byte -------------------------------------------------------------------------------------------------------
Most of the data can be easily correlated with the table provided in the output listed previously. Let’s now discuss the details of the information that hasn’t been explained yet.
- page size : The page size exported by the OS.
- Maximum PFN : Maximum page frame number, that is, Total RAM converted to the number of pages
- Total pages on system : Maximum pages usable by the OS , that is, Maximum page frames in the system – Number of pages in memory hole
- Total size on system : Total RAM size in bytes, that is, Total pages on system * Page size
- KERN_DATA : The minimum number of pages that can’t be avoided during a kdump or vmcore collection, that is, Total pages on system – (zero pages + pvt and non-pvt cache pages + user pages + free pages)
By understanding the various elements and objects that can be included in the vmcore, we can be selective about which of these elements or objects we include in the vmcore. In Oracle Linux this is achieved by setting the dump_level when using the makedumpfile
command, for example: makedumpfile -d 31
. The dump_level can also be set in the /etc/kdump.conf
configuration file. Setting another dump level is unusual, but certain issues demand that the vmcore include other type of pages. Lets experiment with some of these options.
From the makedumpfile(8)
manual page:
2 : Exclude the non-private cache pages.
4 : Exclude all cache pages.
8 : Exclude the user process data pages.
16 : Exclude the free pages.
makedumpfile
command.
zero pages
non private cache
private cache
user data
free pages
Considering the factors discussed, the mentioned objects can help in estimating the size of the dump.
Lets experiment:
————————————-
ZERO 27641
NON_PRI_CACHE 3336797
PRI_CACHE 32
USER 78,583
FREE 39,704
KERN_DATA 132,127
Page Size: 4096
Total pages on system: 3614884
Total size on system: 14806564864 Bytes
makedumpfile -l -d 0
Where did we go wrong with the experiment ?
Please note the “-l” in the command to collect dump data: makedumpfile -l -d 0
. Referring back to the makedumpfile(8)
manual page:
makedumpfile with -c,-l,-p,-z options Compress dump data by the page using the following compression library respectively: -c : zlib, -l : lzo, -p : snappy, -z : zstd
On a heuristic basis, the compression techniques typically give 8% to 12% compression, based on the amount of memory to be dumped, dump level, platform, OS Version, and compression method. In the best case scenarios the compression might go up to a average of 18%. The exploration of various compression techniques and their specific outcomes falls beyond the scope of this discussion. However, this topic might be explored in another blog post in the future.
Let’s continue the experiments without compression:
NON_PRI_CACHE 3336797
PRI_CACHE 32
USER 78,583
FREE 39,704
KERN_DATA 132,127
Page Size 4096
Total pages on system: 3614884
Total size on system: 14806564864
makedumpfile -d 0
(Includes all possible pages, results in a vmcore with maximum size.)
NON_PRI_CACHE 192412
PRI_CACHE 38
USER 76249
FREE 3188441
KERN_DATA 132409
Page Size 4096
Total pages on system: 3614884
Total size on system: 14806564864
makedumpfile -d 31
(Rejects all possible pages, resulting in a vmcore with minimum size.)
NON_PRI_CACHE 322399
PRI_CACHE 38
USER 73507
FREE 3,059,945
KERN_DATA 134000
Page Size 4096
Total pages on system: 3614884
Total size on system: 14806564864
makedumpfile -d 2
(Includes all possible pages with out NON_PRI_CACHE.)
3059945 + 134000 + 73507 + 24995 + 38
: 3,292,485 pages
: 3,292,485 * 4096 Bytes
: 13,486,018,560 Bytes
NON_PRI_CACHE 1456213
PRI_CACHE 38
USER 75267
FREE 1921546
KERN_DATA 136016
Page Size 4096
Total pages on system: 3614884
Total size on system: 14806564864
makedumpfile -d 16
(Includes all possible pages with out Free Pages.)
: 25804 + 1456213 + 75267 + 136016 + 38
: 1,693,338 pages
: 1,693,338 * 4096 Bytes
: 6,935,912,448 Bytes
Consideration Notes:
- This document focuses on vmcores generated through the kdump infrastructure in Linux. Note that this scheme can’t be used to estimate the size of vmcores obtained through alternative methods, such as:
- Xen: xm dumpcore
- QEMU: dump, dump-guest-memory
- Libvirt: virsh dump
- Vmcores obtained from hypervisors or other methodologies and processes.
- The method and experiments outlined in this article can serve as a guide for administrators to arrive at an approximate value for the vmcore size, but not an exact one.
- The dump not only represents an exact replica of the RAM but also contains other metadata.
- The size also depends on factors such as the file system, disk layout and so on.
- We recommend that administrators are cautious while exploring other options used in the kdump process.
- Memory is a highly dynamic subsystem of the kernel, and statistics can change rapidly. The values used to estimate the dump size might differ from the actual result of the kdump process. We recommend that you:
- Estimate the size while a production-like workload is running, ideally at peak usage.
- Avoid estimating the size immediately after boot, as caches might still be cold. Allow some time, possibly hours or even a few days, for the file system and other kernel caches to warm up
- Understand that, undersizing the vmcore storage area can have significant consequences: an incomplete dump is often irrecoverable, making it challenging to debug the issue. Resizing the storage area, especially during an incident, may be difficult or even impossible. Hence, maintaining a generous margin is of great importance.
- A recommended approach is to periodically estimate the dump size on a system and use the maximum size to cater for all types of workloads.
- Heuristic data on the estimated dump size, as previously suggested, can be applied to resize the dump on similar architectures and with similar workloads.
- Some memory is used in the kdump process itself, which might have a marginal impact on the size of the vmcore.
- Considering the factors mentioned above, we recommend maintaining a 200 Mb buffer beyond the estimated vmcore size.
Automating the Scheme:
Oracle Oled tools vmcore_sz automates the above scheme. vmcore_sz
takes the dump_level as an argument and estimates the vmcore size if a kernel dump is obtained at that moment. It displays the total number of pages; pages need to be excluded depending on the dump level and the expected VMcore size in bytes. If the dump level is not specified, the default configured in /etc/kdump.conf
will be used.
Experiments with oled vmcore_sz
:
Command: oled vmcore_sz -d 2
TYPE PAGES EXCLUDABLE DESCRIPTION ---------------------------------------------------------------------- ZERO 26312 yes Pages filled with zero NON_PRI_CACHE 3270478 yes Cache pages without private flag PRI_CACHE 38 yes Cache pages with private flag USER 78994 yes User process pages FREE 103083 yes Free pages KERN_DATA 135979 no Dumpable kernel data page size: 4096 Total pages on system: 3614884 Total size on system: 14806564864 Byte ---------------------------------------------------------------------------------------------- Exclude non private caches : 3270478 Total Pages : 3614884 Pages to be dumped: 344406 ---------------------------------------------------------------------------------------------- Expected vmcore size in bytes : 1410686976
Command: oled vmcore_sz
TYPE PAGES EXCLUDABLE DESCRIPTION ---------------------------------------------------------------------- ZERO 26299 yes Pages filled with zero NON_PRI_CACHE 3270464 yes Cache pages without private flag PRI_CACHE 38 yes Cache pages with private flag USER 79045 yes User process pages FREE 103142 yes Free pages KERN_DATA 135896 no Dumpable kernel data page size: 4096 Total pages on system: 3614884 Total size on system: 14806564864 Byte ---------------------------------------------------------------------------------------------- Dump level not specified. Using default/configured i.e. 16 Exclude free pages : 103142 Total Pages : 3614884 Pages to be dumped: 3511742 ---------------------------------------------------------------------------------------------- Expected vmcoe size in bytes : 14384095232
Comparing the script with a trial vmcore dump:
Note: Here we’re experimenting in “Crash Kernel” with /proc/kcore
. Typically the makedumpfile
runs in “Capture Kernel” on /proc/vmcore
.
Excluded pages : 0x00000000003cc797
Pages filled with zero : 0x000000000000aeb7
Non-private cache pages: 0x00000000000855b5
Private cache pages : 0x000000000000000b
User process data pages : 0x0000000000021c0d
Free pages : 0x000000000031a713
Hwpoison pages : 0x0000000000000000
Offline pages: 0x0000000000000000
Remaining pages : 0x0000000000036d0d
(The number of pages is reduced to 5%.)
Memory Hole : 0x000000000003cb5c
————————————————–
Total pages : 0x0000000000440000
Write bytes : 865872724
Total pages on system: 4136100
Total size on system: 16941465600 Byte
————————————————————-
Exclude zero pages : 28943
Exclude non private caches : 546261
Exclude privae cache : 11
Exclude user pages : 136745
Exclude free pages : 3255570
Total Pages : 4136100
Pages to be dumped: 168570
—————————————————————
Expected vmcore size in bytes : 690462720
The outputs indicate:
- The pages dumped by the
makedumpfile
command closely resemble the values indicated by the oledvmcore_sz
script. - As the kernel page’s dynamics change quickly, we should consider the evaluation of vmcore size as an estimate.
- The actual size of the vmcore produced by the
makedumpfile
command might have slight variations during the mentioned experiment. This experiment involves the runningmakedumpfile
within the “crash kernel,” which includes pages used by the file system, cache, and themakedumpfile
process. - When we generate the vmcore by using the kdump infrastructure (kexec) and run
makedumpfile
within the “capture kernel,” the aforementioned pages (pages accounted to run themakedumpfile
program) are attributed to the “capture kernel” rather than the “crash kernel” as shown in the preceding experiment.
Conclusion
Based on the discussed scheme, system admins can estimate the size of a vmcore and reserve sufficient amount of disk space to safely dump it.