A vmcore for your system may be smaller than you think!

January 3, 2024 | 15 minute read
Text Size 100%:

Many system administrators are hesitant to set up core dumping on their Linux systems because of the large amount of space it can consume or the amount of time it takes to write out large memory segments to disk. With the evolution of enormous enterprise systems, sometimes the vmcore size can scale to several GB. However, sometimes a core dump of kernel memory is the only way to discover and fix a bug on a Linux system. Furthermore, plenty of options are available that make a kernel memory dump much smaller than you might think!

In this article, we provide tools to estimate the size of a core dump with different vmcore size-reducing features enabled, and present a tool for estimating the size of those core dumps. By using these tools, the resulting vmcore is streamlined to only the data required to debug a kernel failure, with most user space and cache content removed. We hope this makes you more likely to enable vmcore dumping on a system!

About kdump

kdump is a service which provides a crash dumping mechanism. The service is used to save the contents of the system memory for analysis. kdump uses kexec functionality to boot into a second kernel whenever the system experiences a crash. This second kernel is often referred to as the “capture kernel,” while the original kernel is termed as the “crash kernel”. When a crash is triggered, kernel exports a memory image known as vmcore, which can be analyzed for debugging purposes and to find the cause of the crash. The dumped image of the main memory is exported in the “Executable and Linkable Format (ELF)” object format. This image can be accessed through /proc/vmcore during the handling of a kernel crash, or it can be automatically saved to a locally accessible file system, a raw device, or a remote system over the network. Typically the makedumpfile program specified as the “core_collector” in /etc/kdump, accesses this formatted memory and dumps in a custom format that can exclude and compress pages.

kdump can be triggered using the panic function to debug the kernel when the kernel encounters some serious errors. The panic can also be manually triggered from userspace by running echo c > /proc/sysrq-trigger, which is useful to understand kernel’s state when applications have issues with kernel services. The vmcore files generated by the kdump process can be analyzed by tools such as crash, drgn and gdb, to extract valuable debugging data. The kdump infrastructure, which generates vmcore files upon fatal kernel errors, plays a pivotal role in the analysis of kernel-related issues and is used extensively by the kernel community. This article shows how to estimate the space needed for a crash dump and provides a script to aid in the estimation process, primarily on Oracle Linux for x86_64.

Estimating the size of a vmcore file

 

Object
Description
Comments
Free Pages
These pages are not used or used after free.
As the page is free, it shouldn’t have an active reference by an object. These pages can be skipped.
Memory Hole
A unusable part of memory
A reserved or unusable segment of memory and can’t be used by the OS. This segment of memory must be skipped by vmcore.
Private cache and non private cache
The page cache contains entire pages from recently accessed files during a page I/O operation. If the data is in the page cache, the kernel can return the cached page rather than operate on the data off the disk. A file mapping can be private or shared and refers to updates made to the contents in memory. In a private mapping the updates are not committed to disk or made visible to other processes. In a shared mapping updates are visible to other processes and end up in the disk.
These pages can be skipped by vmcore.
Total RAM
The total memory of the system assigned to the OS
The maximum size of hypothetical vmcore should be a snapshot of complete RAM.
User Pages
Pages used by applications in userspace
If the number of applications is high this can contribute significantly to the size of the vmcore. This is often skipped by admins until explicitly needed to debug application data from vmcore.
Zero Pages
Pages containing no data.
These pages can be skipped and mostly don’t play a major role in debugging.

 

Tools for understanding the factors mentioned previously include:

  • The /proc/kcore pseudo file system serves as an interface for accessing kernel memory, presented in the form of an ELF core file that can be easily navigated.
  • makedumpfile generates a dump file by compressing dump data or excluding unnecessary pages for analysis, or both. For this, makedumpfile requires access to the debug information of the “crash kernel.” This information is essential for the tool to differentiate unnecessary pages based on its analysis of how the “crash kernel” uses memory.

Checking the memory usage of the current kernel:

#/usr/sbin/makedumpfile --mem-usage /proc/kcore

TYPE             PAGES       EXCLUDABLE   DESCRIPTION
----------------------------------------------------------------------
ZERO             27,641      yes          Pages filled with zero
NON_PRI_CACHE    3,336,797   yes          Cache pages without private flag
PRI_CACHE        32          yes          Cache pages with private flag
USER             78,583      yes          User process pages
FREE             39,704      yes          Free pages
KERN_DATA        132,127     no           Dumpable kernel data

page size: 4096
Total pages on system: 3614884
Total size on system: 14,806,564,864 Byte

-------------------------------------------------------------------------------------------------------

Most of the data can be easily correlated with the table provided in the output listed previously. Let’s now discuss the details of the information that hasn’t been explained yet.

  • page size : The page size exported by the OS.
  • Maximum PFN : Maximum page frame number, that is, Total RAM converted to the number of pages
  • Total pages on system : Maximum pages usable by the OS , that is, Maximum page frames in the system - Number of pages in memory hole
  • Total size on system : Total RAM size in bytes, that is, Total pages on system * Page size
  • KERN_DATA : The minimum number of pages that can’t be avoided during a kdump or vmcore collection, that is, Total pages on system - (zero pages + pvt and non-pvt cache pages + user pages + free pages)

By understanding the various elements and objects that can be included in the vmcore, we can be selective about which of these elements or objects we include in the vmcore. In Oracle Linux this is achieved by setting the dump_level when using the makedumpfile command, for example: makedumpfile -d 31. The dump_level can also be set in the /etc/kdump.conf configuration file. Setting another dump level is unusual, but certain issues demand that the vmcore include other type of pages. Lets experiment with some of these options.

From the makedumpfile(8) manual page:

Dump Level
Description
-d dump_level
Specify the type of unnecessary page for analysis. Pages of the specified type are not copied to DUMPFILE. The page type marked in the following table is excluded. A user can specify multiple page types by setting the sum of each page type for dump_level. The maximum of dump_level is 31
Base level : dump_level consists of five bits, so there are five base levels to specify the type of unnecessary page.
1 : Exclude the pages filled with zero.
2 : Exclude the non-private cache pages.
4 : Exclude all cache pages.
8 : Exclude the user process data pages.
16 : Exclude the free pages.
Here is the list of the bits and a combination of bitwise-or of these flags can be used to select a dump level. The complete bitmap for dump level can be referred in the manual of makedumpfile command.
dump level
zero pages
non private cache
private cache
user data
free pages

 

Considering the factors discussed, the mentioned objects can help in estimating the size of the dump.

Lets experiment:

Memory Usage
Dump Command
Estimate vmcore size
Actual vmcore size
Conclusion
TYPE PAGES
————————————-
ZERO                       27641
NON_PRI_CACHE   3336797
PRI_CACHE             32
USER                       78,583
FREE                       39,704
KERN_DATA            132,127

Page Size:               4096
Total pages on system: 3614884
Total size on system: 14806564864 Bytes
makedumpfile -l -d 0
“-d 0” indicates we included all the possible objects so the size should be 14806564864 bytes or around 14.8 G.
12393866016
The actual size is 2.5 G less than the estimated size of the vmcore. Or around 590 K pages less than the estimated

 

Where did we go wrong with the experiment ?

Please note the “-l” in the command to collect dump data: makedumpfile -l -d 0. Referring back to the makedumpfile(8) manual page:

makedumpfile with -c,-l,-p,-z options Compress dump data by the page using the following compression library respectively:
-c : zlib, -l : lzo, -p : snappy, -z : zstd

On a heuristic basis, the compression techniques typically give 8% to 12% compression, based on the amount of memory to be dumped, dump level, platform, OS Version, and compression method. In the best case scenarios the compression might go up to a average of 18%. The exploration of various compression techniques and their specific outcomes falls beyond the scope of this discussion. However, this topic might be explored in another blog post in the future.

Let’s continue the experiments without compression:

Memory Usage
Dump Command
Estimated VMCore Size
Actual VMCore Size
Conclusion
ZERO                       27641
NON_PRI_CACHE   3336797
PRI_CACHE             32
USER                       78,583
FREE                       39,704
KERN_DATA            132,127

Page Size                4096
Total pages on system: 3614884
Total size on system: 14806564864
makedumpfile -d 0
(Includes all possible pages, results in a vmcore with maximum size.)
As all possible pages are included, so the vmcore size should be 14806564864 bytes or around 14.8 G.
14,845,745,984 Bytes
This matches our estimated value
ZERO                       25335
NON_PRI_CACHE   192412
PRI_CACHE             38
USER                       76249
FREE                       3188441
KERN_DATA            132409

Page Size                4096
Total pages on system: 3614884
Total size on system: 14806564864
makedumpfile -d 31
(Rejects all possible pages, resulting in a vmcore with minimum size.)
As all possible pages are excluded, vmcore the size should only contain the KERN_DATA objects and the size should be : 132409 pages or 132409 * 4096 = 542,347,264 bytes or around .5 GB
500,535,832 Bytes
This matches our estimated value
ZERO                       24995
NON_PRI_CACHE   322399
PRI_CACHE             38
USER                       73507
FREE                       3,059,945
KERN_DATA            134000

Page Size                4096
Total pages on system: 3614884
Total size on system: 14806564864
makedumpfile -d 2
(Includes all possible pages with out NON_PRI_CACHE.)
total pages included in the vmcore is:
3059945 + 134000 + 73507 + 24995 + 38
: 3,292,485 pages
: 3,292,485 * 4096 Bytes
: 13,486,018,560 Bytes
13518405584 Bytes
This matches our estimated value
ZERO                       25804
NON_PRI_CACHE   1456213
PRI_CACHE             38
USER                       75267
FREE                       1921546
KERN_DATA            136016

Page Size                4096
Total pages on system: 3614884
Total size on system: 14806564864
makedumpfile -d 16
(Includes all possible pages with out Free Pages.)
So the total pages included in the vmcore is:
: 25804 + 1456213 + 75267 + 136016 + 38
: 1,693,338 pages
: 1,693,338 * 4096 Bytes
: 6,935,912,448 Bytes
691987126 Bytes
This matches our estimated value

 

Consideration Notes:

  1. This document focuses on vmcores generated through the kdump infrastructure in Linux. Note that this scheme can’t be used to estimate the size of vmcores obtained through alternative methods, such as:
    • Xen: xm dumpcore
    • QEMU: dump, dump-guest-memory
    • Libvirt: virsh dump
    • Vmcores obtained from hypervisors or other methodologies and processes.
  2. The method and experiments outlined in this article can serve as a guide for administrators to arrive at an approximate value for the vmcore size, but not an exact one.
  3. The dump not only represents an exact replica of the RAM but also contains other metadata.
  4. The size also depends on factors such as the file system, disk layout and so on.
  5. We recommend that administrators are cautious while exploring other options used in the kdump process.
  6. Memory is a highly dynamic subsystem of the kernel, and statistics can change rapidly. The values used to estimate the dump size might differ from the actual result of the kdump process. We recommend that you:
    • Estimate the size while a production-like workload is running, ideally at peak usage.
    • Avoid estimating the size immediately after boot, as caches might still be cold. Allow some time, possibly hours or even a few days, for the file system and other kernel caches to warm up
    • Understand that, undersizing the vmcore storage area can have significant consequences: an incomplete dump is often irrecoverable, making it challenging to debug the issue. Resizing the storage area, especially during an incident, may be difficult or even impossible. Hence, maintaining a generous margin is of great importance.
    • A recommended approach is to periodically estimate the dump size on a system and use the maximum size to cater for all types of workloads.
    • Heuristic data on the estimated dump size, as previously suggested, can be applied to resize the dump on similar architectures and with similar workloads.
  7. Some memory is used in the kdump process itself, which might have a marginal impact on the size of the vmcore.
  8. Considering the factors mentioned above, we recommend maintaining a 200 Mb buffer beyond the estimated vmcore size.

Automating the Scheme:

Oracle Oled tools vmcore_sz automates the above scheme. vmcore_sz takes the dump_level as an argument and estimates the vmcore size if a kernel dump is obtained at that moment. It displays the total number of pages; pages need to be excluded depending on the dump level and the expected VMcore size in bytes. If the dump level is not specified, the default configured in /etc/kdump.conf will be used.

Experiments with oled vmcore_sz:

Command: oled vmcore_sz -d 2

TYPE            PAGES                   EXCLUDABLE      DESCRIPTION
----------------------------------------------------------------------
ZERO            26312                   yes             Pages filled with zero
NON_PRI_CACHE   3270478                 yes             Cache pages without private flag
PRI_CACHE       38                      yes             Cache pages with private flag
USER            78994                   yes             User process pages
FREE            103083                  yes             Free pages
KERN_DATA       135979                  no              Dumpable kernel data

page size:              4096
Total pages on system:  3614884
Total size on system:   14806564864      Byte

----------------------------------------------------------------------------------------------

Exclude non private caches : 3270478
Total Pages : 3614884
Pages to be dumped: 344406

----------------------------------------------------------------------------------------------
Expected vmcore size in bytes : 1410686976

Command: oled vmcore_sz

TYPE            PAGES                   EXCLUDABLE      DESCRIPTION
----------------------------------------------------------------------
ZERO            26299                   yes             Pages filled with zero
NON_PRI_CACHE   3270464                 yes             Cache pages without private flag
PRI_CACHE       38                      yes             Cache pages with private flag
USER            79045                   yes             User process pages
FREE            103142                  yes             Free pages
KERN_DATA       135896                  no              Dumpable kernel data

page size:              4096
Total pages on system:  3614884
Total size on system:   14806564864      Byte

----------------------------------------------------------------------------------------------

Dump level not specified. Using default/configured i.e. 16

Exclude free pages  : 103142
Total Pages : 3614884
Pages to be dumped: 3511742

----------------------------------------------------------------------------------------------
Expected vmcoe size in bytes : 14384095232

Comparing the script with a trial vmcore dump:

Note: Here we’re experimenting in “Crash Kernel” with /proc/kcore. Typically the makedumpfile runs in “Capture Kernel” on /proc/vmcore.

makedumpfile -f  –message-level 24 -d 31 /proc/kcore /dev/null
oled vmcore_sz -d 31
Original pages :                0x00000000004034a4
Excluded pages :                0x00000000003cc797
Pages filled with zero :                0x000000000000aeb7
Non-private cache pages:                0x00000000000855b5
Private cache pages :                0x000000000000000b
User process data pages : 0x0000000000021c0d
Free pages :                0x000000000031a713
Hwpoison pages :                0x0000000000000000
Offline pages:                0x0000000000000000
Remaining pages :                0x0000000000036d0d
(The number of pages is reduced to 5%.)
Memory Hole     : 0x000000000003cb5c
————————————————–
Total pages     : 0x0000000000440000
Write bytes     : 865872724
page size:              4096
Total pages on system:  4136100
Total size on system:   16941465600      Byte
————————————————————-
Exclude zero pages : 28943
Exclude non private caches : 546261
Exclude privae cache  : 11
Exclude user pages : 136745
Exclude free pages  : 3255570
Total Pages : 4136100
Pages to be dumped: 168570
—————————————————————
Expected vmcore size in bytes : 690462720

 

The outputs indicate:

  • The pages dumped by the makedumpfile command closely resemble the values indicated by the oled vmcore_sz script.
  • As the kernel page’s dynamics change quickly, we should consider the evaluation of vmcore size as an estimate. 
  • The actual size of the vmcore produced by the makedumpfile command might have slight variations during the mentioned experiment. This experiment involves the running makedumpfile within the “crash kernel,” which includes pages used by the file system, cache, and the makedumpfile process.
  • When we generate the vmcore by using the kdump infrastructure (kexec) and run makedumpfile within the “capture kernel,” the aforementioned pages (pages accounted to run the makedumpfile program) are attributed to the “capture kernel” rather than the “crash kernel” as shown in the preceding experiment.

Conclusion

Based on the discussed scheme, system admins can estimate the size of a vmcore and reserve sufficient amount of disk space to safely dump it. 

Reference:

Partha Satapathy


Previous Post

Grow your Oracle Linux expertise with the installation and boot process training video playlist

Nicolas Pares | 3 min read

Next Post


Crash hotplug: Kernel handling of CPU and memory hot un/plug

Eric DeVolder | 9 min read