Pstore, The Linux Kernel Persistent Storage File System

May 3, 2022 | 6 minute read
Text Size 100%:

Introduction

With Linux, the primary method for obtaining debugging information of a serious error or fault is via the kdump mechanism. Kdump captures a wealth of kernel and machine state and writes it to a file for post-mortem debugging. But if kdump writes to a file that is on a remote server, and networking is down, then kdump can not work. (In this context, networking includes the guest’s network driver and stack, or the host’s network driver(s) and stack or the network hardware both on the host and in the surrounding data center.)

In addition, upon panic, Linux dumps the faulting call-chain backtrace to the console, but if the console is not captured, monitored, or scrollable, then important parts of the dump may be lost.

Thus, with certain types of faults, it can be challenging to obtain visibility to the problem. Yet, it is important to debug these faults to prevent recurrence in the future.

Enter the Linux pstore filesystem.

Linux provides a persistent storage file system, pstore[1], that can store error records when the kernel dies (or reboots or powers-off). These records in turn can be referenced to debug kernel problems (currently the kernel stuffs the tail of the dmesg, which also contains a stack backtrace, into pstore). The pstore is backed by local non-volatile memory and presented to the running system via traditional filesystem interfaces. Since it uses local non-volatile memory, pstore works even when kdump can not.

Oracle has been working to improve the usability of the Linux kernel pstore feature.

Overview of pstore

Pstore was introduced into Linux[1] to record information (eg. dmesg tail) upon panics and shutdowns. Pstore is independent of, and can run before, kdump. In certain scenarios (ie. hosts/guests with root filesystems on NFS/iSCSI where networking software and/or hardware has failed), pstore may contain information available for post-mortem debugging not otherwise captured.

Two common storage backends for the pstore filesystem are ACPI ERST[2] and UEFI[3]. Most BIOSes implement ACPI ERST and, more recently, Oracle provided QEMU support for ACPI ERST[5]. Similarly UEFI is available on most modern servers as well as guests via OVMF. Thus a usable pstore storage backend for bare metal and virtual machines is now possible in almost all configurations.

For example, if the Linux kernel dies, the dmesg tail, is written to pstore.

The snippet below illustrates the dmesg tail stored into the pstore ACPI ERST backend.

# ls -al /sys/fs/pstore
total 0
drwxr-x--- 2 root root     0 Jul 28 11:35 .
drwxr-xr-x 9 root root     0 Jul 28 11:35 ..
-r--r--r-- 1 root root 17711 Jul 28 11:35 dmesg-erst-6990001317951307777
-r--r--r-- 1 root root 17755 Jul 28 11:35 dmesg-erst-6990001317951307778
-r--r--r-- 1 root root 17736 Jul 28 11:35 dmesg-erst-6990001317951307779
-r--r--r-- 1 root root 17746 Jul 28 11:35 dmesg-erst-6990001317951307780

If the pstore backend were UEFI, it may look more similar to the following:

# ls -al /sys/fs/pstore
total 0
drwxr-x--- 2 root root 0 May 9 09:50 .
drwxr-xr-x 7 root root 0 May 9 09:50 ..
-r--r--r-- 1 root root 1610 May 9 09:49 dmesg-efi-155741337601001
-r--r--r-- 1 root root 1778 May 9 09:49 dmesg-efi-155741337602001
-r--r--r-- 1 root root 1726 May 9 09:49 dmesg-efi-155741337603001
-r--r--r-- 1 root root 1746 May 9 09:49 dmesg-efi-155741337604001
-r--r--r-- 1 root root 1686 May 9 09:49 dmesg-efi-155741337605001
-r--r--r-- 1 root root 1690 May 9 09:49 dmesg-efi-155741337606001
-r--r--r-- 1 root root 1775 May 9 09:49 dmesg-efi-155741337607001
-r--r--r-- 1 root root 1811 May 9 09:49 dmesg-efi-155741337608001
-r--r--r-- 1 root root 1817 May 9 09:49 dmesg-efi-155741337609001
-r--r--r-- 1 root root 1795 May 9 09:49 dmesg-efi-155741337710001
-r--r--r-- 1 root root 1770 May 9 09:49 dmesg-efi-155741337711001
-r--r--r-- 1 root root 1796 May 9 09:49 dmesg-efi-155741337712001
-r--r--r-- 1 root root 1787 May 9 09:49 dmesg-efi-155741337713001
-r--r--r-- 1 root root 1808 May 9 09:49 dmesg-efi-155741337714001
-r--r--r-- 1 root root 1754 May 9 09:49 dmesg-efi-155741337715001

The dmesg tail is fragmented (based on the underlying storage exchange buffer size) into several error records[3] which are presented as files, and can be re-assembled. Of course, the most important thing is that the dmesg tail, and thus the kernel panic call trace, has been captured to determine where things went badly wrong.

The size of the dmesg tail is tunable via CONFIG_PSTORE_DEFAULT_KMSG_BYTES, and is 10KiB by default.

As the local non-volatile storage itself tends to be small, typically on the order of tens of kilobytes, Oracle provided the systemd-pstore[6] service to help manage the pstore space. In short, Upon boot (or when systemd-pstore is re/started), it archives the contents of the pstore to other storage (eg. the regular filesystem), thus preserving the existing information and clearing pstore for future error events. Oh, and systemd-pstore will re-assemble the dmesg too!

Systemd-pstore first appeared in v243 and is present and enabled in OL7.9 and OL8.2 and newer.

The systemd-pstore service is enabled by default. It can be re-run by issuing:

systemctl restart systemd-pstore

You can find the archive of past pstore contents under /var/lib/systemd/pstore , for example:

# ls -al /var/lib/systemd/pstore/155741337/
total 92
drwxr-xr-x 2 root root 4096 May 9 09:50 .
drwxr-xr-x 4 root root 40 May 9 09:50 ..
-rw-r--r-- 1 root root 1610 May 9 09:50 dmesg-efi-155741337601001
-rw-r--r-- 1 root root 1778 May 9 09:50 dmesg-efi-155741337602001
-rw-r--r-- 1 root root 1726 May 9 09:50 dmesg-efi-155741337603001
-rw-r--r-- 1 root root 1746 May 9 09:50 dmesg-efi-155741337604001
-rw-r--r-- 1 root root 1686 May 9 09:50 dmesg-efi-155741337605001
-rw-r--r-- 1 root root 1690 May 9 09:50 dmesg-efi-155741337606001
-rw-r--r-- 1 root root 1775 May 9 09:50 dmesg-efi-155741337607001
-rw-r--r-- 1 root root 1811 May 9 09:50 dmesg-efi-155741337608001
-rw-r--r-- 1 root root 1817 May 9 09:50 dmesg-efi-155741337609001
-rw-r--r-- 1 root root 1795 May 9 09:50 dmesg-efi-155741337710001
-rw-r--r-- 1 root root 1770 May 9 09:50 dmesg-efi-155741337711001
-rw-r--r-- 1 root root 1796 May 9 09:50 dmesg-efi-155741337712001
-rw-r--r-- 1 root root 1787 May 9 09:50 dmesg-efi-155741337713001
-rw-r--r-- 1 root root 1808 May 9 09:50 dmesg-efi-155741337714001
-rw-r--r-- 1 root root 1754 May 9 09:50 dmesg-efi-155741337715001
-rw-r--r-- 1 root root 26754 May 9 09:50 dmesg.txt

where the dmesg.txt is re-assembled from the dmesg tail fragments related to dmesg-efi-155741337* files.

Pstore Enablement

To enable Linux pstore, which UEK does by default, ensure the following kernel configuration options are set:

# Enable PSTORE support
CONFIG_PSTORE=y
CONFIG_PSTORE_DEFAULT_KMSG_BYTES=10240
CONFIG_PSTORE_COMPRESS=y
CONFIG_PSTORE_DEFLATE_COMPRESS=y

# Enable UEFI pstore backend
CONFIG_EFI_VARS_PSTORE=y
CONFIG_EFI_VARS_PSTORE_DEFAULT_DISABLE=y

# Enable ACPI ERST pstore backend
CONFIG_ACPI=y
CONFIG_ACPI_APEI=y

With the above, pstore is enabled in the kernel and the ACPI ERST and UEFI storage backends, if present on the machine, are available.

The selection of the pstore backend is done at kernel boot time. By default, ACPI ERST is selected as the storage backend, and is preferred as it was designed for this function.

The UEFI backend is disabled by default, and to use the UEFI backend it must be explicitly selected at kernel boot time.

For UEK5 (4.14) era kernels, the following kernel parameter[4] is selected to utilize the UEFI pstore backend:

pstore.backend=efi
  
or

erst_disable=1

However, for UEK6 (5.4) era kernels, an additional kernel parameter[4] is needed:

efi_pstore.pstore_disable=0

Without this kernel parameter, the EFI backend is never attempted.

To see which backend is active, you can inquire with:

# cat /sys/module/pstore/parameters/backend

Which will result in one of:

  • (null): pstore does not have a valid storage backend
  • efi: pstore is using UEFI variable storage backend
  • erst: pstore is using ACPI ERST backend

At this point, pstore is active and and ready for the next event.

Pstore Kernel Parameters

Two kernel parameters impact the writing of data into pstore.

Parameter printk.always_kmsg_dump writes to pstore at kernel shutdown or reboot.

Parameter crash_kexec_post_notifiers enables the writing to pstore before attempting kdump. Do be aware of the kernel documentation warning for this parameter.

These parameters can be passed at kernel boot time, or set via the sysfs interface:

echo Y > /sys/module/printk/parameters/always_kmsg_dump

echo Y > /sys/module/kernel/parameters/crash_kexec_post_notifiers

To persist a change to these settings, un-comment the appropriate line(s) from /usr/lib/tmpfiles.d/systemd-pstore.conf:

#w /sys/module/printk/parameters/always_kmsg_dump - - - - Y
#w /sys/module/kernel/parameters/crash_kexec_post_notifiers - - - - Y

At next reboot, systemd will process this file and apply the changes.

Summary

By enabling pstore to run prior to kdump, you’re assured of capturing the kernel call backtrace, even in difficult scenarios. With that dmesg tail, and kernel call trace in hand, you’re one step closer to finding the cause of the panic!

References

Eric DeVolder


Previous Post

Quick and Easy Installation of Oracle Database on Oracle Linux in Oracle VM VirtualBox

Sergio Leunissen | 3 min read

Next Post


Oracle Cloud Native Environment 1.5 helps customers embrace multicloud

Simon Coter | 2 min read