An introduction to pvpanic

April 5, 2022 | 10 minute read
Text Size 100%:

What is pvpanic

This blog entry explores a little known feature called pvpanic, which is a paravirtualized device emulated by QEMU and used by the guest OS to report to the VMM when it experiences a panic/crash. This mechanism allows a guest OS kernel to signal the hypervisor when it panics, before any crash dump is collected.

We will describe the behavior and settings required for a Linux guest. Windows guests show similar behavior in the majority of scenarios, but a separate blog entry will be dedicated to discuss the work needed to accomplish this.

How does pvpanic help

If you are a cloud operator (or run a particularly cool home lab), you will occasionally experience issues connecting to the Virtual Machines in your fleet. This can be caused by problems in your infrastructure like a hypervisor crash or network configuration issue, or it could be due to a guest OS level issue like a hang or a kernel panic. We can typically figure out exactly what happened by inspecting logs and checking the status of various services, but it might take some time before either the customer or the operator notices that something has gone wrong.

A cloud operator using QEMU will usually have a monitoring service on the hypervisor listening for QMP events. If a pvpanic event is received, this provides a clear and immediate signal that there is a problem originating at the guest OS level: the guest kernel has experienced a panic. Depending on the event received, it also tells us whether or not a crash dump will be attempted.

From a fleet management perspective, pvpanic provides us with the ability to categorize and root cause unresponsive instances. It helps determine when the instance is down due to a guest level issue, as opposed to an infrastructure (e.g. hypervisor crash) problem, and that information can be used to update metrics and/or alert a customer that their instance has suffered a guest level issue and requires corrective action.

It is important to keep in mind that pvpanic functionality in Linux is limited to notifying the VMM (QEMU in this case) that a panic has occurred in the guest; no additional data is provided to the VMM. This is why pvpanic is normally used in conjunction with the -no-shutdown option in QEMU, or the equivalent -action shutdown=pause, which ensures that the QEMU process does not terminate and makes it possible to collect relevant debugging data like the register state or a memory dump for further analysis.

How to use pvpanic

There are two versions of the pvpanic device, the original which is implemented as an emulated ISA device (-device pvpanic), and a more recent one that is implemented as a PCI device (-device pvpanic-pci). The PCI version was required in order to use pvpanic on Arm instances which do not have support for ISA devices.

The examples shown here were captured in an aarch64 guest launched on an OCI Ampere A1-2c Bare Metal instance, and therefore use the -device pvpanic-pci form. The QMP events generated are the same, regardless of the interface that is emulated by the device.

The short version

If you are an advanced user who just needs to deploy pvpanic in a hurry, here is a summary that shows how the various parameters determine the pvpanic QMP event that is emitted when the guest kernel panics:

kdump active
crash_kexec_post_notifiers
Guest kernel supports
CRASHLOADED event
pvpanic QMP event
N
Y|N
Y|N
GUEST_PANICKED
Y
N
Y|N
None
Y
Y
N
GUEST_PANICKED
Y
Y
Y
GUEST_CRASHLOADED

 

Most Linux distributions enable crash dump functionality by default, and support for the CRASHLOADED event has been available since Linux 5.6.

Use QEMU parameters:

  • -device pvpanic-pci
  • -qmp unix:/tmp/qmp.sock,server,wait=no
  • -action shutdown=pause,panic=none

Linux guest settings:

  • Kernel compiled with:
     
 CONFIG_PVPANIC=y
 CONFIG_PVPANIC_PCI=m
  • Boot with the crash_kexec_post_notifiers kernel command line parameter.
  • kdump service is enabled and running.

For this example we use a Linux guest running OL8.5 which enables the kdump service by default, and UEK6-U3 kernel (5.4.17-2136.304.4.5.el8uek.aarch64) built with the required KCONFIG options. With the above settings, when a kernel panic is triggered in the guest, we will receive the GUEST_CRASHLOADED QMP event:

{"timestamp": {"seconds": 1648072351, "microseconds": 626053}, "event": "GUEST_CRASHLOADED", "data": {"action": "run"}}

After which the guest will kexec into a secondary kernel and collect a crash dump. Once the crash dump is collected, the guest reboots back to its default kernel, issuing a RESET event:

{"timestamp": {"seconds": 1648072358, "microseconds": 10547}, "event": "RESET", "data": {"guest": true, "reason": "guest-reset"}}

If the kdump service is not running, or we have booted an older guest kernel that does not support the GUEST_CRASHLOADED event, we instead receive a GUEST_PANICKED event:

{"timestamp": {"seconds": 1648067630, "microseconds": 165464}, "event": "GUEST_PANICKED", "data": {"action": "run"}}

In this case the guest does not automatically reboot, but the QEMU process does not terminate, giving us a chance to collect debugging data.

The remainder of this blog entry elaborates on the recommended settings above and provides more detailed examples.

QEMU parameters

The QEMU command line parameters that are required in order to use and detect the pvpanic events are:

  • -device pvpanic-pci
    Requests that QEMU emulates a pvpanic device which is exposed to the guest as a PCI device.
  • -qmp unix:/tmp/qmp.sock,server,wait=no
    Configures a QMP monitor using a local Unix socket. We can connect to the socket using utilities like nc, socat && rlwrap, or qmp-shell (provided by QEMU). Use -qmp-pretty to display server responses in pretty-printed JSON formatting.

Recommended:

  • -action shutdown=pause,panic=none
    These are actually two options:
    • -action shutdown=pause
      Equivalent to -no-shutdown, requests that QEMU does not immediately terminate when the guest shuts down.
    • -action panic=none
      Necessary to address the case of older guests (Linux and Windows) using a pvpanic device but missing support for the newer CRASHLOADED event, and Windows guests using the hv-crash enlightenment. In these cases, the default QEMU action is to pause the guest, whereas this option allows the guest to proceed to capture a crash dump and automatically reboot without intervention by a management layer.

Example QEMU command line

Given the requirements above, a basic QEMU command looks like the following:

qemu-system-aarch64 \
-nodefaults \
-device pvpanic-pci \
-action shutdown=pause,panic=none \
-machine virt,gic-version=3 \
-cpu host \
-accel kvm \
-smp 2 \
-m 4G \
-drive file=/usr/share/AAVMF/AAVMF_CODE.pure-efi.fd,if=pflash,unit=0,format=raw,readonly=on \
-drive file=/usr/share/AAVMF/AAVMF_VARS.pure-efi.fd,if=pflash,unit=1,format=raw \
-hda /home/pvpanic-example/ol8.qcow2 \
-qmp unix:/tmp/qmp.sock,server,wait=no \
-serial mon:stdio

Connecting to the QMP socket

There are various methods for connecting to a Unix socket. Here we show an example using the nc (netcat) utility:

# nc -U /tmp/qmp.sock
{"QMP": {"version": {"qemu": {"micro": 0, "minor": 2, "major": 6}, "package": "v6.2.0"}, "capabilities": ["oob"]}}

{"execute": "qmp_capabilities"}
{"return": {}}

Note that we need to issue the qmp_capabilities command to exit the capability negotiation mode and enter command mode. Otherwise no QMP events will be received.

How to trigger a panic in the guest

A kernel panic can be triggered in multiple ways. Since our UEK6-U3 guest kernel enables the Magic SysRq key option (CONFIG_MAGIC_SYSRQ), we use this method.
See the SysRq documentation for details.

Triggering a panic looks like this:

$ echo 1 > /proc/sys/kernel/sysrq
$ echo c > /proc/sysrq-trigger
[  625.468565] sysrq: Trigger a crash
[  625.469186] Kernel panic - not syncing: sysrq triggered crash
[  625.470103] CPU: 1 PID: 1985 Comm: bash Not tainted 5.4.17-2136.304.4.5.el8uek.aarch64 #2
[  625.471452] Hardware name: QEMU KVM Virtual Machine, BIOS 1.4.1 12/03/2020
[  625.472636] Call trace:
[  625.473094]  dump_backtrace+0x0/0x18c
[  625.473768]  show_stack+0x24/0x30
[  625.474392]  dump_stack+0xbc/0xe0
[  625.475003]  panic+0x15c/0x368
[  625.475620]  sysrq_handle_crash+0x1c/0x1c
[  625.476358]  __handle_sysrq+0x88/0x17c
[  625.477043]  write_sysrq_trigger+0x10c/0x174
[  625.477822]  proc_reg_write+0x88/0xe8
[  625.478511]  __vfs_write+0x48/0x8c
[  625.479136]  vfs_write+0xb8/0x1d8
[  625.479743]  ksys_write+0x74/0xf8
[  625.480349]  __arm64_sys_write+0x24/0x30
[  625.481065]  el0_svc_common+0xbc/0x19c
[  625.481749]  el0_svc_handler+0x38/0x88
[  625.482442]  el0_svc+0x8/0xc
[  625.482974] SMP: stopping secondary CPUs

The pvpanic QMP events

For modern versions of QEMU and guest kernels, the type of QMP event emitted when a guest panic occurs is generally determined by two factors:

  • Whether or not a crash dump mechanism (kdump) is active in the guest.

  • If kdump is active, the use of the crash_kexec_post_notifiers kernel parameter in the guest.

Depending on the combination of the factors above, QEMU might emit a GUEST_PANICKED, GUEST_CRASHLOADED, or no event at all. See the following sections for an explanation of the possible scenarios.

GUEST_PANICKED event

This QMP event is emitted by QEMU to indicate that the guest OS kernel panicked and a kernel crash dump mechanism (kdump) was not active. Linux kernels older than the 5.6 release only supported this event, so this would be the event sent even if kdump is active, as long as crash_kexec_post_notifiers is also set in the guest kernel command line parameters.

The other special case is Windows guests using the hv-crash enlightenment, but we’ll discuss that in a separate blog entry.

Lets see an example:

First we stop the kdump service on our guest:

[root@localhost ~]# systemctl stop kdump
[root@localhost ~]# systemctl status kdump
● kdump.service - Crash recovery kernel arming
   Loaded: loaded (/usr/lib/systemd/system/kdump.service; enabled; vendor preset: enabled)
   Active: inactive (dead) since Tue 2022-03-22 17:45:06 EDT; 4s ago
  Process: 1607 ExecStop=/usr/bin/kdumpctl stop (code=exited, status=0/SUCCESS)
  Process: 1089 ExecStart=/usr/bin/kdumpctl start (code=exited, status=0/SUCCESS)
 Main PID: 1089 (code=exited, status=0/SUCCESS)
 [...]

And trigger a panic using SysRq as described earlier.

[   30.613584] sysrq: Trigger a crash
[   30.614245] Kernel panic - not syncing: sysrq triggered crash
[   30.615193] CPU: 1 PID: 1457 Comm: bash Not tainted 5.4.17-2136.304.4.5.el8uek.aarch64 #2
[   30.616516] Hardware name: QEMU KVM Virtual Machine, BIOS 1.4.1 12/03/2020
[   30.617689] Call trace:
[   30.618148]  dump_backtrace+0x0/0x18c
[   30.618822]  show_stack+0x24/0x30
[   30.619431]  dump_stack+0xbc/0xe0
[   30.620041]  panic+0x15c/0x368
[   30.620604]  sysrq_handle_crash+0x1c/0x1c
[   30.621339]  __handle_sysrq+0x88/0x17c
[   30.622041]  write_sysrq_trigger+0x10c/0x174
[   30.622828]  proc_reg_write+0x88/0xe8
[   30.623500]  __vfs_write+0x48/0x8c
[   30.624123]  vfs_write+0xb8/0x1d8
[   30.624730]  ksys_write+0x74/0xf8
[   30.625338]  __arm64_sys_write+0x24/0x30
[   30.626026]  el0_svc_common+0xbc/0x19c
[   30.626673]  el0_svc_handler+0x38/0x88
[   30.627360]  el0_svc+0x8/0xc
[   30.627889] SMP: stopping secondary CPUs
[   30.628864] Kernel Offset: disabled
[   30.629522] CPU features: 0x00002,20802008
[   30.630257] Memory Limit: none
[   30.630822] ---[ end Kernel panic - not syncing: sysrq triggered crash ]---

Entering the QEMU monitor and checking the status:

QEMU 6.2.0 monitor - type 'help' for more information
(qemu) info status
VM status: running

we can see that the VM is still running. This is because we are using -action shutdown=pause,panic=none in the QEMU parameters. The QMP event received is:

{"timestamp": {"seconds": 1648067630, "microseconds": 165464}, "event": "GUEST_PANICKED", "data": {"action": "run"}}

After this event is received, a system_reset command can be issued on the monitor or via QMP to reboot the guest.

GUEST_CRASHLOADED event

The GUEST_CRASHLOADED event indicates that the guest kernel has hit a panic, but it will handle it by itself i.e. kdump service is running on the guest.

This event is emitted when the guest kernel has panicked, kdump is active, and crash_kexec_post_notifiers kernel parameter is used by the guest kernel. Let’s explain how these two settings interact.

kdump

The majority of Linux distributions enable the kdump service, provided by the kexec-tools package in OL8. kdump is the Linux kernel crash-dump mechanism. In the event of a system crash, kdump uses kexec to boot into a second kernel loaded in memory reserved for this purpose, and captures the contents of the crashed kernel’s memory (vmcore) to aid in determining the cause of the crash. After the crash dump has been collected, the guest will reboot using the default kernel.

For more information on kexec, see this blog entry.

crash_kexec_post_notifiers

This kernel parameter is used to request that the callbacks registered with the panic notifier chain are called before the kdump service.

A notifier chain is a mechanism provided by the Linux kernel to allow a subsystem to be notified when an event occurs in another subsystem, and provide a callback function to invoke.

During initialization, the pvpanic module registers a callback with the panic_notifier_list notifier chain. When the kernel panics, the value of crash_kexec_post_notifiers determines whether or not the callbacks registered in the panic_notifier_list are invoked before kexec’ing into the capture kernel which will collect the crash dump. The panic() code in kernel/panic.c is extensively commented and worth a look if you are interested in the details.

Therefore, if we want the pvpanic signal to be sent to QEMU when kdump is enabled, it is necessary to use the crash_kexec_post_notifiers kernel parameter.

NOTE: The crash_kexec_post_notifiers parameter can also be set at runtime using:

echo Y > /sys/module/kernel/parameters/crash_kexec_post_notifiers

QMP events received

If the guest is booted with crash_kexec_post_notifiers, and kdump is active, we trigger a panic in the guest and see the following QMP events:

{"timestamp": {"seconds": 1648069594, "microseconds": 946575}, "event": "GUEST_CRASHLOADED", "data": {"action": "run"}}
{"timestamp": {"seconds": 1648069601, "microseconds": 309809}, "event": "RESET", "data": {"guest": true, "reason": "guest-reset"}}

Note that the RESET event is emitted after kdump finishes collecting the crash dump and reboots back to the default kernel.

If kdump is active, but crash_kexec_post_notifiers is not set, the panic notifier list will not be called. The crash dump will be collected and the guest will automatically reboot, with the only event emitted being the standard RESET event:

{"timestamp": {"seconds": 1647987955, "microseconds": 455659}, "event": "RESET", "data": {"guest": true, "reason": "guest-reset"}}

Wrapping up

In this blog entry we explained the basis of what pvpanic is and how it can be useful for quickly determining if the cause of an outage is due to a guest OS kernel panic. At its core, it is a very simple feature, but unfortunately the various interactions with other components like kdump, guest kernel command line parameter, QEMU options, and legacy kernels with incomplete feature sets makes for a somewhat convoluted explanation of how to use it.

This blog does not discuss implementation details of the different backends in QEMU, as well as how the Linux driver has evolved over time. Windows guests using the Oracle VirtIO Drivers for Microsoft Windows also have support for pvpanic functionality and emit the same events. These topics will be discussed in future blog posts.

Alejandro Jimenez


Previous Post

Find out what you're missing with Ksplice Inspector

Honglin Su | 6 min read

Next Post


TeraCloud: Providing reliable and secure cloud services using Oracle Linux

Parnian Taidi | 4 min read