Deterring a new class of kernel exploits using Ksplice

For a while, null-dereference bugs in the kernel have not been considered security vulnerabilities — in most cases, dereferencing NULL in the kernel, or any virtual address close to zero, happens in a context where the bug can easily be recovered from by the kernel, without doing too much harm to the rest of the system. However, recent research from Google Project Zero have demonstrated ways to again use null-pointer dereferences as an attack.

Fortunately the fix for this exploit is relatively straight forward, and we are able to deliver the fix with Ksplice as well. With Ksplice, we can patch this vulnerability without a reboot even in some kernels up to 5 years old, who can patch without incurring any downtime. It’s not often we can patch an entire class of security vunerabilities, and this blog takes you through the mechanics of (re)fixing null pointer dereferences in the kernel.

A novel way of exploiting null-dereference bugs

The time when null-deref bugs could simply be dismissed by security researchers because of their innocuous appearance seems to have come to an end. Indeed, work from Google Project Zero published in January 2023 introduced a new way of exploiting these seemingly low-impact bugs.

When kernel code dereferences a NULL pointer while running in the context of a process, it simply destroys the process, outputs an “oops” message, and continues execution the best way it can. Had the null-dereference bug happened in any other context, the kernel would have deemed it impossible to continue execution and ceased all operation with what’s called a “panic”.

When the kernel decides to destroy a task because it encounters an error that prevents it from continuing to execute safely, it must do so immediately. It doesn’t have the luxury to track down all the kernel resources that the task may have left behind. In other terms, if a process encounters an error while running in the kernel, such as a NULL pointer dereference, while it has allocated some memory with kmalloc for example, that memory will be lost. The kernel will not be able to reclaim it after the oops handler has run. The same is true for example if the process was holding a lock or incremented a reference count before being destroyed by the oops handler.

This research done by Google Project Zero demonstrates that it is possible to create a Use-After-Free condition by overflowing a reference counter caused by repeatedly making a task oops. In this paper, the exploit makes use of a user-controlled null-deref bug to trigger an oops while the kernel code holds a reference to an object.

The null-deref bug used in this proof-of-concept had already been fixed upstream at the time the paper was published. Nonetheless, it is fair to say that it is not difficult for security researchers to find a bug that complies with the following rules, making an exploit possible:

Causes the kernel to oops.
Can be triggered in a controlled fashion.
Happens while a reference to an object is held.

That’s about all it takes and opens the door to a use-after-free condition, another class of bug whose exploitation techniques are well documented. In the worst case, this can lead to the execution of arbitrary code on the targeted system.

The mitigation

In December 2022, a commit was added to the upstream linux kernel source called “exit: Put an upper limit on how often we can oops”. This commit made it in the Linux v6.2 release, and was also backported to the stable branch. Many distro kernel maintainers also chose to include this patch in their own fork as a security update. Limiting the number of nested oopses effectively prevents this attack from being weaponized.

For example, on Oracle Linux 9, with UEKR7, you’ll benefit from this patch if you upgrade the kernel to version v5.15.0-8.91.1, available as of April 2023. For Ubuntu 22.04 (Jammy Jellyfish), the same patch is present in the linux-image package version 5.15.0-70.77, released at around the same time.

This change adds a clever but simple mitigation for the exploitation technique mentioned earlier: The kernel will count the number of oopses that occurred, and if that count reaches a certain (configurable) limit, it will panic.

In the exploit mentioned earlier, overflowing a reference count requires oopsing the kernel a lot of times (actually something close to 2^32 times). This mitigation would ensure that the kernel crashes before the UAF condition occurs, while at the same time, leave the opportunity for the kernel to recover if the attack stops before reaching the limit.

The downside of the mitigation is that if there is a bug in the kernel that is not a security issue, but happens to generate oops repeatedly and with a high frequency, it will cause your system to crash. Still, it is a good trade-off, because if you hit such a bug, chances are that the system is already unusable because of all the randomly crashing processes.

For the most conservative users, the Linux kernel provides functionality to force the system to crash early, at first sign of a problem. The kernel has the panic_on_oops feature for that, though few customers actually enable this. When this feature is enabled by putting panic_on_oops=1 on the kernel command line at boot time, the kernel will panic whenever an oops occurs, preventing any additional exploit behavior.

Unfortunately, crashing the machine may deter this kind of attack, but using the panic_on_oops feature trades the active exploit for leaving the system vulnerable to denial-of-service for each tiny null-dereference bug in the kernel that can be triggered by an attacker.

panic_on_oops is more useful to kernel developers than end users, because it makes bugs more visible to them when they’re putting kernel code under test. Using panic_on_oops on production workloads should be considered carefully. On most Linux distributions you’ll find this option disabled by default.

Protecting the system without a reboot — with Ksplice

Ksplice is a service that allow patches for security vulnerabilities of the Linux kernel to be applied at run-time, saving our users the need to reboot their systems to benefit from protection for the latest CVEs.

In the Ksplice updates team, the people behind the analysis of security fixes and the preparation of the run-time patches, we usually only focus on isolated security bugs. We rarely have the occasion to provide mitigations for an entire class of exploits. But after consideration, there’s no technical limitation to our technology that would prevent us from adding new features to a running kernel, even if they’re not security related.

Given the level of additional protection that it would provide to our users, we chose to create a security update for the oops limit mitigation that can be applied at run-time without the need to reboot the system. No need to reboot also means no down-time for servers running critical workloads.

Another upside of using Ksplice is that this mitigation is made available for all the kernel versions that we support. So even if you are running a 5 year old kernel that you have never updated, you could benefit from this protection.

If you’ve enjoyed this post and you think you would enjoy working on this kind of subject, feel free to drop us a line at ksplice-support_ww@oracle.com. We are a diverse, fully remote, worldwide team. We look at a ton of Linux kernel patches and ship run-time updates for many distros, totalling more than 600 unique vulnerabilities a year.

Deterring a new class of kernel exploits using Ksplice

A novel way of exploiting null-dereference bugs

The mitigation

Protecting the system without a reboot — with Ksplice

Julian Pidancet

The Anonymous Reverse Mapping - An Introduction

So you want a read-only root filesystem? Try bootc

Deterring a new class of kernel exploits using Ksplice

A novel way of exploiting null-dereference bugs

The mitigation

Protecting the system without a reboot — with Ksplice

Authors

Julian Pidancet

The Anonymous Reverse Mapping - An Introduction

So you want a read-only root filesystem? Try bootc