On May 29th, Qualys published CVE-2025-4598, a TOCTOU information disclosure vulnerability affecting systemd-coredump.
The CVE description is:
A vulnerability was found in systemd-coredump. This flaw allows an attacker to force a SUID process to crash and replace it with a non-SUID binary to access the original’s privileged process coredump, allowing the attacker to read sensitive data, such as /etc/shadow content, loaded by the original process. A SUID binary or process has a special type of permission, which allows the process to run with the file owner’s permissions, regardless of the user executing the binary. This allows the process to access more restricted data than unprivileged users or processes would be able to. An attacker can leverage this flaw by forcing a SUID process to crash and force the Linux kernel to recycle the process PID before systemd-coredump can analyze the /proc/pid/auxv file. If the attacker wins the race condition, they gain access to the original’s SUID process coredump file. They can read sensitive content loaded into memory by the original binary, affecting data confidentiality.
On the same report, CVE-2025-5054 was published, specifically for apport, Ubuntu’s specific coredump utility.
Core dumps & systemd-coredump
Before delving into the specifics of the vulnerability, let’s talk about core dumps and systemd-coredump
.
Core dumps are memory snapshots of a process at a specific point in time, typically taken when the process crashes. These dumps allow developers to analyze the state of the program, such as the stack, backtrace, registers, and memory contents. This information is key to understand the underlying issue, as it accurately displays the context of the program at the time of the crash. When it occurs (e.g., due to a segmentation fault), the kernel detects it and creates the core dump by invoking a userspace handler specified in /proc/sys/kernel/core_pattern
.
A core dump will be created based on the dumpable
attribute of the process. This is set to 1 by default if it’s a regular user process or to suid_dumpable
if SUID process. suid_dumpable
can be:
- 0: non dumpable
- 1: dumpable and accesible by the user. Meant only for debugging purposes.
- 2: dumpable but accessible by root only.
// 6.15 fs/exec.c#L1321 // int begin_new_exec(struct linux_binprm * bprm) /* * Figure out dumpability. Note that this checking only of current * is wrong, but userspace depends on it. This should be testing * bprm->secureexec instead. */ if (bprm->interp_flags & BINPRM_FLAGS_ENFORCE_NONDUMP || !(uid_eq(current_euid(), current_uid()) && gid_eq(current_egid(), current_gid()))) set_dumpable(current->mm, suid_dumpable); else set_dumpable(current->mm, SUID_DUMP_USER);
When the core dump generation is triggered in do_coredump
, the dumpable
attribute is checked at the beginning of the core dump process. If it is 0, the kernel doesn’t trigger the generation.
// 6.15 fs/coredump.c#L524 // void do_coredump(const kernel_siginfo_t *siginfo) if (!__get_dumpable(cprm.mm_flags)) goto fail;
At this point, the core is generated by the kernel alongside the args (see /sys/fs/kernel/core_pattern
man page) and passed to the userspace handler.
systemd-coredump
is a component of the systemd suite responsible for capturing and processing core dumps. It’s usually set in /sys/fs/kernel/core_pattern
as |/usr/lib/systemd/systemd-coredump ARGS
systemd-coredump
stores the cores in a centralized journal or on disk, depending on configuration, while at the same time handling permissions and recovering other metadata from the process, like executable path or arguments.
The whole workflow looks like this:
Problem analysis
systemd-coredump
retrieves process metadata by inspecting the /proc
files from the PID passed by the kernel in %P
. It sets the access to the coredump based on the information collected after the crash. In the code, this happens in grant_user_access
, by querying the process information through /proc/$PID/auxv
.
User access is granted by verifying that AT_SECURE is 0 and that UID == EUID and GID == EGID:
/* We allow access if we got all the data and at_secure is not set and * the uid/gid matches euid/egid. */ bool ret = at_secure == 0 && uid != UID_INVALID && euid != UID_INVALID && uid == euid && gid != GID_INVALID && egid != GID_INVALID && gid == egid;
This means that:
- If it’s user process, then the user can access it
- If SUID process, then only root can access it
CVE-2025-4598 is a race condition at the time of reading the process metadata and setting the permissions for the core dump file.
There is a small window between the kernel invoking systemd-coredump
and the reading of the process metadata for an attacker to replace the original SUID process by one that has the same PID as the SUID process, but it’s just a regular process from the user.
If this happens, when systemd-coredump
reads the /proc/$PID/auxv
in grant_user_access
, it reads the new auxv
, so the process looks like a normal process (AT_SECURE is 0 and both UID and GID match the effective ones) and allows access to the coredump to the user, while storing the memory dump of the SUID process.
As specified by Qualys, winning this race condition against systemd-coredump
is rather hard, since it’s written in C and the startup time is quite fast.
- If the process is killed too fast, the dump is not correctly generated as the kernel fails to get all the core content.
- If it is too slow then the original
/auxv
is read, in which case the coredump is generated with root access only.
The window for an attacker to get the perfect scenario is of miliseconds. While this sounds almost impossible to happen, in reality, it’s a realistic scenario for an attacker to win if getting the timing right.
To make it even more complicated, in order to replace the SUID PID, the attacker needs to overlap the PID space. The maximum value is controlled by /proc/sys/kernel/pid_max
and it usually is 0x400000
in 64 bit systems. This overlap, in the best case scenario, takes several minutes, as pointed by Qualys.
But if an attacker would be able to win the race condition and trigger the issue, sensitive information may be leaked, as the attacker would be able to inspect the memory of the SUID process. The memory could contain the contents of root-only files like /etc/shadow, in-memory secrets or other sensitive information.
Patch & Mitigations
The patch updates the code in grant_user_access
to use %d
(from core_pattern
, provided by the kernel) to verify if permissions should be granted or not. %d
specifies the dumpable level we talked about before:
%d Dump mode—same as value returned by prctl(2) PR_GET_DUMPABLE
The relevant part of the patch is:
- /* We allow access if we got all the data and at_secure is not set and - * the uid/gid matches euid/egid. */ + /* We allow access if %d/dumpable on the command line was exactly 1, we got all the data, + * at_secure is not set, and the uid/gid match euid/egid. */ bool ret = + context->dumpable == 1 && at_secure == 0 && uid != UID_INVALID && euid != UID_INVALID && uid == euid && gid != GID_INVALID && egid != GID_INVALID && gid == egid;
This effectively relies on the kernel-supplied dumpable
argument for either granting access to the user or not, meaning that unless it’s set to 1, it won’t be accessible to the user.
Oracle Linux released patched versions on the same day of the publication of the vulnerability to fix this issue. The fixed versions are:
- Oracle Linux 8: systemd version 239-82.0.4.5 announced in ELSA-2025-20343
- Oracle Linux 9: systemd version 252-51.0.2 announced in ELSA-2025-20344
As pointed by Qualys, if updating is not desired or doable, a possible mitigation is to set suid_dumpable
to 0, as that will block the generation of any SUID processes at kernel level.
In an effort to fix pid reuse race conditions in coredump once and for all, a new feature was made available in 6.15.1. This new feature makes the pidfd of the process being dumped available to the userspace helper (systemd-coredump
, for example) as fd 3. A new core_pattern
parameter %F
was added to support this. The pidfd is unique for each process and therefore is not affected by pid reuse attacks, as pointed by Qualys. Systemd added support already for this new %F
parameter, so any systems in the future running a kernel supporting this feature can start using it.