Issue
Systems may experience recurring RPM database (rpmdb) corruption or package management failures, for example:
# rpm -qa
error: rpmdb: BDB0113 Thread/process 9773/140219155286080 failed: BDB1507 Thread died in Berkeley DB library
error: db5 error(-30973) from dbenv->failchk: BDB0087 DB_RUNRECOVERY: Fatal error, run database recovery
error: cannot open Packages index using db5 - (-30973)
error: cannot open Packages database in /var/lib/rpm
In many cases, the root cause is that a process using the RPM database is terminated abruptly (for example via SIGKILL), leaving the database in an inconsistent state.
To understand who is killing RPM-related processes and when, we can use rpm_db_snooper from the Oracle Linux Enhanced Diagnostics (OLED) toolkit.
What is rpm_db_snooper?
rpm_db_snooper is a systemd service and DTrace-based diagnostic tool for Oracle Linux that:
- Monitors access to the RPM database directory
/var/lib/rpmbyrpm,yum, anddnf. - Watches for signals (such as
SIGKILL) sent to those processes. - Logs relevant events (open/close, signals, exits) via the standard system journal.
It is implemented using DTrace and packaged as part of the Oracle Linux Enhanced Diagnostics (OLED) tools for Oracle Linux.
Note:
rpm_db_snooperis a diagnostics/forensics utility. It does not repair corruption; it helps you capture evidence about what was happening to the RPM database when problems occur.
Why rpm_db_snooper is Useful
RPM database corruption and hangs are often intermittent and hard to reproduce. Once the database is already in a bad state, there may be no standard logging that explains who killed yum/dnf/rpm or which process interrupted a transaction.
rpm_db_snooper addresses this by providing:
- An audit trail of which RPM-related processes were using
/var/lib/rpm. - A record of signals (especially
SIGKILL) delivered to those processes. - Timestamps that let you correlate events with other system activity (automation jobs, monitoring agents, OOM kills, etc.).
By reviewing this data when corruption reoccurs, you can often identify:
- Automation or monitoring tools that time out and kill
yum/dnf/rpm. - Operators manually sending
kill -9to package processes. - Other system components (e.g., OOM killer) terminating those processes.
Temporary Recovery vs Root Cause
When rpmdb corruption is already present, you typically need to:
- Ensure no processes are using
/var/lib/rpm:
$fuser -v /var/lib/rpm - Stop or kill any processes reported by
fuser. - Perform the appropriate rpmdb recovery/repair for your distribution, which may include removing transient Berkeley DB environment files (
__db.*) or following vendor-specific recovery procedures.
These steps can temporarily clear the issue, but if the underlying cause is not fixed, corruption can reoccur. That is where rpm_db_snooper helps: it needs to be running before the issue in order to capture signal and access activity around the problematic time.
Installing rpm_db_snooper (OLED tools)
rpm_db_snooper is included in the Oracle Linux Enhanced Diagnostics (OLED) tools collection.
- For installation and repository setup details, see:
Once the OLED tools are installed, rpm_db_snooper will be provided as a systemd service and supporting DTrace script.
Service Architecture
Key components:
- rpm_db_snooper.service
- A
systemdunit that runs the monitoring DTrace script as a persistent root-level service.
- A
- monitor_rpmdb.d
- The DTrace script that:
- Detects when
yum,dnf, orrpmopen or interact with database files under/var/lib/rpmand, where applicable,history.sqlite. - Tracks active RPM DB consumer PIDs.
- Logs signal delivery (
kill) targeted at those consumers. - Monitors consumer exit events.
- Detects when
- The DTrace script that:
- Key paths
/var/lib/rpm– RPM database directory./usr/libexec/oled-tools/scripts.d/monitor_rpmdb.d– DTrace script./var/log/messages– standard system log, if configured.journalctl -u rpm_db_snooper.service– primary journal view (SYSLOG_IDENTIFIER:rpm_db_snooper).
Service ExecStart (typical):
/usr/sbin/dtrace -Cs /usr/libexec/oled-tools/scripts.d/monitor_rpmdb.d $RPM_DB_SNOOPER_DEBUG
$RPM_DB_SNOOPER_DEBUG can be set to enable additional debug output.
Starting and Managing the Service
On Oracle Linux systems with OLED tools installed, rpm_db_snooper is managed via systemd.
The service is not enabled by default when OLED tools are installed; you must explicitly enable and start it.
Start the service:
systemctl start rpm_db_snooper.service
Check the status:
systemctl status rpm_db_snooper.service
Restart if you change configuration or enable debug:
systemctl restart rpm_db_snooper.service
View logs:
journalctl -u rpm_db_snooper.service
# or
grep rpm_db_snooper /var/log/messages
Important: The service must be running before rpmdb corruption or hangs occur in order to capture useful diagnostic data.
To have it start automatically on boot, also run:
systemctl enable rpm_db_snooper.service
Enabling Debug Output (for troubleshooting rpm_db_snooper itself)
In normal use, you do not need debug output enabled to diagnose rpmdb corruption. The standard service logs are sufficient for tracking signals and basic rpmdb access patterns.
Enable debug mode only if you need more verbose output to troubleshoot the rpm_db_snooper script or service behavior.
Temporary debug via environment variable
export RPM_DB_SNOOPER_DEBUG=-DDEBUG
systemctl restart rpm_db_snooper.service
Or, running the script directly for low-level debugging:
export DEBUG=1
dtrace -s /usr/libexec/oled-tools/scripts.d/monitor_rpmdb.d -DDEBUG
How rpm_db_snooper Works (High-Level)
At a high level, the DTrace script:
- Attaches to RPM-related processes (
rpm,yum,dnf). - Monitors operations on files under
/var/lib/rpm(and related history files). - Tracks which PIDs currently hold the RPM DB open.
- Observes signals sent to those PIDs (e.g.,
SIGKILL,SIGTERM). - Logs structured lines with:
- Process name and PID
- Parent process name and PID
- Signal number (if applicable)
- Target process PID
- Open/close or exit events
This information lets you reconstruct a timeline around rpmdb activity and identify when a package manager was killed.
Example: Using rpm_db_snooper When Corruption Is Discovered Later
In many real-world cases, rpmdb corruption is not noticed immediately. You might only discover it much later, for example when a new yum/dnf/rpm command fails.
To make rpm_db_snooper useful in such scenarios, it should be enabled and kept running in the background before any issues occur. Then, when corruption is eventually detected, you can look back in time through its logs.
- Ensure rpm_db_snooper is running (and preferably enabled at boot):
$systemctl status rpm_db_snooper.service
If it is not active, start (and optionally enable) it:$ systemctl start rpm_db_snooper.service$ systemctl enable rpm_db_snooper.service # optional, for automatic start on boot - When rpmdb corruption is detected (even if long after the triggering event), do not discard logs. Instead, review
rpm_db_snooperoutput for a time window that reasonably covers when the corruption might have occurred. For example:journalctl -u rpm_db_snooper.service --since "2 days ago"
Adjust the--sincewindow based on how long ago you suspect the problematic activity might have taken place. - Look for signal events or unexpected exits of
rpm,yum, ordnfaround the time of any suspicious package operations (maintenance windows, automation runs, scripted updates, etc.). Correlate these with other system logs (cron jobs, monitoring, OOM killer, operator actions) to narrow down the root cause.
Sample Log Output and Interpretation
Below is an example snippet of what rpm_db_snooper-related activity might look like in the system logs when monitoring RPM database access and a kill signal sent to a package manager process:
Feb 11 10:28:37 test_setup systemd[1]: Started Dtrace Program to track kill signals sent to yum/dnf/rpm processes.
Feb 11 10:28:40 test_setup dtrace[3081414]: -------------------------
Feb 11 10:28:40 test_setup dtrace[3081414]: Rpm db snooper - start
Feb 11 10:29:26 test_setup dtrace[3081414]: Process(PID) Parent Process(PID) Signal No Target Process(PID)
Feb 11 10:29:26 test_setup dtrace[3081414]: pkill(3081985) bash(3081982) 9 yum(3081878)
Feb 11 10:31:00 test_setup dtrace[3081414]: Rpm db snooper - end
Feb 11 10:31:00 test_setup dtrace[3081414]: -------------------------
Feb 11 10:31:00 test_setup systemd[1]: Stopping Dtrace Program to track kill signals sent to yum/dnf/rpm processes...
Feb 11 10:31:00 test_setup systemd[1]: rpm_db_snooper.service: Succeeded.
Feb 11 10:31:00 test_setup systemd[1]: Stopped Dtrace Program to track kill signals sent to yum/dnf/rpm processes.
From this example, you can see:
systemdstarts the DTrace-based monitoring service.rpm_db_snooperannounces its start and end.- A header line documents the columns for subsequent signal events.
- A
pkillcommand (PID3081985), running under abashshell (PID3081982), sends signal9(SIGKILL) to ayumprocess (PID3081878).
If rpmdb corruption is later detected, these lines show that yum was killed with SIGKILL by the pkill process while accessing the RPM database. You can then investigate why pkill was run (automation, scripts, operator action, etc.) and adjust timeouts or behavior to avoid abrupt termination.
Summary
- Recurring rpmdb corruption is often caused by package manager processes (
rpm,yum,dnf) being killed while they hold the RPM database open. - Standard system logs usually do not record who sent the kill signal.
rpm_db_snooper, part of the Oracle Linux Enhanced Diagnostics tools, uses DTrace to monitor rpmdb activity and capture relevant signals and exits.- Run
rpm_db_snooperbefore the problem occurs, then use its logs to correlate corruption events with the processes or automation responsible for terminating RPM-related workloads.
This information can then be used to adjust automation, monitoring, or operational practices to prevent future rpmdb corruption.