Introducing Oracle Linux Enhanced Diagnostics

Introduction

At Oracle, we’re always looking for ways to improve our methodology for debugging and resolving issues quickly. We sometimes find ourselves writing tools to do targeted debugging that is beyond the scope of existing tooling, or because the raw data needs to be processed and analyzed in a different manner. Oracle Linux Enhanced Diagnostics (OLED) is a project whose aim is to consolidate those tools in a single repository as a benefit for our team and to make those tools public for the community at large. OLED does not intend to replace other community projects, it’s a home for projects that don’t fit easily under existing projects. We are also heavy users of and contributors to Performance Co-Pilot and drgn [ 2 ]. In this blog, we’ll take a look at what you can find in oled-tools and how they might help you.

oled-tools

All the tools/scripts included in this rpm were developed in-house due to real issues that we had to debug, analyze and resolve for our customers. So the need for these rose organically, for which existing debug methodologies fell short. For instance, we’ve seen issues where a lot of processes were stuck in the uninterruptible (D) state for a long time, driving up the load average and leading to device timeouts or soft lockups. We wrote kstack to capture kernel stacktraces of D state processes. We have seen memory growth bugs where all available memory on the system was being slowly used up over weeks or months until either the OOM-killer was invoked or the system crashed. oled memstate was written to debug those issues, to keep an eye on what category of memory was growing and how fast. We have used memstate to debug a host of other issues, including memory fragmentation, incorrect hugepage or DB PGA configurations, detect kernel memory leaks, etc. We have a handful of dtrace scripts that were written to debug one specific problem in one specific environment, but they have been included in this rpm because we think those corner cases aren’t so rare – maybe someone else in the Oracle Linux community will find them useful. Each of these tools and scripts is discussed in the following two sections.

The tools are primarily written in python and C, and the scripts are mostly dtrace. We will continue to add other tools and scripts to this toolset in future releases.

oled-tools is licensed under GPLv2.

Overview Of The Tools

oled-tools currently supports the following toolset:

# oled --help
usage: oled {-h | -v | COMMAND [ARGS]}

Valid commands:
     kstack      -- Gather kernel stack based on the process status or PID
     lkce        -- Linux Kernel Core Extractor
     memstate    -- Capture and analyze memory usage statistics
     scanfs      -- Scan KVM images for corruption, supports XFS and EXT4
     scripts     -- Run additional oled-tools scripts
     syswatch    -- Execute user-provided commands based on the CPU utilization
     vmcore_sz   -- Estimating vmcore size before kernel dump

optional arguments:
  -h, --help     show this help message and exit
  -v, --version  show program's version number and exit

NOTE: Must run as root.

Here’s a short summary about each of these tools, invoked as oled <command>:

LKCE

LKCE stands for Linux Kernel Core Extractor. Core here refers to the vmcore files that are generated when a system crashes. Vmcores are the most critical pieces of data for most issues we debug. Yet they are also the most controversial, owing to their large size (many GBs) and the downtime required to collect them. One solution to this problem is LKCE. You can use it to generate a report from a vmcore, which is useful since uploading/sharing a (much) smaller report extracted from the same vmcore is easier than sharing the vmcore file itself. LKCE can also run user scripts in the kdump kernel (i.e. when the system crashes). In this latter scenario, one can entirely skip generating a vmcore (thus reducing downtime if vmcore generation takes too long) and just get the report from the current crash, using LKCE.

memstate

The memstate tool gathers data about memory usage on the system where it’s run, and analyzes the raw data to compute various statistics, including total memory used by kernel/userspace allocations, per-process memory usage, slab cache usage, process memory usage allocations per NUMA node, etc. If there are any abnormalities detected – for instance, heavy fragmentation, large page tables, NUMA imbalance, etc. – the tool will call those out as warnings.

This tool can be run in the background mode with an interval argument – so that it captures data periodically over a long period of time.

kstack

kstack is used to capture the kernel stack trace of a selected process or group of processes. It captures all needed data from the /proc filesystem, and parses that data based on arguments given to the tool. A common use case is to debug too many uninterruptible (D state) processes on a system – kstack can capture the stacktraces of those processes to shed a light on what they’re blocked on. We can also collect stacktraces of specific PID(s), if they’re behaving abnormally.

Like memstate, this tool can also be run in the background to capture data periodically, over many samples.

syswatch

syswatch can be used to execute some user-specified commands/scripts if a user-specified CPU utilization threshold is reached. This can be used to debug high %sys CPU spikes by kernel threads, which might happen if there’s lock contention. In such cases, we might want to run some commands (like perf) at the time of the CPU usage spike, which typically last only seconds.

scanfs

scanfs scans KVM image files for any corruptions without requiring VM downtime. It supports XFS and EXT4 filesystems. It takes as an argument the path to the directory containing the image files, and makes a reflink copy of the image files. It then sets up the volume groups and scans the XFS and EXT4 volumes using xfs_repair and e2fsck respectively.

vmcore_sz

This script can estimate the approximate size of a vmcore file before actually capturing it. A vmcore file can be pretty big (a few GBs, sometimes more) and sometimes, it can be useful to get an estimate of how big it can be (at that moment in time) so that the user can decide if downtime needs to be scheduled (for particularly large cores), or verify if the target filesystem has enough space left, etc. The size of a vmcore file depends on the dump level – which decides what pages are included and what are excluded from the dump, hence vmcore_sz takes the desired dump level as an argument.

scripts

The python tools mentioned in the previous section are useful for system-wide debug – for instance, when we need memory usage stats for the entire system, or to collect stacktraces of all processes that are stuck in the uninterruptible (D) state for a while. Sometimes, we need finer targets – a scalpel vs. a knife. For those cases – where the length of the data collection as well as the amount of data collected is quite short, and happens only when specific conditions are met – we’ve developed dtrace and bash scripts.

As mentioned earlier, the need for all these scripts (and tools) arose organically from various real customer issues that the Linux Sustaining team at Oracle has debugged and resolved. Although these scripts were developed to debug a specific customer issue, they might be useful for someone else, to diagnose a similar behavior. Or they might spark an interesting discussion in the community about what other functionalities/features might be useful, for a similar debug script.

For a complete list of all the scripts and how to use them, see Oracle Linux Enhanced Diagnostics (OLED) – scripts.

Using oled-tools

RPM install

You can download and install this rpm from the same public yum/ULN channel as other add-on Oracle Linux software, using yum (OL7) or dnf (OL8, OL9):

$ sudo dnf install oled-tools

On OL8, the ol8_addons repo needs to be enabled to install oled-tools.

$ dnf whatprovides oled
oled-tools-0.7-1.el8.x86_64 : Diagnostic tools for more efficient and faster debugging on Oracle Linux
Repo        : ol8_addons
Matched from:
Filename    : /usr/sbin/oled

oled-tools is supported on x86_64 and aarch64 platforms.

Local build and install

$ git clone https://github.com/oracle/oled-tools.git
$ cd oled-tools
$ sudo make install

Note that LKCE requires drgn-tools too, and you’ll need to manually clone and build drgn-tools or install the rpm if you’re doing a local build of oled-tools.

Usage examples

Both the python/C tools as well as the dtrace/bash scripts are invoked using the oled master command, like so:

# oled memstate
    KERNEL: 5.4.17-2136.330.7.5.el8uek.x86_64
  HOSTNAME: nshqap04adm09.us.oracle.com
      TIME: 10/18/2024 01:42:52

MEMORY USAGE SUMMARY (in GB):
Total memory                            1007.0
Free memory                               23.9
Used memory                              983.1
  Userspace                               53.7
    Processes                              2.6
    Page cache                            51.2
    Shared mem                             0.0
  Kernel                                   6.8
    Slabs                                  1.8
    RDS                                    0.1
    Unknown                                2.8
  Total Hugepages (2048 KB)              922.6
  Free Hugepages (2048 KB)               586.6
Swap used                                  0.0

NUMA STATISTICS:
NUMA is enabled on this system; number of NUMA nodes is 2.
Per-node memory usage summary (in KB):
                                       NODE 0         NODE 1
MemTotal                            527501284      528464316
MemFree                              10305468       14724044
FilePages                            29379312       24341524
AnonPages                              704208        1963068
Slab                                   968728         873960
Shmem                                   11628           8404
Total Hugepages (2048 KB)           483721216      483721216
Free Hugepages (2048 KB)            307560448      307560448
[WARN]  AnonPages is imbalanced across NUMA nodes.

TOP 10 SLAB CACHES (in KB):
SLAB CACHE                           SIZE (KB)            ALIASES
proc_inode_cache                        180416            (null)
radix_tree_node                         154560            (null)
dentry                                  149016            (null)
inode_cache                             140736            (null)
kmalloc-8k                              118976            (null)
xfs_inode                               118112            (null)
kmalloc-512                              67296            (null)
kmalloc-2k                               57184            (null)
kmalloc-1k                               55296            (null)
kmalloc-96                               35120            (null)

>> Total memory used by all slab caches: 1.5 GB

TOP 10 MEMORY CONSUMERS (in KB):
PROCESS(PID)                               RSS
qemu-kvm(72182)                        1558904
java(109988)                            867368
perl(137318)                             89736
libvirtd(227360)                         59136
systemd-journal(2263)                    42752
dbrsMain(109719)                         37552
bash(381704)                             30104
bash(11006)                              28936
bash(381705)                             27868
dbrsBackup(109752)                       27264

TOP 10 SWAP SPACE CONSUMERS:
No swap usage found.

HEALTH CHECKS:
[OK]    The value of vm.min_free_kbytes is: 5279836 KB.

[OK]    The value of vm.watermark_scale_factor is: 10.

[OK]    Page tables size is: 0.0 GB.

[OK]    RDS receive cache size is: 0.1 GB.

[OK]    Unaccounted kernel memory is: 2.8 GB.

Buddyinfo:
  (Low orders are 0-3, high orders are 4-10).
  Node 0, zone   Normal   1100   1666   1336    705    412    187    110     72     49     73   2138
  Total: 9133920 KB;            Low: 61664 KB (0.68%);          High: 9072256 KB (99.32%)
  Node 1, zone   Normal    961    836    782    481    248    110     47     32     31     27   3550
  Total: 14724644 KB;           Low: 38436 KB (0.26%);          High: 14686208 KB (99.74%)

Vmstat:
  allocstall_normal 0
  zone_reclaim_failed 0
  kswapd_low_wmark_hit_quickly 0
  kswapd_high_wmark_hit_quickly 0
  drop_pagecache 0
  drop_slab 0
  oom_kill 0
  compact_migrate_scanned 0
  compact_free_scanned 0
  compact_isolated 0
  compact_stall 0
  compact_fail 0
  compact_success 0
  compact_daemon_wake 0
  compact_daemon_migrate_scanned 0
  compact_daemon_free_scanned 0

# oled scripts run rds_tx_funccount.d
2024-10-18 01:49:54.697 INFO - Running script '/usr/libexec/oled-tools/scripts.d/rds_tx_funccount.d '...
2024 Oct 18 01:50:00: rate of calls for sendmsg, send_xmit, ib_xmit and send_cqe_handler. ctrl+c to stop
--- 2024 Oct 18 01:50:10 ---
[<192.168.8.7,192.168.8.8,0> 01send_msg] 9
[<192.168.8.8,192.168.8.7,0> 01send_msg] 9
[<192.168.8.8,192.168.8.7,0> 04send_cqe_handler] 17
[<192.168.8.7,192.168.8.8,0> 03ib_xmits] 18
[<192.168.8.7,192.168.8.8,0> 04send_cqe_handler] 18
[<192.168.8.8,192.168.8.7,0> 03ib_xmits] 18
[<192.168.8.8,192.168.8.7,0> 02send_xmits] 34
[<192.168.8.7,192.168.8.8,0> 02send_xmits] 36
^C

To learn more about each tool – including what data they collect, what options they support, the default options, if any, etc., simply run the command with --help:

# oled lkce --help

or check the man page:

# man oled-lkce

To learn more about each script, please refer to the corresponding <scriptname>_example.txt file that resides under the doc directory where the script lives. It contains a short description of the script, how to run it, as well as some sample output.

Future Work

We’re actively working on adding more tools and scripts into oled-tools. We’re also continuing to add helpers to the drgn-tools repo. If you’re running Oracle Linux and have a use case for a specific debug script/tool/feature that’s currently missing in Linux, broadly speaking, or – more specifically – pcp, drgn-tools and oled-tools, we’d like to hear about it. Please open an SR to Linux Support if you run into any issues with these tools.

Introducing Oracle Linux Enhanced Diagnostics

Introduction

oled-tools

Overview Of The Tools

LKCE

memstate

kstack

syswatch

scanfs

vmcore_sz

scripts

Using oled-tools

RPM install

Local build and install

Usage examples

Future Work

References

Aruna Ramakrishna

Oracle Linux: 2024 year in review

Webinar: Discover a cost-effective, high-performance virtualization alternative

Introducing Oracle Linux Enhanced Diagnostics

Introduction

oled-tools

Overview Of The Tools

LKCE

memstate

kstack

syswatch

scanfs

vmcore_sz

scripts

Using oled-tools

RPM install

Local build and install

Usage examples

Future Work

References

Authors

Aruna Ramakrishna

Oracle Linux: 2024 year in review

Webinar: Discover a cost-effective, high-performance virtualization alternative