Introduction
At Oracle, we’re always looking for ways to improve our methodology for debugging and resolving issues quickly. We sometimes find ourselves writing tools to do targeted debugging that is beyond the scope of existing tooling, or because the raw data needs to be processed and analyzed in a different manner. Oracle Linux Enhanced Diagnostics (OLED) is a project whose aim is to consolidate those tools in a single repository as a benefit for our team and to make those tools public for the community at large. OLED does not intend to replace other community projects, it’s a home for projects that don’t fit easily under existing projects. We are also heavy users of and contributors to Performance Co-Pilot and drgn [ 2 ]. In this blog, we’ll take a look at what you can find in oled-tools and how they might help you.
oled-tools
All the tools/scripts included in this rpm were developed in-house due to real issues that we had to debug, analyze and resolve for our customers. So the need for these rose organically, for which existing debug methodologies fell short. For instance, we’ve seen issues where a lot of processes were stuck in the uninterruptible (D
) state for a long time, driving up the load average and leading to device timeouts or soft lockups. We wrote kstack
to capture kernel stacktraces of D
state processes. We have seen memory growth bugs where all available memory on the system was being slowly used up over weeks or months until either the OOM-killer was invoked or the system crashed. oled memstate
was written to debug those issues, to keep an eye on what category of memory was growing and how fast. We have used memstate to debug a host of other issues, including memory fragmentation, incorrect hugepage or DB PGA configurations, detect kernel memory leaks, etc. We have a handful of dtrace scripts that were written to debug one specific problem in one specific environment, but they have been included in this rpm because we think those corner cases aren’t so rare – maybe someone else in the Oracle Linux community will find them useful. Each of these tools and scripts is discussed in the following two sections.
The tools are primarily written in python and C, and the scripts are mostly dtrace. We will continue to add other tools and scripts to this toolset in future releases.
oled-tools is licensed under GPLv2.
Overview Of The Tools
oled-tools currently supports the following toolset:
# oled --help usage: oled {-h | -v | COMMAND [ARGS]} Valid commands: kstack -- Gather kernel stack based on the process status or PID lkce -- Linux Kernel Core Extractor memstate -- Capture and analyze memory usage statistics scanfs -- Scan KVM images for corruption, supports XFS and EXT4 scripts -- Run additional oled-tools scripts syswatch -- Execute user-provided commands based on the CPU utilization vmcore_sz -- Estimating vmcore size before kernel dump optional arguments: -h, --help show this help message and exit -v, --version show program's version number and exit NOTE: Must run as root.
Here’s a short summary about each of these tools, invoked as oled <command>
:
LKCE
LKCE stands for Linux Kernel Core Extractor. Core here refers to the vmcore files that are generated when a system crashes. Vmcores are the most critical pieces of data for most issues we debug. Yet they are also the most controversial, owing to their large size (many GBs) and the downtime required to collect them. One solution to this problem is LKCE. You can use it to generate a report from a vmcore, which is useful since uploading/sharing a (much) smaller report extracted from the same vmcore is easier than sharing the vmcore file itself. LKCE can also run user scripts in the kdump kernel (i.e. when the system crashes). In this latter scenario, one can entirely skip generating a vmcore (thus reducing downtime if vmcore generation takes too long) and just get the report from the current crash, using LKCE.
memstate
The memstate tool gathers data about memory usage on the system where it’s run, and analyzes the raw data to compute various statistics, including total memory used by kernel/userspace allocations, per-process memory usage, slab cache usage, process memory usage allocations per NUMA node, etc. If there are any abnormalities detected – for instance, heavy fragmentation, large page tables, NUMA imbalance, etc. – the tool will call those out as warnings.
This tool can be run in the background mode with an interval argument – so that it captures data periodically over a long period of time.
kstack
kstack is used to capture the kernel stack trace of a selected process or group of processes. It captures all needed data from the /proc
filesystem, and parses that data based on arguments given to the tool. A common use case is to debug too many uninterruptible (D state) processes on a system – kstack can capture the stacktraces of those processes to shed a light on what they’re blocked on. We can also collect stacktraces of specific PID(s), if they’re behaving abnormally.
Like memstate, this tool can also be run in the background to capture data periodically, over many samples.
syswatch
syswatch can be used to execute some user-specified commands/scripts if a user-specified CPU utilization threshold is reached. This can be used to debug high %sys CPU spikes by kernel threads, which might happen if there’s lock contention. In such cases, we might want to run some commands (like perf) at the time of the CPU usage spike, which typically last only seconds.
scanfs
scanfs scans KVM image files for any corruptions without requiring VM downtime. It supports XFS and EXT4 filesystems. It takes as an argument the path to the directory containing the image files, and makes a reflink copy of the image files. It then sets up the volume groups and scans the XFS and EXT4 volumes using xfs_repair and e2fsck respectively.
vmcore_sz
This script can estimate the approximate size of a vmcore file before actually capturing it. A vmcore file can be pretty big (a few GBs, sometimes more) and sometimes, it can be useful to get an estimate of how big it can be (at that moment in time) so that the user can decide if downtime needs to be scheduled (for particularly large cores), or verify if the target filesystem has enough space left, etc. The size of a vmcore file depends on the dump level – which decides what pages are included and what are excluded from the dump, hence vmcore_sz takes the desired dump level as an argument.
scripts
The python tools mentioned in the previous section are useful for system-wide debug – for instance, when we need memory usage stats for the entire system, or to collect stacktraces of all processes that are stuck in the uninterruptible (D) state for a while. Sometimes, we need finer targets – a scalpel vs. a knife. For those cases – where the length of the data collection as well as the amount of data collected is quite short, and happens only when specific conditions are met – we’ve developed dtrace and bash scripts.
As mentioned earlier, the need for all these scripts (and tools) arose organically from various real customer issues that the Linux Sustaining team at Oracle has debugged and resolved. Although these scripts were developed to debug a specific customer issue, they might be useful for someone else, to diagnose a similar behavior. Or they might spark an interesting discussion in the community about what other functionalities/features might be useful, for a similar debug script.
For a complete list of all the scripts and how to use them, see Oracle Linux Enhanced Diagnostics (OLED) – scripts.
Using oled-tools
RPM install
You can download and install this rpm from the same public yum/ULN channel as other add-on Oracle Linux software, using yum (OL7) or dnf (OL8, OL9):
$ sudo dnf install oled-tools
On OL8, the ol8_addons
repo needs to be enabled to install oled-tools.
$ dnf whatprovides oled oled-tools-0.7-1.el8.x86_64 : Diagnostic tools for more efficient and faster debugging on Oracle Linux Repo : ol8_addons Matched from: Filename : /usr/sbin/oled
oled-tools is supported on x86_64 and aarch64 platforms.
Local build and install
$ git clone https://github.com/oracle/oled-tools.git $ cd oled-tools $ sudo make install
Note that LKCE requires drgn-tools too, and you’ll need to manually clone and build drgn-tools or install the rpm if you’re doing a local build of oled-tools.
Usage examples
Both the python/C tools as well as the dtrace/bash scripts are invoked using the oled
master command, like so:
# oled memstate KERNEL: 5.4.17-2136.330.7.5.el8uek.x86_64 HOSTNAME: nshqap04adm09.us.oracle.com TIME: 10/18/2024 01:42:52 MEMORY USAGE SUMMARY (in GB): Total memory 1007.0 Free memory 23.9 Used memory 983.1 Userspace 53.7 Processes 2.6 Page cache 51.2 Shared mem 0.0 Kernel 6.8 Slabs 1.8 RDS 0.1 Unknown 2.8 Total Hugepages (2048 KB) 922.6 Free Hugepages (2048 KB) 586.6 Swap used 0.0 NUMA STATISTICS: NUMA is enabled on this system; number of NUMA nodes is 2. Per-node memory usage summary (in KB): NODE 0 NODE 1 MemTotal 527501284 528464316 MemFree 10305468 14724044 FilePages 29379312 24341524 AnonPages 704208 1963068 Slab 968728 873960 Shmem 11628 8404 Total Hugepages (2048 KB) 483721216 483721216 Free Hugepages (2048 KB) 307560448 307560448 [WARN] AnonPages is imbalanced across NUMA nodes. TOP 10 SLAB CACHES (in KB): SLAB CACHE SIZE (KB) ALIASES proc_inode_cache 180416 (null) radix_tree_node 154560 (null) dentry 149016 (null) inode_cache 140736 (null) kmalloc-8k 118976 (null) xfs_inode 118112 (null) kmalloc-512 67296 (null) kmalloc-2k 57184 (null) kmalloc-1k 55296 (null) kmalloc-96 35120 (null) >> Total memory used by all slab caches: 1.5 GB TOP 10 MEMORY CONSUMERS (in KB): PROCESS(PID) RSS qemu-kvm(72182) 1558904 java(109988) 867368 perl(137318) 89736 libvirtd(227360) 59136 systemd-journal(2263) 42752 dbrsMain(109719) 37552 bash(381704) 30104 bash(11006) 28936 bash(381705) 27868 dbrsBackup(109752) 27264 TOP 10 SWAP SPACE CONSUMERS: No swap usage found. HEALTH CHECKS: [OK] The value of vm.min_free_kbytes is: 5279836 KB. [OK] The value of vm.watermark_scale_factor is: 10. [OK] Page tables size is: 0.0 GB. [OK] RDS receive cache size is: 0.1 GB. [OK] Unaccounted kernel memory is: 2.8 GB. Buddyinfo: (Low orders are 0-3, high orders are 4-10). Node 0, zone Normal 1100 1666 1336 705 412 187 110 72 49 73 2138 Total: 9133920 KB; Low: 61664 KB (0.68%); High: 9072256 KB (99.32%) Node 1, zone Normal 961 836 782 481 248 110 47 32 31 27 3550 Total: 14724644 KB; Low: 38436 KB (0.26%); High: 14686208 KB (99.74%) Vmstat: allocstall_normal 0 zone_reclaim_failed 0 kswapd_low_wmark_hit_quickly 0 kswapd_high_wmark_hit_quickly 0 drop_pagecache 0 drop_slab 0 oom_kill 0 compact_migrate_scanned 0 compact_free_scanned 0 compact_isolated 0 compact_stall 0 compact_fail 0 compact_success 0 compact_daemon_wake 0 compact_daemon_migrate_scanned 0 compact_daemon_free_scanned 0 # oled scripts run rds_tx_funccount.d 2024-10-18 01:49:54.697 INFO - Running script '/usr/libexec/oled-tools/scripts.d/rds_tx_funccount.d '... 2024 Oct 18 01:50:00: rate of calls for sendmsg, send_xmit, ib_xmit and send_cqe_handler. ctrl+c to stop --- 2024 Oct 18 01:50:10 --- [<192.168.8.7,192.168.8.8,0> 01send_msg] 9 [<192.168.8.8,192.168.8.7,0> 01send_msg] 9 [<192.168.8.8,192.168.8.7,0> 04send_cqe_handler] 17 [<192.168.8.7,192.168.8.8,0> 03ib_xmits] 18 [<192.168.8.7,192.168.8.8,0> 04send_cqe_handler] 18 [<192.168.8.8,192.168.8.7,0> 03ib_xmits] 18 [<192.168.8.8,192.168.8.7,0> 02send_xmits] 34 [<192.168.8.7,192.168.8.8,0> 02send_xmits] 36 ^C
To learn more about each tool – including what data they collect, what options they support, the default options, if any, etc., simply run the command with --help
:
# oled lkce --help
or check the man page:
# man oled-lkce
To learn more about each script, please refer to the corresponding <scriptname>_example.txt
file that resides under the doc
directory where the script lives. It contains a short description of the script, how to run it, as well as some sample output.
Future Work
We’re actively working on adding more tools and scripts into oled-tools. We’re also continuing to add helpers to the drgn-tools repo. If you’re running Oracle Linux and have a use case for a specific debug script/tool/feature that’s currently missing in Linux, broadly speaking, or – more specifically – pcp, drgn-tools and oled-tools, we’d like to hear about it. Please open an SR to Linux Support if you run into any issues with these tools.
References
- https://blogs.oracle.com/linux/post/performance-analysis-using-pcp
- https://oracle-samples.github.io/drgn-tools/
- https://pcp.readthedocs.io/en/latest/
- https://drgn.readthedocs.io/en/latest/
- https://blogs.oracle.com/linux/post/better-diagnostics-with-performance-co-pilot
- https://blogs.oracle.com/linux/post/new-tools-for-performance-copilot
- https://github.com/oracle-samples/drgn-tools/