Performance Co-pilot (PCP) is a framework for system performance analysis. It enables users to collect system performance metrics on an ongoing basis and use logging infrastructure (pmlogger
) to archive collected data. PCP on Oracle Linux by default logs and retains a weeks worth of data, allowing users to review change in system behavior. PCP collects metrics in raw format, and provides tools to review and analyze these metrics in human readable format. This blog provides an introduction on how to use some PCP commands, and extract information similar to other standard Linux tools. It also introduces PMIE (Performance Metric Inference Engine) and provides some examples on how to do performance analysis.
Performance Reporting Utilities in PCP
PCP provides various tools and commands similar to standard linux tools such as mpstat
, vmstat
, iostat
, etc., which are used to monitor and review system performance. These PCP commands operate on live systems as well as data archived by pmlogger
which allows users to review historical system behavior. Some of the important PCP tools that can be used for analyzing system performance are described below.
pcp mpstat
pcp mpstat
reports CPU and interrupt related statistics, for each available processor and also the average for all processors.
$ pcp mpstat -P ALL Timestamp CPU %usr %nice %sys %iowait %irq %soft %steal %guest %nice %idle 07:56:24 all 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 99.72 07:56:24 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 99.72 07:56:24 1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 99.72 Timestamp CPU %usr %nice %sys %iowait %irq %soft %steal %guest %nice %idle 07:56:25 all 0.5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 99.44 07:56:25 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 99.44 07:56:25 1 0.99 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 98.44
pcp iostat
pcp iostat
reports I/O statistics for scsi devices (by default) or device-mapper devices (if the -x dm option is specified). As with other PCP tools, by default pcp iostat
reports live data for the local host but can also report for a previously captured PCP archive (-a option).
$ pcp iostat -x t # Timestamp Device rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await %util Mon Sep 5 13:50:39 2022 sda 0.00 0.00 0.00 1.00 0.00 23.96 24.000 0.002 2.00 0.00 2.00 0.00 Mon Sep 5 13:50:40 2022 sda 0.00 1.99 0.00 24.93 0.00 199.47 8.000 0.024 0.96 0.00 0.96 0.20 Mon Sep 5 13:50:41 2022 sda 0.00 0.00 0.00 10.97 0.00 171.60 15.636 0.011 1.00 0.00 1.00 0.20 Mon Sep 5 13:50:42 2022 sda 0.00 0.00 0.00 0.00 0.00 0.00 0.000 0.000 0.00 0.00 0.00 0.00 Mon Sep 5 13:50:43 2022 sda 0.00 0.00 0.00 0.00 0.00 0.00 0.000 0.000 0.00 0.00 0.00 0.00 .... $ pcp iostat -x dm -x t # Timestamp Device rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await %util Mon Sep 5 06:52:20 2022 vg0-lvol_root 0.00 0.00 0.00 0.00 0.00 0.00 0.000 0.000 0.00 0.00 0.00 0.00 Mon Sep 5 06:52:20 2022 vg0-lvol_swap 0.00 0.00 0.00 0.00 0.00 0.00 0.000 0.000 0.00 0.00 0.00 0.00 Mon Sep 5 06:52:21 2022 vg0-lvol_root 0.00 0.00 0.00 0.00 0.00 0.00 0.000 0.000 0.00 0.00 0.00 0.00 Mon Sep 5 06:52:21 2022 vg0-lvol_swap 0.00 0.00 0.00 0.00 0.00 0.00 0.000 0.000 0.00 0.00 0.00 0.00 Mon Sep 5 06:52:22 2022 vg0-lvol_root 0.00 0.00 0.00 0.00 0.00 0.00 0.000 0.000 0.00 0.00 0.00 0.00 Mon Sep 5 06:52:22 2022 vg0-lvol_swap 0.00 0.00 0.00 0.00 0.00 0.00 0.000 0.000 0.00 0.00 0.00 0.00 Mon Sep 5 06:52:23 2022 vg0-lvol_root 0.00 0.00 0.00 0.00 0.00 0.00 0.000 0.000 0.00 0.00 0.00 0.00 ....
pcp vmstat
pcp vmstat
reports virtual memory statistics. It gives high level information about processes, memory, paging, block IO, and CPU activity.
$ pcp vmstat @ Mon Aug 29 10:35:36 2022 loadavg memory swap io system cpu 1 min swpd free buff cache pi po bi bo in cs us sy id wa st 0.01 582104 5372m 1292 1557m 0 0 0 0 63 98 0 0 100 0 0 0.01 582104 5372m 1292 1557m 0 0 0 4 324 364 4 3 93 0 0 0.01 582104 5372m 1292 1557m 0 0 0 0 59 96 0 0 100 0 0 0.01 582104 5372m 1292 1557m 0 0 0 20 65 98 0 0 100 0 0
pcp numastat
pcp numastat
shows per numa-node memory statistics for the system.
$ pcp numastat node0 numa_hit 5851342506 numa_miss 0 numa_foreign 0 interleave_hit 28233 local_node 5851342506 other_node 0
pcp pidstat
pcp pidstat
shows information about current tasks being managed by the kernel.
$ pcp pidstat -rl | more Timestamp UID PID MinFlt/s MajFlt/s VSize RSS %Mem Command 05:23:15 0 1 0.0 0.0 217564 9188 0.12 /usr/lib/systemd/systemd --switched-root --system --deserialize 21 05:23:15 0 1120 0.0 0.0 72420 34132 0.43 /usr/lib/systemd/systemd-journald 05:23:15 0 1141 0.0 0.0 119220 2964 0.04 /usr/sbin/lvmetad -f 05:23:15 0 1169 0.0 0.0 72184 3780 0.05 /usr/lib/systemd/systemd-udevd 05:23:15 0 1180 0.0 0.0 324928 7696 0.1 /sbin/multipathd 05:23:15 0 1528 0.0 0.0 55544 1848 0.02 /sbin/auditd 05:23:15 0 1532 0.0 0.0 84568 1408 0.02 /sbin/audispd 05:23:15 0 1536 0.0 0.0 55636 3172 0.04 /usr/sbin/sedispatch 05:23:15 0 1585 0.0 0.0 27212 3400 0.04 /usr/lib/systemd/systemd-logind 05:23:15 0 1587 0.0 0.0 52804 3540 0.04 /usr/sbin/smartd -n -q never 05:23:15 999 1589 0.0 0.0 725344 14264 0.18 /usr/lib/polkit-1/polkitd --no-debug 05:23:15 172 1590 0.0 0.0 198796 2760 0.03 /usr/libexec/rtkit-daemon 05:23:15 0 1592 0.0 0.0 537668 7200 0.09 /usr/libexec/udisks2/udisksd 05:23:15 0 1593 0.0 0.0 400740 8420 0.11 /usr/libexec/accounts-daemon 05:23:15 32 1601 0.0 0.0 69268 2436 0.03 /sbin/rpcbind -w 05:23:15 70 1610 0.0 0.0 62148 4 0.0 avahi-daemon: chroot helper 05:23:15 0 1611 0.0 0.0 21548 2376 0.03 /usr/sbin/irqbalance --foreground
pcp atop
pcp atop
is an advanced system usage and process monitor.
ATOP - host-xxxx 2022/09/01 10:26:57 ----------------- 10s elapsed PRC | sys 0.03s | user 0.02s | #proc 165 | #trun 1 | #tslpi 83 | #tslpu 0 | #zombie 0 | clones 104 | | #exit 0 | CPU | sys 1% | user 1% | irq 0% | idle 197% | wait 0% | guest 0% | ipc notavail | cycl unknown | curf 2.54GHz | curscal ?% | cpu | sys 1% | user 1% | irq 0% | idle 99% | cpu000 w 0% | guest 0% | ipc notavail | cycl unknown | curf 2.54GHz | curscal ?% | cpu | sys 1% | user 1% | irq 0% | idle 99% | cpu001 w 0% | guest 0% | ipc notavail | cycl unknown | curf 2.54GHz | curscal ?% | CPL | avg1 0.00 | avg5 0.00 | avg15 0.00 | | | csw 2353 | intr 1325 | | | numcpu 2 | MEM | tot 15.1G | free 8.7G | cache 5.2G | dirty 0.1M | buff 2.2M | slab 544.7M | shrss 0.0M | vmbal 0.0M | zfarc 0.0M | hptot 0.0M | SWP | tot 4.0G | free 4.0G | swcac 0.0M | zpool 0.0M | zstor 0.0M | ksuse 0.0M | kssav 0.0M | | vmcom 1.3G | vmlim 11.5G | LVM | ivolume-root | busy 0% | read 0 | write 8 | KiB/r 0 | KiB/w 11 | MBr/s 0.0 | MBw/s 0.0 | avq 0.00 | avio 0.0 ns | DSK | sda | busy 0% | read 0 | write 6 | KiB/r 0 | KiB/w 15 | MBr/s 0.0 | MBw/s 0.0 | avq 1.25 | avio 0.67 ms | NET | transport | tcpi 5 | tcpo 5 | udpi 0 | udpo 0 | tcpao 2 | tcppo 0 | tcprs 0 | tcpie 0 | udpie 0 | NET | network | ipi 5 | ipo 5 | ipfrw 0 | deliv 5 | | | | icmpi 0 | icmpo 0 | NET | ens3 0% | pcki 2 | pcko 2 | sp 4294967 Gbps | si 0 Kbps | so 1 Kbps | erri 0 | erro 0 | drpi 0 | drpo NET | lo ---- | pcki 4 | pcko 4 | sp 0 Mbps | si 0 Kbps | so 0 Kbps | erri 0 | erro 0 | drpi 0 | drpo 0 | PID SYSCPU USRCPU RDELAY VGROW RGROW RDDSK WRDSK RUID EUID ST EXC THR S CPUNR CPU CMD 1/1 1476 0.00s 0.01s 0.00s 0K 0K 0K 0K root root -- - 1 S 0 0% sssd_nss 8520 0.01s 0.00s 0.00s 0K 0K 0K 0K opc opc -- - 1 R 1 0% pmdaproc 3148 0.01s 0.00s 0.00s 0K 0K 0K 0K pcp pcp -- - 1 S 0 0% pmie 1991 0.00s 0.01s 0.00s 0K 0K 0K 0K root root -- - 1 S 0 0% OSWatcher 2904276 0.01s 0.00s 0.00s 0K 0K 0K 0K root root -- - 1 I 0 0% kworker/0:2-ev 1957798 0.00s 0.00s 0.00s 0K 0K 0K 0K oracle-c oracle-c -- - 10 S 1 0% gomon 8524 0.00s 0.00s 0.00s 0K 0K 0K 0K root root -- - 1 S 0 0% python3 2909287 0.00s 0.00s 0.00s 0K 0K 0K 0K opc opc -- - 1 S 0 0% pcp-atop 8523 0.00s 0.00s 0.00s 0K 0K 0K 0K root root -- - 1 S 0 0% pmdalinux 8515 0.00s 0.00s 0.00s 0K 0K 0K 0K pcp pcp -- - 1 S 0 0% pmcd 1402 0.00s 0.00s 0.00s 0K 0K 0K 0K rpc rpc -- - 1 S 1 0% rpcbind 1561 0.00s 0.00s 0.00s 0K 0K 0K 0K root root -- - 6 S 1 0% gssproxy 2910154 0.00s 0.00s 0.00s 9400K 936K 0K 0K root root N- - 1 S 0 0% sleep 156 0.00s 0.00s 0.00s 0K 0K 0K 0K root root -- - 1 I 0 0% kworker/0:1H-k 925 0.00s 0.00s 0.00s 0K 0K 0K 0K root root -- - 1 S 1 0% xfsaild/dm-0 2908817 0.00s 0.00s 0.00s 0K 0K 0K 0K root root -- - 1 I 1 0% kworker/1:2-mm 2910098 0.00s 0.00s 0.00s 0K 0K 0K 0K root root -- - 1 I 0 0% kworker/u4:0-e
Example : CPU utilization analysis using pcp mpstat
As an example, on a Linux VM running on Oracle Cloud Infrastructure (OCI), it was noticed that remote SSH logins seemed to be slow one day. As PCP was already running on this VM, metric archives were available for analysis. We can check the time period captured in a archive using the pmdumplog -L
command:
$ pmdumplog -L 20220223.00.14.0.xz Log Label (Log Format Version 2) Performance metrics from host sushmbha-dev commencing Wed Feb 23 00:14:26.918665 2022 ending Wed Feb 23 06:22:36.945329 2022 Archive timezone: GMT PID for pmlogger: 15939
This helps find out if the archive captures the issue timeframe. In this case, it did. Next we would want to look at the CPU utilization data, to find out if any CPU was extra-ordinarily busy. pcp mpstat
should help us with that:
$ pcp --archive 20220223.00.14.0.xz --start '@Wed Feb 23 00:14:26' --finish '@Wed Feb 23 00:45' mpstat -P ALL > mpstat.out Sample excerpt from mpstat.out :- Timestamp CPU %usr %nice %sys %iowait %irq %soft %steal %guest %nice %idle 00:44:56 all 0.45 0.0 50.15 0.0 0.0 0.0 0.2 0.0 0.0 49.05 00:44:56 0 0.9 0.0 0.3 0.0 0.0 0.1 0.4 0.0 0.0 98.1 00:44:56 1 0.0 0.0 100.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 Timestamp CPU %usr %nice %sys %iowait %irq %soft %steal %guest %nice %idle 00:44:57 all 2.9 0.0 51.8 0.0 0.0 0.45 0.2 0.0 0.0 43.85 00:44:57 0 4.9 0.0 10.4 0.0 0.0 0.0 0.4 0.0 0.0 83.9 00:44:57 1 0.2 0.0 93.4 0.0 0.0 0.9 0.0 0.0 0.0 6.5 Timestamp CPU %usr %nice %sys %iowait %irq %soft %steal %guest %nice %idle 00:44:58 all 2.55 0.0 51.95 0.0 0.0 0.45 0.2 0.0 0.0 45.2 00:44:58 0 5.0 0.0 10.7 0.0 0.0 0.0 0.4 0.0 0.0 83.5 00:44:58 1 0.2 0.0 93.2 0.0 0.0 0.9 0.0 0.0 0.0 6.7
From the mpstat data, we can see that CPU 1 on the VM had been unusually busy in this time period, upto 100%, and that has caused the overall CPU utilization to be quite high too.
We can run the pcp atop
command, to see what were the top processes running at that time on the system:
$ pcp --archive 20220223.00.14.0.xz --start '@Wed Feb 23 00:14:26' --finish '@Wed Feb 23 00:45' atop > atop.out
Excerpt from atop.out:
ATOP - host1 2022/02/23 00:15:16 ----------------- 10s elapsed PRC | sys 10.06s | user 0.19s | #proc 223 | #zombie 0 | no procacct | CPU | sys 102% | user 5% | irq 1% | idle 91% | wait 0% | cpu | sys 91% | user 0% | irq 1% | idle 7% | cpu001 w 0% | cpu | sys 10% | user 4% | irq 0% | idle 85% | cpu000 w 0% | CPL | avg1 1.92 | avg5 1.90 | avg15 1.52 | csw 6974 | intr 18725 | MEM | tot 7.5G | free 829.6M | cache 5.7G | buff 0.0M | slab 576.3M | SWP | tot 8.0G | free 7.4G | swcac 33.8M | vmcom 3.8G | vmlim 11.8G | DSK | sda | busy 0% | read 0 | write 80 | avio 0.12 ms | NET | transport | tcpi 5 | tcpo 4 | udpi 0 | udpo 0 | NET | network | ipi 5 | ipo 5 | ipfrw 0 | deliv 5 | NET | ens3 ---- | pcki 8 | pcko 8 | si 0 Kbps | so 0 Kbps | PID SYSCPU USRCPU RDELAY VGROW RGROW RDDSK WRDSK CPUNR CPU CMD 826 9.98s 0.00s 0.00s 0K 0K 0K 0K 1 99% cat 4444 0.00s 0.08s 0.00s 0K 0K 0K 0K 0 1% gsd-color 2586 0.01s 0.05s 0.00s 0K 0K 0K 37K 0 1% OSWatcher 3315 0.00s 0.01s 0.00s 0K 0K 0K 0K 0 0% gnome-shell <snipped>
This highlights that the cat
command was running on CPU 1, and was consuming 99% of the CPU at that time. The PID of the cat
command was 826. So let’s find out what the full command for PID 826 was:
$ pminfo -f -a 20220223.00.14.0.xz proc.psinfo.psargs | grep 826 inst [826 or "000826 cat"] value "cat /proc/highsys"
We can now see that the command cat /proc/highsys
was most likely keeping one CPU fully busy. Its code should be reviewed separately to find out why it was taking so many CPU cycles.
Example : Disk utilization analysis using pcp iostat
We can generate a fake IO workload for a device, using the dd
command:
$ date Thu Sep 1 13:08:12 GMT 2022 $ time for i in {1..3}; do sudo dd of=/tmp/outfile bs=1G count=1 if=/dev/zero 2>&1; done
Examining the PCP archives in that time period, we see an io (write load) spike for sda, as expected, because / is on sda :
$ pcp --start "@Thu Sep 1 13:08:03 2022" --finish "@Thu Sep 1 13:50:00 2022" iostat -a 20220901.00.10.2 -x t -z # Timestamp Device rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await %util Thu Sep 1 13:08:04 2022 sda 0.00 0.00 0.00 1.00 0.00 7.00 7.000 0.001 1.00 0.00 1.00 0.00 Thu Sep 1 13:08:05 2022 sda 0.00 0.00 0.00 0.00 0.00 7.00 0.000 0.000 0.00 0.00 0.00 0.10 Thu Sep 1 13:08:06 2022 sda 0.00 1.00 0.00 1.00 0.00 8.00 8.000 0.001 1.00 0.00 1.00 0.00 Thu Sep 1 13:08:07 2022 sda 0.00 0.00 0.00 1.00 0.00 7.00 7.000 0.000 0.00 0.00 0.00 0.10 Thu Sep 1 13:08:08 2022 sda 0.00 0.00 0.00 0.00 0.00 7.00 0.000 0.001 0.00 0.00 0.00 0.10 Thu Sep 1 13:08:09 2022 sda 0.00 0.00 0.00 1.00 0.00 7.00 7.000 0.000 0.00 0.00 0.00 0.00 Thu Sep 1 13:08:10 2022 sda 0.00 0.00 0.00 1.00 0.00 7.00 7.000 0.001 1.00 0.00 1.00 0.10 Thu Sep 1 13:08:11 2022 sda 0.00 0.00 0.00 1.00 0.00 27.00 27.000 0.001 1.00 0.00 1.00 0.00 Thu Sep 1 13:08:12 2022 sda 0.00 1.00 0.00 2.00 0.00 31.00 15.500 0.001 0.50 0.00 0.50 0.10 Thu Sep 1 13:08:13 2022 sda 0.00 1.00 0.00 1.00 0.00 32.00 32.000 0.001 1.00 0.00 1.00 0.00 Thu Sep 1 13:08:14 2022 sda 0.00 0.00 0.00 2.00 0.00 31.00 15.500 0.001 0.50 0.00 0.50 0.10 Thu Sep 1 13:08:15 2022 sda 0.00 1.00 0.00 1.00 0.00 32.00 32.000 0.001 1.00 0.00 1.00 0.00 Thu Sep 1 13:08:16 2022 sda 0.00 0.00 0.00 2.00 0.00 32.00 16.000 0.001 0.50 0.00 0.50 0.10 Thu Sep 1 13:08:17 2022 sda 0.00 1.00 0.00 1.00 0.00 31.00 31.000 0.001 1.00 0.00 1.00 0.00 Thu Sep 1 13:08:18 2022 sda 0.00 1.00 0.00 2.00 0.00 32.00 16.000 0.002 1.00 0.00 1.00 0.10 Thu Sep 1 13:08:19 2022 sda 0.00 0.00 0.00 1.00 0.00 31.00 31.000 0.001 1.00 0.00 1.00 0.00 Thu Sep 1 13:08:20 2022 sda 0.00 1.00 0.00 2.00 0.00 32.00 16.000 0.001 0.50 0.00 0.50 0.10 Thu Sep 1 13:08:21 2022 sda 0.00 0.00 0.00 28.00 0.00 8884.00 317.286 2.556 91.29 0.00 91.29 7.10 Thu Sep 1 13:08:22 2022 sda 0.00 0.00 0.00 36.00 0.00 11391.00 316.417 3.262 90.61 0.00 90.61 9.00 Thu Sep 1 13:08:23 2022 sda 0.00 0.00 0.00 36.00 0.00 11391.00 316.417 3.262 90.61 0.00 90.61 9.10 Thu Sep 1 13:08:24 2022 sda 0.00 0.00 0.00 36.00 0.00 11390.00 316.389 3.261 90.58 0.00 90.58 9.00 Thu Sep 1 13:08:25 2022 sda 0.00 0.00 0.00 36.00 0.00 11391.00 316.417 3.262 90.61 0.00 90.61 9.10 Thu Sep 1 13:08:26 2022 sda 0.00 0.00 0.00 36.00 0.00 11391.00 316.417 3.262 90.61 0.00 90.61 9.10 # Timestamp Device rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await %util Thu Sep 1 13:08:27 2022 sda 0.00 0.00 0.00 36.00 0.00 11390.00 316.389 3.261 90.58 0.00 90.58 9.00 Thu Sep 1 13:08:28 2022 sda 0.00 0.00 0.00 36.00 0.00 11391.00 316.417 3.262 90.61 0.00 90.61 9.10 Thu Sep 1 13:08:29 2022 sda 0.00 0.00 0.00 36.00 0.00 11391.00 316.417 3.262 90.61 0.00 90.61 9.00 Thu Sep 1 13:08:30 2022 sda 0.00 0.00 0.00 36.00 0.00 11391.00 316.417 3.261 90.58 0.00 90.58 9.10 Thu Sep 1 13:08:31 2022 sda 0.00 0.00 0.00 93.00 0.00 43211.00 464.634 32.867 353.41 0.00 353.41 79.70 Thu Sep 1 13:08:32 2022 sda 0.00 0.00 0.00 109.00 0.00 52245.00 479.312 41.067 376.76 0.00 376.76 99.70 Thu Sep 1 13:08:33 2022 sda 0.00 0.00 0.00 109.00 0.00 52245.00 479.312 41.067 376.76 0.00 376.76 99.70 Thu Sep 1 13:08:34 2022 sda 0.00 0.00 0.00 109.00 0.00 52244.00 479.303 41.068 376.77 0.00 376.77 99.80 Thu Sep 1 13:08:35 2022 sda 0.00 0.00 0.00 109.00 0.00 52245.00 479.312 41.067 376.76 0.00 376.76 99.70 Thu Sep 1 13:08:36 2022 sda 0.00 1.00 0.00 110.00 0.00 52245.00 474.955 41.067 373.34 0.00 373.34 99.70 Thu Sep 1 13:08:37 2022 sda 0.00 0.00 0.00 109.00 0.00 52244.00 479.303 41.067 376.76 0.00 376.76 99.70 Thu Sep 1 13:08:38 2022 sda 0.00 0.00 0.00 109.00 0.00 52245.00 479.312 41.067 376.76 0.00 376.76 99.80 Thu Sep 1 13:08:39 2022 sda 0.00 0.00 0.00 109.00 0.00 52244.00 479.303 41.068 376.77 0.00 376.77 99.70 Thu Sep 1 13:08:40 2022 sda 0.00 0.00 0.00 109.00 0.00 52245.00 479.312 41.067 376.76 0.00 376.76 99.70 <snipped>
A note on -z flag
When PCP is used to analyze archived data, it always makes sense to use the -z flag on the command line. The -z flag ensures the timestamps reported by the monitoring tool will be the same as that of the host server where the archive file was created.
Automated Performance Analysis using PMIE
We can also use the Performance Metric Inference Engine (PMIE) provided by the PCP framework to proactively detect some situations at real time and take appropriate action as and when they occur. For example, in this case pcp mpstat
shows that CPU 1 was busy executing at system level (kernel). So it would have been helpful if we had collected perf
data for the period when the problem happened. Using PCP we can trigger perf
(or any other script we wish), when certain predefined conditions are met. We can also log a message to alert the admin.
pmie.service
PCP comes with a bundled pmie service
, which if enabled, can monitor a system for abnormal situations like very high CPU utilization, high volume of disk activities, etc. There are already a few common conditions which are built into the default configuration file for the pmie service
(/etc/pcp/pmie/config.default). Using those as examples, we can build our own rules as well.
One of the inbuilt rules in pmie.service
is:
// 1 per_cpu.system delta = 2 min; per_cpu.system = some_host ( some_inst ( // first term is always true, but provides %v for actions ... ( 100 * kernel.percpu.cpu.sys ) > 0 && 100 * ( kernel.percpu.cpu.user + kernel.percpu.cpu.sys ) > 75 && 100 * kernel.percpu.cpu.sys / ( kernel.percpu.cpu.user + kernel.percpu.cpu.sys ) > 80 ) && hinv.ncpu > 1 ) -> syslog 2 min "Some CPU busy executing in system mode" " %v%sys[%i]@%h";
This rule checks two conditions involving the time spent in user mode and system mode for each CPU. It specifically observes CPU utilization over the last measurement interval, reporting whether the CPU was consumed for over 75% by either kernel or user mode executing instructions, or for over 80% for kernel only executing instructions.
If both of the above conditions are true for at least for one host system (as specified by the ‘some_host’ qualifier), then the above condition evaluates to true, and the action specified after the arrow (->) is executed. In this case, that would mean the message “Some CPU busy executing in system mode” would be logged, by the system logging daemon. On a system running systemd, journalctl
should show these logs appearing in the system journal. Alternatively one can use journalctl -u pmie.service
to see the logs from PMIE, which will include these. Note the variables in the log message: %v is the value calculated for the left most top most expression, for the instance of the metric which satisfies the given condition. %i is the instance for which the metric satisfies this condition, and %h is the host on which the condition matches. So an actual log message would look like this:
Some CPU busy executing in system mode 96%sys[cpu1]@host1
This message illustrates that 96% of time on CPU 1 was spent executing in kernel mode on host ’host1`.
Using PMIE on archives
PMIE can also be used on archive logs. It can help examine large amounts of data and alert about potentially exceptional conditions, based on any arbitrary check as per our requirement. For retrospective analysis, this is very useful.
For example we can perform similar analysis as described above, on any PCP archive instead of live data. For this purpose, we may want to modify the rule, to print a statement to a terminal, rather than log it in syslog.
$ cat pmie.txt // 1 cpu.system delta = 2 min; cpu.system = some_host ( // first term is always true, but provides %v for actions ... ( 100 * kernel.all.cpu.sys / hinv.ncpu ) > 0 && 100 * ( kernel.all.cpu.user + kernel.all.cpu.sys ) > 70 * hinv.ncpu && 100 * kernel.all.cpu.sys / ( kernel.all.cpu.user + kernel.all.cpu.sys ) > 75 ) -> print 2 min "Busy executing in system mode" " %v%sys@%h"; // 1 cpu.util delta = 2 min; cpu.util = some_host ( 100 * ( kernel.all.cpu.user + kernel.all.cpu.sys + kernel.all.cpu.intr ) / hinv.ncpu > 90 ) -> print "High average processor utilization" " %v%util@%h";
Using the above pmie.txt
file, we can analyze an archive and see if anytime during the period covered by the archive file, the above conditions occurred. So it can help us answer questions like “was any CPU executing instructions in kernel mode more than 90% of the time ?”, or “was any disk filling up very fast?”, or “was
the load average very high?” etc.
Example command for using PMIE on PCP archives:
$ pmie -a /var/log/pcp/pmlogger/<hostname>/<archivename> -c pmie.txt
The evaluation frequency for the pmie rules can be provided via the -t option on the CLI. The default evaluation frequency is 10 seconds, normally it is specified along with every rule in the pmie config file.
Sample PMIE rule to flag high memory usage while examining archive(s)
We can also keep a watch on the memory situation on the system, either to monitor at real time or to analyze the PCP archives to examine if there were any suspicious processes. For example, let’s say we want to check if any process was occupying more than 3 GB of physical memory. We can use a rule like this :
$ cat pmie_rules.txt // proc.memory delta = 10 sec; proc.memory = some_host ( some_inst ( ( proc.psinfo.rss ) > 3 Gbyte ) ) -> print "rss more than 3 GB" " %v [%i]@%h";
We can run PMIE with the above rules config file to examine our archive for this condition:
$ pmie -a <archive> -c pmie_rules.txt Tue Apr 26 11:57:55 2022: rss more than 3 GB 4295680000 [002352 ./a.out]@host1 Tue Apr 26 11:58:05 2022: rss more than 3 GB 4295680000 [002352 ./a.out]@host1 Tue Apr 26 11:58:15 2022: rss more than 3 GB 4295680000 [002352 ./a.out]@host1 Tue Apr 26 11:58:25 2022: rss more than 3 GB 4295680000 [002352 ./a.out]@host1
Sample pmie rule to flag high IO load
Some example commands to detect IO load on the system would be:
$ cat pmie_io_examples.txt // show any disk doing more than 50 I/Os per second >some_inst ( disk.dev.total > 50 ) -> print "high IOPs:" " %i: %v"; // some disk is doing more than 30 reads per second >some_inst ( disk.dev.read > 30 ) -> print "busy reads:" " %i: %v"; // some disk is doing more than 30 writes per second >some_inst ( disk.dev.write > 30 ) -> print "busy writes:" " %i: %v"; // some disk has a high I/O rate and more than 95% reads >some_inst ( disk.dev.total > 40 && disk.dev.read / disk.dev.total > 0.95 ) -> print "busy disk and > 95% reads"; // some disk has a high I/O rate and 1 minute load average is // greater than 5 >kernel.all.load #'1 minute' > 5 && some_inst ( disk.dev.total > 40 ) -> print "busy disk and high load avg";
PCP installation contains some sample archives bundled with the tutorials, we can use them to experiment with PMIE. To access the example scripts do the following:-
$ source /etc/pcp.env $ tar xzf $PCP_DEMOS_DIR/tutorials/pmie.tar.gz
PMIE rule to detect swapping
There is also a rule already provided in the default pmie config file, to detect scenarios where the system is facing a memory crunch leading to swapping:
// 1 memory.exhausted delta = 2 min; memory.exhausted = some_host ( ( avg_sample ( swap.pagesout @0..9 ) ) > 0 && 30 %_sample swap.pagesout @0..9 >= 5 ) -> syslog 2 min "Severe demand for real memory" " %vpgsout/s@%h";
This rule can alert the administrator about a memory pressure situation on the system, enabling them to enact some remedial action. The memory pressure is detected based on the number of pages swapped out, for consecutive 10 recent samples.
So as we can see, PCP provides many interesting capabilities, which can help a system administrator monitor many system performance issues at realtime more effectively. It also helps Oracle support to analyze system performance issues retrospectively, when a customer reports an issue on their Linux system and provides the PCP archives. We can use the already available rules in /etc/pcp/pmie/config.default
as a template and then modify them or add our own rules, for easy analysis of archival data.
Proactive usage of PMIE - using PMIE to collect perf data
One of the envisaged usages is to collect perf data when the system is under high load, to enable analysis of performance bottlenecks, after the event. For that purpose, we can modify the rule which checks for some CPU being busy executing instructions in kernel mode:
// 1 per_cpu.system delta = 10 min; per_cpu.system = some_host ( some_inst ( // first term is always true, but provides %v for actions ... ( 100 * kernel.percpu.cpu.sys ) > 0 && 100 * ( kernel.percpu.cpu.user + kernel.percpu.cpu.sys ) > 75 && 100 * kernel.percpu.cpu.sys / ( kernel.percpu.cpu.user + kernel.percpu.cpu.sys ) > 80 ) && hinv.ncpu > 1 ) -> shell 1 hour "timestamp=$(date +"%m-%d-%Y-%T"); perf record -a -g -F99 -o /tmp/perf_record_$timestamp.data -- sleep 10"
Instead of syslog
, we have used the shell
directive as the action here, which allows us to launch an arbitrary program when our configured condition check is satisfied. The “1 hour” after the shell
directive is the holdoff time; so even though this condition will be evaluated every 10 minutes (based on the ‘delta’ parameter), the perf command will be invoked at most once every hour. So if there was a performance episode reported on a system, it should be checked if the perf_record.data files are created, and if so, those files can be examined to understand what operation the kernel was busy doing. The action above uses a different file name every time so that perf data is not overwritten.
Conclusion
There are many more usages where PCP tools offer great insights into system performance issues. PCP also offers a host of utilities like pmrep
, pmdumplog
, pmdumptext
, pminfo
and others, for examining the log metadata (e.g time duration covered, time zone, metric semantic and unit etc) as well as the metric values and trends. Install pcp-oracle-conf on Oracle Linux 7, 8 or 9 - that will enable and start the pmcd
and pmlogger
services. That should help you get started with PCP on Oracle Linux and generate a rich set of metrics and capabilities available to you for exploration. Refer to the blog Better diagnostics with Performance Co-Pilot for more details and background on PCP.
References