Performance Analysis Using PCP

November 10, 2022 | 47 minute read
Text Size 100%:

Performance Co-pilot (PCP) is a framework for system performance analysis. It enables users to collect system performance metrics on an ongoing basis and use logging infrastructure (pmlogger) to archive collected data. PCP on Oracle Linux by default logs and retains a weeks worth of data, allowing users to review change in system behavior. PCP collects metrics in raw format, and provides tools to review and analyze these metrics in human readable format. This blog provides an introduction on how to use some PCP commands, and extract information similar to other standard Linux tools. It also introduces PMIE (Performance Metric Inference Engine) and provides some examples on how to do performance analysis.

Performance Reporting Utilities in PCP

PCP provides various tools and commands similar to standard linux tools such as mpstat, vmstat, iostat, etc., which are used to monitor and review system performance. These PCP commands operate on live systems as well as data archived by pmlogger which allows users to review historical system behavior. Some of the important PCP tools that can be used for analyzing system performance are described below.

pcp mpstat

pcp mpstat reports CPU and interrupt related statistics, for each available processor and also the average for all processors.

$ pcp mpstat -P ALL

Timestamp       CPU     %usr    %nice   %sys    %iowait         %irq    %soft   %steal  %guest  %nice   %idle
07:56:24        all     0.0     0.0     0.0     0.0             0.0     0.0     0.0     0.0     0.0     99.72
07:56:24        0       0.0     0.0     0.0     0.0             0.0     0.0     0.0     0.0     0.0     99.72
07:56:24        1       0.0     0.0     0.0     0.0             0.0     0.0     0.0     0.0     0.0     99.72

Timestamp       CPU     %usr    %nice   %sys    %iowait         %irq    %soft   %steal  %guest  %nice   %idle
07:56:25        all     0.5     0.0     0.0     0.0             0.0     0.0     0.0     0.0     0.0     99.44
07:56:25        0       0.0     0.0     0.0     0.0             0.0     0.0     0.0     0.0     0.0     99.44
07:56:25        1       0.99    0.0     0.0     0.0             0.0     0.0     0.0     0.0     0.0     98.44

pcp iostat

pcp iostat reports I/O statistics for scsi devices (by default) or device-mapper devices (if the -x dm option is specified). As with other PCP tools, by default pcp iostat reports live data for the local host but can also report for a previously captured PCP archive (-a option).

$ pcp iostat -x t
# Timestamp              Device        rrqm/s  wrqm/s     r/s    w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await   %util
Mon Sep  5 13:50:39 2022 sda             0.00    0.00    0.00   1.00     0.00    23.96   24.000    0.002    2.00    0.00    2.00    0.00
Mon Sep  5 13:50:40 2022 sda             0.00    1.99    0.00  24.93     0.00   199.47    8.000    0.024    0.96    0.00    0.96    0.20
Mon Sep  5 13:50:41 2022 sda             0.00    0.00    0.00  10.97     0.00   171.60   15.636    0.011    1.00    0.00    1.00    0.20
Mon Sep  5 13:50:42 2022 sda             0.00    0.00    0.00   0.00     0.00     0.00    0.000    0.000    0.00    0.00    0.00    0.00
Mon Sep  5 13:50:43 2022 sda             0.00    0.00    0.00   0.00     0.00     0.00    0.000    0.000    0.00    0.00    0.00    0.00
....

$ pcp iostat -x dm -x t
# Timestamp              Device        rrqm/s  wrqm/s     r/s    w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await   %util
Mon Sep  5 06:52:20 2022 vg0-lvol_root    0.00    0.00    0.00   0.00     0.00     0.00    0.000    0.000    0.00    0.00    0.00    0.00
Mon Sep  5 06:52:20 2022 vg0-lvol_swap    0.00    0.00    0.00   0.00     0.00     0.00    0.000    0.000    0.00    0.00    0.00    0.00
Mon Sep  5 06:52:21 2022 vg0-lvol_root    0.00    0.00    0.00   0.00     0.00     0.00    0.000    0.000    0.00    0.00    0.00    0.00
Mon Sep  5 06:52:21 2022 vg0-lvol_swap    0.00    0.00    0.00   0.00     0.00     0.00    0.000    0.000    0.00    0.00    0.00    0.00
Mon Sep  5 06:52:22 2022 vg0-lvol_root    0.00    0.00    0.00   0.00     0.00     0.00    0.000    0.000    0.00    0.00    0.00    0.00
Mon Sep  5 06:52:22 2022 vg0-lvol_swap    0.00    0.00    0.00   0.00     0.00     0.00    0.000    0.000    0.00    0.00    0.00    0.00
Mon Sep  5 06:52:23 2022 vg0-lvol_root    0.00    0.00    0.00   0.00     0.00     0.00    0.000    0.000    0.00    0.00    0.00    0.00
....

pcp vmstat

pcp vmstat reports virtual memory statistics. It gives high level information about processes, memory, paging, block IO, and CPU activity.

$ pcp vmstat
@ Mon Aug 29 10:35:36 2022
 loadavg                      memory      swap        io    system                 cpu
   1 min   swpd   free   buff  cache   pi   po   bi   bo   in   cs  us  sy  id  wa  st
    0.01 582104  5372m   1292  1557m    0    0    0    0   63   98   0   0 100   0   0
    0.01 582104  5372m   1292  1557m    0    0    0    4  324  364   4   3  93   0   0
    0.01 582104  5372m   1292  1557m    0    0    0    0   59   96   0   0 100   0   0
    0.01 582104  5372m   1292  1557m    0    0    0   20   65   98   0   0 100   0   0

pcp numastat

pcp numastat shows per numa-node memory statistics for the system.

$ pcp numastat
                           node0
numa_hit              5851342506
numa_miss                      0
numa_foreign                   0
interleave_hit             28233
local_node            5851342506
other_node                     0

pcp pidstat

pcp pidstat shows information about current tasks being managed by the kernel.

$ pcp pidstat -rl | more

Timestamp        UID    PID     MinFlt/s        MajFlt/s        VSize   RSS     %Mem    Command
05:23:15         0      1       0.0             0.0             217564  9188    0.12    /usr/lib/systemd/systemd --switched-root --system --deserialize 21
05:23:15         0      1120    0.0             0.0             72420   34132   0.43    /usr/lib/systemd/systemd-journald
05:23:15         0      1141    0.0             0.0             119220  2964    0.04    /usr/sbin/lvmetad -f
05:23:15         0      1169    0.0             0.0             72184   3780    0.05    /usr/lib/systemd/systemd-udevd
05:23:15         0      1180    0.0             0.0             324928  7696    0.1     /sbin/multipathd
05:23:15         0      1528    0.0             0.0             55544   1848    0.02    /sbin/auditd
05:23:15         0      1532    0.0             0.0             84568   1408    0.02    /sbin/audispd
05:23:15         0      1536    0.0             0.0             55636   3172    0.04    /usr/sbin/sedispatch
05:23:15         0      1585    0.0             0.0             27212   3400    0.04    /usr/lib/systemd/systemd-logind
05:23:15         0      1587    0.0             0.0             52804   3540    0.04    /usr/sbin/smartd -n -q never
05:23:15         999    1589    0.0             0.0             725344  14264   0.18    /usr/lib/polkit-1/polkitd --no-debug
05:23:15         172    1590    0.0             0.0             198796  2760    0.03    /usr/libexec/rtkit-daemon
05:23:15         0      1592    0.0             0.0             537668  7200    0.09    /usr/libexec/udisks2/udisksd
05:23:15         0      1593    0.0             0.0             400740  8420    0.11    /usr/libexec/accounts-daemon
05:23:15         32     1601    0.0             0.0             69268   2436    0.03    /sbin/rpcbind -w
05:23:15         70     1610    0.0             0.0             62148   4       0.0     avahi-daemon: chroot helper
05:23:15         0      1611    0.0             0.0             21548   2376    0.03    /usr/sbin/irqbalance --foreground

pcp atop

pcp atop is an advanced system usage and process monitor.

ATOP - host-xxxx                                2022/09/01  10:26:57                                -----------------                                10s elapsed
PRC | sys    0.03s  | user   0.02s  | #proc    165 |  #trun      1 |  #tslpi    83 | #tslpu     0  | #zombie    0  | clones   104 |               |  #exit      0 |
CPU | sys       1%  | user      1%  | irq       0% |  idle    197% |  wait      0% | guest     0%  | ipc notavail  | cycl unknown |  curf 2.54GHz |  curscal   ?% |
cpu | sys       1%  | user      1%  | irq       0% |  idle     99% |  cpu000 w  0% | guest     0%  | ipc notavail  | cycl unknown |  curf 2.54GHz |  curscal   ?% |
cpu | sys       1%  | user      1%  | irq       0% |  idle     99% |  cpu001 w  0% | guest     0%  | ipc notavail  | cycl unknown |  curf 2.54GHz |  curscal   ?% |
CPL | avg1    0.00  | avg5    0.00  | avg15   0.00 |               |               | csw     2353  | intr    1325  |              |               |  numcpu     2 |
MEM | tot    15.1G  | free    8.7G  | cache   5.2G |  dirty   0.1M |  buff    2.2M | slab  544.7M  | shrss   0.0M  | vmbal   0.0M |  zfarc   0.0M |  hptot   0.0M |
SWP | tot     4.0G  | free    4.0G  | swcac   0.0M |  zpool   0.0M |  zstor   0.0M | ksuse   0.0M  | kssav   0.0M  |              |  vmcom   1.3G |  vmlim  11.5G |
LVM | ivolume-root  | busy      0%  | read       0 |  write      8 |  KiB/r      0 | KiB/w     11  | MBr/s    0.0  | MBw/s    0.0 |  avq     0.00 |  avio  0.0 ns |
DSK |          sda  | busy      0%  | read       0 |  write      6 |  KiB/r      0 | KiB/w     15  | MBr/s    0.0  | MBw/s    0.0 |  avq     1.25 |  avio 0.67 ms |
NET | transport     | tcpi       5  | tcpo       5 |  udpi       0 |  udpo       0 | tcpao      2  | tcppo      0  | tcprs      0 |  tcpie      0 |  udpie      0 |
NET | network       | ipi        5  | ipo        5 |  ipfrw      0 |  deliv      5 |               |               |              |  icmpi      0 |  icmpo      0 |
NET | ens3      0%  | pcki       2  | pcko       2 |  sp 4294967 Gbps |  si    0 Kbps | so    1 Kbps  | erri       0  | erro       0 |  drpi       0 |  drpo
NET | lo      ----  | pcki       4  | pcko       4 |  sp    0 Mbps |  si    0 Kbps | so    0 Kbps  | erri       0  | erro       0 |  drpi       0 |  drpo       0 |

    PID    SYSCPU    USRCPU    RDELAY     VGROW     RGROW     RDDSK     WRDSK    RUID        EUID        ST    EXC     THR    S    CPUNR     CPU     CMD        1/1
   1476     0.00s     0.01s     0.00s        0K        0K        0K        0K    root        root        --      -       1    S        0      0%     sssd_nss
   8520     0.01s     0.00s     0.00s        0K        0K        0K        0K    opc         opc         --      -       1    R        1      0%     pmdaproc
   3148     0.01s     0.00s     0.00s        0K        0K        0K        0K    pcp         pcp         --      -       1    S        0      0%     pmie
   1991     0.00s     0.01s     0.00s        0K        0K        0K        0K    root        root        --      -       1    S        0      0%     OSWatcher
2904276     0.01s     0.00s     0.00s        0K        0K        0K        0K    root        root        --      -       1    I        0      0%     kworker/0:2-ev
1957798     0.00s     0.00s     0.00s        0K        0K        0K        0K    oracle-c    oracle-c    --      -      10    S        1      0%     gomon
   8524     0.00s     0.00s     0.00s        0K        0K        0K        0K    root        root        --      -       1    S        0      0%     python3
2909287     0.00s     0.00s     0.00s        0K        0K        0K        0K    opc         opc         --      -       1    S        0      0%     pcp-atop
   8523     0.00s     0.00s     0.00s        0K        0K        0K        0K    root        root        --      -       1    S        0      0%     pmdalinux
   8515     0.00s     0.00s     0.00s        0K        0K        0K        0K    pcp         pcp         --      -       1    S        0      0%     pmcd
   1402     0.00s     0.00s     0.00s        0K        0K        0K        0K    rpc         rpc         --      -       1    S        1      0%     rpcbind
   1561     0.00s     0.00s     0.00s        0K        0K        0K        0K    root        root        --      -       6    S        1      0%     gssproxy
2910154     0.00s     0.00s     0.00s     9400K      936K        0K        0K    root        root        N-      -       1    S        0      0%     sleep
    156     0.00s     0.00s     0.00s        0K        0K        0K        0K    root        root        --      -       1    I        0      0%     kworker/0:1H-k
    925     0.00s     0.00s     0.00s        0K        0K        0K        0K    root        root        --      -       1    S        1      0%     xfsaild/dm-0
2908817     0.00s     0.00s     0.00s        0K        0K        0K        0K    root        root        --      -       1    I        1      0%     kworker/1:2-mm
2910098     0.00s     0.00s     0.00s        0K        0K        0K        0K    root        root        --      -       1    I        0      0%     kworker/u4:0-e

Example : CPU utilization analysis using pcp mpstat

As an example, on a Linux VM running on Oracle Cloud Infrastructure (OCI), it was noticed that remote SSH logins seemed to be slow one day. As PCP was already running on this VM, metric archives were available for analysis. We can check the time period captured in a archive using the pmdumplog -L command:

$ pmdumplog -L 20220223.00.14.0.xz
Log Label (Log Format Version 2)
Performance metrics from host sushmbha-dev
    commencing Wed Feb 23 00:14:26.918665 2022
    ending     Wed Feb 23 06:22:36.945329 2022
Archive timezone: GMT
PID for pmlogger: 15939

This helps find out if the archive captures the issue timeframe. In this case, it did. Next we would want to look at the CPU utilization data, to find out if any CPU was extra-ordinarily busy. pcp mpstat should help us with that:

$ pcp --archive 20220223.00.14.0.xz --start '@Wed Feb 23 00:14:26' --finish '@Wed Feb 23 00:45'
mpstat -P ALL > mpstat.out

Sample excerpt from mpstat.out :-

Timestamp       CPU     %usr    %nice   %sys    %iowait         %irq    %soft   %steal  %guest  %nice   %idle
00:44:56        all     0.45    0.0     50.15   0.0             0.0     0.0     0.2     0.0     0.0     49.05
00:44:56        0       0.9     0.0     0.3     0.0             0.0     0.1     0.4     0.0     0.0     98.1
00:44:56        1       0.0     0.0     100.0   0.0             0.0     0.0     0.0     0.0     0.0     0.0

Timestamp       CPU     %usr    %nice   %sys    %iowait         %irq    %soft   %steal  %guest  %nice   %idle
00:44:57        all     2.9     0.0     51.8    0.0             0.0     0.45    0.2     0.0     0.0     43.85
00:44:57        0       4.9     0.0     10.4    0.0             0.0     0.0     0.4     0.0     0.0     83.9
00:44:57        1       0.2     0.0     93.4    0.0             0.0     0.9     0.0     0.0     0.0     6.5

Timestamp       CPU     %usr    %nice   %sys    %iowait         %irq    %soft   %steal  %guest  %nice   %idle
00:44:58        all     2.55    0.0     51.95   0.0             0.0     0.45    0.2     0.0     0.0     45.2
00:44:58        0       5.0     0.0     10.7    0.0             0.0     0.0     0.4     0.0     0.0     83.5
00:44:58        1       0.2     0.0     93.2    0.0             0.0     0.9     0.0     0.0     0.0     6.7

From the mpstat data, we can see that CPU 1 on the VM had been unusually busy in this time period, upto 100%, and that has caused the overall CPU utilization to be quite high too.

We can run the pcp atop command, to see what were the top processes running at that time on the system:

$ pcp --archive 20220223.00.14.0.xz --start '@Wed Feb 23 00:14:26' --finish '@Wed Feb 23 00:45' atop > atop.out

Excerpt from atop.out:

ATOP - host1    2022/02/23  00:15:16    -----------------     10s elapsed
PRC | sys   10.06s | user   0.19s | #proc    223 | #zombie    0 | no  procacct |
CPU | sys     102% | user      5% | irq       1% | idle     91% | wait      0% |
cpu | sys      91% | user      0% | irq       1% | idle      7% | cpu001 w  0% |
cpu | sys      10% | user      4% | irq       0% | idle     85% | cpu000 w  0% |
CPL | avg1    1.92 | avg5    1.90 | avg15   1.52 | csw     6974 | intr   18725 |
MEM | tot     7.5G | free  829.6M | cache   5.7G | buff    0.0M | slab  576.3M |
SWP | tot     8.0G | free    7.4G | swcac  33.8M | vmcom   3.8G | vmlim  11.8G |
DSK |          sda | busy      0% | read       0 | write     80 | avio 0.12 ms |
NET | transport    | tcpi       5 | tcpo       4 | udpi       0 | udpo       0 |
NET | network      | ipi        5 | ipo        5 | ipfrw      0 | deliv      5 |
NET | ens3    ---- | pcki       8 | pcko       8 | si    0 Kbps | so    0 Kbps |

  PID SYSCPU USRCPU RDELAY  VGROW  RGROW  RDDSK  WRDSK CPUNR  CPU CMD
  826  9.98s  0.00s  0.00s     0K     0K     0K     0K     1  99% cat
 4444  0.00s  0.08s  0.00s     0K     0K     0K     0K     0   1% gsd-color
 2586  0.01s  0.05s  0.00s     0K     0K     0K    37K     0   1% OSWatcher
 3315  0.00s  0.01s  0.00s     0K     0K     0K     0K     0   0% gnome-shell
<snipped>

This highlights that the cat command was running on CPU 1, and was consuming 99% of the CPU at that time. The PID of the cat command was 826. So let’s find out what the full command for PID 826 was:

$ pminfo -f -a 20220223.00.14.0.xz proc.psinfo.psargs  | grep 826
    inst [826 or "000826 cat"] value "cat /proc/highsys"

We can now see that the command cat /proc/highsys was most likely keeping one CPU fully busy. Its code should be reviewed separately to find out why it was taking so many CPU cycles.

Example : Disk utilization analysis using pcp iostat

We can generate a fake IO workload for a device, using the dd command:

$ date
Thu Sep  1 13:08:12 GMT 2022

$ time for i in {1..3}; do sudo dd of=/tmp/outfile bs=1G count=1 if=/dev/zero 2>&1; done

Examining the PCP archives in that time period, we see an io (write load) spike for sda, as expected, because / is on sda :

$ pcp --start "@Thu Sep  1 13:08:03 2022" --finish "@Thu Sep  1 13:50:00 2022" iostat -a 20220901.00.10.2 -x t -z
# Timestamp              Device        rrqm/s  wrqm/s     r/s    w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await   %util
Thu Sep  1 13:08:04 2022 sda             0.00    0.00    0.00   1.00     0.00     7.00    7.000    0.001    1.00    0.00    1.00    0.00
Thu Sep  1 13:08:05 2022 sda             0.00    0.00    0.00   0.00     0.00     7.00    0.000    0.000    0.00    0.00    0.00    0.10
Thu Sep  1 13:08:06 2022 sda             0.00    1.00    0.00   1.00     0.00     8.00    8.000    0.001    1.00    0.00    1.00    0.00
Thu Sep  1 13:08:07 2022 sda             0.00    0.00    0.00   1.00     0.00     7.00    7.000    0.000    0.00    0.00    0.00    0.10
Thu Sep  1 13:08:08 2022 sda             0.00    0.00    0.00   0.00     0.00     7.00    0.000    0.001    0.00    0.00    0.00    0.10
Thu Sep  1 13:08:09 2022 sda             0.00    0.00    0.00   1.00     0.00     7.00    7.000    0.000    0.00    0.00    0.00    0.00
Thu Sep  1 13:08:10 2022 sda             0.00    0.00    0.00   1.00     0.00     7.00    7.000    0.001    1.00    0.00    1.00    0.10
Thu Sep  1 13:08:11 2022 sda             0.00    0.00    0.00   1.00     0.00    27.00   27.000    0.001    1.00    0.00    1.00    0.00
Thu Sep  1 13:08:12 2022 sda             0.00    1.00    0.00   2.00     0.00    31.00   15.500    0.001    0.50    0.00    0.50    0.10
Thu Sep  1 13:08:13 2022 sda             0.00    1.00    0.00   1.00     0.00    32.00   32.000    0.001    1.00    0.00    1.00    0.00
Thu Sep  1 13:08:14 2022 sda             0.00    0.00    0.00   2.00     0.00    31.00   15.500    0.001    0.50    0.00    0.50    0.10
Thu Sep  1 13:08:15 2022 sda             0.00    1.00    0.00   1.00     0.00    32.00   32.000    0.001    1.00    0.00    1.00    0.00
Thu Sep  1 13:08:16 2022 sda             0.00    0.00    0.00   2.00     0.00    32.00   16.000    0.001    0.50    0.00    0.50    0.10
Thu Sep  1 13:08:17 2022 sda             0.00    1.00    0.00   1.00     0.00    31.00   31.000    0.001    1.00    0.00    1.00    0.00
Thu Sep  1 13:08:18 2022 sda             0.00    1.00    0.00   2.00     0.00    32.00   16.000    0.002    1.00    0.00    1.00    0.10
Thu Sep  1 13:08:19 2022 sda             0.00    0.00    0.00   1.00     0.00    31.00   31.000    0.001    1.00    0.00    1.00    0.00
Thu Sep  1 13:08:20 2022 sda             0.00    1.00    0.00   2.00     0.00    32.00   16.000    0.001    0.50    0.00    0.50    0.10
Thu Sep  1 13:08:21 2022 sda             0.00    0.00    0.00  28.00     0.00  8884.00  317.286    2.556   91.29    0.00   91.29    7.10
Thu Sep  1 13:08:22 2022 sda             0.00    0.00    0.00  36.00     0.00 11391.00  316.417    3.262   90.61    0.00   90.61    9.00
Thu Sep  1 13:08:23 2022 sda             0.00    0.00    0.00  36.00     0.00 11391.00  316.417    3.262   90.61    0.00   90.61    9.10
Thu Sep  1 13:08:24 2022 sda             0.00    0.00    0.00  36.00     0.00 11390.00  316.389    3.261   90.58    0.00   90.58    9.00
Thu Sep  1 13:08:25 2022 sda             0.00    0.00    0.00  36.00     0.00 11391.00  316.417    3.262   90.61    0.00   90.61    9.10
Thu Sep  1 13:08:26 2022 sda             0.00    0.00    0.00  36.00     0.00 11391.00  316.417    3.262   90.61    0.00   90.61    9.10
# Timestamp              Device        rrqm/s  wrqm/s     r/s    w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await   %util
Thu Sep  1 13:08:27 2022 sda             0.00    0.00    0.00  36.00     0.00 11390.00  316.389    3.261   90.58    0.00   90.58    9.00
Thu Sep  1 13:08:28 2022 sda             0.00    0.00    0.00  36.00     0.00 11391.00  316.417    3.262   90.61    0.00   90.61    9.10
Thu Sep  1 13:08:29 2022 sda             0.00    0.00    0.00  36.00     0.00 11391.00  316.417    3.262   90.61    0.00   90.61    9.00
Thu Sep  1 13:08:30 2022 sda             0.00    0.00    0.00  36.00     0.00 11391.00  316.417    3.261   90.58    0.00   90.58    9.10
Thu Sep  1 13:08:31 2022 sda             0.00    0.00    0.00  93.00     0.00 43211.00  464.634   32.867  353.41    0.00  353.41   79.70
Thu Sep  1 13:08:32 2022 sda             0.00    0.00    0.00 109.00     0.00 52245.00  479.312   41.067  376.76    0.00  376.76   99.70
Thu Sep  1 13:08:33 2022 sda             0.00    0.00    0.00 109.00     0.00 52245.00  479.312   41.067  376.76    0.00  376.76   99.70
Thu Sep  1 13:08:34 2022 sda             0.00    0.00    0.00 109.00     0.00 52244.00  479.303   41.068  376.77    0.00  376.77   99.80
Thu Sep  1 13:08:35 2022 sda             0.00    0.00    0.00 109.00     0.00 52245.00  479.312   41.067  376.76    0.00  376.76   99.70
Thu Sep  1 13:08:36 2022 sda             0.00    1.00    0.00 110.00     0.00 52245.00  474.955   41.067  373.34    0.00  373.34   99.70
Thu Sep  1 13:08:37 2022 sda             0.00    0.00    0.00 109.00     0.00 52244.00  479.303   41.067  376.76    0.00  376.76   99.70
Thu Sep  1 13:08:38 2022 sda             0.00    0.00    0.00 109.00     0.00 52245.00  479.312   41.067  376.76    0.00  376.76   99.80
Thu Sep  1 13:08:39 2022 sda             0.00    0.00    0.00 109.00     0.00 52244.00  479.303   41.068  376.77    0.00  376.77   99.70
Thu Sep  1 13:08:40 2022 sda             0.00    0.00    0.00 109.00     0.00 52245.00  479.312   41.067  376.76    0.00  376.76   99.70
<snipped>

A note on -z flag

When PCP is used to analyze archived data, it always makes sense to use the -z flag on the command line. The -z flag ensures the timestamps reported by the monitoring tool will be the same as that of the host server where the archive file was created.

Automated Performance Analysis using PMIE

We can also use the Performance Metric Inference Engine (PMIE) provided by the PCP framework to proactively detect some situations at real time and take appropriate action as and when they occur. For example, in this case pcp mpstat shows that CPU 1 was busy executing at system level (kernel). So it would have been helpful if we had collected perf data for the period when the problem happened. Using PCP we can trigger perf (or any other script we wish), when certain predefined conditions are met. We can also log a message to alert the admin.

pmie.service

PCP comes with a bundled pmie service, which if enabled, can monitor a system for abnormal situations like very high CPU utilization, high volume of disk activities, etc. There are already a few common conditions which are built into the default configuration file for the pmie service (/etc/pcp/pmie/config.default). Using those as examples, we can build our own rules as well.

One of the inbuilt rules in pmie.service is:

// 1 per_cpu.system
delta = 2 min;
per_cpu.system =
some_host (
    some_inst (
        // first term is always true, but provides %v for actions ...
        ( 100 * kernel.percpu.cpu.sys  ) > 0
        && 100 * ( kernel.percpu.cpu.user  +
                   kernel.percpu.cpu.sys  ) > 75
        && 100 * kernel.percpu.cpu.sys  /
            ( kernel.percpu.cpu.user  + kernel.percpu.cpu.sys  )
                > 80
    )
    && hinv.ncpu  > 1
) -> syslog 2 min "Some CPU busy executing in system mode" " %v%sys[%i]@%h";

This rule checks two conditions involving the time spent in user mode and system mode for each CPU. It specifically observes CPU utilization over the last measurement interval, reporting whether the CPU was consumed for over 75% by either kernel or user mode executing instructions, or for over 80% for kernel only executing instructions.

If both of the above conditions are true for at least for one host system (as specified by the ‘some_host’ qualifier), then the above condition evaluates to true, and the action specified after the arrow (->) is executed. In this case, that would mean the message “Some CPU busy executing in system mode” would be logged, by the system logging daemon. On a system running systemd, journalctl should show these logs appearing in the system journal. Alternatively one can use journalctl -u pmie.service to see the logs from PMIE, which will include these. Note the variables in the log message: %v is the value calculated for the left most top most expression, for the instance of the metric which satisfies the given condition. %i is the instance for which the metric satisfies this condition, and %h is the host on which the condition matches. So an actual log message would look like this:

  Some CPU busy executing in system mode 96%sys[cpu1]@host1

This message illustrates that 96% of time on CPU 1 was spent executing in kernel mode on host ’host1`.

Using PMIE on archives

PMIE can also be used on archive logs. It can help examine large amounts of data and alert about potentially exceptional conditions, based on any arbitrary check as per our requirement. For retrospective analysis, this is very useful.

For example we can perform similar analysis as described above, on any PCP archive instead of live data. For this purpose, we may want to modify the rule, to print a statement to a terminal, rather than log it in syslog.

$ cat pmie.txt
// 1 cpu.system
delta = 2 min;
cpu.system =
some_host (
    // first term is always true, but provides %v for actions ...
    ( 100 * kernel.all.cpu.sys  / hinv.ncpu  ) > 0
    && 100 * ( kernel.all.cpu.user  + kernel.all.cpu.sys  )
            > 70 * hinv.ncpu
    && 100 * kernel.all.cpu.sys  /
            ( kernel.all.cpu.user  + kernel.all.cpu.sys  )
                > 75
) -> print 2 min "Busy executing in system mode" " %v%sys@%h";

// 1 cpu.util
delta = 2 min;
cpu.util =
some_host (
    100 * ( kernel.all.cpu.user  +
            kernel.all.cpu.sys  +
            kernel.all.cpu.intr  ) / hinv.ncpu
        > 90
) -> print "High average processor utilization" " %v%util@%h";

Using the above pmie.txt file, we can analyze an archive and see if anytime during the period covered by the archive file, the above conditions occurred. So it can help us answer questions like “was any CPU executing instructions in kernel mode more than 90% of the time ?”, or “was any disk filling up very fast?”, or “was
the load average very high?” etc.

Example command for using PMIE on PCP archives:

$ pmie -a /var/log/pcp/pmlogger/<hostname>/<archivename> -c pmie.txt

The evaluation frequency for the pmie rules can be provided via the -t option on the CLI. The default evaluation frequency is 10 seconds, normally it is specified along with every rule in the pmie config file.

Sample PMIE rule to flag high memory usage while examining archive(s)

We can also keep a watch on the memory situation on the system, either to monitor at real time or to analyze the PCP archives to examine if there were any suspicious processes. For example, let’s say we want to check if any process was occupying more than 3 GB of physical memory. We can use a rule like this :

$ cat pmie_rules.txt
// proc.memory
delta = 10 sec;
proc.memory =
some_host (
    some_inst (
        ( proc.psinfo.rss )
        > 3 Gbyte
    )
) -> print "rss more than 3 GB" " %v [%i]@%h";

We can run PMIE with the above rules config file to examine our archive for this condition:

$ pmie -a <archive> -c pmie_rules.txt
Tue Apr 26 11:57:55 2022: rss more than 3 GB 4295680000 [002352 ./a.out]@host1
Tue Apr 26 11:58:05 2022: rss more than 3 GB 4295680000 [002352 ./a.out]@host1
Tue Apr 26 11:58:15 2022: rss more than 3 GB 4295680000 [002352 ./a.out]@host1
Tue Apr 26 11:58:25 2022: rss more than 3 GB 4295680000 [002352 ./a.out]@host1

Sample pmie rule to flag high IO load

Some example commands to detect IO load on the system would be:

$ cat pmie_io_examples.txt
// show any disk doing more than 50 I/Os per second
>some_inst ( disk.dev.total > 50 ) -> print "high IOPs:" " %i: %v";

// some disk is doing more than 30 reads per second
>some_inst ( disk.dev.read > 30 ) -> print "busy reads:" " %i: %v";

// some disk is doing more than 30 writes per second
>some_inst ( disk.dev.write > 30 ) -> print "busy writes:" " %i: %v";

// some disk has a high I/O rate and more than 95% reads
>some_inst ( disk.dev.total > 40 &&
            disk.dev.read / disk.dev.total > 0.95 )
-> print "busy disk and > 95% reads";

// some disk has a high I/O rate and 1 minute load average is
// greater than 5
>kernel.all.load #'1 minute' > 5 &&
some_inst ( disk.dev.total > 40 )
-> print "busy disk and high load avg";

PCP installation contains some sample archives bundled with the tutorials, we can use them to experiment with PMIE. To access the example scripts do the following:-

$ source /etc/pcp.env
$ tar xzf $PCP_DEMOS_DIR/tutorials/pmie.tar.gz

PMIE rule to detect swapping

There is also a rule already provided in the default pmie config file, to detect scenarios where the system is facing a memory crunch leading to swapping:

// 1 memory.exhausted
delta = 2 min;
memory.exhausted =
some_host (
    ( avg_sample ( swap.pagesout  @0..9 ) ) > 0 &&
    30 %_sample swap.pagesout  @0..9 >= 5
) -> syslog 2 min "Severe demand for real memory" " %vpgsout/s@%h";

This rule can alert the administrator about a memory pressure situation on the system, enabling them to enact some remedial action. The memory pressure is detected based on the number of pages swapped out, for consecutive 10 recent samples.

So as we can see, PCP provides many interesting capabilities, which can help a system administrator monitor many system performance issues at realtime more effectively. It also helps Oracle support to analyze system performance issues retrospectively, when a customer reports an issue on their Linux system and provides the PCP archives. We can use the already available rules in /etc/pcp/pmie/config.default as a template and then modify them or add our own rules, for easy analysis of archival data.

Proactive usage of PMIE - using PMIE to collect perf data

One of the envisaged usages is to collect perf data when the system is under high load, to enable analysis of performance bottlenecks, after the event. For that purpose, we can modify the rule which checks for some CPU being busy executing instructions in kernel mode:

// 1 per_cpu.system
delta = 10 min;
per_cpu.system =
some_host (
    some_inst (
        // first term is always true, but provides %v for actions ...
        ( 100 * kernel.percpu.cpu.sys  ) > 0
        && 100 * ( kernel.percpu.cpu.user  +
                   kernel.percpu.cpu.sys  ) > 75
        && 100 * kernel.percpu.cpu.sys  /
            ( kernel.percpu.cpu.user  + kernel.percpu.cpu.sys  )
                > 80
    )
    && hinv.ncpu  > 1
) -> shell 1 hour "timestamp=$(date +"%m-%d-%Y-%T"); perf record -a -g -F99 -o /tmp/perf_record_$timestamp.data -- sleep 10"

Instead of syslog , we have used the shell directive as the action here, which allows us to launch an arbitrary program when our configured condition check is satisfied. The “1 hour” after the shell directive is the holdoff time; so even though this condition will be evaluated every 10 minutes (based on the ‘delta’ parameter), the perf command will be invoked at most once every hour. So if there was a performance episode reported on a system, it should be checked if the perf_record.data files are created, and if so, those files can be examined to understand what operation the kernel was busy doing. The action above uses a different file name every time so that perf data is not overwritten.

Conclusion

There are many more usages where PCP tools offer great insights into system performance issues. PCP also offers a host of utilities like pmrep, pmdumplog, pmdumptext, pminfo and others, for examining the log metadata (e.g time duration covered, time zone, metric semantic and unit etc) as well as the metric values and trends. Install pcp-oracle-conf on Oracle Linux 7, 8 or 9 - that will enable and start the pmcd and pmlogger services. That should help you get started with PCP on Oracle Linux and generate a rich set of metrics and capabilities available to you for exploration. Refer to the blog Better diagnostics with Performance Co-Pilot for more details and background on PCP.

References

Sushmita Bhattacharya


Previous Post

Announcing Oracle Linux Automation Manager 2.0

Simon Hayler | 3 min read

Next Post


Oracle Linux debuts on 4th Gen AMD EPYC processor

Michele Resta | 2 min read