New Tools for Performance Co-Pilot

October 31, 2023 | 34 minute read
Text Size 100%:

Introduction

Performance Co-pilot (PCP) serves as a framework for comprehensive system performance analysis. It facilitates the continuous collection of system performance metrics and leverages a logging infrastructure for data archival. Additionally, it provides a suite of utilities for convenient viewing of system performance data, whether in real-time or from an archive, presented in an easily digestible format. Oracle has introduced a set of new utilities aimed at assisting system administrators in visualizing system performance metrics, following a format akin to existing Linux utilities. This article will introduce and elucidate these newly implemented tools: pcp-ps, pcp-buddyinfo, pcp-zoneinfo, pcp-slabinfo, pcp-meminfo, and pcp-netstat.

Why are these tools required?

On Linux, /proc is a pseudo file system that provides information on the running Linux system. By examining /proc, one can gather information about the running kernel and the processes running on the system. PCP provides metrics which are primarily derived from the /proc filesystem. Traditionally, Linux administrators and users who want to analyze any system performance issues after they occur, refer to pseudo files presented by /proc; examples being /proc/meminfo (for memory usage), /proc/buddyinfo (for memory fragmentation issues), /proc/zoneinfo (information about memory zones), /proc/slabinfo (information about slab memory). These are collected as multiple metrics, and therefore viewing them together is not trivial. It takes time and effort to collate the relevant PCP metrics to get similar data presented together. The tools that we are introducing in this blog, present the PCP metrics in a format similar to the above pseudo files provided by /proc. Also, netstat utility prints information about the Linux networking subsystem. The other tool pcp-netstat we cover in this blog, presents the relevant metrics to examine network subsystems, in a format similar to the Linux utility netstat. These tools aim to greatly shorten the learning curve for administrators/support teams when they examine system performance information using PCP.

Tools

pcp-ps

The pcp-ps tool provides users with crucial insights into the behaviour and performance of processes. This includes essential details such as Process ID (PID), associated terminal (TTY), accumulated CPU time, and the command name of the task. Users can narrow down their analysis by specifying options like -e to display all processes, or by filtering based on criteria such as command name (-c [command name]) or username (-U [username]). The tool offers the flexibility to define user-specific output formats using the -o option. This empowers users to selectively display columns like CPU utilization, memory usage, process states, and more. pcp-ps can be used to extract real-time data for the local host, providing instant insights. Additionally, when combined with PCP’s archive replay capabilities, it can analyze historical performance data. Users have the option to specify a custom timezone using the -Z option, ensuring that timestamps align with their preferred timezone.

For Example

Snapshot of the current processes metrics using pcp-ps on a live system

pcp ps on live system

$ pcp ps | head -10
Linux  5.4.17-2136.323.6.el8uek.x86_64  (localhost.localdomain)  09/27/23  x86_64    (4 CPU)
Timestamp        PID        TIME        CMD
16:48:33         1          00:00:08    systemd
16:48:33         2          00:00:00    kthreadd
16:48:33         3          00:00:00    rcu_gp
16:48:33         4          00:00:00    rcu_par_gp
16:48:33         6          00:00:00    kworker/0:0H-events_
16:48:33         8          00:00:03    kworker/0:1H-events_
16:48:33         9          00:00:00    mm_percpu_wq
16:48:33         10         00:00:00    ksoftirqd/0    

The pcp-ps with header selection which is similar to ps command option -o (User-defined format) where user can define output format .

In the example below we have selected process id (pid), process parent id(ppid), memory usage(%mem), user name(USER) and address of the kernel function where the process is sleeping(WCHAN).

pcp ps with customized header

$ pcp ps -o pid,ppid,%mem,uname,wchan | head -10
Linux  5.4.17-2136.323.6.el8uek.x86_64  (localhost.localdomain)  09/27/23  x86_64    (4 CPU)
Timestamp   PID         PPID            %MEM        USER            WCHAN
16:50:27    1               0               0.07        root            ep_poll
16:50:27    2               0               0.0     root            kthreadd
16:50:27    3               2               0.0     root            rescuer_thread
16:50:27    4               2               0.0     root            rescuer_thread
16:50:27    6               2               0.0     root            worker_thread
16:50:27    8               2               0.0     root            worker_thread
16:50:27    9               2               0.0     root            rescuer_thread
16:50:27    10              2               0.0     root            smpboot_thread_fn   

The pcp-ps output with pre-defined user format option, by default gives all the important metrics related to process such as:

  • USERNAME: Indicates the username associated with the process.
  • PID (Process ID): A unique numerical identifier assigned to each running process in the system.
  • %CPU (CPU Usage): This shows the percentage of the CPU’s processing capacity that the process is currently using.
  • %MEM (Memory Usage): Represents the percentage of the system’s physical memory (RAM) that the process is using.
  • VSZ (Virtual Memory Size): This is the total virtual memory used by the process.
  • RSS (Resident Set Size): Represents the non-swapped physical memory that a process is using.
  • TTY (Terminal): Displays the controlling terminal for the process.
  • STAT (Process State): Shows the current state of the process. Common states include: R: Running
    S: Sleeping
    D: Waiting for I/O
    Z: Zombie (terminated but parent process has not yet acknowledged its termination)
    T: Stopped (by a signal).
  • TIME (CPU Time): This indicates the total CPU time that the process has used since it started.
  • START (Start Time): Shows the time when the process was started.
  • COMMAND (Command): This field displays the command that was used to initiate the process.

pcp-ps output with pre-defined user format option

$ pcp ps -u | head -10
Linux  5.4.17-2136.323.6.el8uek.x86_64  (localhost.localdomain)  09/27/23  x86_64    (4 CPU)
Timestamp        USERNAME   PID     %CPU    %MEM    VSZ RSS TTY STAT    TIME        START       COMMAND
16:52:43         root       1           0.0 0.0 175960  14700   ?   S   00:00:08    13:12:06    systemd
16:52:43         root       2           0.0 0.0 0   0   ?   S   00:00:00    13:12:06    kthreadd
16:52:43         root       3           0.0 0.0 0   0   ?   I   00:00:00    13:12:06    rcu_gp
16:52:43         root       4           0.0 0.0 0   0   ?   I   00:00:00    13:12:06    rcu_par_gp
16:52:43         root       6           0.0 0.0 0   0   ?   I   00:00:00    13:12:06    kworker/0:0H-events_
16:52:43         root       8           0.0 0.0 0   0   ?   I   00:00:03    13:12:06    kworker/0:1H-events_
16:52:43         root       9           0.0 0.0 0   0   ?   I   00:00:00    13:12:06    mm_percpu_wq
16:52:43         root       10          0.0 0.0 0   0   ?   S   00:00:00    13:12:06    ksoftirqd/0

Filter process

#Select by process ID
pcp -p pid_of_process

#Select by parent process ID
pcp -P ppid_of_process

pcp-buddyinfo

The tool presents a detailed breakdown of available pages for different orders, ranging from 0 to 10. Each order represents a specific size category, enabling users to grasp memory availability at varying granularities. pcp-buddyinfo elevates the analysis by presenting data in a structured and accessible format. This empowers users to quickly discern critical patterns and trends in memory utilization without the need for manual interpretation of raw text data.

Examples

Buddyinfo related data using pcp-buddyinfo on a live system:

pcp-buddyinfo on live system

$ pcp buddyinfo
Linux  5.4.17-2136.323.6.el8uek.x86_64  (localhost.localdomain)  09/27/23  x86_64    (4 CPU)
TimeStamp         Normal             Nodes        Order0    Order1    Order2    Order3    Order4    Order5    Order6    Order7    Order8    Order9    Order10
16:47:44          DMA                node0          0         0         0         0         0         0         0         0         1         1         3
16:47:44          DMA32              node0          2         7         5         4         7         7         9         4         6         5         787
16:47:44          Normal             node0          56        45        31        17        3         316       226       110       54        1         2914     

pcp buddyinfo on archive

Analyzing pcp-buddyinfo.0.xz archive with pcp-buddyinfo:

$ pcp -a pcp-buddyinfo.0.xz buddyinfo | head -10
Linux  5.4.17-2136.317.5.3.el8uek.x86_64  (localhost.localdomain)  08/02/23  x86_64    (4 CPU)
TimeStamp         Normal             Nodes        Order0    Order1    Order2    Order3    Order4    Order5    Order6    Order7    Order8    Order9    Order10
15:53:21          DMA                node0          0         0         0         0         0         0         0         0         1         1         3
15:53:21          DMA32              node0          3         1         1         1         2         3         2         3         3         2         803
15:53:21          Normal             node0          1720      4118      1862      671       220       120       66        29        21        26        3040
Linux  5.4.17-2136.317.5.3.el8uek.x86_64  (localhost.localdomain)  08/02/23  x86_64    (4 CPU)
TimeStamp         Normal             Nodes        Order0    Order1    Order2    Order3    Order4    Order5    Order6    Order7    Order8    Order9    Order10
15:53:24          DMA                node0          0         0         0         0         0         0         0         0         1         1         3
15:53:24          DMA32              node0          3         1         1         1         2         3         2         3         3         2         803
15:53:24          Normal             node0          1720      4118      1862      671       220       120       66        29        21        26        3040     

pcp-zoneinfo

The pcp-zoneinfo tool offers a detailed view of NUMA (Non-Uniform Memory Access) nodes and their associated statistics, extracted from the /proc/zoneinfo file. It enables users to analyze memory zone availability across nodes, which is vital for optimizing performance in modern server setups with NUMA architectures. By using pcp-zoneinfo, users can gain insights into memory allocation patterns, ensuring efficient utilization of available resources. This tool empowers users with the ability to filter samples from the archive and provides archive replay capabilities.

Examples

Live system metrics for zoneinfo using pcp-zoneinfo:

Live system metrics for zoneinfo using pcp-zoneinfo.

$ pcp zoneinfo | head -10
Linux  5.4.17-2136.323.6.el8uek.x86_64  (localhost.localdomain)  09/27/23  x86_64    (4 CPU)
TimeStamp =  16:37:22
NODE  0, per-node status
     nr_inactive_anon 5614
     nr_active_anon 469809
     nr_inactive_file 279923
     nr_active_file 246316
     nr_unevictable 3097
     nr_slab_reclaimable 107132
     nr_slab_unreclaimable 43908

pcp zoneinfo on archive

Analyzing pcp-zoneinfo.0.xz archive with pcp-zoneinfo:

$ pcp -a pcp-zoneinfo.0.xz zoneinfo
Linux  5.4.17-2136.320.7.1.el7uek.x86_64  (sagar-vminstance-1)  10/03/23  x86_64    (4 CPU)
TimeStamp = 11:49:31
Node 0, zone    DMA
        per-node status
        nr_inactive_anon 33918
        nr_active_anon 224024
        nr_inactive_file 786762
        nr_active_file 404375
        nr_unevictable 5262
        nr_slab_reclaimable 190078
        nr_slab_unreclaimable 64022
        nr_isolated_anon 0
        nr_isolated_file 0
        nr_anon_pages 166619
        nr_mapped 49208
        nr_file_pages 1280072
        nr_dirty 18

pcp-slabinfo

The pcp-slabinfo tool offers an in-depth view of the kernel slab allocator’s statistics. It collates existing PCP metrics related to slab memory. This information is presented in a format reminiscent of the proc filesystem. Users can efficiently analyze memory object allocation in the kernel, providing a real-time perspective in live systems or recorded archive data. The tool displays the current count of active objects, allowing users to understand allocation status. Additionally, it provides the total count of allocated objects, whether in use or not.

Examples

Live system metrics for slabinfo using pcp-slabinfo tool:

Live system metrics for slabinfo using pcp-slabinfo tool

$ pcp slabinfo | head -10
Linux  5.4.17-2136.323.6.el8uek.x86_64  (localhost.localdomain)  09/27/23  x86_64    (4 CPU)
TimeStamp         Name                               active_objs         num_objs         objsize byte         objperslab         pagesperslab         active_slabs         num_slabs
16:55:13          Acpi-Operand                            4368               4368                 72                56                   1                 78                 78
16:55:13          Acpi-Parse                            329400             329522                 56                73                   1               4514               4514
16:55:13          Acpi-State                               765                765                 80                51                   1                 15                 15
16:55:13          anon_vma                               11434              12714                104                39                   1                326                326
16:55:13          anon_vma_chain                         17182              20864                 64                64                   1                326                326
16:55:13          avc_xperms_data                        19328              19328                 32               128                   1                151                151
16:55:13          avtab_extended_perms                  290190             290190                 40               102                   1               2845               2845
16:55:13          avtab_node                            401880             401880                 24               170                   1               2364               2364

pcp slabinfo on archive

Analyzing pcp-slabinfo.0.xz archive with pcp-slabinfo:

$ pcp -a pcp-slabinfo.0.xz slabinfo | head -10
Linux  5.4.17-2136.317.5.3.el8uek.x86_64  (localhost.localdomain)  08/02/23  x86_64    (4 CPU)
TimeStamp         Name                               active_objs         num_objs         objsize byte         objperslab         pagesperslab         active_slabs         num_slabs
10:25:20          Acpi-Operand                            4592               4592                 72                56                   1                 82                 82
10:25:20          Acpi-Parse                            314423             315652                 56                73                   1               4324               4324
10:25:20          Acpi-State                               765                765                 80                51                   1                 15                 15
10:25:20          anon_vma                               11481              11895                104                39                   1                305                305
10:25:20          anon_vma_chain                         18269              19264                 64                64                   1                301                301
10:25:20          avc_xperms_data                         8192               8192                 32               128                   1                 64                 64
10:25:20          avtab_extended_perms                  276216             276216                 40               102                   1               2708               2708
10:25:20          avtab_node                            401880             401880                 24               170                   1               2364               2364

pcp-meminfo

The pcp-meminfo tool offers a comprehensive report on memory usage within the system, utilizing data from the /proc/meminfo file in the /proc pseudo-file system. It provides valuable insights into various memory statistics like used and available memory, swap space, cache, and buffers. This tool aids in reviewing memory usage information. While not indispensable, it offers an additional method for examining memory data, whether on a live machine or from recorded archive data, contributing to effective troubleshooting.

Examples

pcp meminfo on live data

To examine the memory usage statistics on a live machine, execute the subsequent command:

$ pcp meminfo | head -10
Linux  5.4.17-2136.300.7.el8uek.x86_64  (Mohit-OL8u5-vm1)  09/19/23  x86_64    (2 CPU)
05:50:16
MemTotal          : 1734892 kB
MemFree           : 261832 kB
MemAvailable      : 908972 kB
Buffers           : 3164 kB
Cached            : 737992 kB
SwapCached        : 28 kB
Active            : 572576 kB
Inactive          : 551840 kB

pcp meminfo on archive

To examine the memory usage statistics on archive data, execute the subsequent command

$ pcp meminfo -a pcp-meminfo.0.xz -s 2
Linux  5.4.17-2136.300.7.el8uek.x86_64  (Mohit-OL8u5-vm1)  09/08/23  x86_64    (2 CPU)
07:06:55
MemTotal          : 1734892 kB
MemFree           : 244256 kB
MemAvailable      : 860364 kB
Buffers           : 172 kB
Cached            : 728404 kB
SwapCached        : 12876 kB
Active            : 402000 kB
Inactive          : 608084 kB
Active(anon)      : 143436 kB
Inactive(anon)    : 161768 kB
Active(file)      : 258564 kB
Inactive(file)    : 446316 kB
Unevictable       : 0 kB
Mlocked           : 0 kB
SwapTotal         : 1572860 kB
SwapFree          : 1403752 kB
Dirty             : 3604 kB
Writeback         : 0 kB
AnonPages         : 275972 kB
Mapped            : 71720 kB<<cropped>>

pcp-netstat

In general, netstat is a Linux tool that provides statistics about all active connections on a computer, including incoming and outgoing connections, routing tables, and network protocol statistics. For instance, you can use netstat to display all active TCP connections to the computer, display all active UDP connections to the computer, display the routing table of the computer, and display statistics for each protocol. For more info please refer to the netstat man page.

The pcp-netstat is a tool developed on the same lines to view different kinds of statistics related to network protocols and network interfaces. In particular, this tools collects netstat -s and netstat -i -a -n output. It is useful for checking the status of network interfaces, network connections, and troubleshooting network issues. This tool can also be used to analyze network statistics for all available protocols, including TCP, UDP, ICMP, and IP protocols.

Example

By default when no flags are provided as input, this tools displays both the network protocol statistics and network interface statistics.

pcp netstat on live data

Execute the subsequent command to examine the network statistics on the live machine.

$ pcp netstat -s 1
Linux  5.4.17-2136.300.7.el8uek.x86_64  (Mohit-OL8u5-vm1)  09/19/23  x86_64    (2 CPU)
05:59:25

Ip:
        Forwarding: 2
        161401 total packets received
        3 with invalid addresses
        0 forwarded
        0 incoming packets discarded
        125253 incoming packets delivered
        122240 requests sent out
        12 dropped because of missing route


Icmp:
        61 ICMP messages received
        0 Input ICMP message failed
        ICMP input histogram:
                destination unreachable: 61
        0 ICMP messages sent
        0 ICMP messages failed
        ICMP input histogram:
                Output destination unreachable: 0


IcmpMsg:
                InType3: 61
                OutType0: NA


Tcp:
        66 active connections openings
        22 passive connection openings
<<cropped>>

Kernel Interface table
     Iface        MTU      RX-OK     RX-ERR     RX-DRP      TX-OK     TX-ERR     TX-DRP
      ens2       1500     199173          0          0      44991          0          0
      ens3       1500         23          0          0       5813          0          0
      ens4       1500       5835          0          0          2          0          0
        lo      65536      78641          0          0      78641          0          0 

pcp netstat on archives

To examine the network statistics on the recorded archive data, execute the subsequent command.

$ pcp netstat -a pcp-netstat.0.xz
Linux  5.4.17-2136.300.7.el8uek.x86_64  (Mohit-OL8u5-vm1)  09/08/23  x86_64    (2 CPU)
06:35:53

Ip:
        Forwarding: 2
        8765322 total packets received
        8 with invalid addresses
        0 forwarded
        0 incoming packets discarded
        5386076 incoming packets delivered
        5368638 requests sent out
        12 dropped because of missing route


Icmp:
        3817 ICMP messages received
        37 Input ICMP message failed
        ICMP input histogram:
                destination unreachable: 3817
        17 ICMP messages sent
        0 ICMP messages failed
        ICMP input histogram:
                Output destination unreachable: 17


IcmpMsg:
                InType3: 3817
                OutType0: 17


Tcp:
        2316 active connections openings
        1017 passive connection openings
        153 failed connection attempts
<<cropped>>

Kernel Interface table
     Iface        MTU      RX-OK     RX-ERR     RX-DRP      TX-OK     TX-ERR     TX-DRP
      ens2       1500   14340172          0        165     565382          0          0
      ens3       1500    4415935          0          0    5941623          0          0
      ens4       1500    1669793          0          0          1          0          0
        lo      65536    1034960          0          0    1034960          0          0 

Network protocol statistics

For protocol-specific statistics, make use of the -p option followed the protocol name [TCP|IP|UDP|ICMP]

$ pcp netstat -s 1 -p IP
Linux  5.4.17-2136.300.7.el8uek.x86_64  (Mohit-OL8u5-vm1)  09/27/23  x86_64    (2 CPU)
05:36:53

Ip:
        Forwarding: 2
        1435031 total packets received
        2 with invalid addresses
        0 forwarded
        0 incoming packets discarded
        390677 incoming packets delivered
        386767 requests sent out
        12 dropped because of missing route


IpExt:
        InMcastPkts: 6916
        InBcastPkts: 24936
        InOctets: 889985068
        OutOctets: 83609066
        InMcastOctets: 221312
        InBcastOctets: 5704410
        InNoECTpkts: 1451089

Network interface statistics

To obtain statistics pertaining to network interfaces, utilize the -i option.

$ pcp netstat -s 1 -i
Linux  5.4.17-2136.300.7.el8uek.x86_64  (Mohit-OL8u5-vm1)  09/27/23  x86_64    (2 CPU)
05:39:06
Kernel Interface table
     Iface        MTU      RX-OK     RX-ERR     RX-DRP      TX-OK     TX-ERR     TX-DRP
      ens2       1500    4715661          0          0     197130          0          0
      ens3       1500     253452          0          0     244360          0          0
      ens4       1500     497809          0          0          1          0          0
        lo      65536     431528          0          0     431528          0          0 

Particular protocol or interface statistics

# shows output similar to netstat -s, which displays network statistics for each protocol
pcp netstat -s 1 --statistics


# protocol specific stats
pcp netstat -s 1 TCP
pcp netstat -s 1 UDP
pcp netstat -s 1 ICMP

Conclusion

This blog introduced new PCP tools pcp-ps, pcp-buddyinfo, pcp-meminfo, pcp-netstat, and pcp-slabinfo to view PCP metrics in an already familar format. For more details and background on PCP, you can refer these blogs :

sagar sagar

Mohith Thummaluru


Previous Post

Directory Entry Lookup in ext4

Srivathsa Dara | 18 min read

Next Post


Oracle Linux 9.2 receives IPv6 Ready Gold Logo and meets USGv6-r1 standards

David Gilpin | 2 min read