Introduction
In this blog post,we will describe at a high level, how to write a tool to present PCP(Performance Co-Pilot) data in a user-friendly way. We’ll take inspiration from the pcp-ps
tool, developed to present process-related metrics data in the familiar ps command output format.
Understanding PCP
Before diving into writing PCP client tools, let’s first understand a few basics of PCP.
PCP, short for Performance Co-Pilot, is an open-source framework and toolkit designed for monitoring, analyzing, and responding to system performance data. PCP features a fully distributed, plug-in-based architecture, making it ideal for centralized analysis of complex environments and systems.
PCP stores performance data in archives, organized in a hierarchical structure consisting of temporal index files (.index), metadata files (.meta), and data volume files (.0, .1, etc.).
To access PCP data, you can use various client tools, such as pmrep
, pmval
, pminfo
, pmprobe
, and pmchart
. These tools enable you to retrieve, display, archive, and process performance data on the same host or over the network.
For more info, please refer to: Better Diagnostics with Performance Co-Pilot.
Requirements for PCP System Tools
Creating an effective PCP system tool involves several key requirements aimed at enhancing user experience and simplifying data visualization. The tool should:
-
Facilitate Seamless Data Viewing: The tool must allow users to view PCP data in a familiar format, similar to existing Linux tools. Wherever possible, we should try to mimic familiar Linux utilities to minimize the learning curve for users of the tool.
-
Provide Effortless Access to Archived Data: Users should be able to access and explore historical performance metrics easily, along with live system data.
-
Support Intuitive Time Zone Configuration: The tool should offer straightforward configuration options for time zone parameters, enabling users to analyze and interpret data in their preferred time zone.
Setting Up the Environment
PCP is primarily developed and supported on Unix-like systems such as Linux and FreeBSD. For this guide, we’ll focus on setting up PCP on Oracle Linux. Follow these steps to install PCP:
Install PCP Package
Install the PCP package using the Oracle Linux package manager:
$ sudo yum-config-manager --enable ol8_appstream # To enable application stream in Oracle Linux 8 $ sudo yum install pcp-zeroconf
Alternative Installation (Source Code)
Alternatively, you can download the source code from the PCP website and compile it yourself. Ensure you have the necessary development tools installed on your system, including compilers (e.g., python), build systems (e.g., make).
$ sudo yum groupinstall "Development Tools" $ wget https://pcp.io/downloads/pcp-latest.tar.gz $ tar -xzvf pcp-latest.tar.gz $ cd pcp-* $ ./configure $ make $ sudo make install
By following these steps, you can set up a PCP environment on Oracle Linux, ensuring you have all the necessary tools for development and debugging.
Creating PCP Client
Creating a PCP client involves several steps to ensure it effectively displays relevant metrics. Here’s a guide to build a new PCP client, using pcp-ps
as an example.
Step 1: Define Relevant Metrics
Identify and define the metrics necessary for your tool. For the pcp-ps
client, the following group metrics have been used.
PSSTAT_METRICS = [ 'kernel.uname.nodename', 'kernel.uname.release', 'kernel.uname.sysname', 'kernel.uname.machine', 'hinv.ncpu', 'proc.psinfo.pid', 'proc.psinfo.guest_time', 'proc.psinfo.utime', 'proc.psinfo.ppid', 'proc.psinfo.rt_priority', 'proc.psinfo.rss', 'proc.id.uid_nm', 'proc.psinfo.stime', 'kernel.all.boottime', 'proc.psinfo.sname', 'proc.psinfo.start_time', 'proc.psinfo.vsize', 'proc.psinfo.priority', 'proc.psinfo.nice', 'proc.psinfo.wchan_s', 'proc.psinfo.psargs', 'proc.psinfo.cmd', 'proc.psinfo.ttyname', 'mem.physmem', 'proc.psinfo.policy' ]
Step 2: Finalizing the option support for your Utility
Define the options supported by your tool. For the pcp-ps
client, the following options have been used.
pmapi.pmOptions.__init__(self, "t:c:e::p:ukVZ:z?:o:P:l:U:k") self.pmSetOptionCallback(self.extraOptions) self.pmSetOverrideCallback(self.override) self.options()
The __init__
method is defined in the pmOptions
class, Here we give the options we want to support with our utility. For example we want to support U
username filtering and o
for user defined colum formatting for pcp-ps
.
In pmSetOptionCallback
we can add the extra options we want to support and in pmSetOverrideCallback
we can override the default options defined in pmapi.
def override(self, opts): ProcessStatOptions.print_count = self.pmGetOptionSamples() # """Override standard Pcp-ps option to show all process """ return bool(opts in ['p', 'c', 'o', 'P', 'U'])
For more info please check :- https://github.com/performancecopilot/pcp/blob/main/src/python/pcp/pmapi.py.in
Step 3: Initialize the Client
Create and configure the ProcessStatOptions
class and instantiate a MetricGroupManager
:
opts = ProcessStatOptions() manager = pmcc.MetricGroupManager.builder(opts, sys.argv)
The builder
method is defined in the MetricGroupManager
class, which takes a simple PCP monitor argument for parsing such as setting up context, interval and timezone etc and returns a MetricGroupManager
object.
Step 4: Set Context and Validate Options
Set the context to determine if the data is from a live system or an archive. Validate the options using the checkOptions()
method:
ProcessStatOptions.context = manager.type if not opts.checkOptions(): raise pmapi.pmUsageErr
Step 5: Check Metrics Availability
Ensure that all required metrics are available:
missing = manager.checkMissingMetrics(PSSTAT_METRICS) if missing is not None: sys.stderr.write('Error: not all required metrics are available\n Missing %s\n' % missing) sys.exit(1) manager['psstat'] = PSSTAT_METRICS
Manager object of the MetricGroupManager
class contains the required metrics. We use the checkMissingMetrics method to check if all required metrics are available in the context.
Step 6: Assign Printer and Run
Assign a printer method to display the data and run the manager:
manager.printer = ProcessStatReport() sts = manager.run()
The run
method defined in MetricGroup
class utilizes options specification, loops for fetching and reporting, and pauses for the requested time interval between updates.
Step 7: Configure ProcessStatOptions
Define the flags and variables within the ProcessStatOptions
class to manage data visualization:
class ProcessStatOptions(pmapi.pmOptions): show_all_process = False command_filter_flag = False ppid_filter_flag = False pid_filter_flag = False username_filter_flag = False selective_colum_flag = False filter_flag = False user_oriented_format = False empty_arg_flag = False filterstate = None timefmt = "%H:%M:%S" print_count = None colum_list = [] command_list = [] pid_list = [] ppid_list = [] filtered_process_user = None context = None
The member variables above manage data visualization flags, which are set based on user input in the extraOptions
method of the ProcessStatOptions
class.
Step 8: Define Report Methods
Implement the following methods within the ProcessStatReport
class to display system information and process data:
timeStampDelta()
: Calculates the time delta.print_machine_info()
: Prints machine-related information.report()
: Prints the requested data.
def timeStampDelta(self, group): s = group.timestamp.tv_sec - group.prevTimestamp.tv_sec u = group.timestamp.tv_usec - group.prevTimestamp.tv_usec return s + u / 1000000.0 def print_machine_info(self, group, context): timestamp = context.pmLocaltime(group.timestamp.tv_sec) time_string = time.strftime("%x", timestamp.struct_time()) header_string = f"{group['kernel.uname.sysname'].netValues[0][2]} {group['kernel.uname.release'].netValues[0][2]} ({group['kernel.uname.nodename'].netValues[0][2]}) {time_string} {group['kernel.uname.machine'].netValues[0][2]}" print(f"{header_string} ({self.get_ncpu(group)} CPU)") def report(self): if ProcessStatOptions.show_all_process: process_report = ProcessStatus(manager, metric_repository) process_filter = ProcessFilter(ProcessStatOptions) stdout = StdoutPrinter() printdecorator = NoneHandlingPrinterDecorator(stdout) report = ProcessStatusReporter(process_report, process_filter, interval_in_seconds, printdecorator.Print, ProcessStatOptions) report.print_report(timestamp, header_indentation, value_indentation)
Your PCP client is now ready to print data, tailored to the user’s specified format. By following these steps, you ensure your client tool displays the necessary metrics, using the command line options as specified by the user to filter and format the data output.
Archive replay and timezone adjustment
To replay the archive, we can specify the sample count to be printed using the -s
option on the command line. For example:
$ pcp -s4 -a /var/log/pcp/pmlogger/localhost.localdomain/20240521.0 ps
By default, if the context is not set to PM_CONTEXT_ARCHIVE
, the print count is set to 1, and data is retrieved from the live system. This print_count
is passed to the client through the PCP Python infrastructure, which can be read as follows:
ProcessStatOptions.print_count = self.pmGetOptionSamples()
To ensure the program exits gracefully when the print count is exhausted, you can implement the following logic:
# When the print count is exhausted, exit the program gracefully # We can't use break here because it's being called by the run manager if self.processStatOptions.context is not PM_CONTEXT_ARCHIVE: if self.processStatOptions.print_count == 0: sys.exit(0)
Changing the Time Zone for pcp-ps Reports
When using pcp-ps
to report performance metrics, the default behavior is to report the time of day according to the local time zone on the system where pcp-ps
is run. However, you can change the time zone for reporting using the -Z
and -z
options.
-Z timezone, --timezone=timezone
The -Z
option allows you to specify a different time zone for the reports. The timezone
should be in the format of the environment variable TZ
, as described in environ(7)
. This can be useful when you want to view the performance data in a specific time zone.
Example:
To report the time in GMT, you can use the following command:
$ pcp -s3 -a /var/log/pcp/pmlogger/localhost.localdomain/20240521.0 ps -Z=GMT
In this example:
-s3
specifies the sample count.-a /var/log/pcp/pmlogger/localhost.localdomain/20240521.0
specifies the archive to replay.-Z=GMT
changes the reporting time zone to GMT.
-z , --hostzone
The -z
option changes the reporting time zone to the local time zone of the host that is the source of the performance metrics. This is particularly useful when replaying a PCP archive that was captured in a different time zone. By default, the reporting time zone is the local time zone, which may not match the time zone of the PCP archive.
When replaying an archive captured in a foreign time zone, using the -z
option ensures that the reported times align with the time zone of the original data collection.
Usage:
$ pcp -s3 -a /var/log/pcp/pmlogger/localhost.localdomain/20240521.0 ps -z
In this example:
In this example:
-s3
specifies the sample count.-a /var/log/pcp/pmlogger/localhost.localdomain/20240521.0
specifies the archive to replay.-z
sets the reporting time zone to the local time zone of the host from which the performance metrics originated.
Using these options ensures that the time stamps in your performance reports are meaningful and aligned with your needs, whether you’re comparing data across different time zones or reviewing historical data.
In the client source code, you do not need to explicitly handle the time zone. Instead, you can leverage the following APIs provided by the PCP infrastructure to set or retrieve the time zone:
self.pmSetOptionTimezone() self.pmGetOptionTimezone()
Testing Your Program
Now, it’s time to put your program to the test. Within the PCP infrastructure, there are two kinds of tests we have to write: unit tests, and integration tests.
Unit Testing
Unit testing focuses on checking individual functions or methods to ensure they work correctly in isolation. In the src/pcp/ps
directory of the upstream PCP repository, there’s a unit test directory named test
. It contains process_state_util_reporter_test.py
and process_statusutil_test.py
, which are designed to test specific classes. Using the unit test framework, these tests provide inputs to methods and verify that the outputs meet expectations, ensuring the robustness and correctness of the functionality.
Integration Testing
Integration testing involves creating a specialized archive for the metrics defined in the PSSTAT_METRICS
group. These metrics are integrated into the upstream codebase and added to the qa/archives/GNUmakefile
for inclusion during the build phase. Following this, a specific test, such as “1987” for pcp-ps
, is run in the qa
directory. This test checks all supported options of the pcp-ps
program, generating an output file named 1987.out
.This output file is treated as the expected output for the pcp-ps
program. Whenever a package build occurs, it invokes the latest pcp-ps
program available at \src\pcp\ps\pcp-ps.py
using test 1987 as input. It then generates the output file and compares it with the pre-existing expected output, the 1987.out
file.
By using both unit and integration testing, you can thoroughly validate your program, ensuring it functions correctly and robustly within the PCP infrastructure.
Conclusion
We hope using the steps described in this blog, you will find writing a PCP client tool simple and straightforward enough. The PCP framework already has the infrastructure and we just need to define our requirements and integrate with the PCP infrastructure to achieve our system monitoring goals.