Introduction
Performance Co-Pilot (PCP) is a versatile and extensible tool for gathering performance diagnostics of a system. Due to its many features and strong upstream support, it is widely used to monitor system health in production environments. However, there are scenarios where PCP does not collect the metrics you need, requiring system administrators to extend PCP by writing a Performance Metric Domain Agent (PMDA).
PMDAs are responsible for gathering performance data and keeping it updated for requests from the Performance Metric Collection Daemon (PMCD).
When Do You Need a New PMDA?
A new PMDA is required when the metric you need isn’t available through existing PMDAs. This may happen in the following cases:
- PCP is still evolving, and the performance metric you need may not yet exist.
- The metric is specific to your requirements, such as monitoring a custom database or application.
- You have developed an application or module that you want to monitor through PCP.
Where to Begin?
Comprehensive documentation about PCP and its development process is available here: Programming Performance Co-Pilot
This blog simplifies the process and provides a concise guide to writing a PMDA, focusing on Python-based PMDAs. By the end of this blog, you’ll have a working PMDA that monitors active users on a node.
Types of PMDAs
Depending on their architecture and use case, PMDAs can be categorized into the following types:
DSO PMDA
DSO (Dynamic Shared Object) PMDAs are dynamically loadable shared libraries integrated into the PMCD process. These PMDAs are lightweight and efficient but require robust error handling to avoid compromising PMCD’s stability. They are commonly used for monitoring core system resources like CPU usage, memory utilization, and I/O activity.
Daemon PMDA
Daemon PMDAs run as standalone processes and communicate with PMCD via IPC mechanisms such as sockets or pipes. This separation ensures that issues in the PMDA do not affect PMCD’s stability. Daemon PMDAs are ideal for collecting metrics from external or third-party sources, such as databases or cloud services, though they introduce slight performance overhead.
Caching PMDA
Caching PMDAs use a caching mechanism to store previously fetched metric values, reducing the load on the monitored system. These PMDAs are suitable for metrics that change infrequently but may return slightly stale data due to the reliance on cached values.
Language-Based PMDAs
PMDAs can be developed in C, Python, or Perl. C-based PMDAs offer high performance and are suitable for low-level metrics. Python-based PMDAs, which are the focus of this blog, prioritize simplicity and rapid development. Perl-based PMDAs are often used in legacy systems or environments where Perl is already integrated.
PMNS: Performance Metrics Namespace
The Performance Metrics Namespace (PMNS) provides a hierarchical structure for organizing metrics, making it easier to collect, query, and manage performance data.
Defining a New PMNS
- Create the PMNS File: A plain text file specifying the namespace structure and linking it to the metrics your PMDA provides.
- Define Metric Paths: Each metric is identified by a path composed of elements separated by periods (
.
).
Example PMNS Definition:
sample { cluster1 cluster2 } // Metric definition format: // metric_name DOMAIN_NUMBER:CLUSTER_NUMBER:METRIC_NUMBER sample.cluster1 { metric1 SAMPLE:1:1 metric2 SAMPLE:1:2 } sample.cluster2 { metric3 SAMPLE:2:1 }
Explanation: 1. sample
is the root namespace, with cluster1
and cluster2
as subclusters. 2. Metrics like metric1
, metric2
, and metric3
are defined under the clusters. 3. Each metric is defined as metric_name DOMAIN_NUMBER:CLUSTER_NUMBER:METRIC_NUMBER
.
Instance Domains and Metric Types
In PCP, metrics can be instance-based or non-instance-based:
-
Non-instance-based metrics (also called “singular” metrics) have a single value for the whole system or context. For example, a metric tracking
uptime
is non-instance-based because it reflects a system-wide value. -
Instance-based metrics on the other hand, report separate values for each instance within a group. For instance, a metric measuring connection latency would be instance-based, as it varies per connection.
We will explore instance-based metrics and the corresponding PMDA implementations in the upcoming parts of this series.
What is an Instance Domain?
An instance domain defines the set of possible instances for a metric. For instance-based metrics, the instance domain describes all the entities (like disks, CPUs, or network interfaces) for which the metric is reported. Each instance is identified by a unique instance identifier and a name. For non-instance-based metrics, the instance domain is set to PM_INDOM_NULL
, indicating there are no instances, just a single value.
Understanding instance domains is important when designing your PMDA, as it determines how your metrics are structured and queried.
Prerequisites
Before you begin, ensure you have the following:
-
Performance Co-Pilot (PCP) installed and running on your system.
You can install the basic PCP packages with:sudo yum install pcp-zeroconf
-
Python 3.x installed.
Check your version with:python3 --version
-
PCP Python bindings installed.
Install them using:pip install pcp
-
Root or sudo access to install and manage PMDAs.
Tip: For more information on installing or configuring PCP, visit the PCP documentation.
Example PMDA: Checking active users information
Step 1: Directory Structure
The PMDA files need to be placed in a specific directory under /var/lib/pcp/pmdas/
. For this example, the directory structure should look like this:
/var/lib/pcp/pmdas/who/ ├── pmdawho.python ├── Install ├── Remove ├── domain.h ├── pmns
Step 2: Define the domain number
The domain.h
file defines a unique domain number for the PMDA. This domain number is used to identify the PMDA within PCP.
// filepath: /var/lib/pcp/pmdas/who/domain.h #define WHO 260
Here:
WHO
is assigned the domain number260
.- Ensure this domain number is unique and does not conflict with other PMDAs.
How to Decide the Domain Number
The domain number must be unique across all PMDAs in the system. To ensure this, you can refer to the following registry files:
-
/var/lib/pcp/pmns/stdpmid.pcp
: This file contains the standard domain numbers assigned to existing PMDAs. Check this file to avoid conflicts with already registered PMDAs. -
/var/lib/pcp/pmns/stdpmid.local
: This file is used for locally assigned domain numbers. If you are assigning a custom domain number, you can add it to this file to keep track of your local PMDAs.
By consulting these files, you can ensure that the domain number you choose for your PMDA does not conflict with existing ones. For example, in this blog, we have chosen 260
for the Who PMDA, which should be verified against these files before use. Here:
WHO
is assigned the domain number260
.- Ensure this domain number is unique and does not conflict with other PMDAs.
Step 3: Define the Metric
We will define a metric named who.login_info
in our pmns
file. This metric will belong to the who
namespace.
// filepath: /var/lib/pcp/pmdas/who/pmns who { login_info WHO:1:1 }
Here: – WHO
is the PMDA domain number. – 1:1
indicates that this is the first cluster and the first metric in the cluster.
Step 4: Implement the PMDA in Python
Below is the Python implementation of the Who PMDA, pmdawho.python
. This script monitors the users login information and exposes it as a metric.
# filepath: /var/lib/pcp/pmdas/who/pmdawho.python """ 'Who' Performance Metrics Domain Agent (PMDA) for PCP. """ from collections import defaultdict import subprocess from pcp.pmda import PMDA, pmdaMetric from pcp.pmapi import pmUnits import cpmapi as c_api class WhoPMDA(PMDA): """Performance Metrics Domain Agent for collecting users and their stats.""" CLUSTER_WHO = 1 # Cluster number for 'who' metrics METRIC_ITEM_LOGIN_INFO = 1 # Metric number for login info def __init__(self, name="who", domain=260, user="root", logfile="who.log"): super().__init__(name, domain, logfile) if user: self.set_user(user) self.connect_pmcd() self.register_metrics() self.login_info = "" # Placeholder for login info self.set_fetch(self.update_metrics) # background updater self.set_fetch_callback(self.fetch_callback) # responds to PMCD fetches def register_metrics(self): """Registers the metrics.""" self.add_metric( "who.login_info", pmdaMetric( self.pmid(self.CLUSTER_WHO, self.METRIC_ITEM_LOGIN_INFO), c_api.PM_TYPE_STRING, c_api.PM_INDOM_NULL, c_api.PM_SEM_INSTANT, pmUnits(0, 0, 0, 0, 0, 0) ) ) def update_metrics(self): """Background metric fetch that updates internal state.""" try: result = subprocess.run( ["who"], stdout=subprocess.PIPE, stderr=subprocess.PIPE ) self.login_info = result.stdout.decode() except Exception as e: self.log(f"Error fetching users: {e}") def fetch_callback(self, cluster, item, inst): """Handles fetch requests for metrics.""" if cluster == self.CLUSTER_WHO and item == self.METRIC_ITEM_LOGIN_INFO: return [self.login_info, 1] return [c_api.PM_ERR_INST, 0] if __name__ == "__main__": WhoPMDA().run()
PMDA APIs Used
PMDA
Class:- The
PMDA
class is the base class for implementing a custom PMDA. It provides methods to register metrics, handle fetch requests, and communicate with PMCD.
- The
pmdaMetric
:- This class is used to define a metric. It includes details such as the metric’s type, instance domain, semantics, and units.
pmUnits
:- This API is used to define the units of a metric. For example,
pmUnits(0, 0, 0, 0, 0, 0)
specifies that the metric has no units.
- This API is used to define the units of a metric. For example,
set_fetch
:- This method sets a background fetch function (
update_metrics
) that updates the internal state of the PMDA periodically.
- This method sets a background fetch function (
set_fetch_callback
:- This method sets a callback function (
fetch_callback
) that responds to fetch requests from PMCD.
- This method sets a callback function (
add_metric
:- This method registers a metric with the PMDA. It associates the metric name with its definition (
pmdaMetric
).
- This method registers a metric with the PMDA. It associates the metric name with its definition (
pmid
:- This method generates a unique Performance Metric Identifier (PMID) for a metric based on its cluster and item numbers.
connect_pmcd
:- This method establishes a connection between the PMDA and PMCD, enabling the PMDA to register itself and communicate with PMCD.
log
:- This method is used to log messages from the PMDA, which can be helpful for debugging and monitoring.
This script:
- Defines a metric
who.login_info
to monitor users on the node. - Uses the
who
command to determine the current users logged in. - Updates the metric value in the background and responds to fetch requests from PMCD.
Step 5: Supporting Scripts
Install
Script
The Install
script is used to install and register the PMDA with PMCD. It sets up the necessary environment and ensures the PMDA is ready to process requests.
#!/bin/sh # filepath: /var/lib/pcp/pmdas/who/Install . $PCP_DIR/etc/pcp.env . $PCP_SHARE_DIR/lib/pmdaproc.sh iam=who python_opt=true daemon_opt=false pmdaSetup pmdaInstall exit
This script:
- Sources the PCP environment variables.
- Sets up the PMDA using
pmdaSetup
. - Registers the PMDA with PMCD using
pmdaInstall
.
Remove
Script
The Remove
script is used to unregister and remove the PMDA from PMCD.
#! /bin/sh # filepath: /var/lib/pcp/pmdas/who/Remove # # Remove the Who PMDA # . $PCP_DIR/etc/pcp.env . $PCP_SHARE_DIR/lib/pmdaproc.sh iam=who pmdaSetup pmdaRemove exit
This script:
- Sources the PCP environment variables.
- Unregisters the PMDA using
pmdaRemove
.
Step 6: Install and Test the PMDA
- Install the PMDA:
-
Place the
who.python
,Install
,Remove
, anddomain.h
files in the/var/lib/pcp/pmdas/who/
directory. -
Make the
Install
andRemove
scripts executable:$ chmod +x /var/lib/pcp/pmdas/who/Install $ chmod +x /var/lib/pcp/pmdas/who/Remove
-
Run the
Install
script to register the PMDA:$ cd /var/lib/pcp/pmdas/who $ ./Install
-
- Test the PMDA:
-
Use the
pminfo
command to query thewho.login_info
metric:$ pmval who.login_info metric: who.login_info host: pcp_dev semantics: instantaneous value units: none samples: all "root ttyS0 2025-06-18 00:02 mohith pts/4 2025-06-20 05:47 (10.211.9.162) root pts/3 2025-06-20 05:42 (100.107.197.128)"
-
The output should display login information of all the users.
-
Conclusion
This blog demonstrated how to create, install, and test a simple PMDA to monitor active users on the node. By following these steps, you can extend PCP to monitor other system-specific metrics tailored to your needs.