Introduction

Performance Co-Pilot (PCP) is a versatile and extensible tool for gathering performance diagnostics of a system. Due to its many features and strong upstream support, it is widely used to monitor system health in production environments. However, there are scenarios where PCP does not collect the metrics you need, requiring system administrators to extend PCP by writing a Performance Metric Domain Agent (PMDA).

PMDAs are responsible for gathering performance data and keeping it updated for requests from the Performance Metric Collection Daemon (PMCD).

When Do You Need a New PMDA?

A new PMDA is required when the metric you need isn’t available through existing PMDAs. This may happen in the following cases:

  • PCP is still evolving, and the performance metric you need may not yet exist.
  • The metric is specific to your requirements, such as monitoring a custom database or application.
  • You have developed an application or module that you want to monitor through PCP.

Where to Begin?

Comprehensive documentation about PCP and its development process is available here: Programming Performance Co-Pilot

This blog simplifies the process and provides a concise guide to writing a PMDA, focusing on Python-based PMDAs. By the end of this blog, you’ll have a working PMDA that monitors active users on a node.

Types of PMDAs

Depending on their architecture and use case, PMDAs can be categorized into the following types:

DSO PMDA

DSO (Dynamic Shared Object) PMDAs are dynamically loadable shared libraries integrated into the PMCD process. These PMDAs are lightweight and efficient but require robust error handling to avoid compromising PMCD’s stability. They are commonly used for monitoring core system resources like CPU usage, memory utilization, and I/O activity.

Daemon PMDA

Daemon PMDAs run as standalone processes and communicate with PMCD via IPC mechanisms such as sockets or pipes. This separation ensures that issues in the PMDA do not affect PMCD’s stability. Daemon PMDAs are ideal for collecting metrics from external or third-party sources, such as databases or cloud services, though they introduce slight performance overhead.

Caching PMDA

Caching PMDAs use a caching mechanism to store previously fetched metric values, reducing the load on the monitored system. These PMDAs are suitable for metrics that change infrequently but may return slightly stale data due to the reliance on cached values.

Language-Based PMDAs

PMDAs can be developed in C, Python, or Perl. C-based PMDAs offer high performance and are suitable for low-level metrics. Python-based PMDAs, which are the focus of this blog, prioritize simplicity and rapid development. Perl-based PMDAs are often used in legacy systems or environments where Perl is already integrated.

PMNS: Performance Metrics Namespace

The Performance Metrics Namespace (PMNS) provides a hierarchical structure for organizing metrics, making it easier to collect, query, and manage performance data.

Defining a New PMNS

  1. Create the PMNS File: A plain text file specifying the namespace structure and linking it to the metrics your PMDA provides.
  2. Define Metric Paths: Each metric is identified by a path composed of elements separated by periods (.).

Example PMNS Definition:

sample {
    cluster1 
    cluster2
}

// Metric definition format:
// metric_name    DOMAIN_NUMBER:CLUSTER_NUMBER:METRIC_NUMBER

sample.cluster1 {
  metric1     SAMPLE:1:1
  metric2     SAMPLE:1:2
}

sample.cluster2 {
  metric3     SAMPLE:2:1
}

Explanation: 1. sample is the root namespace, with cluster1 and cluster2 as subclusters. 2. Metrics like metric1, metric2, and metric3 are defined under the clusters. 3. Each metric is defined as metric_name DOMAIN_NUMBER:CLUSTER_NUMBER:METRIC_NUMBER.

Instance Domains and Metric Types

In PCP, metrics can be instance-based or non-instance-based:

  • Non-instance-based metrics (also called “singular” metrics) have a single value for the whole system or context. For example, a metric tracking uptime is non-instance-based because it reflects a system-wide value.

  • Instance-based metrics on the other hand, report separate values for each instance within a group. For instance, a metric measuring connection latency would be instance-based, as it varies per connection.

We will explore instance-based metrics and the corresponding PMDA implementations in the upcoming parts of this series.

What is an Instance Domain?

An instance domain defines the set of possible instances for a metric. For instance-based metrics, the instance domain describes all the entities (like disks, CPUs, or network interfaces) for which the metric is reported. Each instance is identified by a unique instance identifier and a name. For non-instance-based metrics, the instance domain is set to PM_INDOM_NULL, indicating there are no instances, just a single value.

Understanding instance domains is important when designing your PMDA, as it determines how your metrics are structured and queried.

Prerequisites

Before you begin, ensure you have the following:

  • Performance Co-Pilot (PCP) installed and running on your system.
    You can install the basic PCP packages with:

    sudo yum install pcp-zeroconf
  • Python 3.x installed.
    Check your version with:

    python3 --version
  • PCP Python bindings installed.
    Install them using:

    pip install pcp
  • Root or sudo access to install and manage PMDAs.

Tip: For more information on installing or configuring PCP, visit the PCP documentation.

Example PMDA: Checking active users information

Step 1: Directory Structure

The PMDA files need to be placed in a specific directory under /var/lib/pcp/pmdas/. For this example, the directory structure should look like this:

/var/lib/pcp/pmdas/who/
    ├── pmdawho.python
    ├── Install
    ├── Remove
    ├── domain.h
    ├── pmns

Step 2: Define the domain number

The domain.h file defines a unique domain number for the PMDA. This domain number is used to identify the PMDA within PCP.

// filepath: /var/lib/pcp/pmdas/who/domain.h
#define WHO 260

Here:

  • WHO is assigned the domain number 260.
  • Ensure this domain number is unique and does not conflict with other PMDAs.

How to Decide the Domain Number

The domain number must be unique across all PMDAs in the system. To ensure this, you can refer to the following registry files:

  1. /var/lib/pcp/pmns/stdpmid.pcp: This file contains the standard domain numbers assigned to existing PMDAs. Check this file to avoid conflicts with already registered PMDAs.

  2. /var/lib/pcp/pmns/stdpmid.local: This file is used for locally assigned domain numbers. If you are assigning a custom domain number, you can add it to this file to keep track of your local PMDAs.

By consulting these files, you can ensure that the domain number you choose for your PMDA does not conflict with existing ones. For example, in this blog, we have chosen 260 for the Who PMDA, which should be verified against these files before use. Here:

  • WHO is assigned the domain number 260.
  • Ensure this domain number is unique and does not conflict with other PMDAs.

Step 3: Define the Metric

We will define a metric named who.login_info in our pmns file. This metric will belong to the who namespace.

// filepath: /var/lib/pcp/pmdas/who/pmns
who {
    login_info    WHO:1:1
}

Here: – WHO is the PMDA domain number. – 1:1 indicates that this is the first cluster and the first metric in the cluster.

Step 4: Implement the PMDA in Python

Below is the Python implementation of the Who PMDA, pmdawho.python. This script monitors the users login information and exposes it as a metric.

# filepath: /var/lib/pcp/pmdas/who/pmdawho.python
"""
'Who' Performance Metrics Domain Agent (PMDA) for PCP.
"""

from collections import defaultdict
import subprocess
from pcp.pmda import PMDA, pmdaMetric
from pcp.pmapi import pmUnits
import cpmapi as c_api

class WhoPMDA(PMDA):
    """Performance Metrics Domain Agent for collecting users and their stats."""

    CLUSTER_WHO = 1 # Cluster number for 'who' metrics
    METRIC_ITEM_LOGIN_INFO = 1      # Metric number for login info

    def __init__(self, name="who", domain=260, user="root", logfile="who.log"):
        super().__init__(name, domain, logfile)

        if user:
            self.set_user(user)

        self.connect_pmcd()
        self.register_metrics()

        self.login_info = "" # Placeholder for login info
        self.set_fetch(self.update_metrics)   # background updater
        self.set_fetch_callback(self.fetch_callback)    # responds to PMCD fetches

    def register_metrics(self):
        """Registers the metrics."""

        self.add_metric(
            "who.login_info",
            pmdaMetric(
                self.pmid(self.CLUSTER_WHO, self.METRIC_ITEM_LOGIN_INFO),
                c_api.PM_TYPE_STRING,
                c_api.PM_INDOM_NULL,
                c_api.PM_SEM_INSTANT,
                pmUnits(0, 0, 0, 0, 0, 0)
            )
        )

    def update_metrics(self):
        """Background metric fetch that updates internal state."""
        try:
            result = subprocess.run(
                ["who"], stdout=subprocess.PIPE, stderr=subprocess.PIPE
            )
            self.login_info = result.stdout.decode()
        except Exception as e:
            self.log(f"Error fetching users: {e}")

    def fetch_callback(self, cluster, item, inst):
        """Handles fetch requests for metrics."""
        if cluster == self.CLUSTER_WHO and item == self.METRIC_ITEM_LOGIN_INFO:
            return [self.login_info, 1]
        return [c_api.PM_ERR_INST, 0]


if __name__ == "__main__":
    WhoPMDA().run()
 

PMDA APIs Used

  1. PMDA Class:
    • The PMDA class is the base class for implementing a custom PMDA. It provides methods to register metrics, handle fetch requests, and communicate with PMCD.
  2. pmdaMetric:
    • This class is used to define a metric. It includes details such as the metric’s type, instance domain, semantics, and units.
  3. pmUnits:
    • This API is used to define the units of a metric. For example, pmUnits(0, 0, 0, 0, 0, 0) specifies that the metric has no units.
  4. set_fetch:
    • This method sets a background fetch function (update_metrics) that updates the internal state of the PMDA periodically.
  5. set_fetch_callback:
    • This method sets a callback function (fetch_callback) that responds to fetch requests from PMCD.
  6. add_metric:
    • This method registers a metric with the PMDA. It associates the metric name with its definition (pmdaMetric).
  7. pmid:
    • This method generates a unique Performance Metric Identifier (PMID) for a metric based on its cluster and item numbers.
  8. connect_pmcd:
    • This method establishes a connection between the PMDA and PMCD, enabling the PMDA to register itself and communicate with PMCD.
  9. log:
    • This method is used to log messages from the PMDA, which can be helpful for debugging and monitoring.

This script:

  • Defines a metric who.login_info to monitor users on the node.
  • Uses the who command to determine the current users logged in.
  • Updates the metric value in the background and responds to fetch requests from PMCD.

Step 5: Supporting Scripts

Install Script

The Install script is used to install and register the PMDA with PMCD. It sets up the necessary environment and ensures the PMDA is ready to process requests.

#!/bin/sh
# filepath: /var/lib/pcp/pmdas/who/Install

. $PCP_DIR/etc/pcp.env
. $PCP_SHARE_DIR/lib/pmdaproc.sh

iam=who
python_opt=true
daemon_opt=false

pmdaSetup
pmdaInstall
exit

This script:

  • Sources the PCP environment variables.
  • Sets up the PMDA using pmdaSetup.
  • Registers the PMDA with PMCD using pmdaInstall.

Remove Script

The Remove script is used to unregister and remove the PMDA from PMCD.

#! /bin/sh
# filepath: /var/lib/pcp/pmdas/who/Remove
#
# Remove the Who PMDA
#

. $PCP_DIR/etc/pcp.env
. $PCP_SHARE_DIR/lib/pmdaproc.sh

iam=who

pmdaSetup
pmdaRemove

exit

This script:

  • Sources the PCP environment variables.
  • Unregisters the PMDA using pmdaRemove.

Step 6: Install and Test the PMDA

  1. Install the PMDA:
    • Place the who.python, Install, Remove, and domain.h files in the /var/lib/pcp/pmdas/who/ directory.

    • Make the Install and Remove scripts executable:

      $ chmod +x /var/lib/pcp/pmdas/who/Install
      $ chmod +x /var/lib/pcp/pmdas/who/Remove
    • Run the Install script to register the PMDA:

      $ cd /var/lib/pcp/pmdas/who
      $ ./Install
  2. Test the PMDA:
    • Use the pminfo command to query the who.login_info metric:

      $ pmval who.login_info
      
      metric:    who.login_info
      host:      pcp_dev
      semantics: instantaneous value
      units:     none
      samples:   all
      "root     ttyS0        2025-06-18 00:02
      mohith   pts/4        2025-06-20 05:47 (10.211.9.162)
      root     pts/3        2025-06-20 05:42 (100.107.197.128)"
    • The output should display login information of all the users.

Conclusion

This blog demonstrated how to create, install, and test a simple PMDA to monitor active users on the node. By following these steps, you can extend PCP to monitor other system-specific metrics tailored to your needs.