X

News, tips, partners, and perspectives for the Oracle Solaris operating system

Oracle Solaris 11.4 release introduced a new feature i.e, Solaris Analytics that helps in analyzing Oracle Solaris systems issues and monitoring various resources on the system.  The two components of the Analytics feature are StatsStore and Analytics WebUI. You can read more about these components in the following blogs:

Solaris Analytics - An Overview

what-is-this-bui-thing-anyway

With the release of  Oracle Solaris 11.4 SRU 19, you can configure StatsStore to generate a FMA alert when a stat value corresponding to a SSID crosses over a threshold value.  When the stat value falls below the threshold, the FMA alert is automatically cleared by StatsStore. For more information on FMA, refer to the Oracle Solaris 11.4 FMA documentation.

As an example, consider a use case to monitor the percentage capacity utilization of all the zpools on a server. To let StatsStore monitor all the zpool resources, we need to have the following JSON file in /usr/libsstore/metadata/json/thresholds:

root@test# cat zpool-usage.json
{
    "$schema": "//:threshold",
    "copyright": "Copyright (c) 2019, Oracle and/or its affiliates. All rights reserved.",
    "description": "Default threshold mapping for zpool usage",
    "id": "zpool-usage",
    "query-interval": 10,
    "ssid-threshold-map-list": [
        {
            "ssid": "//:class.zpool//:*//:stat.capacity",
            "ssid-query-interval": 300,
            "ssid-threshold-list": [
                "90.0"
            ]
        }
    ]
}

Refer to the ssid-metadata(7) for details regarding the JSON file format.

What does this actually do?

StatsStore monitors the stat //:stat.capacity for all the zpool resources that map to SSID //:class.zpool//:*  and generates a FMA alert if the zpool capacity crosses over 90% of its allocated capacity.  Here I created a zpool 'tmp_pool' and filled it up to 95% capacity. The alert that is generated  looks like this:

SUNW-MSG-ID: SUNOS-8000-N9, TYPE: Alert, VER: 1, SEVERITY: Minor
EVENT-TIME: Wed Apr  1 01:10:18 PDT 2020
PLATFORM: VirtualBox, CSN: 0, HOSTNAME: test
SOURCE: software-diagnosis, REV: 0.2
EVENT-ID: 0f09e785-be6e-45df-960e-d37ba8912ba6
DESC: //:class.zpool//:res.name/tmp_pool//:stat.capacity has crossed threshold 90.000000.
AUTO-RESPONSE: Current value is 95.000000.
IMPACT: Impact may depend on workload or other conditions, please review based on your policies.
REC-ACTION: Please address open issues. Please refer to the associated reference document at http://support.oracle.com/msg/SUNOS-8000-N9 for the latest service procedures and policies regarding this diagnosis.
FRU-LOCATION:

You will also see this in the WebUI in the "Faults, Alerts & Activity" tab as shown below.

If I remove some data from tmp_pool so that utilization falls below the threshold of 90%, you will see that FMA alert being reset as below.

Zpool usage threshold monitoring is enabled by default with this feature.  You can enable threshold monitoring for other SSID's by creating JSON files similar to zpool-usage.json shown above.  Refer to ssid-metadata(7) for another example JSON thresholds file. However note that SSID with //:op in it is not yet supported with this feature. Restarting the sstore:default SMF service would start the monitoring for SSID's specified in the JSON files.

In case you don't wish to use this feature, you can turn it off by setting a SMF property config/threshold-alerts to false. Refresh the SMF service configuration and restart the sstore service i.e, sstore:default will result in sstore service starting up with thresholds feature disabled.