X

News, tips, partners, and perspectives for the Oracle Solaris operating system

StatsStore threshold alerts

Oracle Solaris 11.4 release introduced a new feature i.e, Solaris Analytics that helps in analyzing Oracle Solaris systems issues and monitoring various resources on the system.  The two components of the Analytics feature are StatsStore and Analytics WebUI. You can read more about these components in the following blogs:

Solaris Analytics - An Overview

what-is-this-bui-thing-anyway

With the release of  Oracle Solaris 11.4 SRU 19, you can configure StatsStore to generate a FMA alert when a stat value corresponding to a SSID crosses over a threshold value.  When the stat value falls below the threshold, the FMA alert is automatically cleared by StatsStore. For more information on FMA, refer to the Oracle Solaris 11.4 FMA documentation.

As an example, consider a use case to monitor the percentage capacity utilization of all the zpools on a server. To let StatsStore monitor all the zpool resources, we need to have the following JSON file in /usr/libsstore/metadata/json/thresholds:

root@test# cat zpool-usage.json
{
    "$schema": "//:threshold",
    "copyright": "Copyright (c) 2019, Oracle and/or its affiliates. All rights reserved.",
    "description": "Default threshold mapping for zpool usage",
    "id": "zpool-usage",
    "query-interval": 10,
    "ssid-threshold-map-list": [
        {
            "ssid": "//:class.zpool//:*//:stat.capacity",
            "ssid-query-interval": 300,
            "ssid-threshold-list": [
                "90.0"
            ]
        }
    ]
}

Refer to the ssid-metadata(7) for details regarding the JSON file format.

What does this actually do?

StatsStore monitors the stat //:stat.capacity for all the zpool resources that map to SSID //:class.zpool//:*  and generates a FMA alert if the zpool capacity crosses over 90% of its allocated capacity.  Here I created a zpool 'tmp_pool' and filled it up to 95% capacity. The alert that is generated  looks like this:

SUNW-MSG-ID: SUNOS-8000-N9, TYPE: Alert, VER: 1, SEVERITY: Minor
EVENT-TIME: Wed Apr  1 01:10:18 PDT 2020
PLATFORM: VirtualBox, CSN: 0, HOSTNAME: test
SOURCE: software-diagnosis, REV: 0.2
EVENT-ID: 0f09e785-be6e-45df-960e-d37ba8912ba6
DESC: //:class.zpool//:res.name/tmp_pool//:stat.capacity has crossed threshold 90.000000.
AUTO-RESPONSE: Current value is 95.000000.
IMPACT: Impact may depend on workload or other conditions, please review based on your policies.
REC-ACTION: Please address open issues. Please refer to the associated reference document at http://support.oracle.com/msg/SUNOS-8000-N9 for the latest service procedures and policies regarding this diagnosis.
FRU-LOCATION:

You will also see this in the WebUI in the "Faults, Alerts & Activity" tab as shown below.

If I remove some data from tmp_pool so that utilization falls below the threshold of 90%, you will see that FMA alert being reset as below.

Zpool usage threshold monitoring is enabled by default with this feature.  You can enable threshold monitoring for other SSID's by creating JSON files similar to zpool-usage.json shown above.  Refer to ssid-metadata(7) for another example JSON thresholds file. However note that SSID with //:op in it is not yet supported with this feature. Restarting the sstore:default SMF service would start the monitoring for SSID's specified in the JSON files.

In case you don't wish to use this feature, you can turn it off by setting a SMF property config/threshold-alerts to false. Refresh the SMF service configuration and restart the sstore service i.e, sstore:default will result in sstore service starting up with thresholds feature disabled.

Join the discussion

Comments ( 3 )
  • Fritz Kraus Wednesday, April 8, 2020
    very nice feature! I used to script the zpool utilization including alarming, but to see this as an alert in Analytics is a good idea.
  • DavidH Sunday, April 12, 2020
    How do we gateway/route the FMA alerts through SNMPv3 Traps?

    How do we gateway StatStore data through NetSNMP?
  • Pramod Rao Thursday, April 16, 2020
    Yes, this should be possible. Please refer to the documentation - Managing Faults, Defects and Alerts in Oracle Solaris 11.4 @

    https://docs.oracle.com/cd/E37838_01/html/E61036/gliqr.html#scrolltoc

    Also refer to the manual snmpd.conf(5) - configuration file for the Net-SNMP SNMP agent which provides the details of SNMPv3 configuration.
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.