The FMA Triad: Topology, Telemetry & Diagnosis Rules - Part 2
By user9148476 on Apr 29, 2008
As a reminder, the intention of this series is to illustrate how topology, telemetry, and diagnosis rules fit together, where they must agree, and - as a teaser for the last installment - what problems arise when they don't agree.
Part 2 - Telemetry
telemetry (tə-lĕm'ĭ-trē): the science and technology of automatic measurement and transmission of data by wire, radio, or other means from remote sources, as from space vehicles, to receiving stations for recording and analysis.
Ok, so FMA may not be receiving information from space (yet :). But error detectors in a system - whether they be hardware, sensors, software, or firmware - can provide FMA information about an problem detected in the system. The telemetry given to FMA takes the form of error reports - or ereports.
All FMA ereports are defined in the FMA events registry (see related blog). The ereport class name and content convey details of the error to a diagnosis engine. The ereport also represents an agreement between a provider of telemetry (error detector) and the consumer of that telemetry (diagnosis engine). There's lots and lots of ereports, and each has its own specific content, as different subsystems require different information. But with respect to topology, the focus is one of the common elements present in any ereport - the detector.
The detector takes the form of a fault managed resource identifier (FMRI). In other words, the thing that detected (but not necessarily caused) an error in the system. It's best to look at an example:
# fmdump -eV TIME CLASS Mar 31 2008 12:08:36.084161600 ereport.io.fire.dmc.eq_not_en nvlist version: 0 class = ereport.io.fire.dmc.eq_not_en ena = 0x317b96b9efe2c02 detector = (embedded nvlist) nvlist version: 0 version = 0x0 scheme = hc hc-root = hc-list-sz = 3 hc-list = (array of embedded nvlists) (start hc-list) nvlist version: 0 hc-name = ioboard hc-id = 0 (end hc-list) (start hc-list) nvlist version: 0 hc-name = hostbridge hc-id = 0 (end hc-list) (start hc-list) nvlist version: 0 hc-name = pciexrc hc-id = 0 (end hc-list) (end detector) ...
Note: Detectors are not always reported in the 'hc' scheme. For example, device driver detections are typically in the 'dev' scheme.So this is 'hc' scheme. See the 'hc-list'? And each member of the list having a name and an id? Let's write this a little differently:
Having agreement between Solaris topology and ereport detector may extend beyond Solaris. For example, in sun4v systems, much of the error telemetry is sourced outside of Solaris in the Service Processor (SP). The SP must know what hierarchy to encode in the ereport detector, or the diagnosis engines will not react to the incoming ereport. In fact, FMD will complain - loudly. But examples on that in the next installment.
In the next installment of this series, we'll examine some Eversholt diagnosis rules and how the rules relate to the topology.