The FMA Triad: Topology, Telemetry & Diagnosis Rules
By user9148476 on Mar 11, 2008
In recent weeks, I found myself explaining and talking about aspects of the triad to several different audiences within Sun. So why not write it down? Over the next several weeks, I plan to blog about the FMA triad, looking at each piece individually and finally tying them all together. I hope it will give a fuller picture of what each piece is and how they work together.
Part 1 - Topology
To paraphrase the FMD PRM, topology is a view-in-time of the resources in a system used in fault management activities. To expand on that, it's a hierarchy of the system representing:
- any resource that can detect an error
- any resource that can be deconfigured to avoid an error (an ASRU)
- any resource that can be replaced to repair a fault (a FRU)
When designing FMA support for a new product or subsystem, determining the topology is an important piece. You want a reasonable representation of the fault managed resources in the topology, as well as sufficient data with each resource, so FMD modules can simply query the topology given an FMRI for all the information pertaining to a resource. There's several subcomponents that make up a topology. And as FMA is a complex technology, it's often difficult to describe one aspect of it without referring to another aspect. And you'll see that in the text below. I've made my best attempts to alleviate that by describing the pieces from the point of view of how a topology is constructed.
The Big Picture
How does FMD construct a topology? The topology for any system starts with a topology map file. The map lays out the basic hierarchy of the topology of the system, either statically or by invoking enumerators. Enumerators traverse and/or discover resources of interest for a particular subsystem. The resources are described by one or more FMRI schemes to uniquely identify the resource within the system. And finally, FMD can maintain multiple topology snapshots, as the resources within the system can change while the system is running (e.g. hot plug, dynamic reconfiguration).
At FMD start-of-day, the daemon looks for a topology map for the platform. A topology map is a XML file that defines the hierarchy for a given system or family of systems. FMD first looks for a map for a system from most-specific to least-specific. The first map found is used. The sequence is:
- a platform-specific map in /usr/platform/`uname -i`/lib/fm/topo/maps
- an architecture-specific map in /usr/platform/`uname -m`/lib/fm/topo/maps
- a generic map in /usr/lib/fm/topo/maps
29 <topology name='SUNW,SPARC-Enterprise-T5220' scheme='hc'> 30 <range name='motherboard' min='0' max='0'> 31 <enum-method name='motherboard' version='1'/> 32 33 <dependents grouping='children'> 34 <range name='chip' min='0' max='0'> 35 <enum-method name='chip' version='1'/> 36 37 <dependents grouping='children'> 38 <range name='hostbridge' min='0' max='254'> 39 <enum-method name='hostbridge' version='1' /> 40 41 <dependents grouping='children'> 42 <range name='niu' min='0' max='0'> 43 <enum-method name='niu' version='1'/> 44 </range> 45 </dependents> 46 47 </range> 48 </dependents> 49 </range> 50 51 </dependents> 52 53 </range> 54 </topology>Even without understanding the various tags, it's reasonably obvious that the hierarchy of a T5220 system has a motherboard (lines 30-31). The motherboard contains a chip (lines 33-35). Beneath the chip is a hostbridge (lines 37-39). And then finally an niu (lines 41-45) beneath the hostbridge. And on line 29 <topology> line, the scheme used for the enumeration is the "hc", or hardware component, FMRI scheme (more on that later).
The full explanation of the valid XML tags for topology maps are in the topology.dtd.1 file, but I'll explain a few of them here.
- topology: the root level. The scheme attribute specifies which FMRI scheme this map will enumerate. The only supported schemes via a map are 'hc' and 'dev'. In practice, maps use the 'hc' scheme.
- range: specify a range of nodes. For 'hc' scheme maps, the name attribute must be present in the list of allowable hc names. The min and max attributes specify how many nodes to create. If a node has children, all children will be instantiated for each node in the range.
- enum-method: the name attribute specifies a topo enumerator to invoke. The min and max attributes from the range statement are passed to the enumerator's entry function
- dependents: indicates following ranges are dependent upon the predecessor, with linkage specified by the grouping attribute. In practice, dependents are typically specified as children to construct the hierarchy.
Note: The indentation in the file is not required. But as with any source code, clean indentation is useful. For topo maps, it helps to convey hierarchy at a glance.
The above map uses enumerators exclusively. Technically speaking, you don't need to use enumerators to create topology. All of the nodes in a topology can be statically defined in XML. Or a mix of XML and enumerator invocations can be, and are, used. However, direct XML means that portion of the topology is static. OK for some things that are fixed about a system (e.g. the number of physical boards). But in general, enumerators are better as they can dynamically scale. Enumerators are discussed in the next section.
One other thing about topo maps - the ranges in the map don't necessarily indicate all of the nodes that will be in the resulting topology. Quite the contrary. This is because enumerators can create several types of nodes, as well as invoke other enumerators. Each range in the map is the entry point for a particular subsection of the overall topology. A prime example of that is the 'hostbridge' enumerator, which after creating hostbridge and PCI root complex nodes, invokes the 'pcibus' enumerator to walk the PCI fabric. 'pcibus' is not listed in the topology map, but contributes to topoolgy. And in my current project, I'm expecting to produce a topology map that invokes a single enumerator - so a given topo map may not be as revealing about a system's hierarchy as the example used above.
An enumerator, as the name suggests, catalogs the nodes of a particular subsystem. Depending on the enumerator, it may walk existing Solaris structures such as the device tree, use other data sources to determine node presence and properties, or in rare cases know apriori the extent of a given subsystem. Enumerators are invoked either via a topology map or via another enumerator. All topology nodes an enumerator creates are topological children of the invoking entity. It's beyond the scope of this blog to discuss enumerator implementation. But if you're curious, check out Section 9.7 of the aforementioned FMD PRM as well as the existing enumerators in the source gate.
In practice, we tend to use enumerators for most things. For larger subsystems, such as CPUs or PCIE, it makes sense to programmatically discover the topology. But, even when we know apriori there is only a fixed number of nodes in a subsystem, we may still employ an enumerator. At first, this seems silly. However, at least in the sun4v/SPARC world, some of the properties we want to include in FRU FMRIs are the part number and serial number of the FRU. (This speeds up ordering replacement parts, and enables automation of the process in the future.) At the very least, the serial number will be unique for all parts...and part numbers can change over time. So such values can't be encoded into an XML map. The solution is to use an enumerator, that can electronically determine these values at runtime and encode it into the topology node. An example is the 'motherboard' enumerator.
One other interesting twist on enumerators - an enumerator can invoke a topology XML map. This is useful when there's a need to add/correct data in the nodes created. An example is in the x86 space where the BIOS is missing information. Or in some cases, it's just easier to represent a subsystem with straight XML (example).
- cpu: logical CPU
- dev: device tree
- hc: physical hardware component
- mod: Solaris module
- pkg: Solaris package
When it comes to enumeration, 'hc' is the default. A prime charter of FMA is to tell you what FRU needs to be replaced. 'hc' is what describes physical entities, so all resources in the system must be enumerated in the 'hc' scheme. Another scheme that is commonly enumerated is the 'cpu' scheme.
A resource can be represented in several schemes. For example, here's the 'hc' representation of a CPU strand from a T5220:
hc://:product-id=SUNW,SPARC-Enterprise-T5220:chassis-id=0721BBB013:server-id=wgs48-163: serial=fac006b0e653883/motherboard=0/chip=0/cpu=26And the same CPU strand in the 'cpu' scheme:
cpu:///cpuid=26/serial=fac006b0e653883One quickly notices that the 'hc' scheme can get very long, very fast. This is particularly true for IO devices deep within a PCI Express fabric. Furthermore, it contains physical attributes such as chassis & serial numbers.
An initial topology snapshot is taken when FMD starts up. The flow is described in the Topology Maps section above. Until the initial snapshot is taken, FMD will not dispatch any events to any modules. On systems that support dynamic reconfiguration (DR) or have hot-plug capabilities, the system topology can change over time. Resources can be added, changed, or removed. To accommodate this behavior, FMD supports the notion of a topology snapshot. So as the topology changes, FMD can keep an up to date view of the system resources.
When DR or hot-plug happens, Solaris issues an EC_DR sysevent. FMD subscribes to this sysevent and when received, the topology is updated. The same flow of reading the map file, invoking enumerators, and so on is followed. Unlike the initial snapshot, event deliver is not suspended. The event generation, as well as the DR, are inherently asynchronous. So it is possible, but unlikely, to receive an event before the EC_DR sysevent is received and processed. However, any modules calling fmd_hdl_topo_hold() while a snapshot is in process will block.
Prior topology snapshots are not automatically overwritten. Older snapshots must be maintained - at least until there are no pre-existing faults or in-flight fault management exercises referencing a particular snapshot. libtopo provides the interfaces for taking and releasing topology snapshots, as well as the APIs for referencing nodes within a snapshot. Snapshots are described further in the FMD PRM.
The /usr/lib/fm/fmd/fmtopo utility is provided to display the current topology of the system. fmtopo is not formally documented in a manual page, but is discussed in the FMD PRM. There's also brief online help via fmtopo -h.
Note: Be sure to fmtopo as root, or you may get limited/incomplete output, with no warning as such.
With no options, fmtopo outputs the entire topology in the 'hc' scheme. An example from a T5220 is here. fmtopo can also be used to dump the topology in another FMRI scheme - provided the enumerators for the system have enumerated the resources in that scheme. On sun4v systems and i86pc systems, the 'cpu' scheme is available:
# fmtopo -s cpu TIME UUID Feb 22 18:27:33 b885e2c2-0962-494f-a1bb-bddfc4ed4d2b cpu:///cpuid=0/serial=fac006b0e653883 cpu:///cpuid=1/serial=fac006b0e653883 cpu:///cpuid=2/serial=fac006b0e653883 cpu:///cpuid=3/serial=fac006b0e653883 ...I personally find the -p option useful as the output relates to fmadm faulty output nicely. Using the CPU strand example from above:
# fmtopo -p TIME UUID Feb 22 18:28:36 d17a8fb3-8e34-498c-baa0-a7c17b8c0e59 ... hc://:product-id=SUNW,SPARC-Enterprise-T5220:chassis-id=0721BBB013:server-id=wgs48-163: serial=fac006b0e653883/motherboard=0/chip=0/cpu=26 ASRU: cpu:///cpuid=26/serial=fac006b0e653883 FRU: hc://:product-id=SUNW,SPARC-Enterprise-T5220:chassis-id=0721BBB013: server-id=wgs48-163:serial=110523:part=501781301/motherboard=0 Label: MB ...The FRU and the Label output are included in the fmadm faulty output, provided you are using a Solaris version with the recent updates to fmadm faulty.
# fmadm faulty --------------- ------------------------------------ -------------- --------- TIME EVENT-ID MSG-ID SEVERITY --------------- ------------------------------------ -------------- --------- Feb 21 16:29:58 6f9f6a76-4d24-c1b3-a641-8374ec80b4e7 SUN4V-8001-L2 Critical Fault class : fault.cpu.ultraSPARC-T2plus.l2data-u ... FRU : "MB" (hc://:product-id=SUNW,T5240:chassis-id=0725BB4043: server-id=wgs48-113:serial=100014:part=501784702/motherboard=0) ...Having all of this data on a single line is really nice. The label matches the part nomenclature for the system (which is typically silk screened on to the parts) and the full 'hc' FRU FMRI from the topology includes the part and serial number of the affected component. Everything you need to locate the busted part and order a new one.
In the next installment of this series, we'll take a look at some FMA telemetry. In particular, the payload members of ereports that relate to topology.