Solaris platform integration - disk monitoring
By eschrock on May 26, 2007
Two weeks ago I putback PSARC 2007/202, the second step in generalizing the x4500 disk monitor. As explained in my previous blog post, one of the tasks of the original sfx4500-disk module was reading SMART data from disks and generating associated FMA faults. This platform-specific functionality needed to be generalized to effectively support future Sun platforms.
This putback did not add any new user-visible features to Solaris, but it did refactor the code in the following ways:
A new private library, libdiskstatus, was added. This generic library uses uSCSI to read data from SCSI (or SATA via emulation) devices. It is not a generic SMART monitoring library, focusing only on the three generally available disk faults: over temperature, predictive failure, and self-test failure. There is a single function, disk_status_get() that reurns an nvlist describing the current parameters reported by the drive and whether any faults are present.
This library is used by the SATA libtopo module to export a generic TOPO_METH_DISK_STATUS method. This method keeps all the implementation details within libtopo and exports a generic inerface for consumers.
A new fmd module, disk-transport, periodically iterates over libtopo nodes and invokes the TOPO_METH_DISK_STATUS method on any supported nodes. The module generates FMA ereports for any detected errors.
These ereports are translated to faults by a simple eversholt DE. These are the same faults that were originally generated by the sfx4500-disk module, so the code that consumes them remains unchanged.
These changes form the foundation that will allow future Sun platforms to detect and react to disk failures, eliminating 5200 lines of platform-specific code in the process. The next major steps are currently in progress:
The FMA team, as part of the sensor framework, is expanding libtopo to include the ability to represent indicators (LEDs) in a generic fashion. This will replace the x4500 specific properties and associated machinery with generic code.
The SCSI FMA team is finalizing the libtopo enumeration work that will allow arbitrary SCSI devices (not just SATA) to be enumerated under libtopo and therefore be monitored by the disk-transport module. The first phase will simply replicate the existing sfx4500-disk functionality, but will enable us to model future non-SATA platforms as well as external storage devices.
Finally, I am finishing up my long-overdue ZFS FMA work, a necessary step towards connecting ZFS and disk diagnosis. Stay tuned for more info.