OpenSolaris | Thursday, August 7, 2008

SES Sensors and Indicators

Last week, Rob Johnston and I coordinated two putbacks to Solaris to further the cause of Solaris platform integration, this time focusing on sensors and indicators. Rob has a great blog post with an overview of the new sensor abstraction layer in libtopo. Rob did most of the hard work- my contribution consisted only of extending the SES enumerator to support the new facility infrastructure.

You can find a detailed description of the changes in the original FMA portfolio here, but it's much easier to understand via demonstration. This is the fmtopo output for a fan node in a J4400 JBOD:

hc://:product-id=SUN-Storage-J4400:chassis-id=2029QTF0000000005:server-id=/ses-enclosure=1/fan=0
group: protocol version: 1 stability: Private/Private
resource fmri hc://:product-id=SUN-Storage-J4400:chassis-id=2029QTF0000000005:server-id=/ses-enclosure=1/fan=0
label string Cooling Fan 0
FRU fmri hc://:product-id=SUN-Storage-J4400:chassis-id=2029QTF0000000005:server-id=/ses-enclosure=1/fan=0
group: authority version: 1 stability: Private/Private
product-id string SUN-Storage-J4400
chassis-id string 2029QTF0000000005
server-id string
group: ses version: 1 stability: Private/Private
node-id uint64 0x1f
target-path string /dev/es/ses3
hc://:product-id=SUN-Storage-J4400:chassis-id=2029QTF0000000005:server-id=/ses-enclosure=1/fan=0?indicator=ident
group: protocol version: 1 stability: Private/Private
resource fmri hc://:product-id=SUN-Storage-J4400:chassis-id=2029QTF0000000005:server-id=/ses-enclosure=1/fan=0?indicator=ident
group: authority version: 1 stability: Private/Private
product-id string SUN-Storage-J4400
chassis-id string 2029QTF0000000005
server-id string
group: facility version: 1 stability: Private/Private
type uint32 0x1 (LOCATE)
mode uint32 0x0 (OFF)
group: ses version: 1 stability: Private/Private
node-id uint64 0x1f
hc://:product-id=SUN-Storage-J4400:chassis-id=2029QTF0000000005:server-id=/ses-enclosure=1/fan=0?indicator=fail
group: protocol version: 1 stability: Private/Private
resource fmri hc://:product-id=SUN-Storage-J4400:chassis-id=2029QTF0000000005:server-id=/ses-enclosure=1/fan=0?indicator=fail
group: authority version: 1 stability: Private/Private
product-id string SUN-Storage-J4400
chassis-id string 2029QTF0000000005
server-id string
group: facility version: 1 stability: Private/Private
type uint32 0x0 (SERVICE)
mode uint32 0x0 (OFF)
group: ses version: 1 stability: Private/Private
node-id uint64 0x1f
hc://:product-id=SUN-Storage-J4400:chassis-id=2029QTF0000000005:server-id=/ses-enclosure=1/fan=0?sensor=speed
group: protocol version: 1 stability: Private/Private
resource fmri hc://:product-id=SUN-Storage-J4400:chassis-id=2029QTF0000000005:server-id=/ses-enclosure=1/fan=0?sensor=speed
group: authority version: 1 stability: Private/Private
product-id string SUN-Storage-J4400
chassis-id string 2029QTF0000000005
server-id string
group: facility version: 1 stability: Private/Private
sensor-class string threshold
type uint32 0x4 (FAN)
units uint32 0x12 (RPM)
reading double 3490.000000
state uint32 0x0 (0x00)
group: ses version: 1 stability: Private/Private
node-id uint64 0x1f
hc://:product-id=SUN-Storage-J4400:chassis-id=2029QTF0000000005:server-id=/ses-enclosure=1/fan=0?sensor=fault
group: protocol version: 1 stability: Private/Private
resource fmri hc://:product-id=SUN-Storage-J4400:chassis-id=2029QTF0000000005:server-id=/ses-enclosure=1/fan=0?sensor=fault
group: authority version: 1 stability: Private/Private
product-id string SUN-Storage-J4400
chassis-id string 2029QTF0000000005
server-id string
group: facility version: 1 stability: Private/Private
sensor-class string discrete
type uint32 0x103 (GENERIC_STATE)
state uint32 0x1 (DEASSERTED)
group: ses version: 1 stability: Private/Private
node-id uint64 0x1f

Here you can see the available indicators (locate and service), the fan speed (3490 RPM) and if the fan is faulted. Right now this is just interesting data for savvy administrators to play with, as it's not used by any software. But that will change shortly, as we work on the next phases:

  • Monitoring of sensors to detect failure in external components which have no visibility in Solaris outside libtopo, such as power supplies and fans. This will allow us to generate an FMA fault when a power supply or fan fails, regardless of whether it's in the system chassis or an external enclosure.
  • Generalization of the disk-monitor fmd plugin to support arbitrary disks. This will control the failure indicator in response to FMA-diagnosed faults.
  • Correlation of ZFS faults with the associated physical disk. Currently, ZFS faults are against a "vdev" - a ZFS-specific construct. The user is forced to translate from this vdev to a device name, and then use the normal (i.e. painful) methods to figure out which physical disk was affected. With a little work it's possible to include the physical disk in the FMA fault to avoid this step, and also allow the fault LED to be controlled in response to ZFS-detected faults.
  • Expansion of the SCSI framework to support native diagnosis of faults, instead of a stream of syslog messages. This involves generating telemetry in a way that can be consumed by FMA, as well as a diagnosis engine to correlate these ereports with an associated fault.

Even after we finish all of these tasks and reach the nirvana of a unified storage management framework, there will still be lots of open questions about how to leverage the sensor framework in interesting ways, such as a prtdiag-like tool for assembling sensor information, or threshold alerts for non-critical warning states. But with these latest putbacks, it feels like our goals from two years ago are actually within reach, and that I will finally be able to turn on that elusive LED.

Join the discussion

Comments ( 1 )
  • Tomasz Thursday, August 7, 2008

    Controlling LEDs? So I will finally be able to turn those blue LED in Thumper into scrolling 4x12 display? :-D


Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.Captcha
 

Visit the Oracle Blog

 

Contact Us

Oracle

Integrated Cloud Applications & Platform Services