ZFS, FMA, hotplug, and Fishworks

In the past, I've discussed the evolution of disk FMA. Much has been accomplished in the past year, but there are still several gaps when it comes to ZFS and disk faults. In Solaris today, a fault diagnosed by ZFS (a device failing to open, too many I/O errors, etc) is reported as a pool name and 64-bit vdev GUID. This description leaves something to be desired, referring the user to run zpool status to determine exactly what went wrong. But the user is still has to know how to go from a cXtYdZ Solaris device name to a physical device, and when they do locate the physical device they need to manually issue a zpool replace command to initiate the replacement.

While this is annoying in the Solaris world, it's completely untenable in an appliance environment, where everything needs to "just work". With that in mind, I set about to plug the last few holes in the unified plan:

  • ZFS faults must be associated with a physical disk, including the human-readable label
  • A disk fault (ZFS or SMART failure) must turn on the associated fault LED
  • Removing a disk (faulted or otherwise) and replacing it with a new disk must automatically trigger a replacement

While these seem like straightforward tasks, as usual they are quite difficult to get right in a truly generic fashion. And for an appliance, there can be no Solaris commands or additional steps for the user. To start with, I needed to push the FRU information (expressed as a FMRI in the hc libtopo scheme) into the kernel (and onto the ZFS disk label) where it would be available with each vdev. While it is possible to do this correlation after the fact, it simplifies the diagnosis engine and is required for automatic device replacement. There are some edge conditions around moving and swapping disks, but was relatively straightforward. Once the FMRI was there, I could include the FRU in the fault suspect list, and using Mike's enhancements to libfmd_msg, dynamically insert the FRU label into the fault message. Traditional FMA libtopo labels do not include the chassis label, so in the Fishworks stack we go one step further and re-write the label on receipt of a fault event with the user-defined chassis name as well as the physical slot. This message is then used when posting alerts and on the problems page. We can also link to the physical device from the problems page, and highlight the faulty disk in the hardware view.

With the FMA plumbing now straightened out, I needed a way to light the fault LED for a disk, regardless of whether it was in the system chassis or an external enclosure. Thanks to Rob's sensor work, libtopo already presents a FMRI-centric view of indicators in a platform agnostic manner. So I rewrote the disk-monitor module (or really, deleted everything and created a new fru-monitor module) that would both poll for FRU hotplug events, as well as manage the fault LEDs for components. When a fault is generated, the FRU monitor looks through the suspect list, and turns on the fault LED for any component that has a supported indicator. This is then turned off when the corresponding repair event is generated. This also had the side benefit of generating hotplug events phrased in terms of physical devices, which the appliance kit can use to easily present informative messages to the user.

Finally, I needed to get disk replacement to work like everyone expects it to: remove a faulted disk, put in a new one, and walk away. The genesis of this functionality was putback to ON long ago as the autoreplace pool property. In Solaris, this functionality only works with disks that have static device paths (namely SATA). In the world of multipathed SAS devices, the device path is really a scshi_vhci node identified by the device WWN. If we remove a disk and insert a new one, it will appear as a new device with no way to correlate it to the previous instance, preventing us from replacing the correct vdev. What we need is physical slot information, which happens to be provided by the FMRI we are already storing with the vdev for FMA purposes. When we receive a sysevent for a new device addition, we look at the latest libtopo snapshot and take the FMRI of the newly inserted device. By looking at the current vdev FRU information, we can then associate this with the vdev that was previously in the slot, and automatically trigger the replacement.

This process took a lot longer than I would have hoped, and has many more subtleties too boring even for a technical blog entry, but it is nice to sit back and see a user experience that is intuitive, informative, and straightforward - the hallmarks of an integrated appliance solution.

Comments:

Truly impressive work, worthy of every praise. But there is something you wrote, that made me sad:

"While these seem like straightforward tasks, as usual they are quite difficult to get right in a truly generic fashion. And for an appliance, there can be no Solaris commands or additional steps for the user."

This is the ticket, this is the correct attitude to have! But, why didn't we have this kind of attitude before Fishworks, for the rest of the Solaris customers?

I'm interested to know, what were the priorities during Solaris 8 and 9 times? Is there a particular reason why "there can be no additional steps for the user" attitude wasn't considered a priority sooner?

Posted by UX-admin on November 21, 2008 at 02:23 AM PST #

UX-admin

I cannot speak to the intent of the Solaris organization from before I joined Sun, but I can say with confidence that what we have achieved today would not have been possible without FMA and ZFS, both of which included fresh looks at the administrative experience and started well before Solaris 10 or Fishworks.

Posted by Eric Schrock on November 21, 2008 at 04:44 AM PST #

Post a Comment:
Comments are closed for this entry.
About

Musings about Fishworks, Operating Systems, and the software that runs on them.

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today