Proposed FMA Topics for LISA '08
By user9148476 on Jun 04, 2008
Hopefully one or both will be accepted. The submissions are below.
The "Guru Is In" Session
With OpenSolaris and Solaris 10, the reporting and managing of faulty components in the system significantly changed. Half of the Predictive Self-Healing (PSH) feature of Solaris is Fault Management - a framework for diagnosis engines and response agent to handle and report system faults, as well as a set of command line utilities examine and manage system faults. Our "guru" proposal is to have a short demo of Solaris' Fault Management using the freely available FMA Demo Kit and answer questions about how administrators and users work with Fault Management. Ref: http://opensolaris.org/os/community/fm/The "Hit The Ground Running" Session
Hello, I am proposing a Hit the Ground Running (HGTR) session to introduce and cover the essentials of Solaris' Fault Management subsystem. The abstract and outline for the material is below. I'd expect the material to take ~15 minutes to present. Thank you for your consideration, Scott Davenport Sun Microsystems Fault Management Development Team Self-healing functionality for users and administrators of a modern operating system provides fine-grained fault isolation and restart where possible of any component?hardware or software?that experiences a problem. To do so, the system must include intelligent, automated, proactive diagnoses of errors that are observed on the system. The diagnosis system is used to trigger targeted automated responses or guided human intervention that mitigate a specific problem or at least prevent it from getting worse. Finally, these new system capabilities are connected to a new model for system administrators oriented around simpler, higher-level abstractions. Sun's first Predictive Self-Healing features are part of Solaris 10 and OpenSolaris and include the Fault Manager and the Service Manager. This HTGR session focuses on the Fault Manager portion of Solaris' Predictive Self-Healing. The Solaris Fault Management effort (originally code-named FMA inside of Sun) provides a new architecture for building resilient error handlers, error telemetry, automated diagnosis software, response agents, and a consistent model of system failures for a management stack. Outline of material: - Brief History Lesson . legacy UNIX fault model vs. FMA - What is FMA? . FM Daemon, Diagnosis Engines, Response Agents (block diagram) - Responding to Faults . Expect console messsages, knowledge article - Displaying Faults . fmadm faulty - Repairing Faults . fmadm repair - Managing FMD Logs & Modules . fmadm config; fmadm rotate; fmadm load/unload - For More Information . http://opensolaris.org/os/community/fm/