Tuesday Nov 30, 2010

FMA and Email Notifications

In November, Oracle released a sneak peek at the next major release of Solaris in the form of Oracle Solaris 11 Express 2010.11.  There are tons of great features and innovations in this release.  One of the features I worked on was a new service smtp-notify, that can be configured to send email notifications in response to various Fault Management events, such as when a hardware component has been diagnosed as faulty.  Notifications can be configured for the following FMA event types (the descriptions below have been excerpted from the smf(1m) man page)

     problem-diagnosed

         A new problem has been diagnosed by the FMA   subsystem.
         The  diagnosis  includes a list of one or more suspects,
         which (where appropriate) may  have  been  automatically
         isolated  to prevent further errors occurring. The prob-
         lem is identified by a UUID  in the event  payload,  and
         further  events  describing  the resolution lifecycle of
         this problem quote a matching UUID.

     problem-updated

         One or more of the suspect resources in a problem  diag-
         nosis  has  been repaired, replaced or acquitted (or has
         been faulted again), but  there  remains  at  least  one
         faulted  resource  in  the  list.  A repair could be the
         result of an fmadm command line  (fmadm repaired,  fmadm
         acquit,  fmadm  replaced)  or  might  have been detected
         automatically such as through detection of a part serial
         number change.

     problem-repaired

         All of the suspect resources in a problem diagnosis have
         been repaired, resolved or acquitted. Some or all of the
         resources might still be isolated at this stage.

     problem-resolved

         All of the suspect resources in a problem diagnosis have
         been  repaired  resolved or acquitted and are  no longer
         isolated (for example, a cpu that was a suspect and off-
         lined  is  now back online again; this un-isolate action
         is usually automatic).


The smtp-notify service is enabled out-of-the-box.

# svcs smtp-notify
STATE          STIME    FMRI
online         Oct_28   svc:/system/fm/smtp-notify:default

You can list the default notification preferences with svcs(1m):

# svcs -n
Notification parameters for FMA Events
    Event: problem-diagnosed
        Notification Type: smtp
            Active: true
            reply-to: root@localhost
            to: root@localhost

        Notification Type: snmp
            Active: true

        Notification Type: syslog
            Active: true

    Event: problem-repaired
        Notification Type: snmp
            Active: true

    Event: problem-resolved
        Notification Type: snmp
            Active: true

What does the above output tell us?  It tells us that problem-diagnosed events will result in an email notification being sent to root@localhost.  It will also result in a message being sent to syslog and an SNMP trap being generated.  Additionally, SNMP traps will be generated for problem-repaired and problem-resolved events.

What does an example email notification look like?  See below:

From noaccess@diffuser.sfbay.sun.com Wed Jul 21 19:58:29 2010
Date: Wed, 21 Jul 2010 19:58:29 -0700 (PDT)
From: No Access User <noaccess@diffuser.sfbay.sun.com>
To: root@localhost
X-FMEV-HOSTNAME: diffuser
X-FMEV-CLASS: list.suspect
X-FMEV-UUID: e82aa706-ce6a-cbbb-a529-ceef1c9b57b0
X-FMEV-CODE: AMD-8000-AV
X-FMEV-SEVERITY: Major
Subject: Fault Management Event: diffuser:AMD-8000-AV

SUNW-MSG-ID: AMD-8000-AV, TYPE: Fault, VER: 1, SEVERITY: Major
EVENT-TIME: Wed Jul 21 19:58:29 PDT 2010
PLATFORM: Sun-Fire-X4200-Server, CSN: 0000000000, HOSTNAME: diffuser
SOURCE: eft, REV: 1.16
EVENT-ID: e82aa706-ce6a-cbbb-a529-ceef1c9b57b0
DESC: The number of errors associated with this CPU has exceeded acceptable levels.  Refer to http://sun.com/msg/AMD-8000-AV for more information.
AUTO-RESPONSE: An attempt will be made to remove this CPU from service.
IMPACT: Performance of this system may be affected.
REC-ACTION: Schedule a repair procedure to replace the affected CPU.  Use 'fmadm faulty' to identify the module.


Those who've seen the messages that are logged to the console when FMA diagnoses a fault will see that the format is similar.  One additional thing to note is that each FMA email notification message also includes the following X-headers, which are there to aid admins who write mail filters:

 Header Name  Description
 X-FMEV-HOSTNAME
 the name of the host on which the event occurred
 X-FMEV-CLASS
 the event class
X-FMEV-CODE  the Knowledge Article message ID
X-FMEV-SEVERITY  the severity of the event
X-FMEV-UUID  the UUID associated with the event

 

Email notification for FMA are highly configurable via svccfg(1m).  For example, you can enable/disable them per event type.  For example:


# svccfg setnotify problem-diagnosed mailto:active

or

# svccfg setnotify problem-diagnosed mailto:inactive


You can configure separate lists of one or more email recipients per event type.  For example:

# svccfg setnotify problem-repaired mailto:joe@somehost.com,admin@central.com

You can even define your own message body template.

# svccfg setnotify problem-diagnosed \\
mailto:root@localhost?msg_template=/path/to/template

Of course defining your own message template is nice, but it's only really useful if you have a way of referencing information about the actual FMA event in your message.  To facilitate this, we support the following expansion macros that can be embedded in message templates:

Macro Description
 %%  expands to a literal % character
 %<HOSTNAME>  expands to the hostname on which the event occurred
 %<URL>  expands to the URL of the knowledge article associated with this event
 %<CLASS>  expands to the event class
 %<UUID>  expands to the UUID of the event
 %<CODE>  expands to the knowledge article message ID
 %<SEVERITY>  expands to the severity of the event

 


But wait…there's more!

The smtp-notify service can also be configured to generate notifications for SMF service state transitions.  I won't go into the details of that here, but it's all documented in the smf(1m), svccfg(1m) and smtp-notify(1m) man pages.

About

user12611677

Search

Top Tags
Categories
Archives
« November 2010
SunMonTueWedThuFriSat
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
    
       
Today