Email notification of FMA events

One of the projects I worked on for Solaris 11 was to record some information on System Panics in FMA events.Now I want to start making it easier to gather this information and map it to known problems. So starting internally I plan to utilise another feature which we developed as part of  the same effort. This is the email notifications framework. Rob Johnston described this feature in his blog here.

So the nice feature I want to utilise it custom message templates. So I thought I'd share how to do this It's pretty simple, but I got burnt by a couple of slight oddities - which we can probably fix.

First off I needed to create a template. There are a number of committed expansion tokens - these will work to expand information from the FMA event in to meaninful info in the email. The ones I care about this time are

%<HOSTNAME> : Hostname of the system which had the event
%<UUID> : UUID of the event - so you can mine more information
%<URL> : URL of the knowledge doc describing the problem

In addition I want to get some data that is panic specific. As yet these are uncommitted interfaces and shouldn't be relied upon, but for my reference these can be accessed

Panic String of the dump is %<fault-list[0].panicstr>
Stack trace to put in to MOS is  %<fault-list[0].panicstack>

These are visible in the panic event - so I don't feel bad about revealing the names, but I stress they shouldn't be relied upon.

So create a template which contains the text you want. Make sure it's readable by the noaccess user (ie. not /root)

The one I created for now looks like this

# cat /usr/lib/fm/notify/panic_template
%<HOSTNAME> Panicked

For more information log in to %<HOSTNAME> and run the command

fmdump -Vu %<UUID>

Please look at %<URL> for more information

Crash dump is available on %<HOSTNAME> in %<fault-list[0].dump-dir>

Panic String of the dump is %<fault-list[0].panicstr>

Stack trace to put in to MOS is  %<fault-list[0].panicstack>
 

I then need to add this to the notification for the "problem-diagnosed" event class. This is done with the svccfg command

 

# svccfg setnotify problem-diagnosed \"mailto:someone@somehost?msg_template=/usr/lib/fm/notify/panic_template\"

(Note the backslashes and quotes - they're important to get the parser to recognise the "=" correctly.)

It would be nice to tie it specifically to a panic event, but that needs a bit of plumbing to make it happen.

You can  verify it is configured correctly with the command

 

# svccfg listnotify problem-diagnosed
    Event: problem-diagnosed (source: svc:/system/fm/notify-params:default)
        Notification Type: smtp
            Active: true
            reply-to: root@localhost
            msg_template: /usr/lib/fm/notify/panic_template
            to: someone@somehost

Now when I get a panic, I get an email with some useful information I can use to start diagnosing the problem.

So what next? I think I'll try to firm up the stability of the useful members of the event, and may be create a new event we can subscribe to for panics only, then make this template an "extended support" option for panic events, and make it easily configurable.

Please do leave comments if you have any opinions on this and where to take it next.



 





Comments:

Post a Comment:
Comments are closed for this entry.
About

Chris W Beal

Search

Archives
« April 2014
MonTueWedThuFriSatSun
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
    
       
Today