How To Clear An Alert

Signs In The Console
One of the most requested things for Grid Control debugging is a way to 'clear' an old alert. The problem typically is that what people are asking is either not possible, or not advisable to do.
To get into more detail about this problem, let's first discuss what an alert is, what it means, and what it shows in the console.

First of all let's start with the thing that an alert is not: An alert is NOT an error. An error condition is a problem with the execution of the metric. An error cannot be cleared by changing the monitoring setup of the metric: It has to get cleared by fixing the underlying problem with the execution of the metric first as indicated in the emagent.trc file. Since the metric itself has encountered a problem, changing the thresholds will not make any difference for the error condition.

An alert is threshold violation of the data-point collected by the metric. The metric did evaluate, and a valid data-point was collected and uploaded by the Agent. This data-point was then used to compare against the specified threshold, and a violation was detected.

This brings up the main point with alerts: There is 'state' kept on both tiers: Both the Agents and the OMS/repository know about an alert. And this means that you simply cannot shuffle something under the rug, and cleanup either just the OMS or just the Agent.

Clearing alerts as told by the cowboys in the wild-wild internet
A lot of people have a lot of creative ideas on how to 'get rid' of an alert in Grid Control. Several of these urban legends get posted on blogs and forums. And most of them involve removing something directly from the repository table. And although it might appear to be doing what people want it to, it is in fact CORRUPTING the data in the repository, and breaking the information flow between the Agent and the OMS.

Several of the solutions have been put out on blogs on the internet. And the recommendation here is pretty simple:


  • NEVER remove data directly from the repository. No matter how appealing it may sound (or look in the console), it will ALWAYS get you into trouble

Clearing the alert
The only way to clean-up an outstanding alerts is to make sure the values collected by the metric are no longer in violation with the thresholds


  • The most obvious way of clearing the alert is to fix the underlying problem.
    If a disk is 100% full, making some free space on the disk will clear the condition and remove the outstanding alert.

  • If a metric triggered an alert for a non-fatal or a non-problematic condition, the thresholds of the metric are not configured correctly. By updating the thresholds and/or changing the number of occurrences to trigger the alert, the next iteration of the metric will evaluate the data-point with the changed thresholds, and update (and preferably clear) the alert.

  • If the metric is not important, the big question to ask is if the metric is even relevant and needs to be collected in the first place. If the data-points are not important, the metric itself can disabled.
    By disabling the collection, all outstanding alerts will be cleared, and the Agent will be instructed to not collect the metric anymore.

Acknowledging the alert
Sometimes, the thresholds are set OK, the metric is just doing fine, but the administrator working on the problem would just like to signal that the problem is 'under control' and people are working on it.
To signal that the alert has been 'received', an acknowledgement can be added to the alert in the console.

alert_20100204.PNG

This is a trigger for the notification system to not trigger any more notifications for this one anymore. And it's also an indication for anyone working with that target that somebody is paying attention, and is doing something about the reported issue.

Comments:

Useful and Informative. The five point formula to keep the TARGET MONITORING clean: 1. Choose and control what "metrics" are collected for the targets being monitored. (use default template for target type) 2. Use proper "threshold" values for warning and critical states for alert generation (use group default template for each group of targets) 3. When an alert is triggered, investigate and act on the cause. It will automatically "clear" the alert. 4. If there is any automatic clean-up possible. Set the corrective action. 5. For the "collections" no longer needed, disable them. For the targets no longer available, delete them from EM. -- Prasad

Posted by Prasad Chitta on February 22, 2010 at 05:37 PM PST #

Hi There, This is a very good comment and I do agreed, you should not delete alerts directly from the repository. Question is what a bout the ‘errors’? The ones from the alert tab. Those don’t have any way to delete manually or to acknowledge them, so how can we get rid of them? I have resolved some by fixing the underlying issue, but I also have a lot of very old ones that cannot fix the issue triggered them. Any ideas would be well appreciated. Thanks,

Posted by Orlando Reyes on March 06, 2010 at 11:10 AM PST #

I agree that these are sound management practices. However, sometimes an alert doesn't clear after the condition is addressed. For example, many times we'll have a tablespace hit the warning threshold and more space is added, but the alert never clears. Another condition where this happens is with invalid objects. the objects are made valid, but the alert stays. In these cases the only course to remove the outdated alerts is to null out the metric and apply it, the alert disappears, and then reset it. I wish it was better.

Posted by Sherrie Kubis on March 09, 2010 at 08:43 PM PST #

See the new post - Clearing an Alert - Part 2

Posted by werner.de.gruyter on April 05, 2010 at 05:41 AM PDT #

Hi, Thanks for the brilliant article, I dont usually leave remarks but your blog inspired me!! I just had a look at one of those Ipads and they look good, although the screen is huge I like it. I can see a lots of them needing support when the screen breaks :) Keep up the great work and look forward to more articles

Posted by IT Support on May 26, 2010 at 08:20 PM PDT #

Post a Comment:
  • HTML Syntax: NOT allowed
About

Latest information and perspectives on Oracle Enterprise Manager.

Related Blogs




Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
3
5
6
7
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today