Alert Monitoring and Problem Notification in Oracle Enterprise Manager Ops Center

Oracle Enterprise Manager Ops Center provides full lifecycle management of your Oracle hardware and operating systems, including your virtual environments. A significant portion of any given asset's lifecycle is spent in daily operations and when things are running smoothly, there isn't much for an administrator to do. When things go awry, it's critical to know what happened and why as quickly as possible. Oracle Enterprise Manager Ops Center provides alert monitoring and problem notification and management capabilities to enable you to do just that. I'll walk you through a quick and simple example of how you can use these features and hopefully it will spark ideas of how you can implement even more interesting solutions using the same basic steps.

The first step is to tune your monitoring rules. Each type of asset will have a default set of monitoring rules that are applied when the asset is first managed. Rules can be managed on individual assets via their Monitoring tab, or by applying Monitoring Profiles to individual assets or groups of assets. Monitoring rules can be configured to raise alerts when, for example, a monitored attribute exceeds a threshold value for a selectable period of time. For more details on how to configure your monitoring rules, please see section 9 of the Advanced User's Guide, available by clicking on the Help link from within the browser user interface. If you update monitoring rules in a Monitoring Profile, be sure to apply that profile to your desired assets in order to make it affect their monitoring rules. For this example I have set a very short window for the CPU Usage attribute to generate an alert after only 1 minute of high CPU utilization, as shown in the screenshot below.


When an alert is generated, a new problem will be created if none is already open for the issue. Otherwise the alert will be added to an existing problem. Problems aggregate alerts and annotations together and provide the opportunity to assign and track resolution. Any users who have their Notification Profile defined to receive notification of the problem will get an email or page with the pertinent details. The image below shows how you might specify to have the root user subscribe to get email notification of all WARNING or higher level problems.


Problems can be managed holistically from the Message Center in the top of the left-hand navigation panel or they can be viewed for individual assets by selecting the Problems tab. When looking at an open problem, icons along the top allow you to see existing alerts and annotations, to add an annotation, to assign the problem to a user or to take action on the problem, as shown in the screenshot below.


Annotations can be simple textual comments or suggested actions which can include the execution of an existing Operational Plan. For more detail on how to use Operational Plans, see section 11 of the Advanced User's Guide. For this example, I created a simple Operational Plan to execute a prstat. Be sure to select the appropriate Subtype, in this case a Global Zone.


When adding an annotation to a problem, you can optionally select the checkbox at the bottom of the window in order to save that annotation to the Problems Knowledge Base and associate it to future problems of the same type and severity as shown below.


When an annotation has been saved to the Problems Knowledge Base, it can be edited to include additional severities and can also be changed to execute automatically when a future problem is initially created, as shown below. For more detail on the Problems Knowledge Base, please refer to section 10 of the Advanced User's Guide.


When a new problem is detected, the newly added Automated Action will execute the associated Operational Plan and attach the output as an annotation to the problem. To demonstrate this in action, I executed several 'dd' commands on the host to force excessive CPU usage. In this case, the prstat output shows the high CPU usage of the processes that were running at the time that the alert was generated, even though they lasted only a few minutes.

This is clearly a simple example and would not suffice to capture very short-lived processes but it illustrates the possibilities available. The automatic action could have been a more in-depth data gathering script utilizing dTrace or could have even made system changes, depending on the real scenario it was built to address. I hope this quick walk-through has provoked thoughts of how you might implement Alert Monitoring and Problem Notification and Management in your enterprise using Oracle Enterprise Manager Ops Center.

Follow Oracle Enterprise Manager Ops Center at : 

Twitter   Facebook YouTube Linkedin

Comments:

Post a Comment:
  • HTML Syntax: NOT allowed
About

Latest information and perspectives on Oracle Enterprise Manager.

Related Blogs




Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
3
5
6
7
9
10
11
12
13
14
15
17
18
19
20
23
24
25
26
27
28
29
30
   
       
Today