Saturday Oct 12, 2013

Exadata Health and Resource Usage Monitoring

 
   
  

 MAA has recently published a new whitepaper documenting an end to end approach to health and resource utilization monitoring for Oracle Exadata Environments. In an addition to technical details a troubleshooting methodology will be explored that allows administrators to quickly identify and correct issues in an expeditious manner. 

The document takes a “rule out” approach in that components of the system will be verified as performing correctly to eliminate its role in the incident. There will be five areas of concentration in the overall system diagnosis 

1. Steps to take before problems occur that can assist in troubleshooting 

2. Changes made to the system 

3. Quick analysis 

4. Baseline comparison 

5. Advanced diagnostics

http://www.oracle.com/technetwork/database/availability/exadata-health-resource-usage-2021227.pdf


Friday Aug 23, 2013

Creating a disk monitoring metric extension for Exadata Compute Nodes


 It is highly desirable to monitor the Exadata Compute node disks for current failures or degraded performance. By using the Enterprise Manager metric extension functionality Compute nodes can be monitored for these conditions and an alert created in the event of an issue. The following steps will guide you through this process


1. First a root monitoring credential set must be created . Login into the OMS using emcli
$ ./emcli login -username=sysman
Enter password :
Login successful

2. Create the credential set:
$ ./emcli create_credential_set -set_name=root_monitoring -target_type=cluster -supported_cred_types=HostCreds -descript=root_monitoring monitoring
Create credential set for target type host

3. Next login to EM and go to the monitoring credentials page to setup credentials for a test target

Setup--> Security-->Monitoring Credentials
Select Cluster and the push the "Manage Monitoring Credentials" button
Find the target you want to test on with the credential set defined in step 2( In this case root_monitoring)
Highlight the credential set and push the "Set Credentials" button. Enter the credentials and use the test and save button to ensure they are correctly defined


4. Next create the metric extenstion

Enterprise-->Monitoring-->Metric Extensions

Select the create button

5. On the General Properties Screen set the following

Target type select "Host"
Name "Compute_Node_Disk_Monitoring"
Display Name "Compute Node Disk Monitoring"
Adapter "OS Command - Multiple Columns"
Data Collection "Enabled"
Repeat Every "5 Minutes"
Use of Metric Data "Alerting and Historical Trending"
Upload Interval "1 Collections"
Select the Next Button

6. Now create the script to run on the agent

On your local machine create a file called megaclicommand.sh that contains the following
/opt/MegaRAID/MegaCli/MegaCli64 AdpAllInfo -aALL | grep "Virtual Drives" -A 6 | grep -w 'Degraded\|Critical\|Offline\|Failed' | sed 's/Degraded/Virtual Drives Degrades/g' | sed 's/Offline/Virtual Drives Offline/g' | sed 's/Critical Disks/Critical Physical Disks/g' | sed 's/Failed Disks/Failed Physical Disks/g'

7. On the "Edit Storage Server Disk Status (ME$Storage_Server_Disk_Status) v1 : Adapter" page enter the following

Command "/bin/sh"
Script "%scriptsDir%/megaclicommand.sh"
Delimiter ":"

8. On the Upload Custom Files Section

Select the upload button and select the file created in step 6
Click okay and one back to the Create New:Adapter page select the "Next" button

9. On the "Create New : Columns" page create two columns

Column one should be setup as:
Name "Type"
Display Name "Type"
Column Type "Key Column"
Value Type "String"
Metric Category "Fault"
Column two should be setup as:
Name "Value"
Display Name "Value"
Column Type "Data Column"
Value Type "Number"
Metric Category "Fault"
Comparison Operator ">"
Critical "0"
After Setting up the two column select the next button

10. On The Credentials Screen

Select the “Specify Credential Set” radio button
In the drop down box select the credential set created in step 1
Click the next button
11. On the “Create New : Test” page

Add a target to test with in the “Test Targets” section
Click the “Run Test” button and ensure that results are displayed properly in the “Test Results” box.
The results should be similar to below
Type Value
Virtual Drives Degrades 0
Virtual Drives Offline 0
Critical Physical Disks 0
Failed Physical Disks 0

12. Next the the Metric Extension must be saved as a deployable draft. This is accomplished on the main metric extension page. This allows the metric to be deployed to targets for testing. However at this stage only the developer has access to publish the metric. After satisfactory testing is completed the metric is then published. This is once again accomplished from the main metric extension page.


To ensure that administrators are notified in the event the metric created fails an incident rule should be created.

1, To Begin navigate to the Incident Rules Home Page
         From the Setup button on the upper right hand corner of the Enterprise Overview Page
         Setup->Security->Incident Rules

         Now click the “Create Rule Set..” button

 2. On the Create Rule Set screen enter the following information

        Name: Whatever the rule should be called. i.e. Metric Collection Error Rule
        Enabled Box: Should be checked
        Type: Enterprise
        Applies To: Targets
        Select the “All Targets of Type” radio button on the bottom of the screen followed by Host in the drop down box

3. Now select the “Rules” tab at the bottom of the screen

4. Chose the "Create.." button on the middle of the screen

4. On the “Select Type of Rule to Create” Popup box select the “Incoming events and updates to events” radio button. Click the continue button.

5. On the “Create New Rule: Select Events” screen check the type check box. In the drop down select “Metric Extension Update”. Click the next button

6. On the “Add conditional Actions” page you can specify conditions that must occur for the action to apply, if an incident should be created and email notifications. Specify the desired properties and select the continue button.

7. If no additional rules are required select the next button on the “Create New Rule: Add Actions” page.

8. On the next screen either accept the default rule name or specify the desired name

9. For the “Create New Rule : Review” page, ensure everything looks correct and select the “continue button”

10. Lastly click the “Save” button to complete the setup

11. The metric can now be deployed to the desired target by selecting the “Deploy to Targets” option from the “Actions” drop down button on the Metric Extensions Page

Friday Jul 26, 2013

Operational Considerations and Troubleshooting Oracle Enterprise Manager 12c

Oracle Enterprise Manager (EM) 12c has become a valuable component in monitoring and administrating an enterprise environment. The more critical the application, servers and services that are monitored and maintained via EM, the more critical the EM environment becomes. Therefore, EM must be as available as the most critical target it manages.


There are many areas that need to be discussed when talking about managing Enterprise Manager in a data center. Some of these are as follows:

• Recommendations for staffing roles and responsibilities for EM administration

• Understanding the components that make up an EM environment

• Backing up and monitoring EM itself

• Maintaining a healthy EM system

• Patching the EM components

• Troubleshooting and diagnosing guidelines

The Operational Considerations and Troubleshooting Oracle Enterprise Manager 12c whitepaper available on the  Enterprise Manager Maximum Availability Architecture (MAA) site will help define administrator requirements and responsibilities.  It provides guidance in setting up the proper monitoring and maintenance activities to keep Oracle Enterprise Manager 12c healthy and to ensure that EM stays highly available.

About

bocadmin_ww

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today