Friday Aug 23, 2013

Creating a disk monitoring metric extension for Exadata Compute Nodes

 It is highly desirable to monitor the Exadata Compute node disks for current failures or degraded performance. By using the Enterprise Manager metric extension functionality Compute nodes can be monitored for these conditions and an alert created in the event of an issue. The following steps will guide you through this process

1. First a root monitoring credential set must be created . Login into the OMS using emcli
$ ./emcli login -username=sysman
Enter password :
Login successful

2. Create the credential set:
$ ./emcli create_credential_set -set_name=root_monitoring -target_type=cluster -supported_cred_types=HostCreds -descript=root_monitoring monitoring
Create credential set for target type host

3. Next login to EM and go to the monitoring credentials page to setup credentials for a test target

Setup--> Security-->Monitoring Credentials
Select Cluster and the push the "Manage Monitoring Credentials" button
Find the target you want to test on with the credential set defined in step 2( In this case root_monitoring)
Highlight the credential set and push the "Set Credentials" button. Enter the credentials and use the test and save button to ensure they are correctly defined

4. Next create the metric extenstion

Enterprise-->Monitoring-->Metric Extensions

Select the create button

5. On the General Properties Screen set the following

Target type select "Host"
Name "Compute_Node_Disk_Monitoring"
Display Name "Compute Node Disk Monitoring"
Adapter "OS Command - Multiple Columns"
Data Collection "Enabled"
Repeat Every "5 Minutes"
Use of Metric Data "Alerting and Historical Trending"
Upload Interval "1 Collections"
Select the Next Button

6. Now create the script to run on the agent

On your local machine create a file called that contains the following
/opt/MegaRAID/MegaCli/MegaCli64 AdpAllInfo -aALL | grep "Virtual Drives" -A 6 | grep -w 'Degraded\|Critical\|Offline\|Failed' | sed 's/Degraded/Virtual Drives Degrades/g' | sed 's/Offline/Virtual Drives Offline/g' | sed 's/Critical Disks/Critical Physical Disks/g' | sed 's/Failed Disks/Failed Physical Disks/g'

7. On the "Edit Storage Server Disk Status (ME$Storage_Server_Disk_Status) v1 : Adapter" page enter the following

Command "/bin/sh"
Script "%scriptsDir%/"
Delimiter ":"

8. On the Upload Custom Files Section

Select the upload button and select the file created in step 6
Click okay and one back to the Create New:Adapter page select the "Next" button

9. On the "Create New : Columns" page create two columns

Column one should be setup as:
Name "Type"
Display Name "Type"
Column Type "Key Column"
Value Type "String"
Metric Category "Fault"
Column two should be setup as:
Name "Value"
Display Name "Value"
Column Type "Data Column"
Value Type "Number"
Metric Category "Fault"
Comparison Operator ">"
Critical "0"
After Setting up the two column select the next button

10. On The Credentials Screen

Select the “Specify Credential Set” radio button
In the drop down box select the credential set created in step 1
Click the next button
11. On the “Create New : Test” page

Add a target to test with in the “Test Targets” section
Click the “Run Test” button and ensure that results are displayed properly in the “Test Results” box.
The results should be similar to below
Type Value
Virtual Drives Degrades 0
Virtual Drives Offline 0
Critical Physical Disks 0
Failed Physical Disks 0

12. Next the the Metric Extension must be saved as a deployable draft. This is accomplished on the main metric extension page. This allows the metric to be deployed to targets for testing. However at this stage only the developer has access to publish the metric. After satisfactory testing is completed the metric is then published. This is once again accomplished from the main metric extension page.

To ensure that administrators are notified in the event the metric created fails an incident rule should be created.

1, To Begin navigate to the Incident Rules Home Page
         From the Setup button on the upper right hand corner of the Enterprise Overview Page
         Setup->Security->Incident Rules

         Now click the “Create Rule Set..” button

 2. On the Create Rule Set screen enter the following information

        Name: Whatever the rule should be called. i.e. Metric Collection Error Rule
        Enabled Box: Should be checked
        Type: Enterprise
        Applies To: Targets
        Select the “All Targets of Type” radio button on the bottom of the screen followed by Host in the drop down box

3. Now select the “Rules” tab at the bottom of the screen

4. Chose the "Create.." button on the middle of the screen

4. On the “Select Type of Rule to Create” Popup box select the “Incoming events and updates to events” radio button. Click the continue button.

5. On the “Create New Rule: Select Events” screen check the type check box. In the drop down select “Metric Extension Update”. Click the next button

6. On the “Add conditional Actions” page you can specify conditions that must occur for the action to apply, if an incident should be created and email notifications. Specify the desired properties and select the continue button.

7. If no additional rules are required select the next button on the “Create New Rule: Add Actions” page.

8. On the next screen either accept the default rule name or specify the desired name

9. For the “Create New Rule : Review” page, ensure everything looks correct and select the “continue button”

10. Lastly click the “Save” button to complete the setup

11. The metric can now be deployed to the desired target by selecting the “Deploy to Targets” option from the “Actions” drop down button on the Metric Extensions Page




« April 2014