Friday Aug 23, 2013

Creating a disk monitoring metric extension for Exadata Compute Nodes

 It is highly desirable to monitor the Exadata Compute node disks for current failures or degraded performance. By using the Enterprise Manager metric extension functionality Compute nodes can be monitored for these conditions and an alert created in the event of an issue. The following steps will guide you through this process

1. First a root monitoring credential set must be created . Login into the OMS using emcli
$ ./emcli login -username=sysman
Enter password :
Login successful

2. Create the credential set:
$ ./emcli create_credential_set -set_name=root_monitoring -target_type=cluster -supported_cred_types=HostCreds -descript=root_monitoring monitoring
Create credential set for target type host

3. Next login to EM and go to the monitoring credentials page to setup credentials for a test target

Setup--> Security-->Monitoring Credentials
Select Cluster and the push the "Manage Monitoring Credentials" button
Find the target you want to test on with the credential set defined in step 2( In this case root_monitoring)
Highlight the credential set and push the "Set Credentials" button. Enter the credentials and use the test and save button to ensure they are correctly defined

4. Next create the metric extenstion

Enterprise-->Monitoring-->Metric Extensions

Select the create button

5. On the General Properties Screen set the following

Target type select "Host"
Name "Compute_Node_Disk_Monitoring"
Display Name "Compute Node Disk Monitoring"
Adapter "OS Command - Multiple Columns"
Data Collection "Enabled"
Repeat Every "5 Minutes"
Use of Metric Data "Alerting and Historical Trending"
Upload Interval "1 Collections"
Select the Next Button

6. Now create the script to run on the agent

On your local machine create a file called that contains the following
/opt/MegaRAID/MegaCli/MegaCli64 AdpAllInfo -aALL | grep "Virtual Drives" -A 6 | grep -w 'Degraded\|Critical\|Offline\|Failed' | sed 's/Degraded/Virtual Drives Degrades/g' | sed 's/Offline/Virtual Drives Offline/g' | sed 's/Critical Disks/Critical Physical Disks/g' | sed 's/Failed Disks/Failed Physical Disks/g'

7. On the "Edit Storage Server Disk Status (ME$Storage_Server_Disk_Status) v1 : Adapter" page enter the following

Command "/bin/sh"
Script "%scriptsDir%/"
Delimiter ":"

8. On the Upload Custom Files Section

Select the upload button and select the file created in step 6
Click okay and one back to the Create New:Adapter page select the "Next" button

9. On the "Create New : Columns" page create two columns

Column one should be setup as:
Name "Type"
Display Name "Type"
Column Type "Key Column"
Value Type "String"
Metric Category "Fault"
Column two should be setup as:
Name "Value"
Display Name "Value"
Column Type "Data Column"
Value Type "Number"
Metric Category "Fault"
Comparison Operator ">"
Critical "0"
After Setting up the two column select the next button

10. On The Credentials Screen

Select the “Specify Credential Set” radio button
In the drop down box select the credential set created in step 1
Click the next button
11. On the “Create New : Test” page

Add a target to test with in the “Test Targets” section
Click the “Run Test” button and ensure that results are displayed properly in the “Test Results” box.
The results should be similar to below
Type Value
Virtual Drives Degrades 0
Virtual Drives Offline 0
Critical Physical Disks 0
Failed Physical Disks 0

12. Next the the Metric Extension must be saved as a deployable draft. This is accomplished on the main metric extension page. This allows the metric to be deployed to targets for testing. However at this stage only the developer has access to publish the metric. After satisfactory testing is completed the metric is then published. This is once again accomplished from the main metric extension page.

To ensure that administrators are notified in the event the metric created fails an incident rule should be created.

1, To Begin navigate to the Incident Rules Home Page
         From the Setup button on the upper right hand corner of the Enterprise Overview Page
         Setup->Security->Incident Rules

         Now click the “Create Rule Set..” button

 2. On the Create Rule Set screen enter the following information

        Name: Whatever the rule should be called. i.e. Metric Collection Error Rule
        Enabled Box: Should be checked
        Type: Enterprise
        Applies To: Targets
        Select the “All Targets of Type” radio button on the bottom of the screen followed by Host in the drop down box

3. Now select the “Rules” tab at the bottom of the screen

4. Chose the "Create.." button on the middle of the screen

4. On the “Select Type of Rule to Create” Popup box select the “Incoming events and updates to events” radio button. Click the continue button.

5. On the “Create New Rule: Select Events” screen check the type check box. In the drop down select “Metric Extension Update”. Click the next button

6. On the “Add conditional Actions” page you can specify conditions that must occur for the action to apply, if an incident should be created and email notifications. Specify the desired properties and select the continue button.

7. If no additional rules are required select the next button on the “Create New Rule: Add Actions” page.

8. On the next screen either accept the default rule name or specify the desired name

9. For the “Create New Rule : Review” page, ensure everything looks correct and select the “continue button”

10. Lastly click the “Save” button to complete the setup

11. The metric can now be deployed to the desired target by selecting the “Deploy to Targets” option from the “Actions” drop down button on the Metric Extensions Page

Monday Aug 19, 2013

Simplified Agent and Plug-in Deployment

On your site of hundreds or thousands of hosts have you had to patch agents immediately as they get deployed?  For this reason I’ve always been a big fan of cloning an agent that has the required plug-ins and all the recommended core agent and plug-in patches, then using that clone for all new agent deployments. With Oracle Enterprise Manager 12c this got even easier as you can now clone the agent using the console “Add Host” method. You still have to rely on the EM users to use the clone. The one problem I have with cloning is that you have to have a reference target for each platform that you support. If you have a consolidated environment and only have Linux x64, this may not be a problem. If you are managing a typical data center with a mixture of platforms, it can become quite the maintenance nightmare just to maintain your golden images. You must update golden image agents whenever you get a new patch (generic or platform specific) for the agent or plug-in, and recreate the clone for each platform. Typically, I find people create a clone for their most common platforms, and forget about the rest. That means, maybe 80% of their agents meet their standard patch requirements and plug-ins upon deployment, but the other 20% have to be patched post-deploy, or worse – never get patched!

While deployed agents and plug-ins can be patched easily using EM Patches & Updates, but what about the agents still getting deployed or upgraded? Wouldn’t it be nice if they got patched as part of the deployment or upgrade? This article will show you two new features in EM (EM 12cR3) that will help you deploy the most current agent and plug-in versions. Whether you have 100s or 1000s of agents to manage, reducing maintenance and keeping the agents up to date is an important task, and being able to deploy or upgrade to a fully patched agent will save you a lot of time and effort.

[Read More]



« August 2013 »