Thursday Nov 06, 2014

Tools For Generating Consistent Loads

It's finally ready. The new database machine you've spent months, planning and architecting. All those shiny new components perfectly aligned for extreme performance. Will it give you the results you expect? How can you baseline and compare your new and existing systems? Pointing your application at your new machine may take some time to setup and depending on the behavior of the application, may not stress test all hardware components as you might like. You need a set of easy to configure scripts for load testing and you need tools and procedures to compare old systems to new. 

This will be the first of three blog posts to help with that effort. In this post, I'll go over some tools you can use to generate various loads. The next two posts in the series I'll talk about using Enterprise Manager to evaluate and baseline loads, and strategies to compare systems using consistent loads.

Test Suite

My current test suite includes two methods to generate loads: one leverages Swingbench, which is a well known and popular load generating tool, and the other is a solution I cooked up myself. Both sets of scripts can be altered to tailor their load characteristics. I've also included a variable load script wrapper for each, which you can use to adjust the load over time. For example: you can have a load test that runs for a total of 5 hours and within that 5 hour window your load could fluctuate every 30 minutes from heavy to light. The custom scripts are also flexible enough to support altering their behavior if you have a specific set of SQL/PLSQL commands you would like to run.

For this article, my database is running on an Exadata X2-2 quarter rack.

Using Swingbench


Swingbench is a great tool for quickly generating loads on an Oracle database. It's easy to setup and has many configurable options. Although swingbench has a nice GUI interface for creating your test schemas and running your load, I really like the command line interface. With the CLI you can create scripts to interact with Swingbench and nohup loads on remote hosts so your load can run hours or days without needing to be logged in.

Setup

If you don't have it already, download a copy of Swingbench and unzip the files on your host machine. You can run Swingbench from your database host or a remote client. If you co-locate them on your database host, take this into account during load measurement. 

There are a few different types of schemas you can create with Swingbench, and each type has an associated XML wizard file in the bin directory to help with creating that schema. I tend to use the Order Entry (OE) schema the most as it's behavior is more representative of an OLTP system, so we will be using the oewizard.xml file for this example. Open up the XML file in your favorite editor and update the connection information for the system user that will create the schema, then run oewizard on the file like this...

oewizard -c oewizard.xml -cl -cs //<your_host_or_vip>/<service> -u <test_user_name> -p <test_user_pw> -ts <tablespace_name> -create -df <asm_disk_group> -part -scale 4 -debug

You can use -scale to adjust the size of your schema which will also increase the time it takes to build. A scale of 4 gives me about a 10G schema.  

Execution

When your schema is ready, edit the supplied swingconfig.xml file with your connection info and use charbench to verify your schema. 

charbench -c swingconfig.xml

With our schema ready, now we can define our load also using the swingconfig.xml file. There are a number of parameters you can adjust to define your load. Here are the ones I find affective.

  • <NumberOfUsers>40</NumberOfUsers>
  • <MinDelay>100</MinDelay>
  • <MaxDelay>1000</MaxDelay>
  • <LogonDelay>150</LogonDelay>
  • <WaitTillAllLogon>false</WaitTillAllLogon>

MinDelay and MaxDelay specify the wait time between transactions in milliseconds. A LogonDelay helps avoid connection storms (unless that's what you want to test) and I like setting WaitTillAllLogin to false so my load starts right away and there is a nice ramp up over time. If I want to push the system hard I set Min/MaxDelay low and increase the number of users.

Further down the swingconfig.xml file you will find descriptions of the actual transactions that will be executed. Each transaction type can be turned on/off and it's weighted value compared to other transactions can be adjusted. This section is were you will do most of your tweaking to get the load profile you want.

Example

Here's a Top Activity graph in Enterprise Manager showing two consecutive tests. The first test had 300 users with a Min/MaxDelay of 15/20. I decreased the Min/MaxDelay to 10/15 for an increased load which you can see below.  

Here's an example of a heavily overloaded system in which the application doesn't scale. I've setup Swingbench with 800 users connecting every 2 seconds for a slow buildup, Min/MaxDelay of 5/15, and I'm only using the "Order Products" transactions. These transactions perform single row inserts with no batching. Around 11:30am there are ~500 sessions and the system starts to overload. CPU has become saturated and other background processes like the database writers start to slowdown causing excessive Concurrency and Configuration waits in the buffer cache. Our SGA for this test was 10G.

overload example

Variable Loads With Swingbench


In order to generate variable swingbench loads over time, I've created a small wrapper script, variable_load.pl written in Perl that can be used to define how long your load should run and also the variation in that load. To adjust the load you define how many users will be connected. Here's a snippet of the script which describes each parameter.

### how long you want the test to run in hourse:minutes
$runtime = "00:30";

### your swingbench config file
$conf_file = 'swingconfig.xml';

### Adjust your vaiable loads here
###
### RunTime = how ling this load will run in hours:minutes
### UserCount = how many user connections
### SleepTime = how many seconds to sleep before running the load, if needed
###
###              RunTime  UserCount  SleepTime
@swing_list = ( ["00:02", 400,       120],
                ["00:05", 200,         0],
                ["00:05", 100,         0] ); 

With these settings here's what our load profile looks like.

variable load swingbench

Custom Load Generation


There have been times during my own performance testing in which I needed to generate a very specific type of load. Most recently, I needed to generate a heavy large block IO load, so I put together these scripts in response to that need. I tried to keep them easy to setup, run and alter if necessary. The load uses a single schema and creates a test table for each session that will be connected, so the load needs to be initialized based on the maximum number of sessions expected for testing.

Setup and Execution

  1. Download the package to your host and unzip/tar in an empty directory.
  2. Edit the load_env.sh file to setup variables for your test. This is where you will define the maximum number of test tables you will need.
  3. Run the init_load.sh script to setup your user test schema and test tables. You will be prompted for the SYSTEM user password.
  4. Run the start_load.sh script to begin your test. This script requires two parameters, the low and high values for the test tables to use and thus the number of sessions. This was done to allow running a small load and then ramping up and running additional loads as needed. Examples...
    • start_load.sh 1 10 : Will run ten sessions, using test tables 1 through 10.
    • start_load.sh 16 20 : Will run 5 sessions, using test tables 16 through 20.
    • start_load.sh 1 1 : Will run 1 session.
  5. Running stop_load.sh will kill all sessions and thus stop the load. 

Here's what a load of 20 users looks like. Lots of large block IO!

io load example

Custom variable loads can also be run using the variable_load.pl script found in the package. It has the same parameters to adjust as in the Swingbench script. Here's an example of a variable load that ramps up, overloads the system, then drops back down again.

variable load io

As the IO gets heavy we start seeing more contention in the log buffer. 

variable io waits

Customizations

It's possible to design your own custom loads with these scripts, as you may need to execute a particular PL/SQL package or perhaps test how well SQL will scale against a large partitioned table. This can be achieved by editing the init_object.sh and load_object.sh files.

init_object.sh : Edit this script to create or initialize any objects needed for your test. This script gets executed multiple times depending on how many sessions you plan to run concurrently. If you don't have a session specific setup, you can leave an empty code block.

load_object.sh : This is the code that gets executed for each session you define. If you had PL/SQL you wanted to test, this is where you would put it.

As an example, for this test I created some database links for each instance and altered the script to select from our test table using the database links, thus creating a heavy network load. I've included this example script load_object_network.sh in the package zip file as well.

network load

Ready!


With a set of tools to define consistent, predictable loads we are now ready to baseline our systems. Next in the series I will go over the tools available in Enterprise Manager which will help in that effort. 

Tuesday Aug 26, 2014

Exadata Health and Resource Usage Monitoring v2

A newly updated version of the Exadata  Health and Resource Usage monitoring has been released! This white paper documents an end to end approach to health and resource utilization monitoring for Oracle Exadata Environments. The document has been substantially modified to help Exadata administrators easily follow the troubleshooting methodology defined. Other additions include:

  • Exadata 12.1.0.6 plugin for Enterprise Manager new features
  • Enterprise Manager 12.1.0.4 updates
  • Updates to Include X4 environment

Download the white paper as the link below: 

http://www.oracle.com/technetwork/database/availability/exadata-health-resource-usage-2021227.pdf


Wednesday Apr 30, 2014

Latest Enterprise Manager Recommended Patch List

 
   
  

With the release of the latest quarterly patches, the list of recommended patches for managing exadata via Enterprise Manager has been updated.  The details for this patch recommendation can be found in My Oracle Support Note titled Patch Requirements for Setting up Monitoring and Administration for Exadata (Doc ID 1323298.1).

This note contains the recommended patches for the OMS servers, Repository database and the Agent.  It also contains the recommended plugin versions for the database, exadata and fusion middleware plugins as well as the latest bundle patches for these plugins. 

A My Oracle Support note has been created providing the patching steps for this list of recommended patches.  This note can be found here:  Applying Enterprise Manager 12c Recommended Patches (Doc ID 1664074.1)

Also, please refer to the following whitepaper for guidance in the Enterprise Manager patch application process: Oracle Enterprise Manager Software Planned Maintenance

Tuesday Apr 29, 2014

April 2014 Quarterly Full Stack Patch Availability

 
   
  

The latest Quarterly Full Stack patch was released in April for Exadata.  This patch contains all of the latest recommended patches for the Exadata Database Machine. 

For more details, refer to My Oracle Support Note titled Database Machine and Exadata Storage Server 11g Release 2 (11.2) Supported Versions [888828.1]

Note that the information in the note contains details for both Exadata Storage Server Software 11.2 and the new 12.1 release.  This note is maintained on a regular basis so bookmark it and use it as reference for the latest Exadata Database Machine patching recommendations.

Thursday Feb 27, 2014

Fast Recovery Area for Archive Destination


If you are using Fast Recovery Area (FRA) for the archive destination and the destination is set to USE_DB_RECOVERY_FILE_DEST, you may notice that the Archive Area % Used metric does not trigger anymore. Instead you will see the Recovery Area % Used metric trigger when it hits a Warning threshold of 85% full, and Critical of 97% full. Unfortunately, this metric is controlled by the database and the thresholds cannot be modified (see MOS Note 428473.1 for more information). Thresholds of 85/97 are not sufficient for some of the larger, busier databases. This may not give you enough time to kickoff a backup and clear enough logs before the archiver hangs. If you need different thresholds, you can easily accomplish this by creating a Metric Extension (ME) and setting thresholds to your desired values.  This blog will walk through an example of creating an ME to monitor archive on FRA destinations, for more information on ME's and how they can be used, refer to the Oracle Enterprise Manager Cloud Control Administrator's Guide

[Read More]

Monday Feb 24, 2014

Monitoring Archive Area % Used on Cluster Databases


One of the most critical events to monitor on an Oracle Database is your archive area. If the archive area fills up, your database will halt until it can continue to archive the redo logs. If your archive destination is set to a file system, then the Archive Area % Used metric is often the best way to go. This metric allows you to monitor a particular file system for the percentage space that has been used. However, there are a couple of things to be aware of for this critical metric.

Cluster Database vs. Database Instance

You will notice in EM 12c, the Archive Area metric exists on both the Cluster Database and the Database Instance targets. The majority of Cluster Databases (RAC) are built against database best practices which indicate that the Archive destination should be shared read/write between all instances. The purpose for this is that in case of recovery, any instance can perform the recovery and has all necessary archive logs to do so. Monitoring this destination for a Cluster Database at the instance level caused duplicate alerts and notifications, as both instances would hit the Warning/Critical threshold for Archive Area % Used within minutes of each other. To eliminate duplicate notifications, the Archive Area % Used metric for Cluster Databases was introduced. This allows the archive destination to be monitored at a database level, much like tablespaces are monitored in a RAC database.

In the Database Instance (RAC Instance) target, you will notice the Archive Area % Used metric collection schedule is set to Disabled.

If you have a RAC database and you do not share archive destinations between instances, you will want to Disable the Cluster Database metric, and enable the Database Instance metric to ensure that each destination is monitored individually.


Tuesday Feb 11, 2014

Oracle Enterprise Manager Software Planned Maintenance

 
   
  

A critical component of managing an application includes both patching and maintaining the software. Applying patches is not only required for bug fixes, it is also a means of obtaining new and beneficial functionality. Thus it becomes an important  task to maintain a healthy and productive Enterprise Manager (EM) solution. The process of patching itself can present different challenges that can potentially increase the work and time involved in each patching exercise. Issues could arise such as patch conflicts, not meeting required  prerequisites and even unnecessary downtime. Spending the proper time to setup a patching strategy can save time and effort as well as possible errors and risk when patching a production EM environment.

The MAA team has recently published a new whitepaper which provides an overview of the recommended patching strategy for Oracle Enterprise Manager.  This information is intended to be a guideline for maintaining a patched and highly available EM environment and may need customization to accommodate requirements of an individual organization.

There are five main topics covered in this paper as outlined below:

  1. Enterprise Manager Components

  2. Defining Business Requirements

  3. Patching Strategy Overview

    1. Types of Patches

    2. Define Patch List and Steps

    3. Planning

    4. Preparation

    5. Testing

  4. Sample Patching Steps

  5. Optional Patching Strategy

http://www.oracle.com/technetwork/database/availability/em-patching-bp-2132261.pdf


Friday Oct 25, 2013

Latest Exadata Quarterly Full Stack Patch

 
   
  

The latest Quarterly Full Stack patch was released on October 17th for Exadata.  This patch contains all of the latest recommended patches for the Exadata Database Machine. 

For more details, refer to My Oracle Support Note titled Database Machine and Exadata Storage Server 11g Release 2 (11.2) Supported Versions [888828.1]

Note that the information in the note applies only to Exadata Storage Server Software 11.2.  This note is maintained on a regular basis so bookmark it and use it as reference for the latest Exadata Database Machine patching recommendations.

Saturday Oct 12, 2013

Exadata Health and Resource Usage Monitoring

 
   
  

 MAA has recently published a new whitepaper documenting an end to end approach to health and resource utilization monitoring for Oracle Exadata Environments. In an addition to technical details a troubleshooting methodology will be explored that allows administrators to quickly identify and correct issues in an expeditious manner. 

The document takes a “rule out” approach in that components of the system will be verified as performing correctly to eliminate its role in the incident. There will be five areas of concentration in the overall system diagnosis 

1. Steps to take before problems occur that can assist in troubleshooting 

2. Changes made to the system 

3. Quick analysis 

4. Baseline comparison 

5. Advanced diagnostics

http://www.oracle.com/technetwork/database/availability/exadata-health-resource-usage-2021227.pdf


Friday Oct 04, 2013

Java Heap Size Settings For Enterprise Manager 12c

This blog is to provide an update to a previous blog (Oracle Enterprise Manager 12c Configuration Best Practices (Part 1 of 3)) on how to increase the java heap size for an OMS running release 12cR3.  The entire series can be found in the My Oracle Support note titled Oracle Enterprise Manager 12c Configuration Best Practices [1553342.1].

Increase JAVA Heap Size

For larger enterprises, there may be a need to increase the amount of memory used for the OMS.  One of the symptoms of this condition is a “sluggish” performance on the OMS.  If it is determined that the OMS needs more memory, it is done by increasing the JAVA heap size parameters.  However, it is very important to increase this parameter incrementally and be careful not to consume all of the memory on the server.  Also, java does not always perform better with more memory. 

Verify:  The parameters for the java heap size are stored in the following file:

<MW_HOME>/user_projects/domains/GCDomain/bin/startEMServer.sh

Recommendation:  If you have more than 250 agents, increase the -Xmx parameter which specifies the maximum size for the java heap to 2 gb.  As the number of agents grows, it can be incrementally increased.  Note:  Do not increase this larger than 4gb without contacting Oracle.  Change only the –Xmx value in the line containing USER_MEM_ARGS="-Xms256m –Xmx1740m …options…" as seen in the example below.   Do not change the Xms or MaxPermSize values. Note:  change both lines as seen below.  The second occurrence will be used if running in debug mode.

Steps to modify the Java setting for versions prior to 12cR3 (12.1.0.3)

Before

 if [ "${SERVER_NAME}" != "EMGC_ADMINSERVER" ] ; then
  USER_MEM_ARGS="-Xms256m -Xmx1740m
 -XX:MaxPermSize=768M -XX:-DoEscapeAnalysis -XX:+UseCodeCacheFlushing 
-XX:ReservedCodeCacheSize=100M -XX:+UseConcMarkSweepGC -XX:+UseParNewGC 
-XX:+CMSClassUnloadingEnabled"
  if [ "${JAVA_VENDOR}" = "Sun" ] ; then
    if [ "${PRODUCTION_MODE}" = "" ] ; then
      USER_MEM_ARGS="-Xms256m -Xmx1740m
 -XX:MaxPermSize=768M -XX:-DoEscapeAnalysis -XX:+UseCodeCacheFlushing 
-XX:ReservedCodeCacheSize=100M -XX:+UseConcMarkSweepGC -XX:+UseParNewGC 
-XX:+CMSClassUnloadingEnabled -XX:CompileThreshold=8000 
-XX:PermSize=128m"
    fi
  fi
  export USER_MEM_ARGS
fi

After

 if [ "${SERVER_NAME}" != "EMGC_ADMINSERVER" ] ; then
  USER_MEM_ARGS="-Xms256m -Xmx2560m -XX:MaxPermSize=768M
 -XX:-DoEscapeAnalysis -XX:+UseCodeCacheFlushing 
-XX:ReservedCodeCacheSize=100M -XX:+UseConcMarkSweepGC -XX:+UseParNewGC 
-XX:+CMSClassUnloadingEnabled"
  if [ "${JAVA_VENDOR}" = "Sun" ] ; then
    if [ "${PRODUCTION_MODE}" = "" ] ; then
      USER_MEM_ARGS="-Xms256m –Xmx2560m
 -XX:MaxPermSize=768M -XX:-DoEscapeAnalysis -XX:+UseCodeCacheFlushing 
-XX:ReservedCodeCacheSize=100M -XX:+UseConcMarkSweepGC -XX:+UseParNewGC 
-XX:+CMSClassUnloadingEnabled -XX:CompileThreshold=8000 
-XX:PermSize=128m"
    fi
  fi
  export USER_MEM_ARGS
fi

Steps to modify the Java setting for version 12.1.0.3

emctl set property -name JAVA_EM_MEM_ARGS -value "<value>"
emctl stop oms -all
emctl start oms

Please note that this value gets seeded inside emgc.properties and is used to start the OMS.  Please be careful setting this property as this would be the property used by the OMS to start and the oms can fail to start if it is not specified correctly.  Below is an example of the command:

emctl set property -name JAVA_EM_MEM_ARGS -value "-Xms256m -Xmx2048m -XX:MaxPermSize=768M -XX:-DoEscapeAnalysis -XX:+UseCodeCacheFlushing -XX:ReservedCodeCacheSize=100M -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+CMSClassUnloadingEnabled"

 
    

Friday Aug 23, 2013

Creating a disk monitoring metric extension for Exadata Compute Nodes


 It is highly desirable to monitor the Exadata Compute node disks for current failures or degraded performance. By using the Enterprise Manager metric extension functionality Compute nodes can be monitored for these conditions and an alert created in the event of an issue. The following steps will guide you through this process


1. First a root monitoring credential set must be created . Login into the OMS using emcli
$ ./emcli login -username=sysman
Enter password :
Login successful

2. Create the credential set:
$ ./emcli create_credential_set -set_name=root_monitoring -target_type=cluster -supported_cred_types=HostCreds -descript=root_monitoring monitoring
Create credential set for target type host

3. Next login to EM and go to the monitoring credentials page to setup credentials for a test target

Setup--> Security-->Monitoring Credentials
Select Cluster and the push the "Manage Monitoring Credentials" button
Find the target you want to test on with the credential set defined in step 2( In this case root_monitoring)
Highlight the credential set and push the "Set Credentials" button. Enter the credentials and use the test and save button to ensure they are correctly defined


4. Next create the metric extenstion

Enterprise-->Monitoring-->Metric Extensions

Select the create button

5. On the General Properties Screen set the following

Target type select "Host"
Name "Compute_Node_Disk_Monitoring"
Display Name "Compute Node Disk Monitoring"
Adapter "OS Command - Multiple Columns"
Data Collection "Enabled"
Repeat Every "5 Minutes"
Use of Metric Data "Alerting and Historical Trending"
Upload Interval "1 Collections"
Select the Next Button

6. Now create the script to run on the agent

On your local machine create a file called megaclicommand.sh that contains the following
/opt/MegaRAID/MegaCli/MegaCli64 AdpAllInfo -aALL | grep "Virtual Drives" -A 6 | grep -w 'Degraded\|Critical\|Offline\|Failed' | sed 's/Degraded/Virtual Drives Degrades/g' | sed 's/Offline/Virtual Drives Offline/g' | sed 's/Critical Disks/Critical Physical Disks/g' | sed 's/Failed Disks/Failed Physical Disks/g'

7. On the "Edit Storage Server Disk Status (ME$Storage_Server_Disk_Status) v1 : Adapter" page enter the following

Command "/bin/sh"
Script "%scriptsDir%/megaclicommand.sh"
Delimiter ":"

8. On the Upload Custom Files Section

Select the upload button and select the file created in step 6
Click okay and one back to the Create New:Adapter page select the "Next" button

9. On the "Create New : Columns" page create two columns

Column one should be setup as:
Name "Type"
Display Name "Type"
Column Type "Key Column"
Value Type "String"
Metric Category "Fault"
Column two should be setup as:
Name "Value"
Display Name "Value"
Column Type "Data Column"
Value Type "Number"
Metric Category "Fault"
Comparison Operator ">"
Critical "0"
After Setting up the two column select the next button

10. On The Credentials Screen

Select the “Specify Credential Set” radio button
In the drop down box select the credential set created in step 1
Click the next button
11. On the “Create New : Test” page

Add a target to test with in the “Test Targets” section
Click the “Run Test” button and ensure that results are displayed properly in the “Test Results” box.
The results should be similar to below
Type Value
Virtual Drives Degrades 0
Virtual Drives Offline 0
Critical Physical Disks 0
Failed Physical Disks 0

12. Next the the Metric Extension must be saved as a deployable draft. This is accomplished on the main metric extension page. This allows the metric to be deployed to targets for testing. However at this stage only the developer has access to publish the metric. After satisfactory testing is completed the metric is then published. This is once again accomplished from the main metric extension page.


To ensure that administrators are notified in the event the metric created fails an incident rule should be created.

1, To Begin navigate to the Incident Rules Home Page
         From the Setup button on the upper right hand corner of the Enterprise Overview Page
         Setup->Security->Incident Rules

         Now click the “Create Rule Set..” button

 2. On the Create Rule Set screen enter the following information

        Name: Whatever the rule should be called. i.e. Metric Collection Error Rule
        Enabled Box: Should be checked
        Type: Enterprise
        Applies To: Targets
        Select the “All Targets of Type” radio button on the bottom of the screen followed by Host in the drop down box

3. Now select the “Rules” tab at the bottom of the screen

4. Chose the "Create.." button on the middle of the screen

4. On the “Select Type of Rule to Create” Popup box select the “Incoming events and updates to events” radio button. Click the continue button.

5. On the “Create New Rule: Select Events” screen check the type check box. In the drop down select “Metric Extension Update”. Click the next button

6. On the “Add conditional Actions” page you can specify conditions that must occur for the action to apply, if an incident should be created and email notifications. Specify the desired properties and select the continue button.

7. If no additional rules are required select the next button on the “Create New Rule: Add Actions” page.

8. On the next screen either accept the default rule name or specify the desired name

9. For the “Create New Rule : Review” page, ensure everything looks correct and select the “continue button”

10. Lastly click the “Save” button to complete the setup

11. The metric can now be deployed to the desired target by selecting the “Deploy to Targets” option from the “Actions” drop down button on the Metric Extensions Page

Monday Aug 19, 2013

Simplified Agent and Plug-in Deployment

On your site of hundreds or thousands of hosts have you had to patch agents immediately as they get deployed?  For this reason I’ve always been a big fan of cloning an agent that has the required plug-ins and all the recommended core agent and plug-in patches, then using that clone for all new agent deployments. With Oracle Enterprise Manager 12c this got even easier as you can now clone the agent using the console “Add Host” method. You still have to rely on the EM users to use the clone. The one problem I have with cloning is that you have to have a reference target for each platform that you support. If you have a consolidated environment and only have Linux x64, this may not be a problem. If you are managing a typical data center with a mixture of platforms, it can become quite the maintenance nightmare just to maintain your golden images. You must update golden image agents whenever you get a new patch (generic or platform specific) for the agent or plug-in, and recreate the clone for each platform. Typically, I find people create a clone for their most common platforms, and forget about the rest. That means, maybe 80% of their agents meet their standard patch requirements and plug-ins upon deployment, but the other 20% have to be patched post-deploy, or worse – never get patched!

While deployed agents and plug-ins can be patched easily using EM Patches & Updates, but what about the agents still getting deployed or upgraded? Wouldn’t it be nice if they got patched as part of the deployment or upgrade? This article will show you two new features in EM 12.1.0.3 (EM 12cR3) that will help you deploy the most current agent and plug-in versions. Whether you have 100s or 1000s of agents to manage, reducing maintenance and keeping the agents up to date is an important task, and being able to deploy or upgrade to a fully patched agent will save you a lot of time and effort.

[Read More]

Friday Jul 26, 2013

Operational Considerations and Troubleshooting Oracle Enterprise Manager 12c

Oracle Enterprise Manager (EM) 12c has become a valuable component in monitoring and administrating an enterprise environment. The more critical the application, servers and services that are monitored and maintained via EM, the more critical the EM environment becomes. Therefore, EM must be as available as the most critical target it manages.


There are many areas that need to be discussed when talking about managing Enterprise Manager in a data center. Some of these are as follows:

• Recommendations for staffing roles and responsibilities for EM administration

• Understanding the components that make up an EM environment

• Backing up and monitoring EM itself

• Maintaining a healthy EM system

• Patching the EM components

• Troubleshooting and diagnosing guidelines

The Operational Considerations and Troubleshooting Oracle Enterprise Manager 12c whitepaper available on the  Enterprise Manager Maximum Availability Architecture (MAA) site will help define administrator requirements and responsibilities.  It provides guidance in setting up the proper monitoring and maintenance activities to keep Oracle Enterprise Manager 12c healthy and to ensure that EM stays highly available.

Thursday Jun 27, 2013

Oracle Enterprise Manager 12c Configuration Best Practices (Part 3 of 3)

This is part 3 of a three-part blog series that summarizes the most commonly implemented configuration changes to improve performance and operation of a large Enterprise Manager 12c environment. A “large” environment is categorized by the number of agents, targets and users. See the Oracle Enterprise Manager Cloud Control Advanced Installation and Configuration Guide chapter on Sizing for more details on sizing your environment properly.

  • Part 1 of this series covered recommended configuration changes for the OMS and Repository
  • Part 2 covered recommended changes for the Weblogic server
  • Part 3 covers general configuration recommendations and a few known issues

The entire series can be found in the My Oracle Support note titled id=1553342.1">Oracle Enterprise Manager 12c Configuration Best Practices [1553342.1].

Configuration Recommendations

Configure E-Mail Notifications for EM related Alerts

In some environments, the notifications for events for different target types may be sent to different support teams (i.e. notifications on host targets may be sent to a platform support team). However, the EM application administrators should be well informed of any alerts or problems seen on the EM infrastructure components.


Recommendation: Create a new Incident rule for monitoring all EM components and setup the notifications to be sent to the EM administrator(s). The notification methods available can create or update an incident, send an email or forward to an event connector. To setup the incident rule set follow the steps below. Note that each individual rule in the rule set can have different actions configured.


1.  To create an incident rule for monitoring the EM components, click on Setup / Incidents / Incident Rules. On the All Enterprise Rules page, click on the out-of-box rule called “Incident management Ruleset for all targets” and then click on the Actions drop down list and select “Create Like Rule Set…”

2. For the rule set name, enter a name such as MTM Ruleset. Under the Targets tab, select “Specified targets” and select "Targets" from the Add drop down list.  Click on the green "+" sign.  Click on the drop down arrow for Target Type and deselect all target types except "EM Service" and “OMS and Repository".  Click "Search".  Select the targets returned and click "Select".


3. Click on the Rules tab. To edit a rule, click on the rule name and click on Edit as seen below

4. Modify the following rules (names for rules in 12.1.0.3 are in parentheses if they have changed):

a. Incident creation Rule for metric alerts (Create incident for critical metric alerts)

i. Leave the Type set as is but change the Severity to add Warning by clicking on the drop down list and selecting “Warning”. Click Next.

ii.  Add or modify the actions as required (i.e. add email notifications). Click Continue and then click Next.

iii. Leave the Name and description the same and click Next.

iv. Click Continue on the Review page.

b. Incident creation Rule for target unreachable.

i.   Leave the Type set as is but change the Target type to add EM Service and OMS and Repository by clicking on the drop down list selecting both "EM Service" and “OMS and Repository”. Click Next.

ii.  Add or modify the actions as required (i.e. add email notifications) Click Continue and then click Next.

iii. Leave the Name and description the same and click Next.

iv. Click Continue on the Review page.

5 Modify the actions for any other rule as required and be sure to click the “Save” push button to save the rule set or all changes will be lost.

Configure Out-of-Band Notifications for EM Agent

Out-of-Band notifications act as a backup when there’s a complete EM outage or a repository database issue. This is configured on the agent of the OMS server and can be used to send emails or execute another script that would create a trouble ticket. It will send notifications about the following issues:

? Repository Database down

? All OMS are down

? Repository side collection job that is broken or has an invalid schedule

? Notification job that is broken or has an invalid schedule

Recommendation: To setup Out-of-Band Notifications, refer to the MOS note “How To Setup Out Of Bound Email Notification In 12c” (Doc ID 1472854.1)

Modify the Performance Test for the EM Console Service

The EM Console Service has an out-of-box defined performance test that will be run to determine the status of this service. The test issues a request via an HTTP method to a specific URL. By default, the HTTP method used for this test is a GET but for performance reasons, should be changed to HEAD. The URL used for this request is set to point to a specific OMS server by default. If a multi-OMS system has been implemented and the OMS servers are behind a load balancer, then the URL used by EM as the URL in notifications and by this EM Service test must be modified to point to the load balancer name instead of a specific server name. If this is not done and a portion of the infrastructure is down then the EM Console Service will show down as this test will fail.

Recommendation: Modify the HTTP Method for the EM Console Service test and the URL if required following the detailed steps below.

Setting the Console URL if a multi-OMS system is implemented:

1.  Click on Setup / Manage Cloud Control / Health Overview

2.  Click on the "Add" push button next to Console URL as seen in the picture below.


3.  Type in the URL and click OK.


Modifying the HTTP Method for the EM Console Service test:

1.  To create an incident rule for monitoring the EM components, click on Targets / Services. From the list of services, click on the EM Console Service.

2. On the EM Console Service page, click on the Test Performance tab.

3.  At the bottom of the page, click on the Web Transaction test called EM Console Service Test

4.  Click on the Service Tests and Beacons breadcrumb near the top of the page.

5.  Under the Service Tests section, make sure the EM Console Service Test is selected and click on the Edit push button.

6.  Under the Transaction section, make sure the Access Logout page transaction is selected and click on the Edit push button

7) Under the Request section, change the HTTP Method from the default of GET to the recommended value of HEAD. The URL in this section should point to the load balancer name instead of a specific server name if multi-OMSes have been implemented and the Console URL was set according to the steps above.

Check for Known Issues

Job Purge Repository Job is Shown as Down

This issue is caused after upgrading EM from 12c to 12cR2. On the Repository page under Setup → Manage Cloud Control → Repository, the job called “Job Purge” is shown as down and the Next Scheduled Run is blank. Also, repvfy reports that this is a missing DBMS_SCHEDULER job.  NOTE:  this issue is fixed in version 12.1.0.3

Recommendation: In EM 12cR2, the apply_purge_policies have been moved from the MGMT_JOB_ENGINE package to the EM_JOB_PURGE package. To remove this error, execute the commands below:

$ repvfy verify core -test 2 -fix

To confirm that the issue resolved, execute

$ repvfy verify core -test 2

It can also be verified by refreshing the Job Service page in EM and check the status of the job, it should now be Up.

Configure the Listener Targets in EM with the Listener Password (where required)

EM will report this error every time it is encountered in the listener log file. In a RAC environment, typically the grid home and rdbms homes are owned by different OS users. The listener always runs from the grid home. Only the listener process owner can query or change the listener properties. The listener uses a password to allow other OS users (ex. the agent user) to query the listener process for parameters. EM has a default listener target metric that will query these properties. If the agent is not permitted to do this, the TNS incident (TNS-1190) will be logged in the listener’s log file. This means that the listener targets in EM also need to have this password set. Not doing so will cause many TNS incidents (TNS-1190). Below is a sample of this error from the listener log file:

Recommendation: Set a listener password and include it in the configuration of the listener targets in EM

For steps on setting the listener passwords, see MOS notes: 260986.1 , 427422.1

Thursday Jun 20, 2013

Oracle Enterprise Manager 12c Configuration Best Practices (Part 2 of 3)

This is part 2 of a three-part blog series that summarizes the most commonly implemented configuration changes to improve performance and operation of a large Enterprise Manager 12c environment. A “large” environment is categorized by the number of agents, targets and users. See the Oracle Enterprise Manager Cloud Control Advanced Installation and Configuration Guide chapter on Sizing for more details on sizing your environment properly.

  • Part 1 of this series covered recommended configuration changes for the OMS and Repository
  • Part 2 covers recommended changes for the Weblogic server
  • Part 3 will cover general configuration recommendations and a few known issues

The entire series can be found in the My Oracle Support note titled id=1553342.1">Oracle Enterprise Manager 12c Configuration Best Practices [1553342.1].

WebLogic Server Recommendations

Stuck Thread Max Time

By design WLS will ping applications and wait for a response for up to the value of Stuck Thread Max Time which is set to 600 seconds by default. This is a heartbeat to ensure that a particular thread is not stuck. EM on the other hand will keep threads running as long as there is work in the queue and they will not respond to a heartbeat. This is expected behavior for both EM and WebLogic Server however it will cause WLS to timeout and error which will create an incident within EM. If this parameter is not increased, the number of incidents created by this WLS error can be significant. Below is an example of the incident that may be seen. Please note, an enhancement bug has been created requesting that EM install out of the box with a higher value for this parameter.


Recommendation: To assist in reducing these errors, increase the stuck thread timeout in the Admin server as per the steps below. Note that this will reduce the number of above alerts but may not remove them completely.

1. Log onto the WLS Admin server.

2. Click on Environment in the top right side menu and expand Servers. Click on one of the OMS server names.

3. Click on the Tuning tab on in the middle window and then on the Lock and Edit under the Change Center (top left).

4. Change the value for Stuck Thread Max Time to 1800.


5. Save and Activate the change. This will require a restart of the OMS server for it to go into effect and will need to be repeated for all servers in the Admin Console (i.e. OMS servers and ADMINSERVER) but only needs to be done once per site/domain. If the environment contains standby OMS servers, repeat these steps for all standby OMS servers and the ADMINSERVER although a reboot is not required for the standby OMS servers as they are not running.

Modify Log Settings

The default severity setting for logging information in the WebLogic Server is set at a level that will create excessive logging data. These settings should be set to a higher severity level.

Recommendation: To modify these settings, follow the steps below:

1.  Log onto the WLS Admin server.

2.  Click on Environment in the top right side menu and expand Servers. Click on the first OMS server.


3. Click on the Logging tab in the middle window and then on the Lock and Edit under the Change Center (top left).




4. Expand the Advanced option at the bottom of the page.


5. Change the Minimum log severity from Info to Warning.


6. Change the Domain Log Broadcaster Severity Level from Notice to Error.


7. Save and Activate the change. This does not require a restart of the OMS server for it to go into effect but will need to be repeated for all servers in the Admin Console (i.e. OMS servers and ADMINSERVER. This change only needs to be done once per site/domain. If the environment contains standby OMS servers, repeat these steps for all standby OMS servers and the ADMINSERVER.

About

bocadmin_ww

Search

Archives
« April 2015
SunMonTueWedThuFriSat
   
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
  
       
Today