Oracle Support Master Note for 10g Grid Control OMS Performance Issues (Doc ID 1161003.1)

 

 

For most current information refer Master Note for 10g Grid Control OMS Performance Issues (Doc ID 1161003.1)

 

 

In this Document
  Purpose
  
Scope and Application
  
Master Note for 10g Grid Control OMS Performance Issues
     
What Constitutes an OMS Performance Issue?
     
What can Impact OMS Performance? 
     
Diagnostic Tools Available for Troubleshooting OMS Performance Issues
     
Troubleshooting OMS Performance Issues
     
Best Practices (Certification, Maintenance Activities, OCM, Healthcheck, CPU & PSU)
  
References


Applies to:

Enterprise Manager Grid Control - Version: 10.1.0.2 to 10.2.0.5 - Release: 10.1 to 10.2
Information in this document applies to any platform.

Purpose

This Master Note helps understand OMS Performance Issues and provides assistance in using diagnostics effectively to debug/troubleshoot and resolve issues encountered.

Scope and Application

This document is intended to assist Enterprise Manager Grid Control Administrators effectively troubleshoot OMS Performance Issues. This document covers the following topics:

1. What Constitutes an OMS Performance Issue?
2. What can Impact OMS Performance? 
3. Diagnostic Tools Available for Troubleshooting OMS Performance issues
4. Troubleshooting OMS Performance Issues
5. Best Practices (Maintenance Activities, OCM, Healthcheck, CPU & PSU and Certification)

Master Note for 10g Grid Control OMS Performance Issues

What Constitutes an OMS Performance Issue?

An OMS performance issue can manifest in many ways:

- Certain or all pages in the Grid Console take a very long time to be displayed or cause the OMS to core dump.
- Performing certain operations, for example: editing a group from the Grid Console takes a very long time and causes OMS to crash.
- OMS related processes are consuming large amount of CPU / memory at the OS level.
- Core dumps are generated in the OMS home.
- Grid Console cannot be accessed at intervals - returns a 'Page not Found' error, checking the logs show that the OMS is crashing and re-starting at the time of the problem.
- The OMS components, such as Loader have a large pending backlog.
- The MTM -> Errors page shows many repeated errors for the OMS components such as ORA-14400 for partitions, etc.


An understanding of OMS' Java Threads and Repository Components is fundamental for effective performance analysis:

OMS Subsystems

OMS is a multi-threaded Java application and the Java Threads can be classified as:

- Persistent Background EM Daemon Threads
- Worker Background Threads from EM Thread Pools 
- Request Handler Threads from OC4J_EM Thread Pools

  • Persistent Background EM Daemon Threads

    The persistent threads are associated with the OMS sub-systems and are responsible for performing important functions during the OMS operation. Most of these are critical threads which need to be working when the OMS is operational. 
    If any of these threads hang or become unresponsive, then the OMS is forcefully re-started. 
    Examples: HealthMonitor, OMSHeartbeatThread, Job Dispatcher, XMLLoaderN, NotificationMgr, PingHeartBeatRecorder
  • Worker Background Threads from EM Thread Pools 

    These threads are created on a need basis and are the picked from the thread pools managed by the EM application itself. The Idle threads are usually named [Thread-n]. When picked by a particular sub-system, the thread could be re-named to include the sub-system name, for example 
    [JobWorker 690559:Thread-24].
    Examples: Job Worker Thread, Delivery Thread.
  • Request Handler Threads from OC4J_EM Thread Pools

    These Threads are created as per need basis and are managed by the OC4J_EM Container. They are usually called [AJPRequestHandler-ApplicationServerThread-n] and serve http(s) requests. When picked by a particular sub-system, the thread could be re-named to include the sub-system name, for example [Job Receiver 0 0:AJPRequestHandler-ApplicationServerThread-16].
    Examples: Job Receiver, Metadata Loader Threads, Severity Loader Threads, UI Threads that Serve Console Pages, Data File Receiver Threads

For complete details on the above and the Repository database side connection pools used by OMS, refer to the Note 1097545.1: Description of Important Java Threads in a 10g Grid Control Oracle Management Service (OMS).

Repository Components

The Grid Control / Management Repository is a Database schema, called SYSMAN holding all the EM information. The schema includes objects such as tablespaces, tables, views, triggers, packages, procedures, dbms jobs, synonyms (private and public), etc. The repository stores the EM framework data and performs self maintenance like data rollups, data purging, and running other triggers based on the data stored in the tables. 
The Management Repository is configured in an Oracle Database, which can either be an existing database in the environment or a new one installed along with Grid Control. Additionally, there is also a schema called MGMT_VIEW, which is used by the Information Publisher / Reporting Framework in the Grid Control, for querying data out of the repository.

For description of common setups, refer to Oracle Enterprise Manager Grid Control Installation and Configuration Guide 10g Release 5 (10.2.0.5.0), Chapter 17 - 
Grid Control Common Configurations

For additional details on the Repository Components and maintenance activities, refer to:
Note 1164855.1: Overview of the 10g Grid Control Management Repository

********************************************************************************

What can Impact OMS Performance?

OMS is a J2EE application running in an Oracle Application Server Containers for J2EE (OC4J) instance within the Application Server installation. It is responsible for rendering the user interface for Oracle Enterprise Manager Grid Control, interacting with Management Agents on monitored hosts and storing persistent data in the Oracle Management Repository. The Repository itself is a schema in an Oracle Database and depends on the Database features and components for its functionality. Due to various components involved, there can be many factors influencing the OMS performance:

  • Database and Listener level issues such as Database Hang, Archive destination full, ORA-600 / 7445 errors causing database crash, inaccessibility of the Datafiles/Redologs etc, block corruption, dbms_jobs not running at scheduled time, Tablespaces / Datafiles running out of space etc, which will affect the database operations.
  • Repository Database / Listener availability issues: if any of the OMS persistent threads are unable to connect to the Repository database, then the OMS is forcefully re-started.
  • EM Repository Schema (sysman) level issues such as Invalid Objects, design flaw in the pl/sql objects, deadlocks caused by EM sessions, EM related dbms_jobs not submitted, Maintenance activities not performed, etc causing the OMS to run out-of-memory and crash.
  • OMS thread level issues: OMS threads consuming more cpu / memory or hung. If any of the OMS threads hang or time out, the OMS will be re-started forcefully by the HealthMonitor thread.
  • Application Server component level issues such problem with the HTTP Server, opmn etc which can affect OMS operations and accessibility.
  • Java problems in the OMS Home: an unsupported jdk version in the OMS home can cause problems for its operations. 
  • Operations performed in certain Console pages, which can cause OMS to consume lot of memory and then crash.
  • Other external factors such as the OCM Harvestor job, Firewall setup between OMS and Repository database, etc.


********************************************************************************

Diagnostic Tools Available for Troubleshooting OMS Performance Issues

  • 'Monitor the Monitor' (MTM) Pages in Grid Console

    The 'OMS and Repository' target discovered automatically at the time of the Grid Agent installation on the OMS machine, is a special type of target which provides self-monitoring of the OMS and the repository operations performed by the OMS. The internal target type for this target is 'oracle_emrep' and the Grid Console UI pages related to this target are called 'Monitor the Monitor' pages.
    • In a 10.1 Grid Console, the details can be accessed from Management System -> Management Services and Repository.
    • In a 10.2 Grid console, the details are accessible from Setup -> Management Services and Repository page.
    • For a description of the details shown in these pages, refer to 
      Note 1178258.1 Overview of the 'Management Services and Repository' / Monitor-the-Monitor (MTM) Pages in Grid Console.
  • Monitoring Vital Signs of the Enterprise Manager Deployment

    Refer to the steps in Oracle Enterprise Manager Administration 10g Release 5 (10.2.0.5), 
    Chapter 11 - Sizing Your Enterprise Manager Deployment
    Topic: Eliminate Bottlenecks Through Tuning
  • EMDiagkit

The EMDiagkit is a diagnostic tool developed to assist in diagnosis and correction of Enterprise Manager 10g Framework issues. At present, the tool allows us to extract necessary troubleshooting data from the EM Repository Schema using the repvfy utility.

The details for installation, usage of EMDiagkit are available in 

Note 421053.1: EMDiagkit Download and Master Index

  • RDA

    The Remote Diagnostic Agent (RDA) can be executed specifically with the Grid Control / OMS profile name: GridControl and the Database profile name: DB10g / DB11g in order to reduce the number of questions that need to be answered and also to collect all details of the OMS / Database Homes correctly.

    The steps to execute the RDA with GridControl and profiles are explained in:

               
    Note 1057051.1: How to Run the RDA against a Grid Control Installation

    It is highly recommended that the latest EMDiagkit is installed and executed in the OMS home, before running the RDA. This will ensure that the RDA picks up the latest data collected by the EMDiagkit.
  • PL/SQL Tracing

    It is possible to trace the PL/SQL routines executed in the Grid Control Repository for certain OMS modules. This feature is available from the Grid Control 10.2.0.1 onwards and is very useful when trying to debug some SQL exception, or trying to narrow-down a problem with one of the internal OMS server sub-systems.
    For more details regarding the steps, refer to 
    Note 435055.1: How to Enable Tracing for PL/SQL Routines in the 10g Enterprise Manager Grid Control Repository
  • Database-level Diagnostic Utilities

    There are many tools at the Database side, which are helpful in diagnosing performance issues in the Repository Database. A Database-level problem can manifest itself in several ways affecting the performance and operation of the Grid Control components:
    • The OMS is crashing frequently due to inaccessibility of the repository database. This can be verified by attempting a connection to the database via sqlplus, outside the EM setup.
    • Performance of the OMS components like Loader, Notifications etc are slow resulting in a high backlog.
    • Certain Grid Console pages are slow in responding, etc.

For more details on the various tools, refer to Note 1098262.1: Master Note for Diagnostic Tools for 10g Enterprise Manager Grid Control Components, 
Section: Diagnostic Tools for the Enterprise Manager Grid Control Repository
Topic: Database-level Diagnostic Utilities

  • Using OS level Utilities:

Unix/Linux

·         OS Watcher:

The OS Watcher is a tool developed by the Database Center Of Expertise team that can be very helpful in collecting OS related statistics that can be used when diagnosing a performance problem.

1.      OS Watcher (OSW) is a series of shell scripts that collect specific kinds of data, using operating system diagnostic utilities.

2.      Control is passed to individually spawned operating system data collector processes, which in turn collect specific data, timestamp the data output, and append the data to pre-generated and named files.

3.      Each data collector will have its own file, created and named by the File Manager process.

4.      OSW invokes the distinct operating system utilities listed below as data collectors.

5.      These utilities will be supported, or their equivalents, as available for each supported target platform:

ps
top
mpstat
iostat
netstat
traceroute
vmstat

Note 301137.1: OS Watcher User Guide

Microsoft Windows Utilities

·         OS Watcher (for Microsoft Windows):

Note 433472.1: OS Watcher For Windows (OSWFW) User Guide 

·         Microsoft Sysinternals Utilities

Process Explorer
It has a GUI interface and displays more information about each running process. Find out what files, registry keys and other objects processes have open, which DLLs they have loaded, and more. This uniquely powerful utility will even show you everything about the process (CPU,Memory,Handles ...etc)

Process Monitor
Monitor file system, Registry, process, thread and DLL activity in real-time.

PsList
Show information about processes and threads.

ProcDump 
Is a command-line utility whose primary purpose is monitoring an application for CPU spikes and generating crash dumps during a spike that an administrator or developer can use to determine the cause of the spike.

VMMap
See a breakdown of a process's committed virtual memory types as well as the amount of physical memory (working set) assigned by the operating system to those types. Identify the sources of process memory usage and the memory cost of application features.

Handle
This handy command-line utility will show you what files are open by which processes, and much more.

Note: Microsoft Sysinternals Utilities are third party tools and any problems faced while using these tools cannot be supported by Oracle Support. Also, the above mentioned download links are not maintained by Oracle and hence are subject to change.

********************************************************************************

Troubleshooting OMS Performance Issues

  • OMS Crashing / Restarting Abnormally

    Some common causes for abnormal OMS restart / crash are:

    a) JVM in the OMS home crashing
    b) Unresponsive OC4J_EM process killed by OPMN
    c) HealthMonitor restarting the OMS due to hang / timeout in a critical OMS thread.

    To diagnose and collect more information for the above, refer to
    Note 964469.1: Grid Control Performance: How to Troubleshoot OMS Crash / Restart Issues?

    To find documents related to OMS crash / restart, login to My Oracle Support portal and query the 'Knowledge' with the following keywords:

            Grid Control Performance: OMS Crashes <symptom seen>

    Some examples:

            Note 1114094.1: Grid Control Performance: OMS crashes with error "ORA-01089: immediate shutdown in progress - no operations are permitted"
            
    Note 1142053.1: Grid Control Performance: OMS Crashes With 'java.lang.OutOfMemoryError: Java heap space' When a Database's DBMS_Job History Page is Accessed 
            
    Note 949168.1: Grid Control Performance: OMS Crashes With "java.lang.OutOfMemoryError: Java heap space" Exception Or CacheManager Thread Time out

            Note 781347.1: Grid Control Performance: OMS Crashes With 'OutOfMemory' Errors When Editing / Configuring a Group
  • OC4J_EM / OMS Threads Consuming High CPU

    The Note 298991.1 lists the Operating system level processes that are launched when the OMS is started. The OC4J_EM will have multiple java threads associated with it. Any of these threads could have a problem and consume more resources affecting the overall performance of the OMS. The CPU usage by the OMS processes depends on a lot of factors such as:

    a) Repository Database performance, which affects the OMS threads connecting to it.
    b) Particular operation performed in the Grid Console which results in high memory consumption by the OMS connections to the repository database or the Target.
    c) One of the OMS threads such as Loader hanging during its operations.
    d) Sometimes Memory problems can show up as CPU issue if the JVM is doing frequent garbage collection.
    e) JDK in the OMS home has been corrupted or is of a lower version than required for OMS, etc.

    To diagnose and collect more information for the above, refer to
    Note 1185563.1: Grid Control Performance: How to Collect More Information When OMS Threads Consume High CPU?

  • Loader Performance Issues
    • Refer to the details in Oracle Enterprise Manager Administration 10g Release 5 (10.2.0.5), 
      Chapter 11 -Sizing Your Enterprise Manager Deployment
      to evaluate if an additional OMS needs to be installed.
      If the problem is due to Database performance, consider increasing the number of database instance using RAC.
    • If the Loader component is found to be affecting the OMS performance, refer to the troubleshooting steps in NOTE:285384.1 - How To Effectively Investigate & Diagnose 10g Oracle Management Service (OMS) Upload Problems into the Repository Database 

 

  • Slow Console UI Pages
    • If a particular page in the Grid Console is slow, to gather more details refer to the steps in Note 1098262.1:Master Note for Diagnostic Tools for 10g Enterprise Manager Grid Control Components
      Section: Diagnostic Tools for the Enterprise Manager Console Operations
    • To improve the Login performance of the Console Home page, refer to the steps under 
      Oracle Enterprise Manager Administration 10g Release 5 (10.2.0.5),
      Chapter 12 - Maintaining and Troubleshooting the Management Repository
      Improving the Login Performance of the Console Home Page

Some examples:

Note 836290.1: Grid Control Performance: OMS Crashes When Running Customized Report with 'CHART FROM SQL' Element in Console
Note 781347.1: Grid Control Performance: OMS Crashes With 'OutOfMemory' Errors When Editing / Configuring a Group
Note 734703.1: Grid Control Performance: OMS on AIX Core Dumps When Accessing the 'Management Pack Access' Link

  • Searching My Oracle Support Documents for OMS Performance

    As the search is specific to Enterprise Manager Grid Control issues, we recommend that the search be performed only under the Grid Control section, using the following navigation:

    Login to My Oracle Support then Click Knowledge -> Enterprise Management -> Enterprise Manager Consoles - Packs - and Plugins -> Enterprise Manager Grid Control ->All of Enterprise Manager Grid Control.
  • Using RDA and EMDiagkit for troubleshooting OMS Performance
    • The RDA output generated with the GridControl profile is very useful in obtaining all the configuration files and log/trace files together. Specific logs which will be helpful for each component are described in Note 730308.1.
    •  The EMDiagkit output is very useful in diagnosing problems / mis-configurations with Grid Control Repository objects, which can affect the OMS performance.


Note: It is highly recommended that the latest EMDiagkit is installed and executed in the OMS home, before running the RDA. This will ensure that the RDA picks up the latest data collected by the EMDiagkit.



********************************************************************************

Best Practices (Certification, Maintenance Activities, OCM, Healthcheck, CPU & PSU)

This section lists some of the best practices which will help prevent problems with OMS Performance 

EM Certification Checker

It is strongly recommended that you always use a certified combination of OMS, Agent and Repository Database for managing Targets which are certified with this combination.
The Enterprise Manager certification details are available in:

Note 412431.1: Oracle Enterprise Manager 10g Grid Control Certification Checker

Maintenance Activities

  • Enable Log Rotation for the access_log and error_log files created by the httpd_em.conf file:
    Note 339819.1: How to Enable Rotation for the HTTP_Server and OC4J_EM Logfiles in the 10g Grid Control OMS Home?
  • Execute EMDiagKit at regular intervals (once per week or more frequently, depending on your setup) and check for any new problems that are reported.
  • Take valid backups of the Agent, OMS and Repository Database Homes at regular intervals, to restore back any configuration files that are deleted by accident.
    For a 10.2.0.5 OMS, the 'emctl exportconfig oms' command can be used to backup the necessary OMS configuration details. Refer to the details in Oracle Enterprise Manager Administration 10g Release 5 (10.2.0.5), Chapter - 9 Backup, Recovery, and Disaster Recovery. Topic : 
    OMS Backup and Recovery 
  • Regularly monitor the Loader backlog shown in the Grid Console Setup -> Management Services and Repository -> Overview page.
  • Plan to execute on a monthly basis the tasks described in the documentation:

 

  • If using 10.2.0.5 OMS, refer to Note 853691.1: ALERT: Important Upgrade Steps Required for Enterprise Manager Grid Control 10gR5 (10.2.0.5) Upgrades, for list of important patches that need to be applied to the OMS / Agent.


OCM

Oracle Configuration Manager (OCM) works with My Oracle Support to enable proactive support capability that helps you organize, collect and manage your Oracle configurations by providing Proactive configuration-specific notification of Security and General Alerts, HealthCheck recommendations based on Support Best practices when using configuration auto-collection, Simplified Service Request logging, tracking and reporting and Project cataloging of key milestones and contacts associated with your configurations.

  • Among these the following topics are related to the Enterprise Manager Components: 
    • 2.52 Oracle Enterprise Manager 10g Grid Control Management Agent:
    • 2.54 Oracle Enterprise Manager 10g Grid Control Management Service
    • 2.53 Oracle Enterprise Manager 10g Grid Control Management Repository
    • 2.72 Oracle Grid Control Repository (for oracle_emrep target)
    • 2.38 Oracle Agent Deployment Configuration (oracle_emd target)
    • 2.73 Oracle Home
    • 2.23 Host

Note: The above list is expected to be expanded as and when new collections are introduced in future.

  • It is also advisable to review the collections available for the Database instance, so that the Database hosting the repository can be monitored as well:
    • 2.10 Database Instance
    • 2.78 Oracle Listener

Healthcheck

Healthchecks are executed dynamically against the Oracle Configuration Manager uploaded configurations in My Oracle Support. These checks, based on Oracle Best practices, will proactively notify you of potential problems in your environment, and provide recommendations that help you improve system performance and avoid problems in your Oracle environment. 

  • If you are receiving any Healthcheck alerts in My Oracle support, then refer to the following document for the alert details and its corresponding document for resolving the same:

Note 868955.1: My Oracle Support Health Checks Catalog

  • For Healthchecks specific to the Enterprise Manager and Repository Database, refer to the sections titled:
    • Enterprise Manager (for the OMS)
    • Oracle Database (for the Database hosting the Repository)




CPU and PSU

  • CPU

    Critical Patch Updates (CPU) is the primary means of releasing security fixes for Oracle products. They are released on the Tuesday closest to the 15th day of January, April, July and October. This page lists all the currently available Critical Patch Updates (CPUs) in chronological order and is updated whenever new Critical Patch is released. You can also subscribe to the CPU Email Alerts using the steps listed here.

    To obtain the latest CPU patch details for the Enterprise Manager Grid Control and its dependent products - Oracle Application Server and Oracle Database:

    - In the 
    page, click on the link shown for the latest CPU in the table under the 'Critical Patch Updates'.
    - The next page, lists all the products which have security fixes in the chosen CPU release. Scroll down to 'Patch Availability Table ..' topic and find the table with details for the Product Group and Patch Availability and Installation Information. 
    - In the table, find the row related to Product Group: 'Oracle Enterprise Manager' and pick up the document number given in the Patch Availability and Installation Information column. In the document, navigate to: 

                 "Critical Patch Update Availability for Oracle Products" and then to
                 "Oracle Enterprise Manager Grid Control"
  • PSU

    Patch Set Updates (PSU) are proactive cumulative patches containing recommended bug fixes that are released on a regular and predictable schedule. PSUs are on the same quarterly schedule as the Critical Patch Updates (CPU), specifically the Tuesday closest to the 15th of January, April, July, and October. The PSUs serve as a new baseline version for reporting issues to Oracle, hence it is always recommended to be on the latest PSU release.
    • For more details on PSU, refer Note 854428.1: Patch Set Updates for Oracle Products 
    • For Enterprise Manager specific PSU, refer Note 822485.1: Oracle Recommended Patches -- Oracle Enterprise Manager
  • Choosing between CPU / PSU patches 

    The PSU and CPU released each quarter contain the same security content. However, the patches employ different patching mechanisms, so customers need to choose wisely which patch satisfies their needs better:
    • A PSU can be applied on the CPU released at the same time or on an any earlier CPU for the base release version. A PSU can be applied on any earlier PSU or the base release version. CPUs are only created on the base release version. 
    • Once a PSU has been installed, the recommended way to get future security content is to apply subsequent PSUs. Reverting from PSU back to CPU, while possible, would require significant effort, and so is not advised. 
  • Getting CPU / PSU patch recommendations via OCM 

    OCM also collects and recommends the latest CPU and PSU patch that can be applied to a particular Oracle Home. These details can be seen in the My Oracle Support ->Patches and Updates -> Patch Recommendations section 
    - 'Security' patch recommendations include the CPU patches.
    - 'Other Recommendations' include the PSU patches.

References

NOTE:1081865.1 - Master Note for 10g Grid Control OMS Process Control (Start, Stop and Status) & Configuration
NOTE:1082009.1 - Master Note for 10g Grid Control Agent Process Control (Start, Stop & Status) & Configuration
NOTE:1087997.1 - Master Note for 10g Enterprise Manager Grid Control Agent Performance & Core Dump issues
NOTE:1092513.1 - Master Note for 10g Enterprise Manager Grid Control Security Framework
NOTE:1098262.1 - Master Note for Diagnostic Tools for 10g Enterprise Manager Grid Control Components
NOTE:1086343.1 - Master Note for 10g Grid Control Enterprise Manager Communication and Upload issues

Comments:

Post a Comment:
  • HTML Syntax: NOT allowed
About

News and Troubleshooting tips for Oracle Database and Enterprise Manager

Search

Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today