By Leslie-Oracle on Mar 19, 2013
Best Practices: Proactive Data Collection for Performance Issues
A common problem for analysis of performance issues is having the right diagnostic tool or trace in place during the first occurrence of the problem. This is not always possible, but chances for a quicker resolution and potential Root Cause Analysis (RCA) are possible if you adopt a proactive best practice around performance data collection.
The best practice described here includes a methodology around data collection:
- Top-Down approach to data collection
- Establishing multiple baselines
- Having the right tools already installed and running before the problem occurs
- Deploying specialized tools for unstable environments
The performance of the Oracle database depends upon the performance of the hardware and operating system that it runs on. This is why it is important to consider operating system metrics as part of a "big picture" when diagnosing performance problems. Oracle support has a utility named OSWatcher Black Box, which will capture information about the operating system. We strongly recommended all customers run OSWatcher Black Box because it provides a "look back" capability on the OS. If for example, we had a node eviction at 2:00 in the morning, we can look back in time to see what happened on the OS at 2:00 or more importantly, what happened at 1:59, just prior to the actual event.
Document 301137.1 OSWatcher Black Box User Guide (Includes: [Video])
If we drill down from the operating system into the different layers of Oracle, we can take advantage of utilities which have been available since version 10g. The Automatic Workload Repository (AWR) and Active Session History (ASH) are two very important utilities to allow us to see what is happening on the database both from a holistic point of view and also at the session level. These 2 utilities are part of the Oracle Diagnostic Pack and require a special license to use. In order to obtain the most effective diagnostics possible, it is strongly recommended that an Oracle Diagnostic Pack license is obtained.
Document 1363422.1 Automatic Workload Repository (AWR) Reports
Document 243132.1 Active Session History (ASH)
The AWR reports can be taken at regular intervals and then used to determine what if any performance problems exist on the database. The use of ASH allows us to see specifically what individual sessions are waiting on or what resources these sessions are consuming.
The good news is that if you proactively take advantage of these tools, it may be possible to diagnose a system or database hang, just by having these tools in place. Many times bug signatures are available within AWR and hangs can be matched to these bug signatures. This may allow for a diagnostic analysis without the need to take more intrusive diagnostics such as systemstate dumps, hanganalyze dumps, etc.
It is very important for you to establish baselines using OSWatcher Black Box and AWR. Baselines should be used as points of reference and are extremely valuable for differential analysis when problems arise. AWR has a compare utility when can be run on two different reports such as your baseline report and the report from the period when the problem actually occurred. This facilitates quick identification of any differences in statistics between the two reports and allows us to focus and identify specific problems. Baselines should be taken during different phases of your business cycle. Examples would be at the busiest time of the day, batch reporting at night, monthly or quarterly end of business cycles, etc.
In addition to OSWatcher Black Box, AWR and ASH, there are other support tools worth considering when it comes to being proactive.
Oracle support has a tool which automates the collection of hang diagnostics. This tool, hangfg, is a simple UNIX shell script that allows users to choose how intrusive they want hang diagnostic traces like systemstate dumps and hang analyze dumps to be on their production servers. The complex syntax and kind of diagnostic trace is embedded in the utility. This frees you from having to know the kind and level of trace and the cryptic syntax to generate these traces. All you need to do when experiencing a hang is run the shell script, which will collect the appropriate diagnostic trace depending upon how intrusive you choose to make it.
There are 3 levels:
- Light impact on system. This option collects 2 hanganalyze level 3 traces and then determines whether it can also collect 1 hanganalyze level 4 trace with minimal impact to the system. If so, it collects the hanganalyze level 4 trace. If not, it does not collect an additional trace file.
- Medium impact on system (default value). This option collects 1 hanganalyze level 3 trace and then determines whether it can also collect 2 hanganalyze level 4 traces with minimal impact to the system. If so, it collects the 2 additional hanganalyze level 4 traces. If not, it collects an additional hanganalyze level 3 trace. This option also collects 1 systemstate level 266 trace.
- Heavy impact on system. This option collects 2 hanganalyze level 4 traces and 2 systemstate level 266 traces.
Document 362094.1 HANGFG User Guide
SQL Tuning Tools
Oracle support also has specialized tools around SQL tuning. SQLT and SQLHC are recommended best practice tools to have downloaded and installed on you systems if possible. SQLT is the best diagnostic tool support has to resolve complicated SQL tuning issues. Both tools extract statistical and environmental data from the database. SQLT is a far more extensive diagnostic tool but does require a separate scheme to be installed on the database. SQLHC does not have a footprint on the database and provides quick health check information. Both tools can be explored in more detail by clicking on the links associated with them.
Hopefully these tips will encourage you to become more proactive with data collection. Our ultimate goal is to reduce your resolution time if you have a performance problem.
Related Resources: Advisor Webcast Recordings
- Resolve—Troubleshooting Operating System Performance Issues Using OS Watcher Black Box [ID 1456176.1, Archived 2012 tab; October 31, 2012]
- Resolve—Best Practices for Proactive Data Collection to Troubleshoot Performance Issues [ID 1456176.1, Archived 2012 tab; September 26, 2012]
- Resolve—Troubleshooting Performance Issues Using SQLT & SQLHealthCheck [ID 1456176.1; Archived 2012 tab; October 17, 2012]
- What is this SQLTXPLAIN tool and how do I use it? [ID 1456176.1; Archived 2012 tab; May 15, 2012]