Oracle Fusion Middleware includes a Diagnostic Framework (DFW) which aids in detecting, diagnosing, and resolving problems. The problems that are targeted in particular are critical errors such as those caused by code bugs, deadlocked threads, out of memory errors, and uncaught exceptions. DFW is available with all FMW 11g installations that run on WebLogic Server.
The goals of DFW are:
- First-failure diagnostics
- Reducing problem diagnostic time
- Reducing problem resolution time
- Simplifying customer interaction with Oracle Support
- Speed up internal testing cycle
The important thing for me is that the relevant diagnostics are captured at the moment of failure giving the customer and Oracle Support the best start they can for resolving the issue. For example, when a deadlocked thread is detected a thread dump should be automatically captured detailing all threads, pin pointing those that are deadlocked.
The framework came about as a result of a diagnostics project that started with the Oracle Database 11g release. In that release the database development group came up with a set of concepts and infrastrucutre for capturing, recording and indexing diagnostics in a consistent way. Out of this project ADR (Automatic Diagnostic Repository) was born, a file-system repository for cataloguing occurences of failures and storage of associated diagnostic data. ADR was designed with the intention that other Oracle products could integrate with it, providing consistency not only for Oracle Database diagnostics but for products across the Oracle stack. In FMW 11gR1 ADR was adopted, along with the concepts, and a framework built that extended it to support FMW environments.
For more information on ADR refer to the Oracle Database Administrator's Guide.
What are the concepts?
A Problem is a critical error, that could be due to an internal error, server error (i.e. thread deadlock) or configuration error that results in a critical condition. Each Problem has a Problem Key, which is a text value used to associate incidents to problems. It is based on the error message id and other context values. Problems are tracked in ADR.
An Incident is single occurrence of a problem. An incident is created for each occurrence of problem (critical error), although subject to flood control. Each incident has a unique ID, and so when DFW logs a message indicating that an incident has been created an administrator can use the ID to look at the associated diagnostics in ADR.
What are the components?
Automatic Diagnostic Repository
The Automatic Diagnostic Repository (ADR) is a file-based repository for storing diagnostics data associated with incidents. It consists of metadata that describes each Problem and Incident, along with the set of diagnostic dump output generated for each incident.
Each WebLogic Server has it's own ADR. The ADR root directory is known as ADR base. By default, the ADR base is located in the following directory:
Within ADR base, there can be multiple ADR homes, where each ADR home is the root directory for all incident data for a particular instance of Oracle WebLogic Server. The following path shows the location of the ADR home:
The image below show the ADR directory structure for Fusion Middleware.
The subdirectories in the ADR home contain the following information:
- alert - The XML-formatted alert log.
- incident - A directory that can contain multiple subdirectories, where each subdirectory is named for a particular incident. The subdirectories are named incdir_n, with n representing the number of the incident. Each subdirectory contains information and diagnostic dumps pertaining only to that incident.
- (others): Other subdirectories of ADR home, which store incident packages and other information
The ADR Command Interpreter (ADRCI) is a utility that enables you to investigate problems, and package and upload first-failure diagnostic data to Oracle Support, all within a command-line environment. ADRCI also enables you to view the names of the dump files in the ADR, and to view the alert log with XML tags stripped, with and without content filtering.
ADRCI is installed in the following directory:
Diagnostic dumps perform targeted diagnostic data capture when an incident is created or on demand when requested by an administrator. They are generally implemented by FMW components/applications and will be configured to run with appropriate types of critical errors. Example diagnostic dumps include:
- Thread dump
- Execution Context (all active ones)
- Active HTTP requests
- Class histogram
- DMS Metrics
- Logs (by ECID)
- Logs (by timestamp up to -/+ 5 minutes)
- WLDF Diagnostic Image
These dumps will be looked at in more detail in a future post.
DFW provides MBeans that allow you to:
- Configure DFW
- Query Problems and Incidents
- Create manual incident
- Upload files and associate them with an existing incident
- Query available diagnostic dumps
- Execute diagnostic dumps
- Download diagnostic dump data
All of the DFW MBeans are available under the "oracle.dfw" MBean domain.
DFW provides WLST commands that you can use to view information about problems and incidents, create incidents, execute specific dumps and query the set of diagnostic dump types. Refer to the "Diagnostic Framework Custom WLST Commands" documentation for more information.
Full documentation on DFW can be found in chapter 12 of the Oracle Fusion Middleware Administrator's Guide.
In further posts I will cover:
- The process of detecting and creating incidents
- Working with Problems and Incidents
- Integration with WLDF
- Diagnostic dumps in detail
- Configuring DFW
- Supportability flows