Monday May 16, 2011

Troubleshooting ORA-1555 (Doc ID 1307334.1)

This article is intended to assist in finding tips and techniques to assist with finding solutions to ORA-1555 errors. The document will cover the following topics:

Concepts/Definitions
Diagnosing
Common Causes/Solutions

Concepts/Definitions

The ORA-1555 errors can happen when a query is unable to access enough undo to build
a copy of the data at the time the query started. Committed “versions” of blocks are
maintained along with newer uncommitted “versions” of those blocks so that queries can
access data as it existed in the database at the time of the query. These are referred to as
“consistent read” blocks and are maintained using Oracle undo management.

See Note 40689.1 - ORA-1555 "Snapshot too old" - Detailed Explanation for more about
these errors.

Diagnosing

Due to space limitations, it is not always feasible to keep undo blocks on hand for the life of the instance. Oracle Automatic Undo Management (AUM) helps to manage the time frame that undo blocks are stored. The time frame is the “retention” time for those blocks.

There are several ways to investigate the ORA-1555 error. In most cases, the error is a legitimate problem with getting to an undo block that has been overwritten due to the undo “retention” period having passed.

AUM will automatically tune up and down the “retention” period, but often space limitations or configuration of the undo tablespace will throttle back continuous increases to the “retention” period.

The error message is reported in the user session and often is not captured in the alert log. The user could see a message like

Using rollback segment functionality:
ORA-1555: snapshot too old (rollback segment too small)

or

Using AUM:
ORA-01555: snapshot too old: rollback segment number 9 with name "_SYSSMU9$" too small

If the error is captured in the alert.log, you would see something like

Tue May 26 16:16:57 2009
ORA-01555 caused by SQL statement below (SQL ID: 54yn3n36w24ft, Query Duration=922 sec, SCN: 0x0007.8a55f4e3)

Initial Investigation

Rollback Segments:
With Oracle 10g and later versions of Oracle, you can still use a Rollback Segments configuration. ORA-1555 errors in that environment still follow older guidelines as described in

Note 10579.1 - How many Rollback Segments to Have
Note 107085.1 - Tuning Rollback Segments
Note 69464.1 - Rollback Segment Configuration & Tips
Automatic Undo Management:
The database will be self tuning for undo when using Automatic Undo Management. This does not eliminate ORA-1555 completely, but does minimize ORA-1555 as long as there is adequate space in the undo tablespace and workloads tend to follow repeatable patterns. In some cases with periodic changes to workload (large data updates particularly with LOB data) the self tuning of undo can become aggressive and lead to undo issues.

Note 461480.1 - FAQ Automatic Undo Management (AUM) / System Managed Undo (SMU)
Note 135053.1 -How to Create a Database with Automatic Undo Management
Note 268870.1 - How to Shrink the datafile of Undo Tablespace
Note 231776.1 - How to switch a Database from Automatic Undo Management (AUM) back to using Rollback Segments
Note 296863.1 - How to Keep All UNDO Segments from Being Offlined in Oracle 10g - Fast Ramp-Up
LOB Issues:
Out-of-row LOB undo is maintained in the LOB segment. So the UNDO tablespace and undo retention is not associated with most LOB ORA-1555 issues. Instead the LOB column is created using either PCT_VERSION or RETENTION to manage how much space within blocks or time transpires before the LOB undo is overwritten. In environments with high updates, deletes on rows including LOBs, the chances of ORA-1555 on LOB undo is very high.

PCT_VERSION and RETENTION are not auto-tuned. To “tune” those configuration settings, you must change the values for PCT_VERSION or RETENTION. Changes to UNDO_RETENTION does not change LOB retention time frames.

Note 162345.1 - LOBS - Storage, Read-consistency and Rollback
Note 386341.1 - How to determine the actual size of the LOB segments and how to free the deleted/unused space above/below the HWM
Note 563470.1 – Lob retention not changing when undo_retention is changed
Note 422826.1 – How to identify LOB Segment Use PCTVERSION or RETENTION from Data Dictionary

Error Tracing

Undo error tracing can be done for normal undo operations using the following events:

NOTE: Normal undo operations will be indicated in the error message in that the error message includes a segment name like

…. name "_SYSSMU1$" too small

If the error doesn’t show a segment name

… name "" too small

the problem is often related to LOB undo
If using pfile:

event="10442 trace name context forever, level 10"

If using spfile:

Alter system set events '10442 trace name context forever, level 10';

Reproduce the ORA-1555 error and upload the trace file to Oracle Support.

LOB undo error tracing is more difficult. Set additional tracing events as follows:

Start Session 1
Alter session set events '10046 trace name context forever, level 12';
Reproduce the error
Exit Session 1

Start Session 2
Alter session set events '10051 trace name context forever, level 1';
Reproduce the error
Exit Session 2

Start Session
Alter session set events '1555 trace name errorstack forever, level 3';
Reproduce the error
Exit Session 3

Additional resources to review:
Note 846079.1 – LOBs and ORA-1555 troubleshooting
Note 253131.1 –Concurrent Writes May Corrupt LOB Segment When Using Auto Segment Space Management
Note 467872.1 – TROUBLESHOOTING GUIDE (TSG) – ORA-1555

V$UNDOSTAT Analysis

The V$UNDOSTAT view holds undo statistics for 10 minute intervals. This view
represents statistics across instances, thus each begin time, end time, and
statistics value will be a unique interval per instance.

This does not track undo related to LOB
Note 262066.1 – How To Size UNDO Tablespace For Automatic Undo Management
Note 1112363.1 – When Does Undo Used Space Become Available?
Note 240746.1 – 10g NEW FEATURE on AUTOMATIC UNDO RETENTION

Diagnostics Scripts

Refer to Note 746173.1 : Common Diagnostic Scripts for AUM problems
and Note 877613.1 : AUM Common Analysis/Diagnostic Scripts

Common Causes/Solutions

Using Rollback Segments functionality:

* Problem happening on SYSTEM tablespace that still uses old Rollback Segment functionality even when configured for Automatic Undo Management (AUM).

* There are not enough rollback segments to manage the undo needed for long running queries.

* Rollback Segments are too small and undo is overwritten before long running queries complete.
Reference:
Note 69464.1 – Rollback Segment Configuration & Tips
Note 10630.1 – ORA-1555: “Snapshot too old” – Overview
Note 862469.1 – ORA-604 & ORA-1555 Rollback Segment 0 with Name “System” Too Small

Using Automatic Undo Management (AUM):

* TUNED_UNDORETENTION in V$UNDOSTAT around the time of the error is lower than the QUERY DURATION indicated in the error message. This is a legitimate ORA-1555 and if queries are going to run for very long time frames, UNDO_RETENTION may need to be larger. Auto-tuned retention may not be able to keep up with the undo workload and staying within space limitations on the UNDO tablespace.

* LOB updates and/or deletes are frequent and a higher PCT_VERSION is required to provide enough space in the LOB Segment to accommodate the LOB undo. RETENTION on LOBs that are updated or deleted frequently can run into problems holding UNDO long enough for queries.

* QUERY DURATION shown in the error message is 30+ years and therefore, no amount of undo will satisfy the consistent read blocks.

Note 750195.1 – ORA-1555 Shows Unrealistic Query Duration (billions of seconds)

* QUERY DURATION shown in the error message is 0. NOTE: This has been filed as a bug on many release levels and has been very difficult to narrow down to a specific problem.

Note 761128.1 – ORA-1555 Error when Query Duration as 0 Seconds

* QUERY DURATION is lower than TUNED_UNDRETENTION. Undo header information can sometimes get overwritten or you could be seeing a bug.

* TUNED_UNDORETENTION stays very high and UNDO tablepsace continues to grow continuously or getting space errors.

Note 1112431.1 – Undo Remains Unexpired When Using Non-autoextensible Datafiles for Undo Tablespace.

Additional Reference:

Database Administration Community

Wednesday Nov 17, 2010

Oracle Support Master Note for Troubleshooting Managed Distributed Transactions (Doc ID 100664.1)

[Read More]

Oracle Support Master Note for Troubleshooting Streams Apply Errors ORA-1403, ORA-26787 or ORA-26786 (Doc ID 265201.1)

[Read More]

Oracle Support Master Note for Troubleshooting Streams Capture 'WAITING For REDO' or INITIALIZING (Doc ID 313279.1)

[Read More]

Oracle Support Master Note for Streams Setup Scripts (Doc ID 789445.1)

[Read More]

Oracle Support Master Note for Streams Performance Recommendations (Doc ID 335516.1)

[Read More]

Friday Jun 11, 2010

ORA-4030 Troubleshooting

QUICKLINK:

Note 4030.1 OERR: ORA 4030 (Known Issues)

Note 399497.1 FAQ ORA-4030
Note 1088087.1 : ORA-4030 Diagnostic Tools [Video]

Have you observed an ORA-0430 error reported in your alert log?

ORA-4030 errors are raised when memory or resources are requested from the Operating System and the Operating System is unable to provide the memory or resources. The arguments included with the ORA-4030 are often important to narrowing down the problem.

For more specifics on the ORA-4030 error and scenarios that lead to this problem, see Note 399497.1 FAQ ORA-4030.

Looking for the best way to diagnose?

There are several available diagnostic tools (error tracing, 11g Diagnosibility, OCM, Process Memory Guides, RDA, OSW, diagnostic scripts) that collectively can prove powerful for identifying the cause of the ORA-4030.

Error Tracing

The ORA-4030 error usually occurs on the client workstation and for this reason, a trace file and alert log entry may not have been generated on the server side. It may be necessary to add additional tracing events to get initial diagnostics on the problem.


To setup tracing to trap the ORA-4030, on the server use the following in SQLPlus:

alter system set events '4030 trace name heapdump level 536870917;name errorstack level 3';

Once the error reoccurs with the event set, you can turn off tracing using the following command in SQLPlus:

alter system set events '4030 trace name context off; name context off';

NOTE: See more diagnostics information to collect in
Note 399497.1


11g Diagnosibility

Starting with Oracle Database 11g Release 1, the Diagnosability infrastructure was introduced which places traces and core files into a location controlled by the DIAGNOSTIC_DEST initialization parameter when an incident, such as an ORA-4030 occurs. For earlier versions, the trace file will be written to either USER_DUMP_DEST (if the error was caught in a user process) or BACKGROUND_DUMP_DEST (if the error was caught in a background process like PMON or SMON). The trace file may contain vital information about what led to the error condition.

 
Note 443529.1 11g Quick Steps to Package and Send Critical Error Diagnostic Information
to Support[Video] 
 
Oracle Configuration Manager (OCM)

Oracle Configuration Manager (OCM) works with My Oracle Support to enable proactive support capability that helps you organize, collect and manage your Oracle configurations.

Oracle Configuration Manager Quick Start Guide
Note 548815.1: My Oracle Support Configuration Management FAQ

Note 250434.1: BULLETIN: Learn More About My Oracle Support Configuration Manager 

General Process Memory Guides

An ORA-4030 indicates a limit has been reached with respect to the Oracle process private memory allocation. Each Operating System will handle memory allocations with Oracle slightly differently.



Solaris
Note 163763.1
Linux
Note 341782.1
IBM AIX
Notes 166491.1 and 123754.1
HP
Note 166490.1
Windows
Note 225349.1, Note 373602.1, Note 231159.1, Note 269495.1Note 762031.1
Generic
Note 169706.1

RDA

The RDA report will show more detailed information about the database and Server Configuration.

Note 414966.1 RDA Documentation Index

Download RDA -- refer to Note 314422.1 Remote Diagnostic Agent (RDA) 4 - Getting Started

OS Watcher (OSW)

This tool is designed to gather Operating System side statistics to compare with the findings from the database. This is a key tool in cases where memory usage is higher than expected on the server while not experiencing ORA-4030 errors currently.

Reference more details on setup and usage in Note 301137.1 OS Watcher User Guide

Diagnostic Scripts

Refer to Note 1088087.1 : ORA-4030 Diagnostic Tools [Video]

Common Causes/Solutions

The ORA-4030 can occur for a variety of reasons. Some common causes are:

* OS Memory limit reached such as physical memory and/or swap/virtual paging. For instance, IBM AIX can experience ORA-4030 issues related to swap scenarios. See Note 740603.1 10.2.0.4 not using large pages on AIX for more on that problem. Also reference Note 188149.1 for pointers on 10g and stack size issues.

* OS limits reached (kernel or user shell limits) that limit overall, user level or process level memory

* OS limit on PGA memory size due to SGA attach address
Reference:
Note 1028623.6 SOLARIS How to Relocate the SGA

* Oracle internal limit on functionality like PL/SQL varrays or bulk collections. ORA-4030 errors will include arguments like "pl/sql vc2" "pmucalm coll" "pmuccst: adt/re". See
Coding Pointers for pointers on application design to get around these issues

* Application design causing limits to be reached

* Bug - space leaks, heap leaks

* PL/SQL related issues
          Reference
Note 1325100.1 PLSQL Procedure Causing ORA-4030 Errors

***For reference to the content in this blog, refer to Note.1088267.1 Master Note for Diagnosing ORA-4030

ORA-4031 Troubleshooting

QUICKLINKS:

Note 4031.1 OERR: ORA 4031 (Known Issues)

Note 396940.1 Troubleshooting and Diagnosing ORA-4031 Error
Diagnostic Tool for ORA-4031

Note 1087773.1 ORA-4031 Diagnostics Tools [Video]

Have you observed an ORA-04031 error reported in your alert log?

An ORA-4031 error is raised when memory is unavailable for use or reuse in the System Global Area (SGA). The error message will indicate the memory pool getting errors and high level information about what kind of allocation failed and how much memory was unavailable. The challenge with ORA-4031 analysis is that the error and associated trace is for a "victim" of the problem. The failing code ran into the memory limitation, but in almost all cases it was not part of the root problem.

Looking for the best way to diagnose?

When an ORA-4031 error occurs, a trace file is raised and noted in the alert log if the process experiencing the error is a background process. User processes may experience errors without reports in the alert log or traces generated. The V$SHARED_POOL_RESERVED view will show reports of misses for memory over the life of the database.


Diagnostics scripts are available in
Note 430473.1 to help in analysis of the problem. There is also a training video on using and interpreting the script data Note 1087773.1.

 
11g Diagnosibility

Starting with Oracle Database 11g Release 1, the Diagnosability infrastructure was introduced which places traces and core files into a location controlled by the DIAGNOSTIC_DEST initialization parameter when an incident, such as an ORA-4031 occurs. For earlier versions, the trace file will be written to either USER_DUMP_DEST (if the error was caught in a user process) or BACKGROUND_DUMP_DEST (if the error was caught in a background process like PMON or SMON). The trace file contains vital information about what led to the error condition.

Note 443529.1 11g Quick Steps to Package and Send Critical Error Diagnostic 
Information to Support[Video]
 
Oracle Configuration Manager (OCM)
Oracle Configuration Manager (OCM) works with My Oracle Support to enable proactive 
support capability that helps you organize, collect and manage your Oracle configurations.

Oracle Configuration Manager Quick Start Guide Note 548815.1: My Oracle Support Configuration Management FAQ

Note 250434.1: BULLETIN: Learn More About My Oracle Support Configuration Manager 

Common Causes/Solutions

The ORA-4031 can occur for many different reasons. Some possible causes are:

  • SGA components too small for workload
  • Auto-tuning issues
  • Fragmentation due to application design
  • Bug/leaks in memory allocations
For more on the 4031 and how this affects the SGA, see 
Note 396940.1 Troubleshooting and Diagnosing ORA-4031 Error
 
Because of the multiple potential causes, it is important to gather enough diagnostics so that 
an appropriate solution can be identified.  However, most commonly the cause is associated 
with configuration tuning.   Ensuring that MEMORY_TARGET or SGA_TARGET are large 
enough to accommodate workload can get around many scenarios. 
 
The default trace associated with the error provides very high level information about the 
memory problem and the "victim" that ran into the issue.   The data in the default trace is 
not going to point to the root cause of the problem.
 
When migrating from 9i to 10g and higher, it is necessary to increase the size of the Shared 
Pool due to changes in the basic design of the shared memory area.
 
Note 270935.1 Shared pool sizing in 10g

NOTE: Diagnostics on the errors should be investigated as close to the time of the error(s) 
as possible.  If you must restart a database, it is not feasible to diagnose the problem 
until the database has matured and/or started seeing the problems again.
 
Note 801787.1 Common Cause for ORA-4031 in 10gR2, Excess "KGH: NO ACCESS" 
Memory Allocation
 
***For reference to the content in this blog, refer to Note.1088239.1 Master 
Note for Diagnosing ORA-4031 

ORA-7445 Troubleshooting

[Read More]
About

News and Troubleshooting tips for Oracle Database and Enterprise Manager

Search

Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today