Oracle Support Master Note for Troubleshooting Streams Capture 'WAITING For REDO' or INITIALIZING (Doc ID 313279.1)

Master Note for Troubleshooting Streams Capture 'WAITING For REDO' or INITIALIZING (Doc ID 313279.1)

Copyright (c) 2010, Oracle Corporation. All Rights Reserved. 

In this Document

Purpose
  Last Review Date
  Instructions for the Reader
  Troubleshooting Details
  CAUSE
  1.Incorrect use of RMAN backup command or RMAN Issue 
  2.Logs Stored in FRA - Flash Recovery Area 
  3.Logs not registered in dba_registered_archived_log 
  4.Logs marked as corrupted in system.logmnr_log$ 
  5.required_checkpoint_scn issues 
  6.Capture Prevalidation of Logs in dba_registered_archived_log
  7.Logs deleted in error or damaged due to OS/Storage issues 
  SOLUTION 
  1.SCN reported in the Capture state related message - determine logs the SCN relates to 
  2.Check Capture tracefile to understand if mising logfile detailed
  3.Determine if there are logfiles unregistered
  4.Determine if there are logs marked as corrupt
  5.RMAN - Recovery Manager can be used to identify archivelog files which may be missing
  References 


Applies to:

Oracle Server - Enterprise Edition - Version: 10.2.0.1 to 10.2.0.4 - Release: 10.2 to 10.2
Oracle Server - Enterprise Edition - Version: 11.1.0.6 to 11.1.0.7   [Release: 11.1 to 11.1]
Oracle Server - Enterprise Edition - Version: 10.1.0.3 to 10.1.0.5   [Release: 10.1 to 10.1]
Information in this document applies to any platform.
Oracle Server Enterprise Edition - Version: 10.1.0.3 to 10.1.0.5
Oracle Server Enterprise Edition - Version: 10.2.0.1 to 10.2.0.4
Oracle Server Enterprise Edition - Version: 11.1.0.6 to 11.1.0.7 

Purpose

Streams capture process can exist in different states. The different states of the capture process is explained in the following note.

Note 471713.1 Different States of Capture & Apply Process

This article provides troubleshooting steps when capture is stuck in INITIALIZING/DICTIONARY INITIALIZATION, WAITING FOR DICTIONARY REDO: FIRST SCN <SCN> ; WAITING FOR REDO: LAST SCN MINED <SCN> states.

Last Review Date

October 25, 2010  

Instructions for the Reader

A Troubleshooting Guide is provided to assist in debugging a specific issue. When possible, diagnostic tools are included in the document to assist in troubleshooting.
 

Troubleshooting Details

Capture process does not advance and appears stuck in one of the following states :

select * from v$streams_capture shows :

1. INITALIZING / DICTIONARY INITIALIZATION (the state alternates between these states), or
2. WAITING FOR DICTIONARY REDO: FIRST SCN , or
3. WAITING FOR REDO: LAST SCN MINED , or
4. WAITING FOR DICTIONARY REDO: FILE

The nature of the problem could be :

- a missing logfile; or
- a logfile is not registered; or
- a logfile is corrupted; or
- Capture is verifying/prechecking logfiles

The process will remain in this state until the log is located, it is registered, the corruption is resolved in which case it will also be necessary to reregister the log or the Capture process has checked the logfiles on disk  

CAUSE

Common issues which can cause Capture to stop in this state are :


1.Incorrect use of RMAN backup command or RMAN Issue
 

Only certain RMAN commands are Streams aware.

There are also a number of RMAN related issues where logfiles can be deleted when Streams may still need to access these.

Refer to Note 421176.1 for more details.


2.Logs Stored in FRA - Flash Recovery Area
 

It is recommended that the FRA should be avoided with Streams.

Archive log files can be deleted and ORA-1291 reported when archive logs are located in a Flash Recovery Area (FRA). More details can be found with respect to FRA in Note 305648.1. This is a recognised behaviour and for this reason it is recommended that archive logs not be located in a FRA where a Capture process will mine from this location.

This is documented in the Oracle Streams Concepts and Administration 10g Release 2 (10.2) guide under the section Troublshooting Capture Problems subsection : Are Required Redo Log Files Missing.

It should also be pointed out that archive logs may be deleted from the FRA when the instance is shutdown and restarted. Those archive logfiles deleted logs should be reported in the alert.log file.


3.Logs not registered in dba_registered_archived_log
 

This issue is more likely at a downstream database. This could be due to transport issues from upstream to downstream database.


4.Logs marked as corrupted in system.logmnr_log$
 

Manual intervention will likely be necessary to correct this.


5.required_checkpoint_scn issues
 

Logs may need to be manually deleted which can then affect Capture restart if the required_checkpoint_scn (dba_capture view) is not advancing. This is a key value for Capture as discussed below.


6.Capture Prevalidation of Logs in dba_registered_archived_log

The Capture process has to access all the logs in dba_registered_archived_log before it can start mining changes. Thus, in the case of message :WAITING FOR DICTIONARY REDO: FILE, it may be that logs are being accessed but the state of the capture process has not been updated to indicate that it is prevalidating logs.


7.Logs deleted in error or damaged due to OS/Storage issues
 

The following is useful background which should be used to help understand what logs are relevant.

In a non RAC environment, all logs from the logfile containing the required_checkpoint_scn have to be present on disk.

In a RAC environment, all logs from all threads have to be present with respect to the required_checkpoint_scn; this is also applicable for downstream Capture environments where the primary / upstream database is RAC related but the downstream database may be single instance.

These logfiles are also added to the dba_registered_archived_log view.

Before Capture starts mining changes , all the logs in the dba_registered_archived_log have to be located. On downstream Capture environments, this can present confusion as the Capture process may appear to be stuck / not moving (due to large numbers of logs in the view) whilst in fact it is actually accessing log files. This is more of an issue for Downstream Capture environments where a Capture process may be down for some time but logs are still being registered by the RFS processes associated with the primary database.

The logfiles that a Capture process requires on restart as specified by the required_checkpoint_scn can be located from the Streams Healthcheck report output -

Note 273674.1.

Refer to the section headed :
'++ Minimum Archive Log Necessary to Restart Capture ++'

Note: the same information can also be obtain from Note 275323.1 or Note 290143.1.


SOLUTION 

In order to determine the problem logfile(s) and understand what is going on please provide to Oracle Suport Services the Streams Healthcheck output (Note 273674.1note /)) and the alert.log file(s) from Upstream (if relevant) and downstream databases.


Note: please provide details from all nodes. Additionally, please collect the following:


1.SCN reported in the Capture state related message - determine logs the SCN relates to

connect / as sysdba
set pagesize 1000
col first_scn format 999999999999999999
col next_scn format 999999999999999999
alter session set nls_date_format='dd-mon-yyyy hh24:mi:ss';
select source_database,thread#,sequence#,name,modified_time,first_scn,next_scn,dictionary_begin,dictionary_end from dba_registered_archived_log where between first_scn and next_scn;

Note : If the reported is the corresponds to the next_scn SCN value for a thread then it is likely the problem is with the next log in that thread; it may be missing or not registered.

 
2.Check Capture tracefile to understand if mising logfile detailed

If no trace file is present, identify the Capture and Reader processes from the following :

connect / as sysdba
set pagesize 1000

-- Capture
define capture_name=''
select p.spid Spid, 'C00'||c.capture#||' '||upper(lp.role) "Capture Process ", c.capture_name "Capture Name", p.pga_used_mem "PGA Memory Used", p.pga_alloc_mem "PGA Memory Allocated", p.pga_max_mem "PGA Maximum Memory"
from v$streams_capture c, v$logmnr_process lp, v$session s, v$process p
where c.logminer_id = lp.session_id
and lp.role in ('reader','preparer','builder')
and lp.sid = s.sid
and lp.serial# = s.serial#
and s.paddr = p.addr
and c.capture_name = '&capture_name'
union
select p.spid, 'C00'||c.capture#||' Coordinator', c.capture_name, p.pga_used_mem, p.pga_alloc_mem, p.pga_max_mem
from v$streams_capture c, v$session s, v$process p
where c.sid = s.sid
and c.serial# = s.serial#
and s.paddr = p.addr
and c.capture_name = '&capture_name'
order by 6,5;


Use the SPID (OS Process or thread id) for both the Capture process and Reader process as follows :

Note: it may be that a reader process does not exist at the point a particular state is being reported.

 

connect / as sysdba

-- Capture BOTH Process and Reader Process :
oradebug setospid
-- event 1349 / level 32768 decimal
oradebug event 1349 trace name context forever, level 32768
oradebug setospid  
-- event 1349 / level 32768 decimal 
oradebug event 1349 trace name context forever, level 32768 

-- wait 5 minutes, disable trace :

oradebug event 1349 trace name context off
oradebug setospid  
oradebug event 1349 trace name context off 


Check the background_dump_dest location for trace files containing the process ids (spids) from the above.  The trace files should identify the thread / sequence number and the problematic logfile which Capture is looking for.

Note: ensure that you have the latest patchset installed otherwise relevant trace information may not be present.


If it is still not obvious what the file being accessed is specify more complete event trace :

oradebug event 1349 trace name context forever, level 26148863
-- wait 5 minutes, disable trace :
oradebug event 1349 trace name context off


For example, in a Capture trace file you may see something like the following :

krvxalfs: Error 308 raised while opening log /bugmnt21/em/celclnx8/tar20093554.6/app/oracle/product/10.2.0.3/dbs/arch1_35_677241985.dbf.
i.e ora-308 indicates : missing logfile

 
3.Determine if there are logfiles unregistered

If there is a gap in the registered logfiles, the Capture process will stop.

Specify the name of your capture process below as well as thread number, i.e repeat for each thread used.

Note : The plsql block will need be be run more than once for RAC configurations.

connect / as sysdba
set serveroutput on
declare

-- amend the following as required.
user_thread number := 1; -- change as necessary
user_capture varchar2(100) := ''; --change as necessary

rcscn number;
minseq number;
maxseq number;
rseq number;
cnt number;

cursor mlsns is select user_thread thread, sequence# from (
select (lvl + minseq) sequence# from (select * from (select level lvl from dual connect by level <= maxseq - minseq))
minus
select sequence# from dba_registered_archived_log where thread#=user_thread)
order by 1;

begin
select required_checkpoint_scn into rcscn from dba_capture where capture_name = user_capture;
select min(sequence#) into rseq from dba_registered_archived_log where thread# = user_thread and consumer_name = user_capture and rcscn between first_scn and next_scn;
select min(sequence#) into minseq from dba_registered_archived_log where thread# = user_thread and consumer_name = user_capture and sequence# >= rseq;
select max(sequence#) into maxseq from dba_registered_archived_log where thread# = user_thread and consumer_name = user_capture;

select count(*) into cnt from (
select (lvl + minseq) sequence# from (select * from (select level lvl from dual connect by level <= maxseq - minseq))
minus
select sequence# from dba_registered_archived_log where thread#=user_thread) ;

if (cnt is not null) then
for rec in mlsns loop
dbms_output.put_line('Log not registered ' ||rec.sequence#);
end loop;
else
dbms_output.put_line('There are no gaps in registered logs.');
end if;

end;
/


For any logfile which are not registered, ensure that these are on disk and register these using :

alter database register or replace logical logfile '&fqfn' for '&capture_name';

4.Determine if there are logs marked as corrupt

A corrupt archive log file is indicated by the contents field of system.logmnr_log$ having value 16.

connect / as sysdba
set pagesize 1000
select * from system.logmnr_log$ where contents = 16 order by sequence#;

If any logs as marked as corrupted , proceed as follows :

- dump out the logfile; the dump pf the logfile will appear in the user_dump_dest location.

connect / as sysdba
oradebug setmypid
oradebug unlimit
alter session set events '1354 trace name context forever, level 32768';
alter session set events '1348 trace name context forever, level 1032';
alter system dump logfile '&fqfn';

It may be that logfile will need to be restored or copied once again from the primary.

One the above has been done the logfile can be reregistered as follows :

- take a backup/copy of system.logmnr_log$ :

connect / as sysdba
create table system.logmnr_log$_bak as select * from system.logmnr_log$;

- delete the entry in system.logmnr_log$ which is marked as corrupted :  

delete from system.logmnr_log$ where contents = 16 and thread# = and sequence# = ;
commit;

- reregister the logfile once again as detailed above.


5.RMAN - Recovery Manager can be used to identify archivelog files which may be missing
 

Note: this can only be used on upstream environments - not downstream databases. This is due to the fact that no controlfile entry is created for a registered logfile in the downstream database.

For Oracle 10.2, logfiles being mined by a Capture process are also logged to the alert.log. This information together with the Capture restart information from the Streams Healthcheck as well as logfile validation information reported by RMAN can be useful in identifying those logfile(s) that are missing.

The RMAN command 'crosscheck archivelog' command can check the physical presence of an archive log file.

Note: you do not have to be familiar with RMAN for backup and restore or have any experience of RMAN to use the commands detailed below.

The approach when using RMAN is as follows :

- First of all determine where a Capture process will start from it were restarted. This would be a convenient point to specify where to check the existence of logs from.

- Record the restart SCN , the Thread number and Log Sequence number.  

Note: The Thread number and the Log Sequence number should typically be visible in the logfile name. This may not be the case however and will depends on the setting of the parameter : log_archive_format.


Use either the restart Capture SCN (YYYYYYYYY) or the Log Sequence (ZZZZ) number from which to start the crosscheck validation. The thread number can also be specified on some commands (X); in order to understand if RAC or multiple threads are present issue :

select * from gv$instance;

Having performed the above, start RMAN :  

Note :

1. If you specify a logfile you will receive no RMAN command output to the screen. Therefore, whilst getting to grips with RMAN you would be advised not to specify a logfile. Please also use set echo on so that reevant commands can be seen in log file.

2. for problems relating to the running of RMAN , refer to the Backup and Recovery Advanced User's Guide.

 

-- start rman without specifying a logfile.
rman target /

-- start rman and specify a logfile to record command output
rman target / log /tmp/crosscheck.log

Useful commands to allow logs on disk to be crosschecked
on disk are as follows :

-- ensure command used is echoed to log file.
set echo on

- crosscheck archivelog all;
(crosscheck all logs on disk)

- crosscheck archivelog from sequence ZZZZ thread X;
(* crosscheck all the logs on disk from the sequence number)

- crosscheck archivelog from scn = YYYYYYYYYY;
(* crosscheck all the logs on disk from the scn)

- crosscheck archivelog like '%%';
(crosscheck all the logs on disk from a particular location)

- crosscheck archivelog like '%%';
(crosscheck that a specific logfile exists)

* above allow the starting point of a the crosscheck command to be specified. Therefore , these should be used together with restart information detailed in the Streams healthcheck.

Consider the following example where the Streams Healthcheck reports that the capture process will restart from :

Capture will restart from SCN 2381906 in the following file:
/la2/1_228_657120968.dbf (date/time)
/la1/1_227_657120968.dbf (date/time)

The following commands can be used in order to determine whether logs
are available on disk or otherwise.

rman target /

set echo on

- crosscheck archivelog all;

- crosscheck archivelog from sequence 227 thread 1;

- crosscheck archivelog from scn = 2381906;

- crosscheck archivelog like '%la1%';

- crosscheck archivelog like '%1_227_657120968.dbf%';

Checking RMAN output

If a logfile is not found, RMAN will report : 'validation failed for archived log' error.

Note: This is really only significant if a log is missing after the required_checkpoint_scn where a Capture process may restart from. If a logfile is found, RMAN will report :
'validation succeeded for archived log'

....
archive log filename=
/arch1_127_657120968.dbf recid=127 stamp=673799334
validation failed for archived log
archive log filename=
/arch1_128_657120968.dbf recid=128 stamp=673806193
validation succeeded for archived log
...

To exit RMAN, type :

exit

Search through the logfile for 'failed' keyword. This will
hopefully identify the problematic logfile.
i.e (on Unix) grep -i failed /tmp/crosscheck.log

If all logfiles are validated and RMAN performed the crosscheck validation from the location that Capture would restart from , then the logs missing may likely come after the last logs reported by RMAN. Further inspection may be required.  

References

BUG:5770059 - REQUIRED_CHECKPOINT_SCN NOT MOVING FORWARD
BUG:6154377 - NO LOGMNR CHECKPOINTS ARE BEING EXECUTED
Comments:

Post a Comment:
  • HTML Syntax: NOT allowed
About

News and Troubleshooting tips for Oracle Database and Enterprise Manager

Search

Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today