RMAN puzzle: database reincarnation is not in sync with catalog

A very puzzling situation..

Something very puzzling happened with the production database of our customer. Although the client had done a complete, successful backup of PROD in the morning, a subsequent backup attempt of PROD in the afternoon gave this error:

Starting backup at 04172009

released channel: ch00
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of backup command at 04/17/2009 13:25:45
RMAN-03014: implicit resync of recovery catalog failed
RMAN-06004: ORACLE error from recovery catalog database:
RMAN-20011: target database incarnation is not current in recovery catalog

To counter this, un-registration and re-registered of the database was done so that both DB incarnation and catalog reincarnation will come in sync.  Although this had the risk of invalidating the previous PROD backups, this appeared to solve the problem and then the PROD full backups completed successfully. 

However, the very next day, the problem re-appeared!  We asked the client if any changes were done from their side? The firm answer was “No”. This added to the mystery of the problem. 

First impressions..

What could have caused the same error to re-appear all of a sudden? We tried running the RMAN crosscheck command, but all of them were giving the same error. A list incarnation of database was not giving any output. It was quite puzzling.

After some research, we came across Note 412113.1 on http://metalink.oracle.com, which pointed towards the possibility of some non-prod instance getting registered with RMAN catalog with the same DBID as production’s.

In addition, Notes 1076158.6 and 1061206.6 were also *probably* relevant, but the relevance was not striking.

On second thoughts..

It came to light that the client had configured netbackup agent in another server on the same day, and that server had a database, which was the Disaster Recovery database last time. So, there might be some related conflict due to this condition. Crazily enough, the other database was also called PROD!

This is the output of DBIDs and incarnations in rman catalog. There is only one DBID registered for PROD,  but the question was which PROD instance was it referring to?

SQL> select name,DBID,RESETLOGS_TIME from rc_database order by 1;

NAME DBID       RESETLOG
---- ---------- --------
PROD 4290406901 03272008
RCAT 356098101  03012006

SQL> select db_key,DBID,name,current_incarnation from rc_database_incarnation order by 1;

DB_KEY DBID NAME CUR
---------- ---------- -------- ---
23531 356098101 RCAT YES
623638 4290406901 PROD NO
623638 4290406901 PROD YES
623638 4290406901 PROD NO

Solution

So, in reality, what had happened was that the reincarnation of DR PROD instance had been setup in the rman catalog. To undo this, we reset the database incarnation to the catalog using the RESET INCARNATION TO DBKEY and then did a manual resync of catalog using the rman RESYNC CATALOG command.

SQL> select db_key,dbid,name,CURRENT_INCARNATION from 
rc_database_incarnation order by 1;

DB_KEY DBID NAME CUR
---------- ---------- -------- ---
23531 356098101 RCAT YES
623638 4290406901 PROD NO
623638 4290406901 PROD NO
623638 4290406901 PROD YES

We did not re-register any instance in catalog. After this, the LIST INCARNATION OF DATABASE command started working fine as we could list the existing backups and other details. A new full online backup of PROD was started and also completed successfully. 

As an added precaution, the instance name on the other server was changed from "PROD" to "DRTEST".

Comments:

Post a Comment:
  • HTML Syntax: NOT allowed
About

bocadmin_ww

Search

Archives
« July 2015
SunMonTueWedThuFriSat
   
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
 
       
Today