Unbreakable Oracle 10g Release 2 : What if you have ORA-600 kcratr1_lastbwr ?

ORA-600_kcratr1 <script src="http://www.google-analytics.com/urchin.js" type="text/javascript"> </script> <script type="text/javascript"> _uacct = "UA-917120-1"; urchinTracker(); </script>
This an interesting story that happened yesterday on one of our customer site. An engineer powered off the wrong rack of equipment containing a Sun Fire X4600 running Oracle 10g Release 2.  Almost no transactions were performed at time so when the system came up the customer expected the database to be up and running very quickly.

In reality this is what happened :

Completed: ALTER DATABASE   MOUNT
Tue Nov  7 11:19:42 2006
ALTER DATABASE OPEN
Tue Nov  7 11:19:42 2006
Beginning crash recovery of 1 threads
 parallel recovery started with 16 processes
Tue Nov  7 11:19:44 2006
Started redo scan
Tue Nov  7 11:19:44 2006
Errors in file /xxx/oracle/oracle/product/10.2.0/db_1/admin/xxx/udump/xxx_ora_947.trc:
ORA-00600: internal error code, arguments: [kcratr1_lastbwr], [], [], [], [], [], [], []
Tue Nov  7 11:19:44 2006
Aborting crash recovery due to error 600
Tue Nov  7 11:19:44 2006
Errors in file /xxx/oracle/oracle/product/10.2.0/db_1/admin/xxxtest/udump/xxxtest_ora_947.trc:
ORA-00600: internal error code, arguments: [kcratr1_lastbwr], [], [], [], [], [], [], []
ORA-600 signalled during: ALTER DATABASE OPEN...

Not too pretty ! Checking the ASM configuration and the IO subsystem showed nothing wrong. So what to do if you do not have a backup handy ?

Well, here is the idea .... what would we do if we had a backup that was inconsistent ?
The recover database command will start an Oracle process which will roll forward all transactions stored in the restored archived logs necessary to make the database consistent again. The recovery process must run up to a point that corresponds with the time just before the error occurred after which the log sequence must be reset to prevent any further system changes from being applied to the database.

So we tried :

startup mount

ALTER DATABASE   MOUNT
Tue Nov  7 11:54:03 2006
Starting background process ASMB
ASMB started with pid=61, OS id=1070
Starting background process RBAL
RBAL started with pid=67, OS id=1074
Tue Nov  7 11:54:13 2006
SUCCESS: diskgroup xxxTESTDATA was mounted
Tue Nov  7 11:54:17 2006
Setting recovery target incarnation to 2
Tue Nov  7 11:54:17 2006
Successful mount of redo thread 1, with mount id 2364224219
Tue Nov  7 11:54:17 2006
Database mounted in Exclusive Mode
Completed: ALTER DATABASE   MOUNT
Tue Nov  7 11:54:32 2006

recover database


ALTER DATABASE RECOVER  database 
Tue Nov  7 11:54:32 2006
Media Recovery Start
 parallel recovery started with 16 processes
Tue Nov  7 11:54:33 2006
Recovery of Online Redo Log: Thread 1 Group 3 Seq 4 Reading mem 0
  Mem# 0 errs 0: +xxxTESTDATA/xxxtest/onlinelog/group_3.263.605819131
Tue Nov  7 11:59:25 2006
Media Recovery Complete (xxxtest)
Tue Nov  7 11:59:27 2006
Completed: ALTER DATABASE RECOVER  database 


alter database open

Tue Nov  7 12:03:01 2006
alter database open
Tue Nov  7 12:03:01 2006
Beginning crash recovery of 1 threads
 parallel recovery started with 16 processes
Tue Nov  7 12:03:01 2006
Started redo scan
Tue Nov  7 12:03:01 2006
Completed redo scan
 273 redo blocks read, 0 data blocks need recovery
Tue Nov  7 12:03:01 2006
Started redo application at
 Thread 1: logseq 4, block 12858574
Tue Nov  7 12:03:01 2006
Recovery of Online Redo Log: Thread 1 Group 3 Seq 4 Reading mem 0
  Mem# 0 errs 0: +xxxTESTDATA/xxxtest/onlinelog/group_3.263.605819131
Tue Nov  7 12:03:01 2006
Completed redo application
Tue Nov  7 12:03:01 2006
Completed crash recovery at
 Thread 1: logseq 4, block 12858847, scn 824040
 0 data blocks read, 0 data blocks written, 273 redo blocks read
Tue Nov  7 12:03:02 2006
Thread 1 advanced to log sequence 5
Thread 1 opened at log sequence 5
  Current log# 1 seq# 5 mem# 0: +xxxTESTDATA/xxxtest/onlinelog/group_1.261.605819081
Successful open of redo thread 1
Tue Nov  7 12:03:02 2006
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Tue Nov  7 12:03:02 2006
SMON: enabling cache recovery
Tue Nov  7 12:03:03 2006
Successfully onlined Undo Tablespace 1.
Tue Nov  7 12:03:03 2006
SMON: enabling tx recovery
Tue Nov  7 12:03:03 2006
Database Characterset is UTF8
replication_dependency_tracking turned off (no async multimaster replication found)
Starting background process QMNC
QMNC started with pid=56, OS id=1128
Tue Nov  7 12:03:05 2006
Completed: alter database open



And we are up and running ! The real thing that Oracle should work on is the quality and clarity of their error messages.
At this point this is quite poor ...

 Unbreakable database, maybe. Automatic (and simple) , not yet.

Comments:

I was told that to have highest availability, one needs Oracle RAC. But then I was told that when upgrading Oracle RAC, all nodes must be shutdown. Is that really true? Thanks. e1, GSE Asia South, Sun.

Posted by iwan on November 08, 2006 at 09:36 AM PST #

Ah, the ora-600....

First off, any time you see an ora-600, its a bug and should be treated a such. All support, open a tar and pester them. You mentioned ASM.... unless you are using RAC, all of the folks I've talked to agree: do not bother with ASM since it yet another thing that can go wrong

As for RAC.... yes, if you need the best availability, you need a two node rack cluster. Dont go past 3 or 4 nodes unless you tune your apps for RAC (in general, it does not scale nearly as well as the marketing folks like you to believe unless. Some OLTP apps are ok, but others are not so load testing is in order. There is a reason Oracle still uses 25K's in a rack cluster rather than dozens of little 1U boxes)

Of course, database maintenance still requires downtime, but they are working toward limiting the amount of time that the system is down due to an upgrade.

Posted by john on November 08, 2006 at 10:14 AM PST #

YES iwan! One need oracle RAC!

Posted by Egitim on December 09, 2010 at 11:00 PM PST #

Post a Comment:
  • HTML Syntax: NOT allowed
About

mrbenchmark

Search

Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today
News
Blogroll
deepdive

No bookmarks in folder