一、问题描述:
在只读备库adgstby上启动connection pool并通过easy naming method通过pool连接备库:
SQL> startup
ORACLE instance started.
Total System Global Area 1369579520 bytes
Fixed Size 2253104 bytes
Variable Size 419434192 bytes
Database Buffers 939524096 bytes
Redo Buffers 8368128 bytes
Database mounted.
Database opened.
SQL> alter database recover managed standby database using current logfile disconnect from session;
Database altered.
SQL> exec dbms_connection_pool.start_pool();
PL/SQL procedure successfully completed.
SQL> exit
[oracle@adg ~]$ sqlplus hr/hr@adg.fsm.com:1521/adgstby:POOLED
SQL*Plus: Release 11.2.0.4.0 Production on Thu Aug 10 08:00:50 2017
Copyright (c) 1982, 2013, Oracle. All rights reserved.
ERROR:
ORA-56600: DRCP: Illegal call [First call inconsistency]
Errors in file :
OCI-21500: internal error code, arguments: [kpplcSyncState:Error in sync], [56600], [], [], [], [], [], []
@@@33333333Errors in file :
OCI-21500: internal error code, arguments: [kgepop: no error frame to pop to], [], [], [], [], [], [], []
OCI-21500: internal error code, arguments: [kpplcSyncState:Error in sync], [56600], [], [], [], [], [], []
这个错误在11.2.0.4 ADG备库上可以重现。
在查了与此相关的bug发现都没有得到确认之后。进入了分析步骤。
二、分析过程:
1、起初只想到这是个连接问题,于是认为追踪的手段errorstack肯定不行,只能sqlnet client trace看看
(4198659808) [14-AUG-2017 01:20:31:440] nsbasic_bsd: packet dump
(4198659808) [14-AUG-2017 01:20:31:440] nsbasic_bsd: 01 2A 00 00 06 00 00 00 |.*……|
(4198659808) [14-AUG-2017 01:20:31:440] nsbasic_bsd: 00 00 03 5E 06 61 80 00 |…^.a..|
(4198659808) [14-AUG-2017 01:20:31:440] nsbasic_bsd: 00 00 00 00 00 FE FF FF |……..|
(4198659808) [14-AUG-2017 01:20:31:440] nsbasic_bsd: FF FF FF FF FF 50 00 00 |…..P..|
(4198659808) [14-AUG-2017 01:20:31:440] nsbasic_bsd: 00 00 00 00 00 FE FF FF |……..|
(4198659808) [14-AUG-2017 01:20:31:440] nsbasic_bsd: FF FF FF FF FF 0D 00 00 |……..|
(4198659808) [14-AUG-2017 01:20:31:440] nsbasic_bsd: 00 00 00 00 00 FE FF FF |……..|
(4198659808) [14-AUG-2017 01:20:31:440] nsbasic_bsd: FF FF FF FF FF FE FF FF |……..|
(4198659808) [14-AUG-2017 01:20:31:440] nsbasic_bsd: FF FF FF FF FF 00 00 00 |……..|
(4198659808) [14-AUG-2017 01:20:31:440] nsbasic_bsd: 00 01 00 00 00 00 00 00 |……..|
(4198659808) [14-AUG-2017 01:20:31:440] nsbasic_bsd: 00 00 00 00 00 00 00 00 |……..|
(4198659808) [14-AUG-2017 01:20:31:440] nsbasic_bsd: 00 00 00 00 00 00 00 00 |……..|
(4198659808) [14-AUG-2017 01:20:31:440] nsbasic_bsd: 00 00 00 00 00 00 00 00 |……..|
(4198659808) [14-AUG-2017 01:20:31:440] nsbasic_bsd: 00 00 00 00 00 FE FF FF |……..|
(4198659808) [14-AUG-2017 01:20:31:440] nsbasic_bsd: FF FF FF FF FF 00 00 00 |……..|
(4198659808) [14-AUG-2017 01:20:31:440] nsbasic_bsd: 00 00 00 00 00 FE FF FF |……..|
(4198659808) [14-AUG-2017 01:20:31:440] nsbasic_bsd: FF FF FF FF FF FE FF FF |……..|
(4198659808) [14-AUG-2017 01:20:31:440] nsbasic_bsd: FF FF FF FF FF C8 31 5C |……1\|
(4198659808) [14-AUG-2017 01:20:31:441] nsbasic_bsd: 02 00 00 00 00 00 00 00 |……..|
(4198659808) [14-AUG-2017 01:20:31:441] nsbasic_bsd: 00 00 00 00 00 FE FF FF |……..|
(4198659808) [14-AUG-2017 01:20:31:441] nsbasic_bsd: FF FF FF FF FF FE FF FF |……..|
(4198659808) [14-AUG-2017 01:20:31:441] nsbasic_bsd: FF FF FF FF FF 00 00 00 |……..|
(4198659808) [14-AUG-2017 01:20:31:441] nsbasic_bsd: 00 00 00 00 00 00 00 00 |……..|
(4198659808) [14-AUG-2017 01:20:31:441] nsbasic_bsd: 00 00 00 00 00 00 00 00 |……..|
(4198659808) [14-AUG-2017 01:20:31:441] nsbasic_bsd: 00 00 00 00 00 00 00 00 |……..|
(4198659808) [14-AUG-2017 01:20:31:441] nsbasic_bsd: 00 00 00 00 00 28 53 45 |…..(SE|
(4198659808) [14-AUG-2017 01:20:31:441] nsbasic_bsd: 4C 45 43 54 20 4E 55 4C |LECT.NUL|
(4198659808) [14-AUG-2017 01:20:31:441] nsbasic_bsd: 4C 20 46 52 4F 4D 20 44 |L.FROM.D|
(4198659808) [14-AUG-2017 01:20:31:442] nsbasic_bsd: 55 41 4C 20 46 4F 52 20 |UAL.FOR.|
(4198659808) [14-AUG-2017 01:20:31:443] nsbasic_bsd: 55 50 44 41 54 45 20 4E |UPDATE.N|<====这里显示drcp执行了这个sql:select null from dual for update nowait
(4198659808) [14-AUG-2017 01:20:31:443] nsbasic_bsd: 4F 57 41 49 54 00 01 00 |OWAIT…|
(4198659808) [14-AUG-2017 01:20:31:443] nsbasic_bsd: 00 00 00 00 00 00 00 00 |……..|
(4198659808) [14-AUG-2017 01:20:31:443] nsbasic_bsd: 00 00 00 00 00 00 00 00 |……..|
(4198659808) [14-AUG-2017 01:20:31:443] nsbasic_bsd: 00 00 00 00 00 00 00 00 |……..|
(4198659808) [14-AUG-2017 01:20:31:443] nsbasic_bsd: 00 00 01 00 00 00 00 00 |……..|
(4198659808) [14-AUG-2017 01:20:31:443] nsbasic_bsd: 00 00 00 00 00 00 00 00 |……..|
(4198659808) [14-AUG-2017 01:20:31:443] nsbasic_bsd: 00 00 00 00 00 00 00 00 |……..|
(4198659808) [14-AUG-2017 01:20:31:443] nsbasic_bsd: 00 00 |.. |
…
(4198659808) [14-AUG-2017 01:20:31:445] nioqrc: entry
(4198659808) [14-AUG-2017 01:20:31:445] nsbasic_brc: entry: oln/tot=0
(4198659808) [14-AUG-2017 01:20:31:445] nttfprd: entry
(4198659808) [14-AUG-2017 01:20:31:445] nttfprd: socket 8 had bytes read=244
(4198659808) [14-AUG-2017 01:20:31:445] nttfprd: exit
(4198659808) [14-AUG-2017 01:20:31:445] nsbasic_brc: type=6, plen=244
(4198659808) [14-AUG-2017 01:20:31:445] nsbasic_brc: what=1, tot =244
(4198659808) [14-AUG-2017 01:20:31:445] nsbasic_brc: packet dump
(4198659808) [14-AUG-2017 01:20:31:445] nsbasic_brc: 00 F4 00 00 06 00 00 00 |……..|
(4198659808) [14-AUG-2017 01:20:31:445] nsbasic_brc: 00 00 04 01 00 00 00 00 |……..|
(4198659808) [14-AUG-2017 01:20:31:445] nsbasic_brc: 00 01 00 00 00 00 5C 02 |……\.|
(4198659808) [14-AUG-2017 01:20:31:446] nsbasic_brc: 00 00 00 00 00 00 00 00 |……..|
(4198659808) [14-AUG-2017 01:20:31:446] nsbasic_brc: 00 00 00 00 00 00 00 00 |……..|
(4198659808) [14-AUG-2017 01:20:31:446] nsbasic_brc: 00 00 00 00 00 00 00 00 |……..|
(4198659808) [14-AUG-2017 01:20:31:446] nsbasic_brc: 00 00 00 00 00 00 00 00 |……..|
(4198659808) [14-AUG-2017 01:20:31:446] nsbasic_brc: 00 00 00 06 00 00 00 00 |……..|
(4198659808) [14-AUG-2017 01:20:31:446] nsbasic_brc: 00 00 00 00 00 00 00 00 |……..|
(4198659808) [14-AUG-2017 01:20:31:446] nsbasic_brc: 00 00 00 00 00 00 00 00 |……..|
(4198659808) [14-AUG-2017 01:20:31:446] nsbasic_brc: 00 00 20 3B 12 0C 00 00 |…;….|
(4198659808) [14-AUG-2017 01:20:31:446] nsbasic_brc: 00 00 00 00 00 00 00 00 |……..|
(4198659808) [14-AUG-2017 01:20:31:446] nsbasic_brc: 00 00 00 00 00 00 00 00 |……..|
(4198659808) [14-AUG-2017 01:20:31:446] nsbasic_brc: 00 00 00 00 00 00 00 00 |……..|
(4198659808) [14-AUG-2017 01:20:31:446] nsbasic_brc: 00 00 00 00 00 00 00 00 |……..|
(4198659808) [14-AUG-2017 01:20:31:446] nsbasic_brc: 00 00 00 00 00 00 00 00 |……..|
(4198659808) [14-AUG-2017 01:20:31:446] nsbasic_brc: 00 00 00 00 00 00 00 00 |……..|
(4198659808) [14-AUG-2017 01:20:31:446] nsbasic_brc: 00 00 00 00 00 00 00 00 |……..|
(4198659808) [14-AUG-2017 01:20:31:446] nsbasic_brc: 00 00 61 4F 52 41 2D 30 |..aORA-0|
(4198659808) [14-AUG-2017 01:20:31:446] nsbasic_brc: 30 36 30 34 3A 20 65 72 |0604:.er|
(4198659808) [14-AUG-2017 01:20:31:446] nsbasic_brc: 72 6F 72 20 6F 63 63 75 |ror.occu|
(4198659808) [14-AUG-2017 01:20:31:446] nsbasic_brc: 72 72 65 64 20 61 74 20 |rred.at.|
(4198659808) [14-AUG-2017 01:20:31:446] nsbasic_brc: 72 65 63 75 72 73 69 76 |recursiv|
(4198659808) [14-AUG-2017 01:20:31:446] nsbasic_brc: 65 20 53 51 4C 20 6C 65 |e.SQL.le|
(4198659808) [14-AUG-2017 01:20:31:446] nsbasic_brc: 76 65 6C 20 31 0A 4F 52 |vel.1.OR|
(4198659808) [14-AUG-2017 01:20:31:446] nsbasic_brc: 41 2D 31 36 30 30 30 3A |A-16000:|<====
(4198659808) [14-AUG-2017 01:20:31:446] nsbasic_brc: 20 64 61 74 61 62 61 73 |.databas|
(4198659808) [14-AUG-2017 01:20:31:446] nsbasic_brc: 65 20 6F 70 65 6E 20 66 |e.open.f|
(4198659808) [14-AUG-2017 01:20:31:446] nsbasic_brc: 6F 72 20 72 65 61 64 2D |or.read-|
(4198659808) [14-AUG-2017 01:20:31:446] nsbasic_brc: 6F 6E 6C 79 20 61 63 63 |only.acc|
(4198659808) [14-AUG-2017 01:20:31:446] nsbasic_brc: 65 73 73 0A |ess. |
(4198659808) [14-AUG-2017 01:20:31:446] nsbasic_brc: exit: oln=0, dln=234, tot=244, rc=0
通过如上sqlnet client trace可以看到是递归sql: select null from dual for update在只读备库上执行,报了ORA-16000, 连续报错五次之后,连接断开。这是个DML语句,在只读备库上遇到ORA-16000是意料之中的事情。
2、跟drcp方面沟通后,确认drcp是有防止在只读备库上执行dml操作的功能的,即上述select null from dual for update并不是导致异常断开的原因。而sqlnet client trace看到的也只是客户端跟数据库服务器之间的交互,看不到更多的东西。
3、仔细想了之后,发现其实我们还可以通过对ORA-16000做errorstack追踪下,因为sqlnet trace已经说明它还是连上过备库的,只是触发了ORA-16000之后异常断开而已。
通过对备库设置alter system set events=’16000 trace name errorstack forever,level 3′;
并重现连接问题之后,得到了如下trace:
Unix process pid: 3311, image: oracle@adg.fsm.com (L001)
*** 2017-08-16 06:54:49.441
*** SESSION ID:(30.65) 2017-08-16 06:54:49.441
*** CLIENT ID:() 2017-08-16 06:54:49.441
*** SERVICE NAME:() 2017-08-16 06:54:49.441
*** MODULE NAME:() 2017-08-16 06:54:49.441
*** ACTION NAME:() 2017-08-16 06:54:49.441
dbkedDefDump(): Starting a non-incident diagnostic dump (flags=0x0, level=4, mask=0x0)
—– Error Stack Dump —–
ORA-16000: database open for read-only access
—– Current SQL Statement for this session (sql_id=4m7m0t6fjcs5x) —–
update seq$ set increment$=:2,minvalue=:3,maxvalue=:4,cycle#=:5,order$=:6,cache=:7,highwater=:8,audit$=:9,flags=:10 where obj#=:1
—– Call Stack Trace —–
skdstdst <- ksedst1 <- ksedst <- dbkedDefDump <- ksedmp
<- dbkdaKsdActDriver <- dbgdaExecuteAction <- dbgdaRunAction <- dbgdRunActions <- dbgdProcessEventAct
<- ions <- dbgdChkEventKgErr <- dbkdChkEventRdbmsEr <- ksfpec <- dbgePostErrorKGE
<- 1137 <- dbkePostKGE_kgsf <- kgeselv <- ksesecl0 <- opiexe
<- opiall0 <- opikpr <- opiodr <- rpidrus <- skgmstack
<- rpiswu2 <- kprball <- kqdsnu <- kqrcmt <- ktcCommitTxn
<- kdnwor <- kdnAllocN <- kdnnxt <- kkdlses <- kzaini
<- kpplsCreateSession <- kpplsFixUpSession <- kpplsSessGet <- opiodr <- ttcpip
<- opitsk <- opiino <- opiodr <- opirip <- opidrv
<- sou2o <- opimai_real <- ssthrdmain <- main <- libc_start_main
<- start
从中发现引起ORA-16000报错的实际上是这个递归sql:
update seq$ set increment$=:2,minvalue=:3,maxvalue=:4,cycle#=:5,order$=:6,cache=:7,highwater=:8,audit$=:9,flags=:10 where obj#=:1
注意到其中的audit$=:9,这个sql像是db audit执行的,于是有了下面的解决方案。
三、解决方案
oracle@adg trace]$ sqlplus / as sysdba
SQL*Plus: Release 11.2.0.4.0 Production on Wed Aug 16 07:05:24 2017
Copyright (c) 1982, 2013, Oracle. All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 – 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
SQL> alter system set audit_trail=none scope=spfile;
System altered.
SQL> shutdown immediate
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> startup
ORACLE instance started.
Total System Global Area 1369579520 bytes
Fixed Size 2253104 bytes
Variable Size 419434192 bytes
Database Buffers 939524096 bytes
Redo Buffers 8368128 bytes
Database mounted.
Database opened.
SQL> alter database recover managed standby database using current logfile disconnect from session;
Database altered.
SQL> exec dbms_connection_pool.start_pool();
PL/SQL procedure successfully completed.
SQL> exit
Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 – 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
[oracle@adg trace]$ sqlplus hr/hr@adg.fsm.com:1521/adgstby:POOLED
SQL*Plus: Release 11.2.0.4.0 Production on Wed Aug 16 07:08:21 2017
Copyright (c) 1982, 2013, Oracle. All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 – 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
SQL> select * from tab;
no rows selected
SQL>