X

An Oracle blog about Oracle Cloud

Recent Posts

The no-so-simple reason of the ORA-29771 on 12c Data Guard (and 11g too)

"My Customer" has a ADG db that is used for reporting. Upgraded from 11gR2 to 12c. Some queries fail with an ORA-29771 process USER (OSID 335621) blocks LGWR (OSID 314841) for more than 70 seconds The underlying problem has been there forever, it was a time bomb. here is why First what is the same in 11g and 12c A SQL when it parses grabs a shared lock in DBINSTANCE state object. When the SCN is advanced by the recovery in an ADG the LGWR locks the DBINSTANCE state object in Exclusive mode. As a consequence the LGWR can block the parsing of a SQL which is bad but the parsing of a SQL can block the LGWR which is really bad.  and here comes Bug 17018214 - ORA-600 [krdrsb_end_qscn_2] ORA-4021 in Active Dataguard Standby Database with fix for bug 16717701 present - Instance may crash (Doc ID 17018214.8) Before the fix for bug 17018214 (pre-11.2.0.4), when a query blocked the LGWR for so long an ora-600 use to be raised and the instance crashed. Now with the fix for bug 17018214 (11.2.0.4 & 12.1.0.2) then the query is aborted with the ORA-29771. So the longer parsing the longer the chances to raise the ora-29771. Then there are 2 strategies to prevent the error. A) Prevent the timeout  B) Reduce the parsing time of the SQL For A) The fix adds the parameter _adg_parselock_timeout The value of the event encodes two things:    Timeout - this is the amount of time that LGWR will wait for an X lock before signaling a timeout error and retrying.    Sleep duration - this is the amount of time that LGWR will sleep for after having timed out. For B) there are as many of reasons why the parsing is slow as SQLs. Now comes 12c For A) "My Customer" tried 550 ms but still the error is raised. There are other recommendations in note:17018214.8 like  LGWR will wait for up to 30 seconds to acquire the X-lock.   If it fails, then it will retry after sleeping for 100 milliseconds.          value = (S * 1048576) + T = (100 * 1048576) + 30000 = 104887600    where S = sleep duration in milliseconds        T = timeout in milliseconds but still it is possible a SQL would parse longer than any timeout. For B) The parsing delay was caused by dynamic sampling queries. The DS queries are doing single block reads. I have seen this many times so I had a hunch that many of these single block reads were over BitMap Blocks (BMB) and checked the segments for out-of-sync HWM using Note:726653.1 Dbms_Space_Admin.Assm_Segment_Synchwm Definition  and there it was. "My Customer" fixed the HWM following Note:726653.1 and the parsing was fast and no more error but still it was not enough as they keep doing loads and the error eventually came back. It is easier to identify if the blocks are being read are bitmap blocks if they are of class 8,9,10,11, or 12 listed (usually p3) in the waitevent that read it like "gc current block 2-way" or WAITEVENT: "read by other session" Reference Note (Doc ID 732891.1) So we are back to 11g It is expected behavior *in all post-9i RDBMS versions* that the Low and High HWMs to become separated due to direct-data loads,  used to improve performance.   This introduces unformatted blocks between the Low and High HWMs for a segment. A Table scan on ASSM segments can be slowed if there are a lot of unformatted blocks between the Low and High HWMs for a segment.  When Low and High HWMs are not in sync, then it becomes necessary to build a more detailed extent map at SELECT time,  which could increase the number of CR gets on L1/L2/L3 blocks by a factor of how many extents are between the HWMs. As unformatted blocks continue to grow and grow, due to the continued direct loads  this can cause the fetch for these blocks to start to run too long, which can result in LGWR blocks. And here comes Bug 4067168  - Excessive CR blocks / Slow table scan for ASSM segments with UNIFORM extent allocation (DocID: 4067168.8) The enhancement of bug 4067168 (covered in this document Note:4067168.8 for all post-9i RDBMS versions), introduces code to keep the high and low HWMs closer in sync for direct load operations:  The high and low HWMs should ALWAYS be in sync when the extent management is AUTOALLOCATE.  The high and low HWMs should be reasonably in sync when the extent management is UNIFORM,     and an approximate ratio of unformatted blocks/allocated blocks is always  < _HWM_SYNC_THRESHOLD % (default 10%) The problem with "My Customer" is this ratio.  The larger the table the larger the number of blocks that percentage represents and eventually reaches the threshold where  it is too much burden to read the BMBs and the data blocks individually.  That is why for "My Customer" the problem did not happened immediately that they upgrade to 12c.  It took time while the table partitions kept growing. it is not a common practice to set because for most customers that percentage is small enough that the cost of reading  the Bit Map Blocks and data blocks has not reached a level that they even notice or they do other DMLs that format the blocks later. This enhancement also adds a procedure DBMS_SPACE_ADMIN.ASSM_SEGMENT_SYNCHWM to help check for the HWMs that might be out of sync,  and fixes them.   When fixing, it is recommended to set _HWM_SYNC_THRESHOLD=100, at the session level, to guarantee complete sync. In 11g mostly the user queries where slow because the constructed the extent map during execution time but if a query used Dynamic Sampling they were sure to have the same behavior causing slow parsing and the most extreme cases raise the ORA-29771. "My Customer" did not came aware of the out-of-sync in 11g because the error was not raised. They only worried that the queries ran. So back again to 12c In 12c with the explosion of dynamic sampling queries increase the likelihood of encountering the HWM issue during parse time. For customers using data guard the Slower parse causes the ORA-29771. For most customers using non-dataguard DBs this issue only manifest as slow parsing. and the most common answer from support is disable dynamic sampling and transferring the problem to execution time as the customer most probably had in 11g but as long as the performance is restored to 11g level no ones digs deeper. Solutions to avoid the error ORA-29771 then are (in order of strategy and my preference)  On A) Prevent the timeout  Fiddle with _adg_parselock_timeout and cross your fingers Do not use ADG. Maybe GoldenGate or just STREAMS. On B) Reduce the parsing time of the SQL Move the segments to a tablespace with AUTOALLOCATE extent management Manually sync the HWM of the segments after load Set _HWM_SYNC_THRESHOLD=100 before each load Disable Dynamic Sampling in ADG and cross your fingers the plan is acceptable.

"My Customer" has a ADG db that is used for reporting. Upgraded from 11gR2 to 12c. Some queries fail with an ORA-29771 process USER (OSID 335621) blocks LGWR (OSID 314841) for more than 70 seconds The...

"physical read total multi block requests"

During an escalation where WORKLOAD statistics end up being corrupted and producing aberrant execution plans I was made aware of a blog post about to "Never gather WORKLOAD stats on Exadata…" . The blog post goes ahead to identify that the statistic "physical read total multi block requests" does not appears to count correctly the multi-block requests. The discussion continued on the Oracle Community Forum . I wanted to know what this statistic really measures. The Documentation of the statistic says : Total number of Oracle instance read requests which read in two or more database blocks per request for all instance activity including application, backup and recovery, and other utilities. But as I dig deeper I realized the description is wrong and filed Bug 26136778 - desc of "physical read total multi block requests" is wrong My interpretation of the statistic is : Total number of Oracle instance read requests which read 128kBytes or more per request for all instance activity including application, backup and recovery, and other utilities. And this causes also the interpretation of "physical read total IO requests" to change as well Current Description : Number of read requests which read one or more database blocks for all instance activity including application, backup and recovery, and other utilities. The difference between this value and "physical read total multi block requests" gives the total number of single block read requests. My interpretation of the statistic is : Number of read requests which read one or more database blocks for all instance activity including application, backup and recovery, and other utilities. The difference between this value and "physical read total multi block requests" gives the total number of "Small" IO requests which are less than 128kBytes down to single block read requests. We'll have to wait for Oracle Dev to review the bug and give its ruling.

During an escalation where WORKLOAD statistics end up being corrupted and producing aberrant execution plans I was made aware of a blog post about to "Never gather WORKLOAD stats on Exadata…" .The...

Oracle Optimizer

Follow up on "Same Select..."

A follow up on the case of "Same SELECT used in an INSERT has different Execution Plan" The query has the following predicate :CH.CONTACTDATETIME > SYSDATE-5The testcase I have has table,column,index statistics from 27-jun-2013.If I run *today* the insert with the SYSDATE clause on my testcase with stats from 27-jun-2013 then I get a very poor cardinality estimation on access to CH because the stats reflect data there is little or no data present from after 27-jun-2013. This makes the CBO choose a poor execution plan, which is the one with only one DTP.If in contrast I change the query to have :CH.CONTACTDATETIME>to_date('22-JUN-13') Then the cardinality estimation is closer to reality and the CBO chooses the better execution plan than has DISTINCT PLACEMENT transformation in step 7 and 15, where before it was only happening in step 7. -----------------------------------------------------------------Id|Operation |Name |Rows |Cost----------------------------------------------------------------- 0|INSERT STATEMENT | | | 313K 1|LOAD TABLE CONVENTIONAL | | | 2| HASH JOIN OUTER | | 2 | 313K 3| VIEW | | 1 |2771 4| SORT UNIQUE | | 1 |2771 5| NESTED LOOPS | | | 6| NESTED LOOPS | | 1 |2770 7| VIEW |VW_DTP_BD4C95A1 | 524 |2238 8| SORT UNIQUE | | 524 |2238 9| TABLE ACCESS FULL |DL | 636K|177610| INDEX RANGE SCAN |I_T | 1 | 211| TABLE ACCESS BY INDEX ROWID|T | 1 | 212| VIEW | |1940 | 310K13| SORT UNIQUE | |1940 | 310K14| HASH JOIN | |1940 | 310K15| VIEW |VW_DTP_22D53B8E | 524 |223816| SORT UNIQUE | | 524 |223817| TABLE ACCESS FULL |DL | 636K|177618| PARTITION RANGE ITERATOR | |1728K| 307K19| TABLE ACCESS FULL |CH |1728K| 307K----------------------------------------------------------------- Since the query relies heavily on that SYSDATE predicate then there are a couple of possible options :1) Use SQL Plan Management to fix the execution plan to have the DTP in step 7 and 15.2) Before executing the insert, make sure the table CH has fresh statistics that reflect the latest data inserted in the table.     For example, If this is done daily then you will have to do it daily.

A follow up on the case of "Same SELECT used in an INSERT has different Execution Plan" The query has the following predicate : CH.CONTACTDATETIME > SYSDATE-5The testcase I have has table,column,index...

SQL Execution Debugging

Migrated Rows and PDML

Recently I have seen a raising trend of issues about chained rows so this post is to show a not so obvious side effect.  I see no reason to explain what a chained row is and examples of them as in the internet there are very good postings that talk about them like http://www.akadia.com/services/ora_chained_rows.html Migrated Rows is just a special case of chained rows. I am going to use a similar testcase as the article to show my point. Similar table, just bigger columns as the standard block size nowdays is 8k. EMB11203> cat setup.sqldrop table  row_mig_chain_demo;CREATE TABLE row_mig_chain_demo (  x int PRIMARY KEY,  a CHAR(2000),  b CHAR(2000),  c CHAR(2000),  d CHAR(2000),  e CHAR(2000));EMB11203> cat data.sqlvar pk number;begin select nvl(max(x),0) into :pk from row_mig_chain_demo;end;/Prompt just in case I want to add more, I just re-execute the script.INSERT INTO row_mig_chain_demo (x)select rownum+:pk from dual connect by rownum <= 100;COMMIT;Prompt migrate the rows.UPDATE row_mig_chain_demo SET a = rownum, b = rownum, c = rownum;commit;analyze table ROW_MIG_CHAIN_DEMO compute statistics; The script creates 100 rows and migrates 99 of them SQL> select  table_name,num_rows,blocks,CHAIN_CNT  2  from user_tables where table_name='ROW_MIG_CHAIN_DEMO';TABLE_NAME                       NUM_ROWS     BLOCKS  CHAIN_CNT------------------------------ ---------- ---------- ----------ROW_MIG_CHAIN_DEMO                    100        103         99 So, first interesting thing. Why I used ANALYZE instead of DBMS_STATS ? Because DBMS_STATS does not collect CHAIN_CNT. In May 2008 an internal bug was opened to request that DBMS_STATS collects CHAIN_CNT and it is still waiting to be implemented, so while I wait I'll use ANALYZE to collect it. The CBO was enhanced in 10.2.0.3 to take CHAIN_CNT in its cost calculations so if you feel that it would be good for DBMS_STATS to implement the collection of CHAIN_CNT drop Optimizer Development a line  https://blogs.oracle.com/optimizer/ Second, why 99 and not all were migrated ? because the update on the first row updated only 3 columns with combined length of 6000 which still fits in the block. The insert placed all the rows in the same block and the update migrated the rest. The following query shows that the head piece is still in the same block. set pagesize 200 lines 180select x,DBMS_ROWID.ROWID_RELATIVE_FNO (rowid) , dbms_rowid.ROWID_BLOCK_NUMBER (rowid)from row_mig_chain_demoorder by 2,3         X DBMS_ROWID.ROWID_RELATIVE_FNO(ROWID) DBMS_ROWID.ROWID_BLOCK_NUMBER(ROWID)---------- ------------------------------------ ------------------------------------         1                                    4                               248213         2                                    4                               248213         3                                    4                               248213...                100                                    4                               248213          A block dump shows all but the first row to be migrated. Start dump data blocks tsn: 4 file#:4 minblk 248213 maxblk 248215Block dump from cache:Dump of buffer cache at level 4 for tsn=4 rdba=17025429BH (0x6a3884b8) file#: 4 rdba: 0x0103c995 (4/248213) class: 1 ba: 0x6bcc4000...===============tsiz: 0x1f98hsiz: 0xdapbl: 0x2b017976ba64     76543210flag=--------ntab=1nrow=100                               <-- 100 rows....block_row_dump:tab 0, row 0, @0x5c1tl: 6015 fb: --H-FL-- lb: 0x1  cc: 4   <-- Head, First and Last Piece...tab 0, row 1, @0x5b8tl: 9 fb: --H----- lb: 0x1  cc: 0      <-- Only the headnrid:  0x0103c996.0                    <-- Where is the next piece.tab 0, row 2, @0x5aftl: 9 fb: --H----- lb: 0x1  cc: 0nrid:  0x0103c997.0....tab 0, row 99, @0x246tl: 9 fb: --H----- lb: 0x1  cc: 0nrid:  0x0103ed84.0end_of_block_dumpDump of buffer cache at level 4 for tsn=4 rdba=17025430BH (0x6a30e928) file#: 4 rdba: 0x0103c996 (4/248214) class: 1 ba: 0x6b048000...===============tsiz: 0x1f80hsiz: 0x14pbl: 0x2b017976ba7c     76543210flag=--------ntab=1nrow=1                                <-- 1 row...block_row_dump:tab 0, row 0, @0x7fbtl: 6021 fb: ----FL-- lb: 0x2  cc: 4  <-- First and Last Piecehrid: 0x0103c995.1                    <-- Where is the head  Everything is setup so now to experiment. every where I looked in the internet everyone talked about queries so I thought a good opportunity to show the effect of the migrated rows over a DML. So I will use a Serial UPDATE and a Parallel UPDATE. EMB11203> cat sdml.sqlupdate row_mig_chain_demoset a=rowid||'-'||to_char(sysdate,'yyyymmddhh24miss')/EMB11203> cat pdml.sqlalter session enable parallel dml;alter session force parallel query;update /*+ parallel */row_mig_chain_demoset a=rowid||'-'||to_char(sysdate,'yyyymmddhh24miss')/EMB11203> cat d.sqlset pages 2000 lines 180SELECT a.name, b.value value_QC  FROM v$statname a, v$mystat b WHERE a.statistic# = b.statistic#   AND lower(a.name) = 'table fetch continued row';SELECT a.name, b.value value_system  FROM v$statname a, v$sysstat b WHERE a.statistic# = b.statistic#   AND lower(a.name) = 'table fetch continued row'; The update will modify only the first column of all 100 rows but since the column is CHAR there is not going to be any change in their allocated space. First Test, the serial UDPATE SQL> @dNAME                                                               VALUE_QC---------------------------------------------------------------- ----------table fetch continued row                                                 0NAME                                                             VALUE_SYSTEM---------------------------------------------------------------- ------------table fetch continued row                                                 482SQL> @sdml100 rows updated.SQL> @dNAME                                                               VALUE_QC---------------------------------------------------------------- ----------table fetch continued row                                                 0NAME                                                             VALUE_SYSTEM---------------------------------------------------------------- ------------table fetch continued row                                                 482SQL> commit;Commit complete.SQL> @dNAME                                                               VALUE_QC---------------------------------------------------------------- ----------table fetch continued row                                                 0NAME                                                             VALUE_SYSTEM---------------------------------------------------------------- ------------table fetch continued row                                                 482 Plan hash value: 1111098730-----------------------------------------------------------------------------------------| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |-----------------------------------------------------------------------------------------| 0 | UPDATE STATEMENT | | | | 30 (100)| || 1 | UPDATE | ROW_MIG_CHAIN_DEMO | | | | || 2 | TABLE ACCESS FULL| ROW_MIG_CHAIN_DEMO | 100 | 196K| 30 (0)| 00:00:01 |----------------------------------------------------------------------------------------- Why did I query the statistics before and after the commit ? the answer will be evident in the PDML test.The execution plan shows a full scan and the expectation is that multiblock reads will be done and sure enough the 10046 shows it. =====================PARSING IN CURSOR #47530402140584 len=79 dep=0 uid=95 oct=6 lid=95 tim=1374113452337883 hv=2108613435 ad='6538a218' sqlid='c9m69ctyuxstv'update row_mig_chain_demoset a=rowid||'-'||to_char(sysdate,'yyyymmddhh24miss')END OF STMTPARSE #47530402140584:c=24995,e=24536,p=13,cr=58,cu=0,mis=1,r=0,dep=0,og=1,plh=1111098730,tim=1374113452337881WAIT #47530402140584: nam='Disk file operations I/O' ela= 1001 FileOperation=2 fileno=4 filetype=2 obj#=78159 tim=1374113452339118WAIT #47530402140584: nam='db file sequential read' ela= 22 file#=4 block#=248210 blocks=1 obj#=78159 tim=1374113452339165WAIT #47530402140584: nam='db file scattered read' ela= 60 file#=4 block#=248211 blocks=5 obj#=78159 tim=1374113452339342WAIT #47530402140584: nam='Disk file operations I/O' ela= 701WAIT #47530402140584: nam='db file scattered read' ela= 62 file#=4 block#=248224 blocks=8 obj#=78159 tim=1374113452340796WAIT #47530402140584: nam='db file scattered read' ela= 36 file#=4 block#=248233 blocks=7 obj#=78159 tim=1374113452341328WAIT #47530402140584: nam='db file scattered read' ela= 40 file#=4 block#=248240 blocks=8 obj#=78159 tim=1374113452341708WAIT #47530402140584: nam='db file scattered read' ela= 49 file#=4 block#=248249 blocks=7 obj#=78159 tim=1374113452342655WAIT #47530402140584: nam='db file scattered read' ela= 82 file#=4 block#=248256 blocks=8 obj#=78159 tim=1374113452343192WAIT #47530402140584: nam='db file scattered read' ela= 40 file#=4 block#=248265 blocks=7 obj#=78159 tim=1374113452343781WAIT #47530402140584: nam='db file scattered read' ela= 42 file#=4 block#=248272 blocks=8 obj#=78159 tim=1374113452344284WAIT #47530402140584: nam='db file scattered read' ela= 39 file#=4 block#=248281 blocks=7 obj#=78159 tim=1374113452344851WAIT #47530402140584: nam='db file scattered read' ela= 49 file#=4 block#=248288 blocks=8 obj#=78159 tim=1374113452345351WAIT #47530402140584: nam='db file scattered read' ela= 79 file#=4 block#=248297 blocks=7 obj#=78159 tim=1374113452345936WAIT #47530402140584: nam='db file scattered read' ela= 46 file#=4 block#=248304 blocks=8 obj#=78159 tim=1374113452349178WAIT #47530402140584: nam='db file scattered read' ela= 38 file#=4 block#=248313 blocks=7 obj#=78159 tim=1374113452351317WAIT #47530402140584: nam='db file scattered read' ela= 63 file#=4 block#=257408 blocks=8 obj#=78159 tim=1374113452352544EXEC #47530402140584:c=7999,e=15045,p=104,cr=106,cu=464,mis=0,r=100,dep=0,og=1,plh=1111098730,tim=1374113452352980 It is interesting that the first IO is a single block read but later all are multiblock. So far , all that was promised is there, the full table scan is not affected by the migrated rows. Now the Parallel Update NAME                                                               VALUE_QC---------------------------------------------------------------- ----------table fetch continued row                                                 0NAME                                                             VALUE_SYSTEM---------------------------------------------------------------- ------------table fetch continued row                                                 602SQL> @pdmlSession altered.Session altered.100 rows updated.SQL> @dNAME                                                               VALUE_QC---------------------------------------------------------------- ----------table fetch continued row                                                 0NAME                                                             VALUE_SYSTEM---------------------------------------------------------------- ------------table fetch continued row                                                 700SQL> commit;Commit complete.SQL> @dNAME                                                               VALUE_QC---------------------------------------------------------------- ----------table fetch continued row                                                98NAME                                                             VALUE_SYSTEM---------------------------------------------------------------- ------------table fetch continued row                                                 700 To answer a previous question, why I get the statistics before and after the commit ? The reason is that it appears the statistics of the slaves are not updated to the Query Coordinator (QC) until the transaction is over but they are updated in the system-wide statistics and as it shows, 98 "table fetch continued row" (TFCR) operations were done. 98 is a strange number as there are 99 migrated rows. The execution plan shows the Full Table Scan too so why the Parallel UPDATE did TFCR when the Serial did not ? Plan hash value: 262915969----------------------------------------------------------------------------------------------------------------------------| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | TQ |IN-OUT| PQ Distrib |----------------------------------------------------------------------------------------------------------------------------| 0 | UPDATE STATEMENT | | | | 4 (100)| | | | || 1 | PX COORDINATOR | | | | | | | | || 2 | PX SEND QC (RANDOM) | :TQ10001 | 100 | 196K| 4 (0)| 00:00:01 | Q1,01 | P->S | QC (RAND) || 3 | INDEX MAINTENANCE | ROW_MIG_CHAIN_DEMO | | | | | Q1,01 | PCWP | || 4 | PX RECEIVE | | 100 | 196K| 4 (0)| 00:00:01 | Q1,01 | PCWP | || 5 | PX SEND RANGE | :TQ10000 | 100 | 196K| 4 (0)| 00:00:01 | Q1,00 | P->P | RANGE || 6 | UPDATE | ROW_MIG_CHAIN_DEMO | | | | | Q1,00 | PCWP | || 7 | PX BLOCK ITERATOR | | 100 | 196K| 4 (0)| 00:00:01 | Q1,00 | PCWC | ||* 8 | TABLE ACCESS FULL| ROW_MIG_CHAIN_DEMO | 100 | 196K| 4 (0)| 00:00:01 | Q1,00 | PCWP | |----------------------------------------------------------------------------------------------------------------------------Predicate Information (identified by operation id):--------------------------------------------------- 8 - access(:Z>=:Z AND :Z<=:Z) The explanation given by development is that some operations , in particular PDMLs, in order to avoid some deadlock scenarios the slaves need to do head piece scan while serial does a row piece scan. Why do I care if it is a head piece scan vs row piece scan done ? Because a Head piece scan reads the entire row, this means accessing all the blocks where the row pieces resides to get then all, and the worst part is that it has to be resolved immediately, cannot be deferred as the pieces are found by the multiblock read so the head piece scan does single block reads to look for the pieces. The 10046 of the slaves shows those single block reads. PARSING IN CURSOR #47041633656608 len=94 dep=1 uid=95 oct=6 lid=95 tim=1374112767371325 hv=2999224131 ad='666d9300' sqlid='98qsj4utc91u3'update /*+ parallel */row_mig_chain_demoset a=rowid||'-'||to_char(sysdate,'yyyymmddhh24miss')END OF STMTPARSE #47041633656608:c=0,e=258,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,plh=262915969,tim=1374112767371319WAIT #47041633656608: nam='PX Deq: Execution Msg' ela= 21180 sleeptime/senderid=268566527 passes=1 p3=1687153256 obj#=-1 tim=1374112767398375WAIT #47041633656608: nam='Disk file operations I/O' ela= 1189 FileOperation=2 fileno=4 filetype=2 obj#=78159 tim=1374112767400091WAIT #47041633656608: nam='db file sequential read' ela= 46 file#=4 block#=248228 blocks=1 obj#=78159 tim=1374112767400195WAIT #47041633656608: nam='PX Deq: Execution Msg' ela= 313 sleeptime/senderid=268566527 passes=1 p3=1687153256 obj#=78159 tim=1374112767401031WAIT #47041633656608: nam='db file sequential read' ela= 21 file#=4 block#=248299 blocks=1 obj#=78159 tim=1374112767401145WAIT #47041633656608: nam='PX Deq: Execution Msg' ela= 213 sleeptime/senderid=268566527 passes=1 p3=1687153256 obj#=78159 tim=1374112767401398WAIT #47041633656608: nam='db file sequential read' ela= 24 file#=4 block#=248308 blocks=1 obj#=78159 tim=1374112767401520WAIT #47041633656608: nam='PX Deq: Execution Msg' ela= 185 sleeptime/senderid=268566527 passes=1 p3=1687153256 obj#=78159 tim=1374112767402019WAIT #47041633656608: nam='db file sequential read' ela= 18 file#=4 block#=248262 blocks=1 obj#=78159 tim=1374112767402102WAIT #47041633656608: nam='PX Deq: Execution Msg' ela= 177 sleeptime/senderid=268566527 passes=1 p3=1687153256 obj#=78159 tim=1374112767402330WAIT #47041633656608: nam='db file sequential read' ela= 19 file#=4 block#=248256 blocks=1 obj#=78159 tim=1374112767402429..... This is particulary bad in Exadata as customers like to have very wide tables (more than 255 columns which are automatically chained) and like to do PDML. In Exadata system, the waitevent will be 'cell single block physical read'. There is hope though, smart scan can cope with some cases of chained rows as long as the head and pieces are within the 1MB boundary of the IO but do not temp your luck and do your best to not produce chained or migrated rows, it is in everyones best interest.

Recently I have seen a raising trend of issues about chained rows so this post is to show a not so obvious side effect.  I see no reason to explain what a chained row is and examples of them as in the...

Database Performance

More parses in 12c ?

More parses in 12.1 ? Some customers upgrading from 11.2.0.3 to 12.1 might get alarmed because they see a spike in total number of parse count. But not to worry, it is not a problem in 12.1. The problem is in 11.2.0.3. Lets take the following example : select a.name, b.value     from v$statname a, v$mystat b    where a.statistic# =  b.statistic#      and ( a.name like '%parse count%'         or a.name like 'session cursor cache%')      and value>0;   begin   for n in 1..100 loop   begin     execute immediate 'alter session set optimizer_goal=choose';   exception    when others then    null;   end;   end loop;   end;  /   select a.name, b.value     from v$statname a, v$mystat b    where a.statistic# =  b.statistic#      and ( a.name like '%parse count%'         or a.name like 'session cursor cache%')      and value>0;  I got the output of the queries and put them side by side in each version. NAME                            11.1 VALUE   11.2 VALUE   12.1 VALUE -----------------------------   ----------   ----------   ---------- session cursor cache hits                2            3            1 session cursor cache count               3           10           10 parse count (total)                     12           11           12 parse count (hard)                                    1    PL/SQL procedure successfully completed. NAME                            11.1 VALUE   11.2 VALUE   12.1 VALUE -----------------------------   ----------   ----------   ---------- session cursor cache hits                3            3            1 session cursor cache count               4           11           11 parse count (total)                    114           13          114 parse count (hard)                     100          101          100 parse count (failures)                                           100 Notice the number of "parse count (total)" in 11.2 is less than "parse count (hard)" which makes no sense since "parse count (total)" should include Hard , Soft and Failed parses. This might sound like an extreme case but there are other soft parsing scenarios that also reproduce this oddity. For customers moving from 11.1 to 11.2.0.3 it might have looked as an improvement that there were less parses in total and then moving from 11.2.0.3 to 12.1 they would probably notice the spike and raise concerns. So do not be alarmed, 12.1 does not do more parses, nor 11.2.0.3 does less parses, it is just a bug in the instrumentation of the stats, specifically : Bug 13837105  statistics "parse count (total)" and "session cursor cache hits" miscounted [ID 13837105.8]

More parses in 12.1 ? Some customers upgrading from 11.2.0.3 to 12.1 might get alarmed because they see a spike in total number of parse count. But not to worry, it is not a problem in 12.1. The problem...

Oracle Optimizer

Same SELECT used in an INSERT has different execution plan

A customer complained that a query and its INSERT counterpart had different execution plans, and of course, the INSERT was slower. First lets look at the SELECT : SELECT ua_tr_rundatetime,        ua_ch_treatmentcode,        ua_tr_treatmentcode,        ua_ch_cellid,        ua_tr_cellid FROM   (SELECT DISTINCT CH.treatmentcode AS UA_CH_TREATMENTCODE,                         CH.cellid        AS UA_CH_CELLID         FROM    CH,                 DL         WHERE  CH.contactdatetime > SYSDATE - 5                AND CH.treatmentcode = DL.treatmentcode) CH_CELLS,        (SELECT DISTINCT T.treatmentcode AS UA_TR_TREATMENTCODE,                         T.cellid        AS UA_TR_CELLID,                         T.rundatetime   AS UA_TR_RUNDATETIME         FROM    T,                 DL         WHERE  T.treatmentcode = DL.treatmentcode) TRT_CELLS WHERE  CH_CELLS.ua_ch_treatmentcode(+) = TRT_CELLS.ua_tr_treatmentcode;  The query has 2 DISTINCT subqueries.  The execution plan shows one with DISTICT Placement transformation applied and not the other. The view in Step 5 has the prefix VW_DTP which means DISTINCT Placement. -------------------------------------------------------------------- | Id  | Operation                    | Name            | Cost (%CPU) -------------------------------------------------------------------- |   0 | SELECT STATEMENT             |                 |   272K(100) |*  1 |  HASH JOIN OUTER             |                 |   272K  (1) |   2 |   VIEW                       |                 |  4408   (1) |   3 |    HASH UNIQUE               |                 |  4408   (1) |*  4 |     HASH JOIN                |                 |  4407   (1) |   5 |      VIEW                    | VW_DTP_48BAF62C |  1660   (2) |   6 |       HASH UNIQUE            |                 |  1660   (2) |   7 |        TABLE ACCESS FULL     | DL              |  1644   (1) |   8 |      TABLE ACCESS FULL       | T               |  2744   (1) |   9 |   VIEW                       |                 |   267K  (1) |  10 |    HASH UNIQUE               |                 |   267K  (1) |* 11 |     HASH JOIN                |                 |   267K  (1) |  12 |      PARTITION RANGE ITERATOR|                 |   266K  (1) |* 13 |       TABLE ACCESS FULL      | CH              |   266K  (1) |  14 |      TABLE ACCESS FULL       | DL              |  1644   (1) -------------------------------------------------------------------- Query Block Name / Object Alias (identified by operation id): -------------------------------------------------------------    1 - SEL$1    2 - SEL$AF418D5F / TRT_CELLS@SEL$1    3 - SEL$AF418D5F    5 - SEL$F6AECEDE / VW_DTP_48BAF62C@SEL$48BAF62C    6 - SEL$F6AECEDE    7 - SEL$F6AECEDE / DL@SEL$3    8 - SEL$AF418D5F / T@SEL$3    9 - SEL$2        / CH_CELLS@SEL$1   10 - SEL$2   13 - SEL$2        / CH@SEL$2   14 - SEL$2        / DL@SEL$2 Predicate Information (identified by operation id): ---------------------------------------------------    1 - access("CH_CELLS"."UA_CH_TREATMENTCODE"="TRT_CELLS"."UA_TR_TREATMENTCODE")    4 - access("T"."TREATMENTCODE"="ITEM_1")   11 - access("CH"."TREATMENTCODE"="DL"."TREATMENTCODE")   13 - filter("CH"."CONTACTDATETIME">SYSDATE@!-5) The outline shows PLACE_DISTINCT(@"SEL$3" "DL"@"SEL$3") indicating that the QB3 is the one that got the transformation. Outline Data -------------   /*+       BEGIN_OUTLINE_DATA       IGNORE_OPTIM_EMBEDDED_HINTS       OPTIMIZER_FEATURES_ENABLE('11.2.0.3')       DB_VERSION('11.2.0.3')       ALL_ROWS       OUTLINE_LEAF(@"SEL$2")       OUTLINE_LEAF(@"SEL$F6AECEDE")       OUTLINE_LEAF(@"SEL$AF418D5F") PLACE_DISTINCT(@"SEL$3" "DL"@"SEL$3")       OUTLINE_LEAF(@"SEL$1")       OUTLINE(@"SEL$48BAF62C")       OUTLINE(@"SEL$3")       NO_ACCESS(@"SEL$1" "TRT_CELLS"@"SEL$1")       NO_ACCESS(@"SEL$1" "CH_CELLS"@"SEL$1")       LEADING(@"SEL$1" "TRT_CELLS"@"SEL$1" "CH_CELLS"@"SEL$1")       USE_HASH(@"SEL$1" "CH_CELLS"@"SEL$1")       FULL(@"SEL$2" "CH"@"SEL$2")       FULL(@"SEL$2" "DL"@"SEL$2")       LEADING(@"SEL$2" "CH"@"SEL$2" "DL"@"SEL$2")       USE_HASH(@"SEL$2" "DL"@"SEL$2")       USE_HASH_AGGREGATION(@"SEL$2")       NO_ACCESS(@"SEL$AF418D5F" "VW_DTP_48BAF62C"@"SEL$48BAF62C")       FULL(@"SEL$AF418D5F" "T"@"SEL$3")       LEADING(@"SEL$AF418D5F" "VW_DTP_48BAF62C"@"SEL$48BAF62C" "T"@"SEL$3")       USE_HASH(@"SEL$AF418D5F" "T"@"SEL$3")       USE_HASH_AGGREGATION(@"SEL$AF418D5F")       FULL(@"SEL$F6AECEDE" "DL"@"SEL$3")       USE_HASH_AGGREGATION(@"SEL$F6AECEDE")       END_OUTLINE_DATA   */ The 10053 shows there is a comparative of cost with and without the transformation. This means the transformation belongs to Cost-Based Query Transformations (CBQT). In SEL$3 the optimization of the query block without the transformation is 6659.73 and with the transformation is 4408.41 so the transformation is kept. GBP/DP: Checking validity of GBP/DP for query block SEL$3 (#3) DP: Checking validity of distinct placement for query block SEL$3 (#3) DP: Using search type: linear DP: Considering distinct placement on query block SEL$3 (#3) DP: Starting iteration 1, state space = (5) : (0) DP: Original query DP: Costing query block. DP: Updated best state, Cost = 6659.73 DP: Starting iteration 2, state space = (5) : (1) DP: Using DP transformation in this iteration. DP: Transformed query DP: Costing query block. DP: Updated best state, Cost = 4408.41 DP: Doing DP on the original QB. DP: Doing DP on the preserved QB. In SEL$2 the cost without the transformation is less than with it so it is not kept. GBP/DP: Checking validity of GBP/DP for query block SEL$2 (#2) DP: Checking validity of distinct placement for query block SEL$2 (#2) DP: Using search type: linear DP: Considering distinct placement on query block SEL$2 (#2) DP: Starting iteration 1, state space = (3) : (0) DP: Original query DP: Costing query block. DP: Updated best state, Cost = 267936.93 DP: Starting iteration 2, state space = (3) : (1) DP: Using DP transformation in this iteration. DP: Transformed query DP: Costing query block. DP: Not update best state, Cost = 267951.66 To the same query an INSERT INTO is added and the result is a very different execution plan. INSERT  INTO cc               (ua_tr_rundatetime,                ua_ch_treatmentcode,                ua_tr_treatmentcode,                ua_ch_cellid,                ua_tr_cellid)SELECT ua_tr_rundatetime,       ua_ch_treatmentcode,       ua_tr_treatmentcode,       ua_ch_cellid,       ua_tr_cellidFROM   (SELECT DISTINCT CH.treatmentcode AS UA_CH_TREATMENTCODE,                        CH.cellid        AS UA_CH_CELLID        FROM    CH,                DL        WHERE  CH.contactdatetime > SYSDATE - 5               AND CH.treatmentcode = DL.treatmentcode) CH_CELLS,       (SELECT DISTINCT T.treatmentcode AS UA_TR_TREATMENTCODE,                        T.cellid        AS UA_TR_CELLID,                        T.rundatetime   AS UA_TR_RUNDATETIME        FROM    T,                DL        WHERE  T.treatmentcode = DL.treatmentcode) TRT_CELLSWHERE  CH_CELLS.ua_ch_treatmentcode(+) = TRT_CELLS.ua_tr_treatmentcode;----------------------------------------------------------| Id  | Operation                     | Name | Cost (%CPU)----------------------------------------------------------|   0 | INSERT STATEMENT              |      |   274K(100)|   1 |  LOAD TABLE CONVENTIONAL      |      |            |*  2 |   HASH JOIN OUTER             |      |   274K  (1)|   3 |    VIEW                       |      |  6660   (1)|   4 |     SORT UNIQUE               |      |  6660   (1)|*  5 |      HASH JOIN                |      |  6659   (1)|   6 |       TABLE ACCESS FULL       | DL   |  1644   (1)|   7 |       TABLE ACCESS FULL       | T    |  2744   (1)|   8 |    VIEW                       |      |   267K  (1)|   9 |     SORT UNIQUE               |      |   267K  (1)|* 10 |      HASH JOIN                |      |   267K  (1)|  11 |       PARTITION RANGE ITERATOR|      |   266K  (1)|* 12 |        TABLE ACCESS FULL      | CH   |   266K  (1)|  13 |       TABLE ACCESS FULL       | DL   |  1644   (1)----------------------------------------------------------Query Block Name / Object Alias (identified by operation id):-------------------------------------------------------------   1 - SEL$1   3 - SEL$3 / TRT_CELLS@SEL$1   4 - SEL$3   6 - SEL$3 / DL@SEL$3   7 - SEL$3 / T@SEL$3   8 - SEL$2 / CH_CELLS@SEL$1   9 - SEL$2  12 - SEL$2 / CH@SEL$2  13 - SEL$2 / DL@SEL$2Predicate Information (identified by operation id):---------------------------------------------------   2 - access("CH_CELLS"."UA_CH_TREATMENTCODE"="TRT_CELLS"."UA_TR_TREATMENTCODE")   5 - access("T"."TREATMENTCODE"="DL"."TREATMENTCODE")  10 - access("CH"."TREATMENTCODE"="DL"."TREATMENTCODE")  12 - filter("CH"."CONTACTDATETIME">SYSDATE@!-5)Outline Data-------------  /*+      BEGIN_OUTLINE_DATA      IGNORE_OPTIM_EMBEDDED_HINTS      OPTIMIZER_FEATURES_ENABLE('11.2.0.3')      DB_VERSION('11.2.0.3')      ALL_ROWS      OUTLINE_LEAF(@"SEL$2")      OUTLINE_LEAF(@"SEL$3")      OUTLINE_LEAF(@"SEL$1")      OUTLINE_LEAF(@"INS$1")      FULL(@"INS$1" "CC"@"INS$1")      NO_ACCESS(@"SEL$1" "TRT_CELLS"@"SEL$1")      NO_ACCESS(@"SEL$1" "CH_CELLS"@"SEL$1")      LEADING(@"SEL$1" "TRT_CELLS"@"SEL$1" "CH_CELLS"@"SEL$1")      USE_HASH(@"SEL$1" "CH_CELLS"@"SEL$1")      FULL(@"SEL$2" "CH"@"SEL$2")      FULL(@"SEL$2" "DL"@"SEL$2")      LEADING(@"SEL$2" "CH"@"SEL$2" "DL"@"SEL$2")      USE_HASH(@"SEL$2" "DL"@"SEL$2")      USE_HASH_AGGREGATION(@"SEL$2")      FULL(@"SEL$3" "DL"@"SEL$3")      FULL(@"SEL$3" "T"@"SEL$3")      LEADING(@"SEL$3" "DL"@"SEL$3" "T"@"SEL$3")      USE_HASH(@"SEL$3" "T"@"SEL$3")      USE_HASH_AGGREGATION(@"SEL$3")      END_OUTLINE_DATA  */ There is no DISTINCT Placement view and no hint.The 10053 trace shows a new legend "DP: Bypassed: Not SELECT"implying that this is a transformation that it is possible only for SELECTs. GBP/DP: Checking validity of GBP/DP for query block SEL$3 (#4)DP: Checking validity of distinct placement for query block SEL$3 (#4)DP: Bypassed: Not SELECT.GBP/DP: Checking validity of GBP/DP for query block SEL$2 (#3)DP: Checking validity of distinct placement for query block SEL$2 (#3)DP: Bypassed: Not SELECT. In 12.1 (and hopefully in 11.2.0.4 when released) the restriction on applying CBQT to some DMLs and DDLs (like CTAS) is lifted.This is documented in BugTag Note:10013899.8 Allow CBQT for some DML / DDLAnd interestingly enough, it is possible to have a one-off patch in 11.2.0.3. SQL> select DESCRIPTION,OPTIMIZER_FEATURE_ENABLE,IS_DEFAULT     2  from v$system_fix_control where BUGNO='10013899'; DESCRIPTION ---------------------------------------------------------------- OPTIMIZER_FEATURE_ENABLE  IS_DEFAULT ------------------------- ---------- enable some transformations for DDL and DML statements 11.2.0.4                           1

A customer complained that a query and its INSERT counterpart had different execution plans, and of course, the INSERT was slower. First lets look at the SELECT : SELECT ua_tr_rundatetime,        ...

Database Performance

Diagnosis of a High Version Count

During the 2013 Great Lakes Oracle Conference (GLOC), Craig Shallahamer presented "Cursors, Child Cursors, and More Child Cursors" and motivate me to make this posting. In the past, I have participated in many situations where I had to debug the mechanisms that generate large number of child cursors and learned that ,more often than not ,a cursor is not shared for many reasons and those may change from version to version so I wrote a tool to help debug it. I see pertinent first explaining a little what child cursors are and what a high version count is. A cursor is the instance of query/dml in the library cache. A hash key to locate the cursor in the library cache is calculated based on the text of the statement. But if the text is the only criteria used to identify the cursor in the library cache what happens if the cursor is not compatible with the conditions that execute them ? For example, how can the same cursor be used when the same query is executed by 2 users with different privileges or with different language or even different objects? It creates a version of the cursor with those unique conditions that make it possible to execute. A version of a cursor is also known as child cursor. So if there is a child, is there a parent ? Yes, the parent cursor is the root where all those child cursors are linked. In order for a session to find the correct version to execute, it attempts to match the query text in the library cache using the hash key generated by the text to the parent cursor and once found it scans the list of versions (or children) one by one to match which has the correct set of conditions and if it does not find one then it creates a new version and adds it to the list. For example, lets say I have 3 schemas, A, B ,C and a table T in schema A and a public synonym for T. Lets say further that A grants access on T to B. What happens if I run the query : "SELECT * FROM T" from each schema ? if the query is ran by A, "T" points to the schema object "A"."T" and the query runs ok. A version is created that indicates that a session connected with schema "A" can access "A"."T". if the query is ran by B, "T" points to public synonym "T" and that points to "A"."T" and because A granted access on T to B then the query runs ok. A version is created that indicates that a session connected with schema "B" can access "A"."T". if the query us ran by C, "T" points to public synonym "T" and that points to "A"."T" and but because A has not granted access on T to C then the query fails and no version is created. If now, "A" grants access on "T" to "C" and runs the query again, then "T" is resolved again to to public synonym "T" and that points to "A"."T" and since it now has privileges the query runs ok and a version is created that indicates that a session connected with schema "C" can access "A"."T". If now, "A" revokes access on "T" to "B", then the version associated with schema "B" is invalidated, to let any session that has a pointer to that child cursor that it is no longer valid. If the query is ran by "A" but on a client that has German NLS settings and all the previous ones were done in English then a new version is created for German NLS settings. I can easily continue the examples as there are over 50 different reasons why a child needs to be created. These are tracked in V$SQL_SHARED_CURSOR. There are reasons related to Binds, privileges, languages, parallel execution, optimization, and more, and reasons are added, removed or change in meaning with each version. Even if in theory there could be 50+ combinations of reasons and each reason may have "n" number of variables that can make it non share-able to anyone but who has those same conditions , the number of versions usually remains low. The norm is that most cursors will have less than 100 versions but there are times when hundreds or thousands of versions are made and their management cause significant performance degradation in the system. As each reason is related to different features in the database there is not one way to debug a high version count situation. For example, in order to diagnose when the reason for not sharing is language, it might be necessary to know what languanges are involved. If the reason is binds, it might be necessary to know how many, what datatype, and what size are the binds. and so on. For a more in depth explanation see : Troubleshooting: High Version Count Issues [ID 296377.1] After working several of these high version count cases I came to identify some common diagnostics , their presentation and their interpretation so I made a diagnostic tool to collect them. Roudtrips between support and the customer have always been annoying and riddled with mistakes and omitions so the tool has the objective to collect as much relevant information as possible to only the reasons identified. The tool is VERSION_RPT and can be found in : High SQL Version Counts - Script to determine reason(s) [ID:438755.1] The tool adapts to the number of reasons in each version so it works from 9.2 to 11gR2 and I expect to work too on 12c once released. The tool can be used to diagnose one or many cursors, including pseudocursors, using SQL_ID or HASH_VALUE. The output of the report shows a summary of the reasons and information about the factors that each reason may take into consideration to decide to share or not, like Optimizer parameters when the reason is  OPTIMIZER_MISMATCH.These reasons are searchable in MOS. Searching for the phrase :"Cursor not shared due to USER_BIND_PEEK_MISMATCH"Can found BugTag Notes like Bug 8981059 - High Version Count (due to USER_BIND_PEEK_MISMATCH) with bind peeking [ID 8981059.8] To collect data for a group of SQLs it can be done SELECT b.* FROM v$sqlarea a ,      TABLE(version_rpt(a.sql_id)) b WHERE loaded_versions >=100; Here is an example of its use for one cursor: =============== SQL> set pages 2000 lines 180 SQL> SELECT * FROM TABLE(version_rpt('8swypbbr0m372')); Version Count Report Version 3.2.1 -- Today's Date 02-may-12 11:50 RDBMS Version :11.2.0.3.0 Host: XXXXXXXXXXXXXXXXXXXX Instance 1 : XXXXXXXXXX ================================================================== Addr: 00000003D8B6A570  Hash_Value: 3993603298  SQL_ID 8swypbbr0m372 Sharable_Mem: 208913 bytes   Parses: 477 Stmt: 0 select order#,columns,types from access$ where d_obj#=:1 1 Versions Summary ---------------- OPTIMIZER_MISMATCH :10 BIND_MISMATCH :1 LANGUAGE_MISMATCH :1 OPTIMIZER_MODE_MISMATCH :3 PX_MISMATCH :2 Total Versions:10 ~ Plan Hash Value Summary ----------------------- Plan Hash Value Count =============== =====       872636971 11               0 1 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Details for OPTIMIZER_MISMATCH : 1 versions with is_recur_flags = 167 2 versions with is_recur_flags = 161 9 versions with _pga_max_size = 614400 KB 7 versions with optimizer_mode = choose 3 versions with is_recur_flags = 35 7 versions with hash_area_size = 10485760 1 versions with optimizer_mode = rule 2 versions with advanced_queuing_internal_cursor = 1 2 versions with is_recur_flags = 163 1 versions with parallel_execution_enabled = false 1 versions with _pga_max_size = 204800 KB 10 versions with _optimizer_order_by_elimination_enabled = false 2 versions with is_recur_flags = 39 1 versions with hash_area_size = 2000000 1 versions with sort_area_size = 1000000 8 versions with _db_file_optimizer_read_count = 128 10 versions with _optimizer_cost_based_transformation = on 1 versions with parallel_threads_per_cpu = 4 1 versions with parallel_max_degree = 16 1 versions with _parallel_syspls_obey_force = false ~~~~~~~~~~~~~~~~~~~~~~~~~~~ Details for BIND_MISMATCH : Consolidated details for : BIND_MISMATCH,USER_BIND_PEEK_MISMATCH,BIND_UACS_DIFF and BIND_EQUIV_FAILURE (Mislabled as ROW_LEVEL_SEC_MISMATCH BY bug 6964441 in 11gR1) from v$sql_bind_capture COUNT(*) POSITION MIN(MAX_LENGTH) MAX(MAX_LENGTH) DATATYPE (PRECISION,SCALE) ======== ======== =============== =============== ======== ================       11        1              22              22        2 (,) SUM(DECODE(column,Y, 1, 0) FROM V$SQL IS_OBSOLETE IS_BIND_SENSITIVE IS_BIND_AWARE IS_SHAREABLE =========== ================= ============= ============           0                 0             0            9 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Details for LANGUAGE_MISMATCH : No details available ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Details for OPTIMIZER_MODE_MISMATCH : 1 versions with RULE 9 versions with CHOOSE 2 versions with ALL_ROWS ~~~~~~~~~~~~~~~~~~~~~~~~~ Details for PX_MISMATCH : No details available #### To further debug Ask Oracle Support for the appropiate level LLL. alter session set events 'immediate trace name cursortrace address 3993603298, level LLL'; To turn it off do use address 1, level 2147483648 ================================================================ 84 rows selected.

During the 2013 Great Lakes Oracle Conference (GLOC), Craig Shallahamer presented "Cursors, Child Cursors, and More Child Cursors" and motivate me to make this posting. In the past, I have participated...

Database Performance

Parse and Parse again.

Since the first release of JDBC and event longer ago with Pro* Precompilers to this day I still see the following issue : Please notice below in this abstract of a 30 min AWR the number of Parse Calls vs Executions, the SQL Module and in the case of the first SQL, the SQL Text too.  Parse Calls Executions % Total Parses SQL Id SQL Module SQL Text 339,567 339,566 15.93 0mr7azgm9psws JDBC Thin Client SELECT COUNT(*) FROM DUAL 101,574 101,880 4.77 5uta56d6syjv3 JDBC Thin Client select TRIM(emp_fname || ' ' |... 46,800 46,799 2.20 6vw7kzhpb1dm3 JDBC Thin Client select "STATUS_ID" as "status... 35,509 35,509 1.67 648yjr5s1w7uc JDBC Thin Client SELECT "PROTOCOL_GROUP_ID" as... 31,292 32,540 1.47 0jvyqkxaytndr bvlab.exe select crf.CRF_LABEL from COMP... Since the introduction of the shared pool in Oracle7 it was available the capability for clients to have a SQL Statement Cache in order to do the famous "Parse once, Execute Many" best practice. This feature is controlled on the RDBMS by the SESSION_CACHED_CURSORS parameter and this makes many people believe that the feature is managed at the RDBMS side alone but this far from the truth. When this feature is not used then the client software parses the statement for each execution does not matter if the parameter is set in the database. Many take this feature for granted because SQL*Plus uses it, so they assume it is true for other client software but it is a feature that has to be explicitly managed by the client software and works in conjunction with the RDBMS when the parameter is set. In JDBC it is necessary to explicitly enable it and has an implicit setImplicitCachingEnabled and an explicit setExplicitCachingEnabled way to manage the cache.  It is a little easier with the precompilers by setting are precompile time the HOLD_CURSOR and RELEASE_CURSOR parameters. Take a moment to look at your own AWR or Statspack and verify your application is using the Client Statement Cache as intended. Now the cherry of the sundae is the SQL "SELECT COUNT(*) FROM DUAL". Over the years I've seen variations of queries over DUAL coming from JDBC in large quantities. I have been given the explanation from Customers and Developers that these queries are used as a heartbeat, to confirm the connection to the database is stil open. I can understand that was necessary when jdbc and connection pooling was a young technology but isn't it anything better nowadays that will do the same thing without parsing and parsing the same useless statement without end ? I would like to hear from people about what other techniques/features are used that can avoid the need to parse a bogus SQL to the database.  swing out sister - windmills of your mind

Since the first release of JDBC and event longer ago with Pro* Precompilers to this day I still see the following issue : Please notice below in this abstract of a 30 min AWR the number of Parse Calls...

Oracle Optimizer

Disabled Fixes ... Hidden gems of the RDBMS

When making changes to the RDBMS,the developers of Oracle are always wondering if the new change will introduce a regression. Every change is checked against a suite of regressions tests to try to trap if the change introduced a regression and fix it before it is released to our clients. Just in case that change does introduce a regression the developers add parameters and events to disable them. These are the workarounds used when a defect is encounter. But sometimes the developers are overly cautious and decide that a change may be better to be introduced in a more controlled manner and decide to leave it disabled and only very few people would know how to enable them. Another reason is that these changes are introduced to help one specific customer and there is no eviedence to support it might help or hurt someone else. These disabled changes are still tested for regressions anyway. With the introduction of FIX CONTROL in 10.2.0.2 there is now discover-able information about these disabled fixes. SQL> desc v$session_fix_control Name                                      Null?    Type ----------------------------------------- -------- ---------------------------- SESSION_ID                                         NUMBER BUGNO                                              NUMBER VALUE                                              NUMBER SQL_FEATURE                                        VARCHAR2(64) DESCRIPTION                                        VARCHAR2(64) OPTIMIZER_FEATURE_ENABLE                           VARCHAR2(25) EVENT                                              NUMBER IS_DEFAULT                                         NUMBERSQL> select value,count(*) from v$system_fix_control group by value order by value;     VALUE   COUNT(*)  -- 11.2.0.3---------- ---------- 0         47  <-- Disabled 1        600  <-- Enabled 3          4  ] 7          1  ]__ Configurable 15          1  ] 20          1  ] So in 11.2.0.3 there are 47 fixes that dormant, waiting for an application to take advantage of them. Lets see an example : Bug 6708183 - CBO cannot get good cardinality estimate for TABLE() functions [ID 6708183.8] Introduced in 11.1.0.7 and 11.2.0.1. This fix allows for a limited version of dynamic sampling to look inside a Collection Table in an effort to get a better cardinality estimation. By default the CBO makes a guess that the collection has as many rows of 100 bytes that can fit in 100 block and that usually translates to 8168 rows. There are exception when this fix is enabled where still cardinality is not better estimated and are corrected in posterior fixes or are still to be corrected but all of them them depend on fix 6708183 to be enabled first. The fix is enabled by "_fix_control" = '6708183:ON' or by adding a DYNAMIC_SAMPLING hint in the query. If you wish to know more about one particular disabled (or enabled) fix, try its bugtag first. if it does not have a bugtag let me know and I'll see if one can be made.

When making changes to the RDBMS,the developers of Oracle are always wondering if the new change will introduce a regression. Every change is checked against a suite of regressions tests to try to trap...

Oracle Optimizer

The Best Kept Secret in Support That is Not Suppose to be a Secret

BugTags  What are Bugtags ? The bug database keeps dialogue and fix information for suspected product defects for various Oracle products.  Some bugs are true defects and get fixed in some code line, sometimes with detailed information about the defect and sometimes with very little comment. Some times they are enhancements to features or new features altogehter, also with or without details. A bugtag is a separate entity which can be created for any fixed defect or enhancement and includes summary information and categorizations of the defect or enhacement. A bugtag is an optional extension to a fixed defect or added enhancement. See : Introduction to "Bug Description" Articles [ID 245840.1]) for more details. What products BugTags cover ? Currently a bugtags are only created for server products, and mostly only the RDBMS. What information is in a BugTag ? A bugtag include attributes such as: Bug number that the bugtag describes A summary title (intended to be a customer friendly bug title) A short description of the bug issue (a few paragraphs at most) A range of versions believed to be affected by the bug A list of specific versions known to be affected by the bug A list of specific versions where the bug is fixed (eg: Patch set versions / Patch set update versions etc. where the fix is included) Some categorization of the bug (to give an indication of product area / feature affected, type/s of symptom seen etc..) Hooks to indicate any key errors, parameters etc  which may be relevant Optional link to a more specific document about the bug Some indication of if an issue is particularly notable or not For port specific issues which platforms are affected A simple testcase,if available, to reproduce the issue as example of what to look for. Where Can I find the BugTags ? Bugtag data is propogated to customers via MOS Notes in several ways: Each bugtag is represented by a single MOS Note (document id = bugno.8)  Eg (Defect 16475397.8  / Enh 10248538.8) Certain reference notes include automatically maintained lists of bugs derived from bugtags inlined into them.  Eg: ORA-600/7445 reference notes include a "Known Bugs" section which is automatically populated using the bugtag data for bugtags marked with a HOOK for that ORA-600, etc.. ORA-600 [kclchkblkdma_3] [ID 468202.1] Certain notes have sections created from reports against the bugtag data. Eg: Init.ora Parameter "CURSOR_SHARING" Reference Note [ID 94036.1] Eg: 11.2.0.3 Patch Set Updates - List of Fixes in each PSU [ID 1449750.1] Who writes the BugTags ? Bugtags are created and maintained internally by Oracle Support. Among them, Me (Abel Macias). I specially like writing the ones for CBO and performance related bugs and fixes included in the Exadata Bundle Patches. What is the lifecycle of BugTag ? When a defect is fixed then it becomes eligible for making a bugtag. Support Analysts will pickup the task to make the bugtag note and posted in MOS. In the following events the note becomes elibile for an update (some updates are automatic, some are manual): Someone adds a comment to the Note.A new base version includes the fix in its code.   Lets assume the fix for bug 1 is scheduled to be included in a future version like 12.1.0.1. While the future version is not officially released the note will indicate 12g (future release)   When 12.1.0.1 is released and the fix is confirmed to be included in it then the note is updated indicating it is fixed in 12.1.0.1.A regression was introduced by this fix or has been superseded with another fix.A second defect filed is closed as duplicate of the defect and has new symptoms not observed before.   Like a defect may address one particular ora-600 but later a different ora-600 is identified that is fixed by the same fix.   Eg: Bug 10408668 - Assorted Dumps with Queries with distinct aggregates having subqueries as arguments [ID 10408668.8]

BugTags  What are Bugtags ? The bug database keeps dialogue and fix information for suspected product defects for various Oracle products.  Some bugs are true defects and get fixed in some code...

SQL Execution Debugging

how to identify "think time" in a SQL trace ?

There are multiple blogs posts, presentations and documentation about "think time" and "service time" and some tools that would translate that for you  but if you do not have those tools available or you need to understand a pattern of behavior from the client app, how can you identify "think time" in a SQL trace ? This case is of a customer which reported a performance problem from its JDBC connections running reports that are constantly timing out and needed to identify where in the stack the problem is.  From database point of view we got a 10046 level 12 trace (SQL Trace with Wait events and Bind variables) ===================== PARSING IN CURSOR #4... len=50 dep=0 uid=888 oct=3 lid=888 tim=1365524355043575  SELECT STATUS FROM <table> WHERE ID = :1 END OF STMT PARSE #4...:c=1000,e=1126,p=0,cr=2,cu=0,mis=1,r=0,dep=0,og=1,plh=0,tim=1365524355043574 BINDS #4...:  Bind#0   oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00   oacflg=01 fl2=1000000 frm=01 csi=873 siz=24 off=0   kxsbbbfp=2ab1d8538888  bln=22  avl=06  flg=05 value=123456789 EXEC #4...:c=2999,e=3049,p=0,cr=2,cu=0,mis=1,r=0,dep=0,og=1,plh=3860417426,tim=1365524355046681 WAIT #4...: nam='SQL*Net message to client' ela= 1 driver id=1952673792 #bytes=1 p3=0 obj#=-1 tim=1365524355046730 WAIT #4...: nam='SQL*Net message from client' ela= 274 driver id=1952673792 #bytes=1 p3=0 obj#=-1 tim=1365524355047023 WAIT #4...: nam='SQL*Net message to client' ela= 1 driver id=1952673792 #bytes=1 p3=0 obj#=-1 tim=1365524355047055 WAIT #4...: nam='SQL*Net message from client' ela= 259 driver id=1952673792 #bytes=1 p3=0 obj#=-1 tim=1365524355047330 WAIT #4...: nam='SQL*Net message to client' ela= 1 driver id=1952673792 #bytes=1 p3=0 obj#=-1 tim=1365524355047385 FETCH #4...:c=0,e=41,p=0,cr=4,cu=0,mis=0,r=1,dep=0,og=1,plh=3860417426,tim=1365524355047402 *** 2013-04-09 12:19:17.047 WAIT #4...: nam='SQL*Net message from client' ela= 2000161 driver id=1952673792 #bytes=1 p3=0 obj#=-1 tim=1365524357047634 WAIT #4...: nam='SQL*Net message to client' ela= 1 driver id=1952673792 #bytes=1 p3=0 obj#=-1 tim=1365524357047707 WAIT #4...: nam='SQL*Net message from client' ela= 258 driver id=1952673792 #bytes=1 p3=0 obj#=-1 tim=1365524357047973 BINDS #4...:  Bind#0   oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00   oacflg=01 fl2=1000000 frm=01 csi=873 siz=24 off=0   kxsbbbfp=2ab1d8538888  bln=22  avl=06  flg=05 value=123456789 EXEC #4...:c=0,e=58,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=3860417426,tim=1365524357048051 WAIT #4...: nam='SQL*Net message to client' ela= 1 driver id=1952673792 #bytes=1 p3=0 obj#=-1 tim=1365524357048086 FETCH #4...:c=0,e=26,p=0,cr=4,cu=0,mis=0,r=1,dep=0,og=1,plh=3860417426,tim=1365524357048098 *** 2013-04-09 12:19:19.054 WAIT #4...: nam='SQL*Net message from client' ela= 2006775 driver id=1952673792 #bytes=1 p3=0 obj#=-1 tim=1365524359054902 WAIT #4...: nam='SQL*Net message to client' ela= 1 driver id=1952673792 #bytes=1 p3=0 obj#=-1 tim=1365524359055015 WAIT #4...: nam='SQL*Net message from client' ela= 294 driver id=1952673792 #bytes=1 p3=0 obj#=-1 tim=1365524359055324 BINDS #4...:  Bind#0   oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00   oacflg=01 fl2=1000000 frm=01 csi=873 siz=24 off=0   kxsbbbfp=2ab1d8538888  bln=22  avl=06  flg=05 value=123456789 EXEC #4...:c=0,e=74,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=3860417426,tim=1365524359055435 WAIT #4...: nam='SQL*Net message to client' ela= 1 driver id=1952673792 #bytes=1 p3=0 obj#=-1 tim=1365524359055482 FETCH #4...:c=0,e=40,p=0,cr=4,cu=0,mis=0,r=1,dep=0,og=1,plh=3860417426,tim=1365524359055503 *** 2013-04-09 12:19:21.055 WAIT #4...: nam='SQL*Net message from client' ela= 2000065 driver id=1952673792 #bytes=1 p3=0 obj#=-1 tim=1365524361055603 WAIT #4...: nam='SQL*Net message to client' ela= 1 driver id=1952673792 #bytes=1 p3=0 obj#=-1 tim=1365524361055688 WAIT #4...: nam='SQL*Net message from client' ela= 267 driver id=1952673792 #bytes=1 p3=0 obj#=-1 tim=1365524361055967 BINDS #4...:  Bind#0   oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00   oacflg=01 fl2=1000000 frm=01 csi=873 siz=24 off=0   kxsbbbfp=2ab1d8538888  bln=22  avl=06  flg=05 value=123456789 ..... ..... Pattern repeats 30 times. ..... ..... WAIT #4...: nam='SQL*Net message to client' ela= 1 driver id=1952673792 #bytes=1 p3=0 obj#=-1 tim=1365524415383807 FETCH #4...:c=0,e=45,p=0,cr=4,cu=0,mis=0,r=1,dep=0,og=1,plh=3860417426,tim=1365524415383827 WAIT #4...: nam='SQL*Net message from client' ela= 282 driver id=1952673792 #bytes=1 p3=0 obj#=-1 tim=1365524415384145 WAIT #4...: nam='SQL*Net message to client' ela= 1 driver id=1952673792 #bytes=1 p3=0 obj#=-1 tim=1365524415384170 WAIT #4...: nam='SQL*Net message from client' ela= 317 driver id=1952673792 #bytes=1 p3=0 obj#=-1 tim=1365524415384497 The trace shows a query that is looking for the STATUS of a specific ID over an over again. In between each roundtrip there is a delay of 2 seconds (Notice the ela= 2000000+ in 'SQL*Net message from client') I asked the customer if he considered normal for the session to execute the same query with same binds over and over again. The answer was no. And there was neither an explanation about why there is a 2 seconds delay between calls. From the point of view of the database, the query was parsed, bound , executed and rows fetched in microseconds. After that, the Oracle Process sits idle waiting for the client software to issue the next statement and that came after 2 seconds. That is the application "think time" and it is reflected by the 'SQL*Net message from client' in the SQL trace. After some digging, the customer recognized that this was a session that comes from a JDBC program that was in a loop keeping taps on the execution of a report from another session and it was doing it by looking  at the STATUS of the report over and over after sleeping for 2 seconds. The objective of my post is to show how "think time" looks like but just to complete the root cause, Other traces show the reports were finishing in a timely manner but these "tracking" sessions were on the same connection pool as the reports. The connection pooling was not recognizing that these "tracking" sessions were "idle" for those 2 seconds so the connection was being considered active and not yielding it for other incoming report so effectively half the slots were used by useful work and half by tracking sessions. To make things worst, for some reason the "tracking" were not going away after the reports finished and new ones were being created as new reports came in so eventually the connection pool got exhausted and no work was being done.

There are multiple blogs posts, presentations and documentation about "think time" and "service time" and some tools that would translate that for you  but if you do not have those tools available or...