Monday Jul 22, 2013

Follow up on "Same Select..."

A follow up on the case of "Same SELECT used in an INSERT has different Execution Plan"

The query has the following predicate :

CH.CONTACTDATETIME > SYSDATE-5

The testcase I have has table,column,index statistics from 27-jun-2013.

If I run *today* the insert with the SYSDATE clause on my testcase with stats from 27-jun-2013 then I get a very poor cardinality estimation on access to CH because the stats reflect data there is little or no data present from after 27-jun-2013. This makes the CBO choose a poor execution plan, which is the one with only one DTP.
If in contrast I change the query to have :

CH.CONTACTDATETIME>to_date('22-JUN-13')

Then the cardinality estimation is closer to reality and the CBO chooses the better execution plan than has DISTINCT PLACEMENT transformation in step 7 and 15, where before it was only happening in step 7.

-----------------------------------------------------------------
Id|Operation                        |Name            |Rows |Cost
-----------------------------------------------------------------
 0|INSERT STATEMENT                |                 |     | 313K
 1|LOAD TABLE CONVENTIONAL         |                 |     |    
 2| HASH JOIN OUTER                |                 |   2 | 313K
 3|  VIEW                          |                 |   1 |2771
 4|   SORT UNIQUE                  |                 |   1 |2771
 5|    NESTED LOOPS                |                 |     |    
 6|     NESTED LOOPS               |                 |   1 |2770
 7|      VIEW                      |VW_DTP_BD4C95A1  | 524 |2238
 8|       SORT UNIQUE              |                 | 524 |2238
 9|        TABLE ACCESS FULL       |DL               | 636K|1776
10|      INDEX RANGE SCAN          |I_T              |   1 |   2
11|     TABLE ACCESS BY INDEX ROWID|T                |   1 |   2
12|  VIEW                          |                 |1940 | 310K
13|   SORT UNIQUE                  |                 |1940 | 310K
14|    HASH JOIN                   |                 |1940 | 310K
15|     VIEW                       |VW_DTP_22D53B8E  | 524 |2238
16|      SORT UNIQUE               |                 | 524 |2238
17|       TABLE ACCESS FULL        |DL               | 636K|1776
18|     PARTITION RANGE ITERATOR   |                 |1728K| 307K
19|      TABLE ACCESS FULL         |CH               |1728K| 307K
----------------------------------------------------------------- 


Since the query relies heavily on that SYSDATE predicate then there are a couple of possible options :

1) Use SQL Plan Management to fix the execution plan to have the DTP in step 7 and 15.
2) Before executing the insert, make sure the table CH has fresh statistics that reflect the latest data inserted in the table.
    For example, If this is done daily then you will have to do it daily.

Thursday Jul 18, 2013

Migrated Rows and PDML

Recently I have seen a raising trend of issues about chained rows so this post is to show a not so obvious side effect. 
I see no reason to explain what a chained row is and examples of them as in the internet there are very good postings that talk about them
Migrated Rows is just a special case of chained rows.
I am going to use a similar testcase as the article to show my point.
Similar table, just bigger columns as the standard block size nowdays is 8k.

EMB11203> cat setup.sql
drop table  row_mig_chain_demo;
CREATE TABLE row_mig_chain_demo (
  x int PRIMARY KEY,
  a CHAR(2000),
  b CHAR(2000),
  c CHAR(2000),
  d CHAR(2000),
  e CHAR(2000)
);
EMB11203> cat data.sql
var pk number;
begin
 select nvl(max(x),0) into :pk from row_mig_chain_demo;
end;
/
Prompt just in case I want to add more, I just re-execute the script.
INSERT INTO row_mig_chain_demo (x)
select rownum+:pk from dual connect by rownum <= 100;
COMMIT;
Prompt migrate the rows.
UPDATE row_mig_chain_demo SET a = rownum, b = rownum, c = rownum;
commit;
analyze table ROW_MIG_CHAIN_DEMO compute statistics;

The script creates 100 rows and migrates 99 of them

SQL> select  table_name,num_rows,blocks,CHAIN_CNT
  2  from user_tables where table_name='ROW_MIG_CHAIN_DEMO';
TABLE_NAME                       NUM_ROWS     BLOCKS  CHAIN_CNT
------------------------------ ---------- ---------- ----------
ROW_MIG_CHAIN_DEMO                    100        103         99


So, first interesting thing. Why I used ANALYZE instead of DBMS_STATS ?
Because DBMS_STATS does not collect CHAIN_CNT.
In May 2008 an internal bug was opened to request that DBMS_STATS collects CHAIN_CNT and it is still waiting to be implemented, so while I wait I'll use ANALYZE to collect it.
The CBO was enhanced in 10.2.0.3 to take CHAIN_CNT in its cost calculations so if you feel that it would be good for DBMS_STATS to implement the collection of CHAIN_CNT drop Optimizer Development a line 
Second, why 99 and not all were migrated ?
because the update on the first row updated only 3 columns with combined length of 6000 which still fits in the block.

The insert placed all the rows in the same block and the update migrated the rest.
The following query shows that the head piece is still in the same block.

set pagesize 200 lines 180
select x,DBMS_ROWID.ROWID_RELATIVE_FNO (rowid) , dbms_rowid.ROWID_BLOCK_NUMBER (rowid)
from row_mig_chain_demo
order by 2,3
         X DBMS_ROWID.ROWID_RELATIVE_FNO(ROWID) DBMS_ROWID.ROWID_BLOCK_NUMBER(ROWID)
---------- ------------------------------------ ------------------------------------
         1                                    4                               248213
         2                                    4                               248213
         3                                    4                               248213
...         
       100                                    4                               248213         

A block dump shows all but the first row to be migrated.

Start dump data blocks tsn: 4 file#:4 minblk 248213 maxblk 248215
Block dump from cache:
Dump of buffer cache at level 4 for tsn=4 rdba=17025429
BH (0x6a3884b8) file#: 4 rdba: 0x0103c995 (4/248213) class: 1 ba: 0x6bcc4000
...
===============
tsiz: 0x1f98
hsiz: 0xda
pbl: 0x2b017976ba64
     76543210
flag=--------
ntab=1
nrow=100                               <-- 100 rows.
...
block_row_dump:
tab 0, row 0, @0x5c1
tl: 6015 fb: --H-FL-- lb: 0x1  cc: 4   <-- Head, First and Last Piece
...
tab 0, row 1, @0x5b8
tl: 9 fb: --H----- lb: 0x1  cc: 0      <-- Only the head
nrid:  0x0103c996.0                    <-- Where is the next piece.
tab 0, row 2, @0x5af
tl: 9 fb: --H----- lb: 0x1  cc: 0
nrid:  0x0103c997.0
....
tab 0, row 99, @0x246
tl: 9 fb: --H----- lb: 0x1  cc: 0
nrid:  0x0103ed84.0
end_of_block_dump
Dump of buffer cache at level 4 for tsn=4 rdba=17025430
BH (0x6a30e928) file#: 4 rdba: 0x0103c996 (4/248214) class: 1 ba: 0x6b048000
...
===============
tsiz: 0x1f80
hsiz: 0x14
pbl: 0x2b017976ba7c
     76543210
flag=--------
ntab=1
nrow=1                                <-- 1 row
...
block_row_dump:
tab 0, row 0, @0x7fb
tl: 6021 fb: ----FL-- lb: 0x2  cc: 4  <-- First and Last Piece
hrid: 0x0103c995.1                    <-- Where is the head 


Everything is setup so now to experiment.
every where I looked in the internet everyone talked about queries so I thought a good opportunity to show the effect of the migrated rows over a DML.
So I will use a Serial UPDATE and a Parallel UPDATE.

EMB11203> cat sdml.sql
update row_mig_chain_demo
set a=rowid||'-'||to_char(sysdate,'yyyymmddhh24miss')
/
EMB11203> cat pdml.sql
alter session enable parallel dml;
alter session force parallel query;
update /*+ parallel */row_mig_chain_demo
set a=rowid||'-'||to_char(sysdate,'yyyymmddhh24miss')
/
EMB11203> cat d.sql
set pages 2000 lines 180
SELECT a.name, b.value value_QC
  FROM v$statname a, v$mystat b
 WHERE a.statistic# = b.statistic#
   AND lower(a.name) = 'table fetch continued row';
SELECT a.name, b.value value_system
  FROM v$statname a, v$sysstat b
 WHERE a.statistic# = b.statistic#
   AND lower(a.name) = 'table fetch continued row';

The update will modify only the first column of all 100 rows but since the column is CHAR there is not going to be any change in their allocated space.

First Test, the serial UDPATE

SQL> @d
NAME                                                               VALUE_QC
---------------------------------------------------------------- ----------
table fetch continued row                                                 0
NAME                                                             VALUE_SYSTEM
---------------------------------------------------------------- ------------
table fetch continued row                                                 482
SQL> @sdml
100 rows updated.
SQL> @d
NAME                                                               VALUE_QC
---------------------------------------------------------------- ----------
table fetch continued row                                                 0
NAME                                                             VALUE_SYSTEM
---------------------------------------------------------------- ------------
table fetch continued row                                                 482
SQL> commit;
Commit complete.
SQL> @d
NAME                                                               VALUE_QC
---------------------------------------------------------------- ----------
table fetch continued row                                                 0
NAME                                                             VALUE_SYSTEM
---------------------------------------------------------------- ------------
table fetch continued row                                                 482
Why did I query the statistics before and after the commit ? the answer will be evident in the PDML test. The execution plan shows a full scan and the expectation is that multiblock reads will be done and sure enough the 10046 shows it.

It is interesting that the first IO is a single block read but later all are multiblock.
So far , all that was promised is there, the full table scan is not affected by the migrated rows.

Now the Parallel Update

NAME                                                               VALUE_QC
---------------------------------------------------------------- ----------
table fetch continued row                                                 0

NAME                                                             VALUE_SYSTEM
---------------------------------------------------------------- ------------
table fetch continued row                                                 602
SQL> @pdml
Session altered.

Session altered.

100 rows updated.
SQL> @d
NAME                                                               VALUE_QC
---------------------------------------------------------------- ----------
table fetch continued row                                                 0

NAME                                                             VALUE_SYSTEM
---------------------------------------------------------------- ------------
table fetch continued row                                                 700
SQL> commit;
Commit complete.
SQL> @d
NAME                                                               VALUE_QC
---------------------------------------------------------------- ----------
table fetch continued row                                                98

NAME                                                             VALUE_SYSTEM
---------------------------------------------------------------- ------------
table fetch continued row                                                 700


To answer a previous question, why I get the statistics before and after the commit ?
The reason is that it appears the statistics of the slaves are not updated to the Query Coordinator (QC) until the transaction is over
but they are updated in the system-wide statistics and as it shows, 98 "table fetch continued row" (TFCR) operations were done.
98 is a strange number as there are 99 migrated rows.

The execution plan shows the Full Table Scan too so why the Parallel UPDATE did TFCR when the Serial did not ?

The explanation given by development is that some operations , in particular PDMLs, in order to avoid some deadlock scenarios the slaves need to do head piece scan while serial does a row piece scan.

Why do I care if it is a head piece scan vs row piece scan done ?

Because a Head piece scan reads the entire row, this means accessing all the blocks where the row pieces resides to get then all, and the worst part is that it has to be resolved immediately, cannot be deferred as the pieces are found by the multiblock read so the head piece scan does single block reads to look for the pieces.

The 10046 of the slaves shows those single block reads.

This is particulary bad in Exadata as customers like to have very wide tables (more than 255 columns which are automatically chained) and like to do PDML.

In Exadata system, the waitevent will be 'cell single block physical read'.

There is hope though, smart scan can cope with some cases of chained rows as long as the head and pieces are within the 1MB boundary of the IO but do not temp your luck and do your best to not produce chained or migrated rows, it is in everyones best interest.

Thursday Jul 11, 2013

More parses in 12c ?

More parses in 12.1 ?

Some customers upgrading from 11.2.0.3 to 12.1 might get alarmed because they see a spike in total number of parse count.
But not to worry, it is not a problem in 12.1. The problem is in 11.2.0.3.

Lets take the following example :

select a.name, b.value 
   from v$statname a, v$mystat b 
  where a.statistic# =  b.statistic# 
    and ( a.name like '%parse count%' 
       or a.name like 'session cursor cache%') 
    and value>0; 

 begin 
 for n in 1..100 loop 
 begin 
   execute immediate 'alter session set optimizer_goal=choose'; 
 exception 
  when others then 
  null; 
 end; 
 end loop; 
 end; 

 select a.name, b.value 
   from v$statname a, v$mystat b 
  where a.statistic# =  b.statistic# 
    and ( a.name like '%parse count%' 
       or a.name like 'session cursor cache%') 
    and value>0; 

I got the output of the queries and put them side by side in each version.

NAME                            11.1 VALUE   11.2 VALUE   12.1 VALUE
-----------------------------   ----------   ----------   ----------
session cursor cache hits                2            3            1
session cursor cache count               3           10           10
parse count (total)                     12           11           12
parse count (hard)                                    1   

PL/SQL procedure successfully completed.

NAME                            11.1 VALUE   11.2 VALUE   12.1 VALUE
-----------------------------   ----------   ----------   ----------
session cursor cache hits                3            3            1
session cursor cache count               4           11           11
parse count (total)                    114           13          114
parse count (hard)                     100          101          100
parse count (failures)                                           100


Notice the number of "parse count (total)" in 11.2 is less than "parse count (hard)" which makes no sense since "parse count (total)" should include Hard , Soft and Failed parses.

This might sound like an extreme case but there are other soft parsing scenarios that also reproduce this oddity.

For customers moving from 11.1 to 11.2.0.3 it might have looked as an improvement that there were less parses in total and then moving from 11.2.0.3 to 12.1 they would probably notice the spike and raise concerns.

So do not be alarmed, 12.1 does not do more parses, nor 11.2.0.3 does less parses, it is just a bug in the instrumentation of the stats, specifically :

Bug 13837105  statistics "parse count (total)" and "session cursor cache hits" miscounted [ID 13837105.8]

Tuesday Jul 02, 2013

Same SELECT used in an INSERT has different execution plan

A customer complained that a query and its INSERT counterpart had different execution plans, and of course, the INSERT was slower.

First lets look at the SELECT :

SELECT ua_tr_rundatetime, 
       ua_ch_treatmentcode, 
       ua_tr_treatmentcode, 
       ua_ch_cellid, 
       ua_tr_cellid 
FROM   (SELECT DISTINCT CH.treatmentcode AS UA_CH_TREATMENTCODE, 
                        CH.cellid        AS UA_CH_CELLID 
        FROM    CH, 
                DL 
        WHERE  CH.contactdatetime > SYSDATE - 5 
               AND CH.treatmentcode = DL.treatmentcode) CH_CELLS, 
       (SELECT DISTINCT T.treatmentcode AS UA_TR_TREATMENTCODE, 
                        T.cellid        AS UA_TR_CELLID, 
                        T.rundatetime   AS UA_TR_RUNDATETIME 
        FROM    T, 
                DL 
        WHERE  T.treatmentcode = DL.treatmentcode) TRT_CELLS 
WHERE  CH_CELLS.ua_ch_treatmentcode(+) = TRT_CELLS.ua_tr_treatmentcode; 



The query has 2 DISTINCT subqueries. 
The execution plan shows one with DISTICT Placement transformation applied and not the other.
The view in Step 5 has the prefix VW_DTP which means DISTINCT Placement.

--------------------------------------------------------------------
| Id  | Operation                    | Name            | Cost (%CPU)
--------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                 |   272K(100)
|*  1 |  HASH JOIN OUTER             |                 |   272K  (1)
|   2 |   VIEW                       |                 |  4408   (1)
|   3 |    HASH UNIQUE               |                 |  4408   (1)
|*  4 |     HASH JOIN                |                 |  4407   (1)
|   5 |      VIEW                    | VW_DTP_48BAF62C |  1660   (2)
|   6 |       HASH UNIQUE            |                 |  1660   (2)
|   7 |        TABLE ACCESS FULL     | DL              |  1644   (1)
|   8 |      TABLE ACCESS FULL       | T               |  2744   (1)
|   9 |   VIEW                       |                 |   267K  (1)
|  10 |    HASH UNIQUE               |                 |   267K  (1)
|* 11 |     HASH JOIN                |                 |   267K  (1)
|  12 |      PARTITION RANGE ITERATOR|                 |   266K  (1)
|* 13 |       TABLE ACCESS FULL      | CH              |   266K  (1)
|  14 |      TABLE ACCESS FULL       | DL              |  1644   (1)
--------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------

   1 - SEL$1
   2 - SEL$AF418D5F / TRT_CELLS@SEL$1
   3 - SEL$AF418D5F
   5 - SEL$F6AECEDE / VW_DTP_48BAF62C@SEL$48BAF62C
   6 - SEL$F6AECEDE
   7 - SEL$F6AECEDE / DL@SEL$3
   8 - SEL$AF418D5F / T@SEL$3
   9 - SEL$2        / CH_CELLS@SEL$1
  10 - SEL$2
  13 - SEL$2        / CH@SEL$2
  14 - SEL$2        / DL@SEL$2

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("CH_CELLS"."UA_CH_TREATMENTCODE"="TRT_CELLS"."UA_TR_TREATMENTCODE")
   4 - access("T"."TREATMENTCODE"="ITEM_1")
  11 - access("CH"."TREATMENTCODE"="DL"."TREATMENTCODE")
  13 - filter("CH"."CONTACTDATETIME">SYSDATE@!-5)


The outline shows PLACE_DISTINCT(@"SEL$3" "DL"@"SEL$3")
indicating that the QB3 is the one that got the transformation.

Outline Data
-------------

  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('11.2.0.3')
      DB_VERSION('11.2.0.3')
      ALL_ROWS
      OUTLINE_LEAF(@"SEL$2")
      OUTLINE_LEAF(@"SEL$F6AECEDE")
      OUTLINE_LEAF(@"SEL$AF418D5F")
PLACE_DISTINCT(@"SEL$3" "DL"@"SEL$3")
      OUTLINE_LEAF(@"SEL$1")
      OUTLINE(@"SEL$48BAF62C")
      OUTLINE(@"SEL$3")
      NO_ACCESS(@"SEL$1" "TRT_CELLS"@"SEL$1")
      NO_ACCESS(@"SEL$1" "CH_CELLS"@"SEL$1")
      LEADING(@"SEL$1" "TRT_CELLS"@"SEL$1" "CH_CELLS"@"SEL$1")
      USE_HASH(@"SEL$1" "CH_CELLS"@"SEL$1")
      FULL(@"SEL$2" "CH"@"SEL$2")
      FULL(@"SEL$2" "DL"@"SEL$2")
      LEADING(@"SEL$2" "CH"@"SEL$2" "DL"@"SEL$2")
      USE_HASH(@"SEL$2" "DL"@"SEL$2")
      USE_HASH_AGGREGATION(@"SEL$2")
      NO_ACCESS(@"SEL$AF418D5F" "VW_DTP_48BAF62C"@"SEL$48BAF62C")
      FULL(@"SEL$AF418D5F" "T"@"SEL$3")
      LEADING(@"SEL$AF418D5F" "VW_DTP_48BAF62C"@"SEL$48BAF62C" "T"@"SEL$3")
      USE_HASH(@"SEL$AF418D5F" "T"@"SEL$3")
      USE_HASH_AGGREGATION(@"SEL$AF418D5F")
      FULL(@"SEL$F6AECEDE" "DL"@"SEL$3")
      USE_HASH_AGGREGATION(@"SEL$F6AECEDE")
      END_OUTLINE_DATA
  */


The 10053 shows there is a comparative of cost with and without the transformation.
This means the transformation belongs to Cost-Based Query Transformations (CBQT).

In SEL$3 the optimization of the query block without the transformation is 6659.73
and with the transformation is 4408.41 so the transformation is kept.

GBP/DP: Checking validity of GBP/DP for query block SEL$3 (#3)
DP: Checking validity of distinct placement for query block SEL$3 (#3)
DP: Using search type: linear
DP: Considering distinct placement on query block SEL$3 (#3)
DP: Starting iteration 1, state space = (5) : (0)
DP: Original query
DP: Costing query block.
DP: Updated best state, Cost = 6659.73
DP: Starting iteration 2, state space = (5) : (1)
DP: Using DP transformation in this iteration.
DP: Transformed query
DP: Costing query block.
DP: Updated best state, Cost = 4408.41
DP: Doing DP on the original QB.
DP: Doing DP on the preserved QB.


In SEL$2 the cost without the transformation is less than with it so it is not kept.

GBP/DP: Checking validity of GBP/DP for query block SEL$2 (#2)
DP: Checking validity of distinct placement for query block SEL$2 (#2)
DP: Using search type: linear
DP: Considering distinct placement on query block SEL$2 (#2)
DP: Starting iteration 1, state space = (3) : (0)
DP: Original query
DP: Costing query block.
DP: Updated best state, Cost = 267936.93
DP: Starting iteration 2, state space = (3) : (1)
DP: Using DP transformation in this iteration.
DP: Transformed query
DP: Costing query block.
DP: Not update best state, Cost = 267951.66


To the same query an INSERT INTO is added and the result is a very different execution plan.

INSERT  INTO cc
               (ua_tr_rundatetime,
                ua_ch_treatmentcode,
                ua_tr_treatmentcode,
                ua_ch_cellid,
                ua_tr_cellid)
SELECT ua_tr_rundatetime,
       ua_ch_treatmentcode,
       ua_tr_treatmentcode,
       ua_ch_cellid,
       ua_tr_cellid
FROM   (SELECT DISTINCT CH.treatmentcode AS UA_CH_TREATMENTCODE,
                        CH.cellid        AS UA_CH_CELLID
        FROM    CH,
                DL
        WHERE  CH.contactdatetime > SYSDATE - 5
               AND CH.treatmentcode = DL.treatmentcode) CH_CELLS,
       (SELECT DISTINCT T.treatmentcode AS UA_TR_TREATMENTCODE,
                        T.cellid        AS UA_TR_CELLID,
                        T.rundatetime   AS UA_TR_RUNDATETIME
        FROM    T,
                DL
        WHERE  T.treatmentcode = DL.treatmentcode) TRT_CELLS
WHERE  CH_CELLS.ua_ch_treatmentcode(+) = TRT_CELLS.ua_tr_treatmentcode;

----------------------------------------------------------
| Id  | Operation                     | Name | Cost (%CPU)
----------------------------------------------------------
|   0 | INSERT STATEMENT              |      |   274K(100)
|   1 |  LOAD TABLE CONVENTIONAL      |      |            
|*  2 |   HASH JOIN OUTER             |      |   274K  (1)
|   3 |    VIEW                       |      |  6660   (1)
|   4 |     SORT UNIQUE               |      |  6660   (1)
|*  5 |      HASH JOIN                |      |  6659   (1)
|   6 |       TABLE ACCESS FULL       | DL   |  1644   (1)
|   7 |       TABLE ACCESS FULL       | T    |  2744   (1)
|   8 |    VIEW                       |      |   267K  (1)
|   9 |     SORT UNIQUE               |      |   267K  (1)
|* 10 |      HASH JOIN                |      |   267K  (1)
|  11 |       PARTITION RANGE ITERATOR|      |   266K  (1)
|* 12 |        TABLE ACCESS FULL      | CH   |   266K  (1)
|  13 |       TABLE ACCESS FULL       | DL   |  1644   (1)
----------------------------------------------------------
Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
   1 - SEL$1
   3 - SEL$3 / TRT_CELLS@SEL$1
   4 - SEL$3
   6 - SEL$3 / DL@SEL$3
   7 - SEL$3 / T@SEL$3
   8 - SEL$2 / CH_CELLS@SEL$1
   9 - SEL$2
  12 - SEL$2 / CH@SEL$2
  13 - SEL$2 / DL@SEL$2

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("CH_CELLS"."UA_CH_TREATMENTCODE"="TRT_CELLS"."UA_TR_TREATMENTCODE")
   5 - access("T"."TREATMENTCODE"="DL"."TREATMENTCODE")
  10 - access("CH"."TREATMENTCODE"="DL"."TREATMENTCODE")
  12 - filter("CH"."CONTACTDATETIME">SYSDATE@!-5)

Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('11.2.0.3')
      DB_VERSION('11.2.0.3')
      ALL_ROWS
      OUTLINE_LEAF(@"SEL$2")
      OUTLINE_LEAF(@"SEL$3")
      OUTLINE_LEAF(@"SEL$1")
      OUTLINE_LEAF(@"INS$1")
      FULL(@"INS$1" "CC"@"INS$1")
      NO_ACCESS(@"SEL$1" "TRT_CELLS"@"SEL$1")
      NO_ACCESS(@"SEL$1" "CH_CELLS"@"SEL$1")
      LEADING(@"SEL$1" "TRT_CELLS"@"SEL$1" "CH_CELLS"@"SEL$1")
      USE_HASH(@"SEL$1" "CH_CELLS"@"SEL$1")
      FULL(@"SEL$2" "CH"@"SEL$2")
      FULL(@"SEL$2" "DL"@"SEL$2")
      LEADING(@"SEL$2" "CH"@"SEL$2" "DL"@"SEL$2")
      USE_HASH(@"SEL$2" "DL"@"SEL$2")
      USE_HASH_AGGREGATION(@"SEL$2")
      FULL(@"SEL$3" "DL"@"SEL$3")
      FULL(@"SEL$3" "T"@"SEL$3")
      LEADING(@"SEL$3" "DL"@"SEL$3" "T"@"SEL$3")
      USE_HASH(@"SEL$3" "T"@"SEL$3")
      USE_HASH_AGGREGATION(@"SEL$3")
      END_OUTLINE_DATA
  */

There is no DISTINCT Placement view and no hint.
The 10053 trace shows a new legend "DP: Bypassed: Not SELECT"
implying that this is a transformation that it is possible only for SELECTs.

GBP/DP: Checking validity of GBP/DP for query block SEL$3 (#4) DP: Checking validity of distinct placement for query block SEL$3 (#4) DP: Bypassed: Not SELECT. GBP/DP: Checking validity of GBP/DP for query block SEL$2 (#3) DP: Checking validity of distinct placement for query block SEL$2 (#3) DP: Bypassed: Not SELECT.

In 12.1 (and hopefully in 11.2.0.4 when released) the restriction on applying CBQT to some DMLs and DDLs (like CTAS) is lifted.
This is documented in BugTag Note:10013899.8 Allow CBQT for some DML / DDL
And interestingly enough, it is possible to have a one-off patch in 11.2.0.3.

SQL> select DESCRIPTION,OPTIMIZER_FEATURE_ENABLE,IS_DEFAULT  
  2  from v$system_fix_control where BUGNO='10013899';
DESCRIPTION
----------------------------------------------------------------
OPTIMIZER_FEATURE_ENABLE  IS_DEFAULT
------------------------- ----------
enable some transformations for DDL and DML statements
11.2.0.4                           1


Tuesday May 21, 2013

Diagnosis of a High Version Count


In the past, I have participated in many situations where I had to debug the mechanisms that generate large number of child cursors and learned that ,more often than not ,a cursor is not shared for many reasons and those may change from version to version so I wrote a tool to help debug it.

I see pertinent first explaining a little what child cursors are and what a high version count is.

A cursor is the instance of query/dml in the library cache.
A hash key to locate the cursor in the library cache is calculated based on the text of the statement.
But if the text is the only criteria used to identify the cursor in the library cache what happens if the cursor is not compatible with the conditions that execute them ?
For example, how can the same cursor be used when the same query is executed by 2 users with different privileges or with different language or even different objects?

It creates a version of the cursor with those unique conditions that make it possible to execute.
A version of a cursor is also known as child cursor. So if there is a child, is there a parent ?
Yes, the parent cursor is the root where all those child cursors are linked.

In order for a session to find the correct version to execute, it attempts to match the query text in the library cache using the hash key generated by the text to the parent cursor and once found it scans the list of versions (or children) one by one to match which has the correct set of conditions and if it does not find one then it creates a new version and adds it to the list.

For example, lets say I have 3 schemas, A, B ,C and a table T in schema A and a public synonym for T.
Lets say further that A grants access on T to B.

What happens if I run the query : "SELECT * FROM T" from each schema ?

if the query is ran by A, "T" points to the schema object "A"."T" and the query runs ok. A version is created that indicates that a session connected with schema "A" can access "A"."T".
if the query is ran by B, "T" points to public synonym "T" and that points to "A"."T" and because A granted access on T to B then the query runs ok. A version is created that indicates that a session connected with schema "B" can access "A"."T".
if the query us ran by C, "T" points to public synonym "T" and that points to "A"."T" and but because A has not granted access on T to C then the query fails and no version is created.

If now, "A" grants access on "T" to "C" and runs the query again, then "T" is resolved again to to public synonym "T" and that points to "A"."T" and since it now has privileges the query runs ok and a version is created that indicates that a session connected with schema "C" can access "A"."T".
If now, "A" revokes access on "T" to "B", then the version associated with schema "B" is invalidated, to let any session that has a pointer to that child cursor that it is no longer valid.

If the query is ran by "A" but on a client that has German NLS settings and all the previous ones were done in English then a new version is created for German NLS settings.

I can easily continue the examples as there are over 50 different reasons why a child needs to be created.
These are tracked in V$SQL_SHARED_CURSOR.

There are reasons related to Binds, privileges, languages, parallel execution, optimization, and more,
and reasons are added, removed or change in meaning with each version.

Even if in theory there could be 50+ combinations of reasons and each reason may have "n" number of variables that can make it non share-able to anyone but who has those same conditions , the number of versions usually remains low. The norm is that most cursors will have less than 100 versions but there are times when hundreds or thousands of versions are made and their management cause significant performance degradation in the system.

As each reason is related to different features in the database there is not one way to debug a high version count situation.
For example, in order to diagnose when the reason for not sharing is language, it might be necessary to know what languanges are involved.
If the reason is binds, it might be necessary to know how many, what datatype, and what size are the binds.
and so on.

For a more in depth explanation see : Troubleshooting: High Version Count Issues [ID 296377.1]

After working several of these high version count cases I came to identify some common diagnostics , their presentation and their interpretation so I made a diagnostic tool to collect them.

Roudtrips between support and the customer have always been annoying and riddled with mistakes and omitions so the tool has the objective to collect as much relevant information as possible to only the reasons identified.

The tool is VERSION_RPT and can be found in : High SQL Version Counts - Script to determine reason(s) [ID:438755.1]

The tool adapts to the number of reasons in each version so it works from 9.2 to 11gR2 and I expect to work too on 12c once released.
The tool can be used to diagnose one or many cursors, including pseudocursors, using SQL_ID or HASH_VALUE.

The output of the report shows a summary of the reasons and information about the factors that each reason may take into consideration to decide to share or not, like Optimizer parameters when the reason is  OPTIMIZER_MISMATCH.

These reasons are searchable in MOS. Searching for the phrase :

"Cursor not shared due to USER_BIND_PEEK_MISMATCH"

Can found BugTag Notes like Bug 8981059 - High Version Count (due to USER_BIND_PEEK_MISMATCH) with bind peeking [ID 8981059.8]

To collect data for a group of SQLs it can be done


SELECT b.*
FROM v$sqlarea a ,
     TABLE(version_rpt(a.sql_id)) b
WHERE loaded_versions >=100;

Here is an example of its use for one cursor:

===============

SQL> set pages 2000 lines 180
SQL> SELECT * FROM TABLE(version_rpt('8swypbbr0m372'));

Version Count Report Version 3.2.1 -- Today's Date 02-may-12 11:50

RDBMS Version :11.2.0.3.0 Host: XXXXXXXXXXXXXXXXXXXX Instance 1 : XXXXXXXXXX
==================================================================
Addr: 00000003D8B6A570  Hash_Value: 3993603298  SQL_ID 8swypbbr0m372
Sharable_Mem: 208913 bytes   Parses: 477
Stmt:
0 select order#,columns,types from access$ where d_obj#=:1
1

Versions Summary
----------------
OPTIMIZER_MISMATCH :10
BIND_MISMATCH :1
LANGUAGE_MISMATCH :1
OPTIMIZER_MODE_MISMATCH :3
PX_MISMATCH :2
Total Versions:10
~
Plan Hash Value Summary
-----------------------
Plan Hash Value Count
=============== =====
      872636971 11
              0 1
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Details for OPTIMIZER_MISMATCH :

1 versions with is_recur_flags = 167
2 versions with is_recur_flags = 161
9 versions with _pga_max_size = 614400 KB
7 versions with optimizer_mode = choose
3 versions with is_recur_flags = 35
7 versions with hash_area_size = 10485760
1 versions with optimizer_mode = rule
2 versions with advanced_queuing_internal_cursor = 1
2 versions with is_recur_flags = 163
1 versions with parallel_execution_enabled = false
1 versions with _pga_max_size = 204800 KB
10 versions with _optimizer_order_by_elimination_enabled = false
2 versions with is_recur_flags = 39
1 versions with hash_area_size = 2000000
1 versions with sort_area_size = 1000000
8 versions with _db_file_optimizer_read_count = 128
10 versions with _optimizer_cost_based_transformation = on
1 versions with parallel_threads_per_cpu = 4
1 versions with parallel_max_degree = 16
1 versions with _parallel_syspls_obey_force = false
~~~~~~~~~~~~~~~~~~~~~~~~~~~

Details for BIND_MISMATCH :

Consolidated details for :
BIND_MISMATCH,USER_BIND_PEEK_MISMATCH,BIND_UACS_DIFF and
BIND_EQUIV_FAILURE (Mislabled as ROW_LEVEL_SEC_MISMATCH BY bug 6964441 in 11gR1)

from v$sql_bind_capture
COUNT(*) POSITION MIN(MAX_LENGTH) MAX(MAX_LENGTH) DATATYPE (PRECISION,SCALE)
======== ======== =============== =============== ======== ================
      11        1              22              22        2 (,)

SUM(DECODE(column,Y, 1, 0) FROM V$SQL
IS_OBSOLETE IS_BIND_SENSITIVE IS_BIND_AWARE IS_SHAREABLE
=========== ================= ============= ============
          0                 0             0            9
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Details for LANGUAGE_MISMATCH :

No details available
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Details for OPTIMIZER_MODE_MISMATCH :

1 versions with RULE
9 versions with CHOOSE
2 versions with ALL_ROWS
~~~~~~~~~~~~~~~~~~~~~~~~~

Details for PX_MISMATCH :

No details available

####

To further debug Ask Oracle Support for the appropiate level LLL.

alter session set events 'immediate trace name cursortrace address 3993603298, level LLL';

To turn it off do use address 1, level 2147483648
================================================================

84 rows selected.


Monday Apr 29, 2013

Parse and Parse again.

Since the first release of JDBC and event longer ago with Pro* Precompilers to this day I still see the following issue :

Please notice below in this abstract of a 30 min AWR the number of Parse Calls vs Executions, the SQL Module and in the case of the first SQL, the SQL Text too. 

Parse Calls Executions % Total Parses SQL Id SQL Module SQL Text
339,567 339,566 15.93 0mr7azgm9psws JDBC Thin Client SELECT COUNT(*) FROM DUAL
101,574 101,880 4.77 5uta56d6syjv3 JDBC Thin Client select TRIM(emp_fname || ' ' |...
46,800 46,799 2.20 6vw7kzhpb1dm3 JDBC Thin Client select "STATUS_ID" as "status...
35,509 35,509 1.67 648yjr5s1w7uc JDBC Thin Client SELECT "PROTOCOL_GROUP_ID" as...
31,292 32,540 1.47 0jvyqkxaytndr bvlab.exe select crf.CRF_LABEL from COMP...

Since the introduction of the shared pool in Oracle7 it was available the capability for clients to have a SQL Statement Cache in order to do the famous "Parse once, Execute Many" best practice. This feature is controlled on the RDBMS by the SESSION_CACHED_CURSORS parameter and this makes many people believe that the feature is managed at the RDBMS side alone but this far from the truth.

When this feature is not used then the client software parses the statement for each execution does not matter if the parameter is set in the database.
Many take this feature for granted because SQL*Plus uses it, so they assume it is true for other client software but it is a feature that has to be explicitly managed by the client software and works in conjunction with the RDBMS when the parameter is set.

In JDBC it is necessary to explicitly enable it and has an implicit setImplicitCachingEnabled and an explicit setExplicitCachingEnabled way to manage the cache

It is a little easier with the precompilers by setting are precompile time the HOLD_CURSOR and RELEASE_CURSOR parameters.

Take a moment to look at your own AWR or Statspack and verify your application is using the Client Statement Cache as intended.

Now the cherry of the sundae is the SQL "SELECT COUNT(*) FROM DUAL".

Over the years I've seen variations of queries over DUAL coming from JDBC in large quantities. I have been given the explanation from Customers and Developers that these queries are used as a heartbeat, to confirm the connection to the database is stil open.

I can understand that was necessary when jdbc and connection pooling was a young technology but isn't it anything better nowadays that will do the same thing without parsing and parsing the same useless statement without end ?

I would like to hear from people about what other techniques/features are used that can avoid the need to parse a bogus SQL to the database. 

swing out sister - windmills of your mind

Tuesday Apr 16, 2013

Disabled Fixes ... Hidden gems of the RDBMS

When making changes to the RDBMS,the developers of Oracle are always wondering if the new change will introduce a regression.
Every change is checked against a suite of regressions tests to try to trap if the change introduced a regression and fix it before it is released to our clients.
Just in case that change does introduce a regression the developers add parameters and events to disable them. These are the workarounds used when a defect is encounter.
But sometimes the developers are overly cautious and decide that a change may be better to be introduced in a more controlled manner and decide to leave it disabled and only very few people would know how to enable them.
Another reason is that these changes are introduced to help one specific customer and there is no eviedence to support it might help or hurt someone else.

These disabled changes are still tested for regressions anyway.

With the introduction of FIX CONTROL in 10.2.0.2 there is now discover-able information about these disabled fixes.

SQL> desc v$session_fix_control
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 SESSION_ID                                         NUMBER
 BUGNO                                              NUMBER
 VALUE                                              NUMBER
 SQL_FEATURE                                        VARCHAR2(64)
 DESCRIPTION                                        VARCHAR2(64)
 OPTIMIZER_FEATURE_ENABLE                           VARCHAR2(25)
 EVENT                                              NUMBER
 IS_DEFAULT                                         NUMBER

SQL> select value,count(*) from v$system_fix_control group by value order by value;
     VALUE   COUNT(*)  -- 11.2.0.3
---------- ----------
         0         47  <-- Disabled
         1        600  <-- Enabled
         3          4  ] 
         7          1  ]__ Configurable
        15          1  ]
        20          1  ]

So in 11.2.0.3 there are 47 fixes that dormant, waiting for an application to take advantage of them.

Lets see an example :

Bug 6708183 - CBO cannot get good cardinality estimate for TABLE() functions [ID 6708183.8]
Introduced in 11.1.0.7 and 11.2.0.1.

This fix allows for a limited version of dynamic sampling to look inside a Collection Table in an effort to get a better cardinality estimation.
By default the CBO makes a guess that the collection has as many rows of 100 bytes that can fit in 100 block and that usually translates to 8168 rows.

There are exception when this fix is enabled where still cardinality is not better estimated and are corrected in posterior fixes or are still to be corrected but all of them them depend on fix 6708183 to be enabled first.

The fix is enabled by "_fix_control" = '6708183:ON' or by adding a DYNAMIC_SAMPLING hint in the query.

If you wish to know more about one particular disabled (or enabled) fix, try its bugtag first.
if it does not have a bugtag let me know and I'll see if one can be made.

Friday Apr 12, 2013

The Best Kept Secret in Support That is Not Suppose to be a Secret

BugTags 

What are Bugtags ?

The bug database keeps dialogue and fix information for suspected product defects for various Oracle products. 
Some bugs are true defects and get fixed in some code line, sometimes with detailed information about the defect and sometimes with very little comment.
Some times they are enhancements to features or new features altogehter, also with or without details.

A bugtag is a separate entity which can be created for any fixed defect or enhancement and includes summary information and categorizations of the defect
or enhacement.
A bugtag is an optional extension to a fixed defect or added enhancement.

See : Introduction to "Bug Description" Articles [ID 245840.1]) for more details.

What products BugTags cover ?

Currently a bugtags are only created for server products, and mostly only the RDBMS.

What information is in a BugTag ?

A bugtag include attributes such as:
  • Bug number that the bugtag describes
  • A summary title (intended to be a customer friendly bug title)
  • A short description of the bug issue (a few paragraphs at most)
  • A range of versions believed to be affected by the bug
  • A list of specific versions known to be affected by the bug
  • A list of specific versions where the bug is fixed (eg: Patch set versions / Patch set update versions etc. where the fix is included)
  • Some categorization of the bug (to give an indication of product area / feature affected, type/s of symptom seen etc..)
  • Hooks to indicate any key errors, parameters etc  which may be relevant
  • Optional link to a more specific document about the bug
  • Some indication of if an issue is particularly notable or not
  • For port specific issues which platforms are affected
  • A simple testcase,if available, to reproduce the issue as example of what to look for.

Where Can I find the BugTags ?

Bugtag data is propogated to customers via MOS Notes in several ways:

  • Each bugtag is represented by a single MOS Note (document id = bugno.8) 
Eg (Defect 16475397.8  / Enh 10248538.8)
  • Certain reference notes include automatically maintained lists of bugs derived from bugtags inlined into them. 
Eg: ORA-600/7445 reference notes include a "Known Bugs" section which is automatically populated using the bugtag data for bugtags marked with a HOOK for that ORA-600, etc..
ORA-600 [kclchkblkdma_3] [ID 468202.1]
  • Certain notes have sections created from reports against the bugtag data.
Eg: Init.ora Parameter "CURSOR_SHARING" Reference Note [ID 94036.1]
Eg: 11.2.0.3 Patch Set Updates - List of Fixes in each PSU [ID 1449750.1]

Who writes the BugTags ?

Bugtags are created and maintained internally by Oracle Support.
Among them, Me (Abel Macias). I specially like writing the ones for CBO and performance related bugs and fixes included in the Exadata Bundle Patches.

What is the lifecycle of BugTag ?

When a defect is fixed then it becomes eligible for making a bugtag.
Support Analysts will pickup the task to make the bugtag note and posted in MOS.
In the following events the note becomes elibile for an update (some updates are automatic, some are manual):
  • Someone adds a comment to the Note.
  • A new base version includes the fix in its code.
  Lets assume the fix for bug 1 is scheduled to be included in a future version like 12.1.0.1. While the future version is not officially released the note will indicate 12g (future release)
  •   When 12.1.0.1 is released and the fix is confirmed to be included in it then the note is updated indicating it is fixed in 12.1.0.1.
  • A regression was introduced by this fix or has been superseded with another fix.
  • A second defect filed is closed as duplicate of the defect and has new symptoms not observed before.
  Like a defect may address one particular ora-600 but later a different ora-600 is identified that is fixed by the same fix.
  Eg: Bug 10408668 - Assorted Dumps with Queries with distinct aggregates having subqueries as arguments [ID 10408668.8]

Thursday Apr 11, 2013

how to identify "think time" in a SQL trace ?

There are multiple blogs posts, presentations and documentation about "think time" and "service time" and some tools that would translate that for you 
but if you do not have those tools available or you need to understand a pattern of behavior from the client app, how can you identify "think time" in a SQL trace ?

This case is of a customer which reported a performance problem from its JDBC connections running reports that are constantly timing out and needed to identify where in the stack the problem is. 

From database point of view we got a 10046 level 12 trace (SQL Trace with Wait events and Bind variables)


=====================
PARSING IN CURSOR #4... len=50 dep=0 uid=888 oct=3 lid=888 tim=1365524355043575 
SELECT STATUS FROM <table> WHERE ID = :1
END OF STMT
PARSE #4...:c=1000,e=1126,p=0,cr=2,cu=0,mis=1,r=0,dep=0,og=1,plh=0,tim=1365524355043574
BINDS #4...:
 Bind#0
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=01 fl2=1000000 frm=01 csi=873 siz=24 off=0
  kxsbbbfp=2ab1d8538888  bln=22  avl=06  flg=05
value=123456789
EXEC #4...:c=2999,e=3049,p=0,cr=2,cu=0,mis=1,r=0,dep=0,og=1,plh=3860417426,tim=1365524355046681
WAIT #4...: nam='SQL*Net message to client' ela= 1 driver id=1952673792 #bytes=1 p3=0 obj#=-1 tim=1365524355046730
WAIT #4...: nam='SQL*Net message from client' ela= 274 driver id=1952673792 #bytes=1 p3=0 obj#=-1 tim=1365524355047023
WAIT #4...: nam='SQL*Net message to client' ela= 1 driver id=1952673792 #bytes=1 p3=0 obj#=-1 tim=1365524355047055
WAIT #4...: nam='SQL*Net message from client' ela= 259 driver id=1952673792 #bytes=1 p3=0 obj#=-1 tim=1365524355047330
WAIT #4...: nam='SQL*Net message to client' ela= 1 driver id=1952673792 #bytes=1 p3=0 obj#=-1 tim=1365524355047385
FETCH #4...:c=0,e=41,p=0,cr=4,cu=0,mis=0,r=1,dep=0,og=1,plh=3860417426,tim=1365524355047402

*** 2013-04-09 12:19:17.047
WAIT #4...: nam='SQL*Net message from client' ela= 2000161 driver id=1952673792 #bytes=1 p3=0 obj#=-1 tim=1365524357047634
WAIT #4...: nam='SQL*Net message to client' ela= 1 driver id=1952673792 #bytes=1 p3=0 obj#=-1 tim=1365524357047707
WAIT #4...: nam='SQL*Net message from client' ela= 258 driver id=1952673792 #bytes=1 p3=0 obj#=-1 tim=1365524357047973
BINDS #4...:
 Bind#0
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=01 fl2=1000000 frm=01 csi=873 siz=24 off=0
  kxsbbbfp=2ab1d8538888  bln=22  avl=06  flg=05
value=123456789
EXEC #4...:c=0,e=58,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=3860417426,tim=1365524357048051
WAIT #4...: nam='SQL*Net message to client' ela= 1 driver id=1952673792 #bytes=1 p3=0 obj#=-1 tim=1365524357048086
FETCH #4...:c=0,e=26,p=0,cr=4,cu=0,mis=0,r=1,dep=0,og=1,plh=3860417426,tim=1365524357048098

*** 2013-04-09 12:19:19.054
WAIT #4...: nam='SQL*Net message from client' ela= 2006775 driver id=1952673792 #bytes=1 p3=0 obj#=-1 tim=1365524359054902
WAIT #4...: nam='SQL*Net message to client' ela= 1 driver id=1952673792 #bytes=1 p3=0 obj#=-1 tim=1365524359055015
WAIT #4...: nam='SQL*Net message from client' ela= 294 driver id=1952673792 #bytes=1 p3=0 obj#=-1 tim=1365524359055324
BINDS #4...:
 Bind#0
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=01 fl2=1000000 frm=01 csi=873 siz=24 off=0
  kxsbbbfp=2ab1d8538888  bln=22  avl=06  flg=05
value=123456789
EXEC #4...:c=0,e=74,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=3860417426,tim=1365524359055435
WAIT #4...: nam='SQL*Net message to client' ela= 1 driver id=1952673792 #bytes=1 p3=0 obj#=-1 tim=1365524359055482
FETCH #4...:c=0,e=40,p=0,cr=4,cu=0,mis=0,r=1,dep=0,og=1,plh=3860417426,tim=1365524359055503

*** 2013-04-09 12:19:21.055
WAIT #4...: nam='SQL*Net message from client' ela= 2000065 driver id=1952673792 #bytes=1 p3=0 obj#=-1 tim=1365524361055603
WAIT #4...: nam='SQL*Net message to client' ela= 1 driver id=1952673792 #bytes=1 p3=0 obj#=-1 tim=1365524361055688
WAIT #4...: nam='SQL*Net message from client' ela= 267 driver id=1952673792 #bytes=1 p3=0 obj#=-1 tim=1365524361055967
BINDS #4...:
 Bind#0
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=01 fl2=1000000 frm=01 csi=873 siz=24 off=0
  kxsbbbfp=2ab1d8538888  bln=22  avl=06  flg=05
value=123456789
.....
.....
Pattern repeats 30 times.
.....
.....
WAIT #4...: nam='SQL*Net message to client' ela= 1 driver id=1952673792 #bytes=1 p3=0 obj#=-1 tim=1365524415383807
FETCH #4...:c=0,e=45,p=0,cr=4,cu=0,mis=0,r=1,dep=0,og=1,plh=3860417426,tim=1365524415383827
WAIT #4...: nam='SQL*Net message from client' ela= 282 driver id=1952673792 #bytes=1 p3=0 obj#=-1 tim=1365524415384145
WAIT #4...: nam='SQL*Net message to client' ela= 1 driver id=1952673792 #bytes=1 p3=0 obj#=-1 tim=1365524415384170
WAIT #4...: nam='SQL*Net message from client' ela= 317 driver id=1952673792 #bytes=1 p3=0 obj#=-1 tim=1365524415384497


The trace shows a query that is looking for the STATUS of a specific ID over an over again.
In between each roundtrip there is a delay of 2 seconds (Notice the ela= 2000000+ in 'SQL*Net message from client')

I asked the customer if he considered normal for the session to execute the same query with same binds over and over again.
The answer was no. And there was neither an explanation about why there is a 2 seconds delay between calls.

From the point of view of the database, the query was parsed, bound , executed and rows fetched in microseconds.
After that, the Oracle Process sits idle waiting for the client software to issue the next statement and that came after 2 seconds.

That is the application "think time" and it is reflected by the 'SQL*Net message from client' in the SQL trace.

After some digging, the customer recognized that this was a session that comes from a JDBC program that was
in a loop keeping taps on the execution of a report from another session and it was doing it by looking 
at the STATUS of the report over and over after sleeping for 2 seconds.

The objective of my post is to show how "think time" looks like but just to complete the root cause,

Other traces show the reports were finishing in a timely manner but these "tracking" sessions were on the same connection pool as the reports.
The connection pooling was not recognizing that these "tracking" sessions were "idle" for those 2 seconds so the connection was being considered active and not yielding it for other incoming report so effectively half the slots were used by useful work and half by tracking sessions.

To make things worst, for some reason the "tracking" were not going away after the reports finished and new ones were being created as new reports came in so eventually the connection pool got exhausted and no work was being done.



About

I am Abel Macias email, an Oracle support engineer that specialized in Performance that belongs to Exadata Support.
Disclaimer This blog looks to broadcast my experiences while diagnosing performance problems on Oracle's RDBMS, mostly on Exadata.
The views expressed on this blog are my own and do not necessarily reflect the views of Oracle.

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today