10g optimizer case study: Runtime Execution issues with View merging

Preface

After the 10g upgrade for our client’s 11i Application system, we were a little gung-ho, but the self-congratulations were short-lived as we discovered that a bulk load of custom concurrent programs were taking longer.

Aswath Rao, headshot aswath brighter who had solved many performance issues before using the HOTSOS tool, handled it smartly. In this article, we talk about how he approached the problem, and what else could have been looked into.

Essentially, after some SQL plan analysis, he found that the merging of some views was causing a poorly executing plan (which was probably looking good to the 10g optimizer – the stats were recent) to be selected and finally got around it by using a couple time tested and clever hints for avoiding merging. He had to try and test quite a bit, but avoiding the merging worked.

First thoughts..

When I first heard about it, my first thoughts went to an initialization variable called optimizer_secure_view_merging and when I searched on Google, I came across http://oracledb.wordpress.com/2007/04/10/execution-plans-differents-with-different-users/, which talks about something similar. Now, interestingly, if you run the bde_chk_cbo.sql script (obtainable from Metalink), it says that this parameter should be set to FALSE. The note 216205.1 – Mandatory parameters for 11i Apps corroborates the same. It was set to its expected value, in which Oracle does not use view merging or predicate move-around. Still, the view merging was definitely happening.

Where’s the bottleneck?

So there was this custom concurrent program called XXMRPSTKTRANS and its runtime parameters had suddenly changed (we had the history of previous concurrent programs), an sql  of the format INSERT INTO XXCI.XXMRP_STOCK_TRANSFER_RPT_ITEMS <followed by a select query> was taking approximately 97 % of the total concurrent request’s runtime.  The query used 967 million Logical IOs.

Lets look at the Hotsos profiling output:

Hotsos output1

skew graph

Lets look deeper..

From the sql traces, it could be seen that this query was dragging us down:

SELECT PLANS.COMPILE_DESIGNATOR,
ITEMS.ORGANIZATION_ID,
ITEMS.ORGANIZATION_CODE,
ITEMS.PLANNER_CODE,
ITEMS.INVENTORY_ITEM_ID,
ITEMS.DESCRIPTION, ITEMS.
FULL_LEAD_TIME, MSS.SAFETY_STOCK_QUANTITY, NULL
FROM MRP_PLANS_SC_V PLANS,
MRP_SYSTEM_ITEMS ITEMS,
MTL_CATEGORY_SETS CATSETS,
MTL_CATEGORIES CAT,
MTL_ITEM_CATEGORIES ITEMC,
(SELECT INVENTORY_ITEM_ID, ORGANIZATION_ID,
SAFETY_STOCK_QUANTITY,EFFECTIVITY_DATE
FROM MTL_SAFETY_STOCKS MSS1
WHERE EFFECTIVITY_DATE = (SELECT MAX(EFFECTIVITY_DATE)
FROM MTL_SAFETY_STOCKS
WHERE INVENTORY_ITEM_ID = MSS1.INVENTORY_ITEM_ID
AND ORGANIZATION_ID = MSS1.ORGANIZATION_ID
GROUP BY INVENTORY_ITEM_ID, ORGANIZATION_ID ) ) MSS
WHERE PLANS.COMPILE_DESIGNATOR = ITEMS.COMPILE_DESIGNATOR
AND PLANS.PLANNED_ORGANIZATION = ITEMS.ORGANIZATION_ID
AND CATSETS.CATEGORY_SET_ID = ITEMC.CATEGORY_SET_ID
AND CATSETS.STRUCTURE_ID = CAT.STRUCTURE_ID
AND CAT.CATEGORY_ID = ITEMC.CATEGORY_ID
AND ITEMC.INVENTORY_ITEM_ID = ITEMS.INVENTORY_ITEM_ID
AND ITEMC.ORGANIZATION_ID = ITEMS.ORGANIZATION_ID
AND ITEMS.INVENTORY_ITEM_ID = MSS.INVENTORY_ITEM_ID(+)
AND ITEMS.ORGANIZATION_ID = MSS.ORGANIZATION_ID(+)
AND CATSETS.CATEGORY_SET_NAME = :B8
AND ( (:B7 IS NOT NULL AND ITEMS.PLANNER_CODE = :B7 ) OR (:B7 IS NULL) )
AND PLANS.COMPILE_DESIGNATOR = :B6
AND ((:B5 IS NOT NULL AND LENGTH(:B5 ) < 4 AND ITEMS.ORGANIZATION_CODE = :B5 )
OR (:B5 IS NOT NULL AND LENGTH(:B5 ) > 3 AND ITEMS.ORGANIZATION_CODE IN
(SELECT LOOKUP_CODE FROM
FND_LOOKUP_VALUES WHERE
LOOKUP_TYPE = :B5 AND
LANGUAGE = 'US' ) )
OR (:B5 IS NULL ) )
AND ((:B4 IS NOT NULL AND CAT.SEGMENT1 = :B4 )OR (:B4 IS NULL))
AND ((:B3 IS NOT NULL AND CAT.SEGMENT2 = :B3 )OR (:B3 IS NULL))
AND ((:B2 IS NOT NULL AND CAT.SEGMENT3 = :B2 )OR (:B2 IS NULL))
AND ((:B1 IS NOT NULL AND CAT.SEGMENT4 = :B1 ) OR (:B1 IS NULL));

Playing with plans – Worst, Better, Best..

Having determined the sql on which the future tuning efforts must be focused on, lets dive into the different sql plans observed (with and without hints) and the accompanying logical I/O stats.



  • WITHOUT HINTS - 35 MILLION CONSISTENT GETS (Not good)



Lets talk about the out-of-the-box plan (no hints) first. Obviously, there is a lot of detail here and is difficult to be absorbed at a single glance, but the key thing to note here is that the cost is actually very low (103), so on paper this looks like a great plan. Sure, there are a couple Full table scans, but those tables are really small. The caveat with this plan is that its performance is really bad in real time due to high index scans, which further contribute to excessive CPU delays.

Default plan



  • ORDERED HINT - 15 MILLION CONSISTENT GETS



It got a little better with the ORDERED hint, although the plan cost went up a bit:

ordered hint plan



  • NO_MERGE(CATSETS) NO_MERGE(CAT) NO_MERGE (PLANS) - 26952 CONSISTENT GETS



Better than the ORDERED hint was the NO_MERGE hint for each of the views, which brought down the consistent gets to ~27,000, all the way from 35 million! Whoa, that was a drastic change. Note that the cost of this plan is 380, as compared to 1634 of the ORDERED plan’s (cost). Interesting observation.

no merge hint plan

Moral of the story?

Well, the NO_MERGE hint definitely made things better, when used on all the views or inline views in the query. The 10g optimizer was definitely having a challenge around this part. Also, while the overall compile time plan was showing lesser cost, the runtime performance was very skewed. 

I am also forced to think that if we had played with optimizer_index_cost_adj parameter a bit at the session level, maybe we could have made some indexes look more expensive and hence pushed them down the priority list for the optimizer. But since this idea did not get tested, its just a thought at this point (it wouldn’t have been a sustainable solution anyway, but I’ve seen it being effective at session level before and can be tried in unwieldy situations).

In retrospect, note that the MRP_PLANS_SC_V internally refers to three more views! This is probably what makes this sql plan a little more special.

SQL> select REFERENCED_NAME,REFERENCED_TYPE,REFERENCED_LINK_NAME,DEPENDENCY_TYPE from dba_dependencies where name like 'MRP_PLANS_SC_V' and REFERENCED_TYPE like 'VIEW';

REFERENCED_NAME           REFERENCED_TYPE           REFERENCED_LINK_NAME      DEPENDENCY_TYPE

------------------------- ------------------------- ------------------------- -------------------------


MFG_LOOKUPS               VIEW                                                HARD


MRP_DESIGNATORS_VIEW      VIEW                                                HARD


MRP_ORGANIZATIONS_V       VIEW                                                HARD

A digressive (?) discussion..

As this wasn’t the only sql tuning issue the customer encountered, this might as well be pointing to an innate issue with the 10g optimizer workload stats that are stored in the sys.aux_stats$ table. Although, the customer had gathered a representative workload sample of  system stats using DBMS_STATS.GATHER_SYSTEM_STATS(‘START’) and DBMS_STATS.GATHER_SYSTEM_STATS(‘STOP’), the 10g optimizer does not seem to be churning out realistic execution-wise sql plans.

An interesting face is that the customer’s production machine has 32 CPUs, 192G of RAM and a really fast I/O sub-system, so maybe the workload parameters estimated through the APIs were overly optimistic? Maybe during the sampling, the optimizer API said.. Wow, This is cool. I can really do a lot. It sounds like a possible cause, but one would argue that the oracle API would be smart enough to account for it. Interestingly, the values for CPUSPEEDNW (the parameter used without workload simulation) and CPUSPEED are not very far from each other.

Here is how the stats look right now:

SQL> select sname, pname, pval1 from sys.aux_stats$;

SNAME         PNAME      PVAL1

------------- ---------- ----------


SYSSTATS_INFO STATUS


SYSSTATS_INFO DSTART


SYSSTATS_INFO DSTOP


SYSSTATS_INFO FLAGS      0


SYSSTATS_MAIN CPUSPEEDNW 983.218164


SYSSTATS_MAIN IOSEEKTIM  10


SYSSTATS_MAIN IOTFRSPEED 4096


SYSSTATS_MAIN SREADTIM   3.866


SYSSTATS_MAIN MREADTIM   5.203


SYSSTATS_MAIN CPUSPEED   972


SYSSTATS_MAIN MBRC       7


SYSSTATS_MAIN MAXTHR     79357952


SYSSTATS_MAIN SLAVETHR   3072

13 rows selected.

One should be aware that these parameters can be set manually using DBMS_STATS.SET_SYSTEM_STATS API. While you do find some general information on the web, unfortunately, not a lot of specific case study material is available on how manually changing each parameter affects the optimizer behavior. I think this is also where most of us fail to understand the true potential of 10g optimizer, both before and after the upgrade.

I sincerely wish that someone authoritative from the product or server technologies support team comes up with informative material on them.

Comments:

Post a Comment:
  • HTML Syntax: NOT allowed
About

bocadmin_ww

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today