Thursday Feb 10, 2011

New "How do I ..." series

Over the last year or so the Optimizer development team has presented at a number of conferences and we got a lot of questions that start with "How do I ...". Where people were looking for a specific command or set of steps to fix a problem they had encountered. So we thought it would be a good idea to create a series of small posts that deal with these "How do I" question directly. We will use a simple example each time, that shows exactly what commands and procedures should be used to address a given problem. If you have an interesting "How do I .." question you would like to see us answer on the blog please email me and we will do our best to answer them! Watch out for the first post in this series which addresses the problem of "How do I deal with a third party application that has embedded hints that result in a sub-optimal execution plan in my environment?"

Friday Jan 28, 2011

Concurrent Statistics Gathering

Gathering optimizer statistics is one of life's necessary evils even if it can take an extremely long time to complete. In this blog post, we discuss one remedy to improve the efficiency of statistics gathering.[Read More]

Monday Jan 03, 2011

Optimizer Transformation: Join Predicate Pushdown

Happy New Year to all of our readers! We hope you all had a great holiday season. We start the new year by continuing our series on Optimizer transformations. This time it is the turn of Predicate Pushdown. I would like to thank Rafi Ahmed for the content of this blog.

Normally,a view cannot be joined with an index-based nested loop (i.e., index access) join, since a view, in contrast with a base table, does not have an index defined on it. A view can only be joined with other tables using three methods: hash, nested loop, and sort-merge joins.[Read More]

Thursday Nov 04, 2010

Optimizer Transformations: Star Transformation

Star transformation was introduced in Oracle 8i to process star queries efficiently. These queries are commonly used in data warehouse applications that follow the Star Schema data model. The Star Schema is so called because the data model diagram resembles a star. The center of the star consists of one or more fact tables and the points of the star are the dimension tables.

star_model.PNG
The basic idea of this transformation is to steer clear of using a full table scan access method on large tables, referred to as fact tables in the Star Schema. In a typical star query, the fact table is joined to several much smaller dimension tables. The fact table typically contains one key (referred to as foreign key) for every dimension table as well as a number of measure columns such as sales amount. The corresponding key in the dimension table is referred to as the primary key. The join is based on a foreign key of the fact table with the corresponding primary key of the dimension table. The query also contains filter predicates on other columns of the dimension tables that typically are very restrictive. The combination of these filters help to dramatically reduce the data set processed from the fact table.  The goal of star transformation is to access only this reduced set of data from the fact table.


Consider the following star query Q1. The query is to find the total sales amount in all cities in California for quarters Q1 and Q2 of year 1999 through the Internet.

Q1:

SELECT c.cust_city, t.calendar_quarter_desc, SUM(s.amount_sold) sales_amount

FROM sales s, times t, customers c, channels ch

WHERE s.time_id = t.time_id

AND s.cust_id = c.cust_id

AND s.channel_id = ch.channel_id

AND c.cust_state_province = 'CA'

AND ch.channel_desc = 'Internet'

  AND t.calendar_quarter_desc IN ('1999-01','1999-02')

GROUP BY c.cust_city, t.calendar_quarter_desc;

Sales is the fact table while the other tables are considered as dimension tables. The Sales table contains one row for every sale of a product and thus it may contain billions of sales records. However only a few of them are sold to customers in California through the Internet for the specified quarters. The query is transformed into Q2.

Q2:

SELECT c.cust_city, t.calendar_quarter_desc, SUM(s.amount_sold) sales_amount

FROM sales s, times t, customers c

WHERE s.time_id = t.time_id

  AND s.cust_id = c.cust_id

  AND c.cust_state_province = 'CA'

  AND t.calendar_quarter_desc IN ('1999-01','1999-02')

  AND s.time_id IN (SELECT time_id
                            FROM times 
                            WHERE calendar_quarter_desc IN('1999-01','1999-02'))

  AND s.cust_id IN (SELECT cust_id
                           FROM customers
                           WHERE cust_state_province='CA')

  AND s.channel_id IN (SELECT channel_id  
                                 FROM channels 
                                 WHERE channel_desc = 'Internet')

GROUP BY c.cust_city, t.calendar_quarter_desc;


Star transformation is essentially about adding subquery predicates corresponding to the constraint dimensions. These subquery predicates are referred to as bitmap semi-join predicates. The transformation is performed when there are indexes on the fact join columns (s.timeid, s.custid...). By driving bitmap AND and OR operations (bitmaps can be from bitmap indexes or generated from regular B-Tree indexes) of the key values supplied by the subqueries, only the relevant rows from the fact table need to be retrieved.  If the filters on the dimension tables filter out a lot of data, this can be much more efficient than a full table scan on the fact table.  After the relevant rows have been retrieved from the fact table, they may need to be joined back to the dimension tables, using the original predicates. In some cases, the join back can be eliminated. We will discuss this situation later. 

Table 1 shows the query plan for the transformed query. Note that the sales table has a bitmap access path instead of a full table scan. For each key value coming from the subqueries (lines 11, 16, 21), the bitmaps are retrieved from the fact table indexes (lines 12, 17, 22).  Each bit in the bitmap corresponds to a row in fact table. The bit is set if the key value from the subquery is same as the value in the row of fact table.  For example, the bitmap [1][0][1][0][0][0]...(all 0s for remaining rows) indicate that rows 1 and 3 of fact table has matching key value from subquery.  Lets say the above bitmap is for a key value from customers table subquery.

The operations in lines 9, 14, 19 iterates over the keys from the subqueries and get the corresponding bitmaps. Lets say the customers subquery produces one more key value with the  bitmap [0][1][0][0][0][0]...

The bitmaps for each subquery are merged (ORed) (lines 8, 13 and 18). In the above example, it will produce a single bitmap [1][1][1][0][0][0]... for customers subquery after merging the two bitmaps.

The merged bitmaps are ANDed (line 7). Lets say the bitmap from channels is [1][0][0][0][0][0]...  If you AND this bitmap with the bitmap from customers subquery it will produce [1][0][0][0][0]...

The corresponding rowids of the final bitmap are generated (line 6).  The fact table rows are retrieved using the rowids (line 5).  In the above example, it will generate only 1 rowid corresponding to the first row and fetches only a single row instead of scanning the entire fact table.

The representation of bitmaps in the above example is for illustration purpose only. In oracle, they are represented and stored in a compressed form.


Table 1: The plan of the transformed query

Id

Operation

Name

0

SELECT STATEMENT

1

HASH GROUP BY

2

HASH JOIN

3

HASH JOIN

4

PARTITION RANGE SUBQUERY

5

TABLE ACCESS BY LOCAL INDEX ROWID

SALES

6

BITMAP CONVERSION TO ROWIDS

7

BITMAP AND

8

BITMAP MERGE

9

BITMAP KEY ITERATION

10

BUFFER SORT

11

TABLE ACCESS FULL

CHANNELS

12

BITMAP INDEX RANGE SCAN

SALES_CHANNEL_BIX

13

BITMAP MERGE

14

BITMAP KEY ITERATION

15

BUFFER SORT

16

TABLE ACCESS FULL

TIMES

17

BITMAP INDEX RANGE SCAN

SALES_TIME_BIX

18

BITMAP MERGE

19

BITMAP KEY ITERATION

20

BUFFER SORT

21

TABLE ACCESS FULL

CUSTOMERS

22

BITMAP INDEX RANGE SCAN

SALES_CUST_BIX

23

TABLE ACCESS FULL

CUSTOMERS

24

TABLE ACCESS FULL

TIMES


Join back elimination

The subqueries and their bitmap tree only filter the fact table based on the dimension filters, so it may still be necessary to join to the dimension table.  The join back of the dimension table is eliminated when all the predicates on dimension tables are part of the semijoin subquery predicate, the column(s) selected from the subquery are unique and the dimension columns are not in select list, group by etc. In the above example, the table channels is not joined back to the sales table since it is not referenced outside and channel_id is unique.

Temporary table transformation

If the join back is not eliminated, Oracle stores the results of the subquery in a temporary table to avoid re-scanning the dimension table (for bitmap key generation and join back). In addition to this, the results are materialized if the query is run in parallel, so that each slave can select the results from the temporary tables instead of executing the subquery again.

For example, if Oracle materializes the results of the subquery on customers into a temporary table, the transformed query Q3 will be as follows.

Q3:
SELECT t1.c1 cust_city, t.calendar_quarter_desc calendar_quarter_desc, 
              sum(s.amount_sold) sales_amount

FROM sales s, sh.times t, sys_temp_0fd9d6621_e7e24 t1

WHERE s.time_id=t.time_id

   AND s.cust_id=t1.c0

   AND (t.calendar_quarter_desc='1999-q1' OR t.calendar_quarter_desc='1999-q2')

   AND s.cust_id IN (SELECT  t1.c0 FROM sys_temp_0fd9d6621_e7e24 t1)

   AND s.channel_id IN (SELECT  ch.channel_id 
                                     FROM channels ch
                                     WHERE ch.channel_desc='internet')

  AND s.time_id IN (SELECT t.time_id
                               FROM times t
                               WHERE t.calendar_quarter_desc='1999-q1'
                                   OR t.calendar_quarter_desc='1999-q2')

GROUP BY t1.c1,  t.calendar_quarter_desc


Note that customers is replaced by the temporary table sys_temp_0fd9d6621_e7e24 and references to columns cust_id and cust_city are replaced by the corresponding columns of the temporary table. The temporary table will be created with 2 columns -  (c0 number, c1 varchar2(30)). These columns corresponds to cust_id and cust_city of customers table. The table will be populated using the following query Q4 at the beginning of the execution of the statement Q3.

Q4:
SELECT c.cust_id, c.cust_city FROM customers WHERE c.cust_state_province = 'CA'

Table 2 shows the plan for the transformed query.

Table 2: Plan with temporary table transformation


0

SELECT STATEMENT



1

TEMP TABLE TRANSFORMATION



2

LOAD AS SELECT

sys_temp_0fd9d6621_e7e24


3

TABLE ACCESS FULL

CUSTOMERS



4

HASH GROUP BY



5

HASH JOIN



6

HASH JOIN



7

PARTITION RANGE SUBQUERY



8

TABLE ACCESS BY LOCAL INDEX ROWID

SALES



9

BITMAP CONVERSION TO ROWIDS



10

BITMAP AND



11

BITMAP MERGE



12

BITMAP KEY ITERATION



13

BUFFER SORT



14

TABLE ACCESS FULL

CHANNELS



15

BITMAP INDEX RANGE SCAN

SALESCHANNELBIX



16

BITMAP MERGE



17

BITMAP KEY ITERATION



18

BUFFER SORT



19

TABLE ACCESS FULL

TIMES



20

BITMAP INDEX RANGE SCAN

SALESTIMEBIX



21

BITMAP MERGE



22

BITMAP KEY ITERATION



23

BUFFER SORT



24

TABLE ACCESS FULL

sys_temp_0fd9d6621_e7e24



25

BITMAP INDEX RANGE SCAN

SALESCUSTBIX



26

TABLE ACCESS FULL

sys_temp_0fd9d6621_e7e24



27

TABLE ACCESS FULL

TIMES


The lines 1,2 and 3 of the plan materialize the customers subquery into the temporary table. In line 24,  it scans the temporary table (instead of the subquery) to build the bitmap from the fact table. Line 26 is for scanning the temporary table for joining back instead of scanning customers. Note that the filter on customers is not needed to be applied on the temporary table since the filter is already applied while materializing the temporary table.


Enabling the transformation

Star transformation is controlled by the star_transformation_enabled parameter.  The parameter takes 3 values.

  • TRUE - The Oracle optimizer performs transformation by identifying fact and constraint dimension tables automatically. This is done in a cost-based manner, i.e. the transformation is performed only if the cost of the transformed plan is lower than the non-transformed plan. Also the optimizer will attempt temporary table transformation automatically whenever materialization improves performance.

  • FALSE - The transformation is not tried.

  • TEMP_DISABLE - This value has similar behavior as TRUE except that temporary table transformation is not tried.

The default value of the parameter is FALSE. You have to change the parameter value and create indexes on the joining columns of the fact table to take advantage of this transformation.


Summary
Star transformation improves the performance of queries with a very big fact table joined to multiple dimension tables when the dimension tables have very selective predicates. The transformation avoids the full scan of  the fact table. It fetches only relevant rows from the fact table that will eventually join to the constraint dimension rows. The transformation is performed based on cost - only when the cost of the transformed plan is lower than that of the non-transformed plan.  If the dimension filters do not significantly reduce the amount of data to be retrieved from the fact table, then a full table scan is more efficient.

In this post we have tried to illustrate the basic ideas behind star transformation by showing  simple example queries and plans. Oracle can do star transformation in more complex cases. For example, a query with multiple fact tables, snowflakes (dimension is a join of several normalized tables instead of denormalized single table), etc.

[Read More]

Friday Oct 29, 2010

Optimizer Transformations: View Merging part 2

This is the second of two posts on view merging. See the first post for the basics of view merging, how it works for simple select-project-join views and views appearing in outer joins, and why one of these views might not be merged. In this post we'll cover complex view merging.

Complex View Merging

We use the term "complex view merging" to describe merging of group by and distinct views. Like simple view merging, this allows the optimizer to consider additional join orders and access paths. In addition, the evaluation of the group-by/distinct operation can be delayed until after the joins have been evaluated. Delayed evaluation of group-by can make performance better or worse depending on the characteristics of the data. Delaying a group-by until after joins can result in a reduction in the data set on which the group-by operation is to be performed, if joins are filtering; on the other hand, early group-by can reduce the amount of data to be processed by subsequent joins or the joins could explode the amount of data to undergo group-by. The same is true for distinct operations. Because it is not always better to merge such a view, we choose whether to use this transformation in a cost-based manner. The two options - with and without view merging - are each costed by the optimizer, and we choose to merge the view only if it is cheaper to do so.

Consider the following group by view and query which refers to it:
create view cust_prod_totals_v as
select sum(s.quantity_sold) total, s.cust_id, s.prod_id
from sales s
group by s.cust_id, s.prod_id;

select c.cust_id, c.cust_first_name, c.cust_last_name, c.cust_email
from customers c, products p, cust_prod_totals_v
where c.country_id = 'US'
and c.cust_id = cust_prod_totals_v.cust_id
and cust_prod_totals_v.total > 100
and cust_prod_totals_v.prod_id = p.prod_id
and p.prod_name = 'T3 Faux Fur-Trimmed Sweater';
This query finds all of the customers from the US who have bought at least 100 of a particular item. The view is eligible for complex view merging. After merging, the query looks like this:
select c.cust_id, cust_first_name, cust_last_name, cust_email
from customers c, products p, sales s
where c.country_id = 'US'
and c.cust_id = s.cust_id
and s.prod_id = p.prod_id
and p.prod_name = 'T3 Faux Fur-Trimmed Sweater'
group by s.cust_id, s.prod_id, p.rowid, c.rowid,
c.cust_email, c.cust_last_name, c.cust_first_name, c.cust_id
having sum(s.quantity_sold) > 100;
The transformed query is cheaper than the untransformed query, so the optimizer chooses to merge the view. Why is the transformed query cheaper? In the untransformed query, the group by operator applies to the entire sales table in the view. In the transformed query, the joins to products and customers (especially products) filter out a large portion of the rows from the sales table, so the group by operation is much cheaper. The join is more expensive because the sales table has not been reduced, but it is not that much more expensive, since the group-by operation does not reduce the data size that much in the original query. If any of these characteristics were to change, it may no longer be cheaper to merge the view. Hence the need for a cost-based decision. The final plan is as follows:
--------------------------------------------------------
| Id  | Operation             | Name      | Cost (%CPU)|
--------------------------------------------------------
|   0 | SELECT STATEMENT      |           |  2101  (18)|
|*  1 |  FILTER               |           |            |
|   2 |   HASH GROUP BY       |           |  2101  (18)|
|*  3 |    HASH JOIN          |           |  2099  (18)|
|*  4 |     HASH JOIN         |           |  1801  (19)|
|*  5 |      TABLE ACCESS FULL| PRODUCTS  |    96   (5)|
|   6 |      TABLE ACCESS FULL| SALES     |  1620  (15)|
|*  7 |     TABLE ACCESS FULL | CUSTOMERS |   296  (11)|
--------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(SUM("QUANTITY_SOLD")>100)
3 - access("C"."CUST_ID"="CUST_ID")
4 - access("PROD_ID"="P"."PROD_ID")
5 - filter("P"."PROD_NAME"='T3 Faux Fur-Trimmed Sweater')
7 - filter("C"."COUNTRY_ID"='US')
There is no view in the plan above, which is what one would expect after the view has been merged. However, there are some cases where a view will still appear in the plan even after view merging, with a name like VW_NWVW_1. We'll discuss the reasons why in a moment, but first let's look at an example. This also gives us a chance to look at an example of distinct view merging. Consider this query to find customers in the US that bought a particular product:
select c.cust_id, c.cust_first_name, c.cust_last_name, c.cust_email
from customers c, products p,
(select distinct s.cust_id, s.prod_id
from sales s) cust_prod_v
where c.country_id = 'US'
and c.cust_id = cust_prod_v.cust_id
and cust_prod_v.prod_id = p.prod_id
and p.prod_name = 'T3 Faux Fur-Trimmed Sweater';
The view can be merged, though it is based on cost, since the reduction in data due to distinct may make the join cheaper. In this case, however, it is cheaper to merge the view, so we get this equivalent query:
select nwvw.cust_id, nwvw.cust_first_name, nwvw.cust_last_name, nwvw.cust_email
from (select distinct c.rowid, p.rowid, s.prod_id, s.cust_id, c.cust_id,
c.cust_first_name, c.cust_last_name, c.cujst_email
from customers c, products p, sales s
where c.country_id = 'US'
and c.cust_id = s.cust_id
and s.prod_id = p.prod_id
and p.prod_name = 'T3 Faux Fur-Trimmed Sweater') nwvw;
and this plan:
-------------------------------------------
| Id  | Operation             | Name      |
-------------------------------------------
|   0 | SELECT STATEMENT      |           |
|   1 |  VIEW                 | VM_NWVW_1 |
|   2 |   HASH UNIQUE         |           |
|*  3 |    HASH JOIN          |           |
|*  4 |     HASH JOIN         |           |
|*  5 |      TABLE ACCESS FULL| PRODUCTS  |
|   6 |      TABLE ACCESS FULL| SALES     |
|*  7 |     TABLE ACCESS FULL | CUSTOMERS |
-------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
  3 - access("C"."CUST_ID"="S"."CUST_ID")
  4 - access("S"."PROD_ID"="P"."PROD_ID")
  5 - filter("P"."PROD_NAME"='T3 Faux Fur-Trimmed Sweater')
  7 - filter("C"."COUNTRY_ID"='US')
So why do we still have a view after we've supposedly merged the view? The new view is what we call a "projection view". When we merge the view, we move the distinct to the outer query block. But when we move the distinct, we have to add several additional columns, in order to maintain semantic equivalence with the original query. So we put all of that into a new view, so we can select out just the columns we want in the outer query block's select list. But we still get all of the benefits we promised from merging the view -- all of the tables are in one query block and the optimizer is free to permute them as it desires in the final join order, and the distinct operation has been delayed until after all of the joins are completed. These projection views appear in queries where a distinct view has been merged, or a group by view is merged into an outer query block which also contains group by, having, and/or aggregates. In the latter case, the projection view contains the group by, having, and aggregates from the original outer query block.

Now that this great mystery has been revealed, let's look at some of the reasons a group by or distinct view might not be merged. Aside from cost, there are several other reasons, including:
  • The outer query tables do not have a rowid or unique column
  • View appears in a connect by query block
  • View contains grouping sets, rollup, pivot
  • View or outer query block contains spreadsheet clause

Summary
View merging can improve plans by allowing additional join orders, access methods, and other transformations to be considered. In cases where view merging should always lead to a better plan, Oracle automatically merges a view; in other cases, this is determined based on cost. There are many reasons why a view may not be merged, including cost or validity restrictions. Note that view merging that is rejected on the basis of cost or heuristics can be overridden with hints; but view merging that is rejected based on validity may not.

Wednesday Oct 13, 2010

Optimizer Transformations: View Merging part 1

We continue our series on query transformations with a two-part discussion of view merging. In these posts, we will review the Oracle terminology related to view merging, explain the different types of view merging, and discuss the reasons that a view might not be merged. The examples in these posts use the Oracle sample schemas.
We use the term view to describe a sub-query block appearing in the FROM clause. Oracle can merge several different types of views:
  1. Simple view merging, for simple select-project-join views.
  2. Outer-join view merging for outer-joined views.
  3. Complex view merging, for distinct and group by views.
In today's post, we will discuss the first two. We''ll discuss complex view merging in the next post.
Simple View Merging
Consider a simple query with a view:
select e.first_name, e.last_name, dept_locs_v.street_address, dept_locs_v.postal_code
from employees e,
     (select d.department_id, d.department_name, l.street_address, l.postal_code
      from departments d, locations l
      where d.location_id = l.location_id) dept_locs_v
where dept_locs_v.department_id = e.department_id
and e.last_name = 'Smith';
The query joins the employees table with a view that returns the street address for each department. The view is itself a join of two tables. The query can be executed by joining departments and locations to produce the rows of the view, and then joining that result to employees. Because the query contains the view (V), the join orders that the optimizer can consider are constrained to the following:
E, V
V, E
Within the view, two join orders are considered:
D, L
L, D
So in combination, there are only four possible join orders for this form of the query. The join methods are also constrained; the index-based nested loops join is not feasible for the join order [E, V], since there is no index on the column from the view. Without view merging, the optimizer chooses the following plan:
-----------------------------------------------------------------
| Id  | Operation                    | Name        | Cost (%CPU)|
-----------------------------------------------------------------
|   0 | SELECT STATEMENT             |             |     7  (15)|
|*  1 |  HASH JOIN                   |             |     7  (15)|
|   2 |   TABLE ACCESS BY INDEX ROWID| EMPLOYEES   |     2   (0)|
|*  3 |    INDEX RANGE SCAN          | EMP_NAME_IX |     1   (0)|
|   4 |   VIEW                       |             |     5  (20)|
|*  5 |    HASH JOIN                 |             |     5  (20)|
|   6 |     TABLE ACCESS FULL        | LOCATIONS   |     2   (0)|
|   7 |     TABLE ACCESS FULL        | DEPARTMENTS |     2   (0)|
-----------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("DEPT_LOCS_V"."DEPARTMENT_ID"="E"."DEPARTMENT_ID")
3 - access("E"."LAST_NAME"='Smith')
5 - access("D"."LOCATION_ID"="L"."LOCATION_ID")
View merging merges the tables from the view into the outer query block, removing the view query block. After view merging, the query looks like this:
select e.first_name, e.last_name, l.street_address, l.postal_code
from employees e, departments d, locations l
where d.location_id = l.location_id
and d.department_id = e.department_id
and e.last_name = 'Smith';
Now that all three tables appear in one query block, the optimizer is not constrained by what join orders it can consider (there are a total of 6), and the joins to employees and departments can be index-based. The following plan is chosen with view merging:
-------------------------------------------------------------------
| Id  | Operation                      | Name        | Cost (%CPU)|
-------------------------------------------------------------------
|   0 | SELECT STATEMENT               |             |     4   (0)|
|   1 |  NESTED LOOPS                  |             |            |
|   2 |   NESTED LOOPS                 |             |     4   (0)|
|   3 |    NESTED LOOPS                |             |     3   (0)|
|   4 |     TABLE ACCESS BY INDEX ROWID| EMPLOYEES   |     2   (0)|
|*  5 |      INDEX RANGE SCAN          | EMP_NAME_IX |     1   (0)|
|   6 |     TABLE ACCESS BY INDEX ROWID| DEPARTMENTS |     1   (0)|
|*  7 |      INDEX UNIQUE SCAN         | DEPT_ID_PK  |     0   (0)|
|*  8 |    INDEX UNIQUE SCAN           | LOC_ID_PK   |     0   (0)|
|   9 |   TABLE ACCESS BY INDEX ROWID  | LOCATIONS   |     1   (0)|
-------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
 5 - access("E"."LAST_NAME"='Smith')
 7 - access("E"."DEPARTMENT_ID"="D"."DEPARTMENT_ID")
 8 - access("D"."LOCATION_ID"="L"."LOCATION_ID")
Oracle uses the term "simple" to refer to select-project-join views. The example above used simple view merging to select the better plan. Such views are automatically merged if it is legal to do so, since it is generally the case that the merged view will result in a plan that is at least as good as the unmerged view would. With the additional join orders and access paths available after a view has been merged, view merging can frequently result in a much better plan. View merging can also allow other transformations to take place; for instance, a table inside of the view may allow a table outside of the view to be join eliminated after the view has been merged and both tables reside in one query block.

There are several reasons why a select-project-join view might not be merged, typically because it is not semantically valid to do so. Some of the reasons a view may not be valid for simple view merging are listed below.
  • The view contains constructs other than select, project, join, including:
    • Group by
    • Distinct
    • Outer-join
    • Spreadsheet clause
    • Connect by
    • Set operators
    • Aggregation
  • The view appears on the right side of a semi- or anti-join.
  • The view contains subqueries in the select list.
  • The outer query block contains PL/SQL functions.
Note that some of these constructs do not disallow view merging in all queries, but depend on additional validity constraints.
Outer Join View Merging
If a view is involved in an outer join with tables from the outer query block or if the view contains outer-joined tables, there are many additional restrictions on whether it is valid to merge the view. After view merging, it must be possible to express the query in terms of Oracle outer join syntax. This imposes one significant restriction on views on the left of an outer join: each table from the outer query block can be outer-joined to at most one underlying table of the view. For instance, it is currently not possible to merge the view in this query:
select e1.first_name||' '||e1.last_name emp_name, dept_managers_v.manager_name,            
      dept_managers_v.department_name
from employees e1,
    (select e2.manager_id, e2.first_name||' '||e2.last_name as manager_name,
            d.department_id, d.department_name
     from departments d, employees e2
     where d.manager_id = e2.employee_id) dept_managers_v
where dept_managers_v.department_id = e1.department_id(+)
and dept_managers_v.manager_id = e1.manager_id(+);
If the view were merged, it would result in table e1 being outer joined to two tables, which is not legal in Oracle outer join. But the view in the following query can be merged:
select e1.first_name||' '||e1.last_name emp_name, dept_managers_v.manager_name,
      dept_managers_v.department_name
from employees e1,
    (select e2.manager_id, e2.first_name||' '||e2.last_name as manager_name,
            d.department_id, d.department_name
     from departments d, employees e2
     where d.manager_id = e2.employee_id) dept_managers_v
where dept_managers_v.department_id = e1.department_id(+);
The merged form of the query looks like this:
select e1.first_name||' '||e1.last_name emp_name,
      e2.first_name||' '||e2.last_name as manager_name,
      d.department_name
from employees e1, employees e2, departments d
where d.manager_id = e2.employee_id
and d.department_id = e1.department_id(+);
This allows the optimizer to consider additional join orders and access paths like we discussed earlier.
If a view appears on the right of an outer join, the view can be merged only if it contains a single table in the from-clause (which can be a table or another view). If a view contains more than one table, the semantics of the query require the join between those two tables to occur before the outer join. There are additional restrictions on merging of views participating in an outer join, but these are the most common reasons for merging of outer joined views to not be valid.
Summary
In this post we covered the basics of view merging, how it works for simple select-project-join views and views appearing in outer joins, and why one of these views might not be merged. In the next two weeks, we'll finish up the topic with a discussion of complex view merging, and we'll finally reveal the reason for one of the great mysteries of view merging - the VW_NWVW_* view!

Monday Sep 27, 2010

Optimizer Transformations: Subquery Unesting part 2

In Part one of our blog series on Optimizer transformations we discussed Subquery unesting. We explained how subqueries are evaluated, and the unnesting of EXISTS and ANY subqueries.

Here in part two of our series we will discuss the unnesting of NOT EXISTS and single-row subqueries, which appear in the WHERE clause. Once again I need to give credit to Rafi one of the senior Optimizer developers for the content of this blog!

[Read More]

Tuesday Sep 21, 2010

Update on Oracle Open World

Thanks to all those who stopped by the demoground yesterday to chat with the Optimizer developers and to check out what is new in Oracle Database 11g. If you haven't had a chance to stop by yet the developers will be there all day today and tomorrow.

To day at Open World we will be giving two sessions on the Optimizer. We will start at 11 am with the Explain the Explain plan session (session S316955 Moscone South/Rm 104), where we will examines the different aspects of an execution plan, from selectivity to parallel execution, and explains what information you should be gleaning from the plan. Then on this afternoon at 3:30 pm  we will build on what you learned in the morning session when we present Top tips for getting optimal SQL execution all the time (session S317019 Moscone South/Rm 103). This session will show you how to identify and resolving the most common SQL execution performance problems, such as poor cardinality estimations, bind peeking issues, and selecting the wrong access method.  

Looking forward to seeing you at our sessions or at the demoground!

Wednesday Sep 15, 2010

Optimizer Transformations: Subquery Unnesting part 1

We have received a ton of requests for more information on Oracle Optimizer Transformations so we thought we would put together a series of blog posts describing the most commonly used transformations. You should expect to see two blog posts a month on this over the next quarter.

Before we begin we should explain what we mean by transformation? When a SQL statement is parsed, the Optimizer will try to "transform" or rewrite the SQL statement into a semantically equivalent SQL statement that can be processed more efficiently. We first discussed the concepts of transformations in a blog post in June 2008 called why are some of tables in my query missing in my plan.

This original post dealt with just one of the new transformations introduced in Oracle Database 11g called table elimination. Part one of our new series will deal with subquery unnesting. I must give credit to Rafi, one of the senior Optimizer developers, who provided the content for this topic.

[Read More]

Monday Aug 16, 2010

Dynamic sampling and its impact on the Optimizer

Dynamic sampling (DS) was introduced in Oracle Database 9i Release 2 to improve the optimizer's ability to generate good execution plans. The most common misconception is that DS can be used as a substitute for optimizer statistics. The goal of DS is to augment the optimizer statistics; it is used when regular statistics are not sufficient to get good quality cardinality estimates.

So how and when will DS be use?

[Read More]

Friday Aug 06, 2010

What to expect from the Optimizer team at Oracle Open World 2010

Oracle Open World, the largest gathering of Oracle customers, partners, developers, and technology enthusiasts will happen in September (19-23) when Oracle will host its annual user conference in San Francisco. And the Optimizer development group will be there!

You have multiple opportunities to meet the team, either at one of our technical sessions, or at the Oracle Demo grounds.  This year the Optimizer team has 3 technical session, all of which build from one another.  We will start on Tuesday morning at 11 am with the Explain the Explain plan session (session S316955 Moscone South/Rm 104), where we will examines the different aspects of an execution plan, from selectivity to parallel execution, and explains what information you should be gleaning from the plan.

Then on Tuesday afternoon at 3:30 pm  we will build on what you learned in the morning session when we present Top tips for getting optimal SQL execution all the time (session S317019 Moscone South/Rm 103). This session will show you how to identify and resolving the most common SQL execution performance problems, such as poor cardinality estimations, bind peeking issues, and selecting the wrong access method.  

Finally Wednesday morning at 11:30 am you get an opportunity to put what you learned at both of the technical presentations to good use when you join us for our first ever hands on lab, Explain Plan: How to read and Analyze Execution Plans (Session S318755 Marriott Marquis/Salon 10 / 11). In this hands on lab you will get to solve 4 SQL tuning problems using the techniques described in the technical sessions.

The Optimizer development team will also be at the Oracle demo grounds (Monday through Wednesday) to answer all of your burning questions and to demonstrate the latest 11g features.

Monday Aug 02, 2010

We have moved!

You might have been wondering why things had gone so quiet on the Optimizer development team's blog Optimizer Magic over the last few months. Well the blog has moved to blogs.oracle.com/optimizer. All of the old articles have moved too and we plan to be a lot more active at our new home, with at least one new post every month.

Friday Mar 12, 2010

Explain adaptive cursor sharing behavior with cursor_sharing = similar and force.

Here's another question that was submitted during the OpenWorld Optimizer Roundtable. It's a common question that we've discussed a little bit in a couple other posts, but we wanted to summarize everything in one place. First, let's quickly review what the different values for the cursor_sharing parameter mean. We discussed this behavior in some detail in an earlier post about cursor_sharing. Below is a summary of the behavior of the different values in different cases (copied from the earlier post):

CURSOR_SHARING VALUE SPACE USED IN SHARED POOL QUERY PERFORMANCE
EXACT (No literal replacement) Worst possible case - each stmt issued has its own parent cursor Best possible case as each stmt has its own plan generated for it based on the value of the literal value present in the stmt
FORCE Best possible case as only one parent and child cursor for each distinct stmt Potentially the worst case as only one plan will be used for each distinct stmt and all occurrences of that stmt will use that plan
SIMILAR without histogram present Best possible case as only one parent and child cursor for each distinct stmt Potentially the worst case as only one plan will be used for each distinct stmt and all occurrences of that stmt will use that plan
SIMILAR with histogram present Not quite as much space used as with EXACT but close. Instead of each stmt having its own parent cursor they will have their own child cursor (which uses less space) Best possible case as each stmt has its own plan generated for it based on the value of the literal value present in the stmt

Adaptive cursor sharing (ACS) is another feature we've blogged about before, which allows the optimizer to generate a set of plans that are optimal for different sets of bind values. A common question is how the two interact, and whether users should consider changing the value of cursor_sharing when upgrading to 11g to take advantage of ACS. The simplest way to think about the interaction between the two features for a given query is to first consider whether literal replacement will take place for a query. Consider a query containing a literal:

select * from employees where job = 'Clerk'

As we see from the table above, the treatment of this query by literal replacement will depend on the value of the cursor_sharing parameter and whether there is a histogram on the job column. Here are the interesting cases:

  1. Cursor_sharing = exact. No literal replacement will take place, and the optimizer will see the query as is.
  2. Cursor_sharing = force. Whether there is a histogram or not, literal replacement will take place, and the optimizer will optimize the query as if it were: select * from employees where job = :b Bind peeking will take place, so that the value "Clerk" is used to generate cardinality estimates for the query. Subsequent executions of this query differing only in the literal value will share the same plan.
  3. Cursor_sharing = similar. There are two different cases for this:
    1. There is a histogram on the job column. In this case, literal replacement will not take place. The presence of a histogram indicates that the column is skewed, and the optimal plan may depend on the literal value. Hence, the optimizer sees the query as: select * from employees where job = 'Clerk' and subsequent executions with a different literal will not necessarily use the same plan.
    2. There is no histogram on the job column. This indicates that the column is not skewed, and the optimizer is likely to choose the same plan no matter the literal, so literal replacement takes place.
Now that we know when literal replacement will take place, and what the query looks like to the optimizer, we can consider adaptive cursor sharing. If literal replacement takes place, and the query that the optimizer optimizes contains a bind, then adaptive cursor sharing can take place. To adaptive cursor sharing, a bind variable is a bind variable, whether it comes from the user query or is inserted by literal replacement. On the other hand, if the query contains only literals (no binds), adaptive cursor sharing will not take place. In our example above, adaptive cursor sharing can be considered for cases 2 and 3.2. For case 3.2, it is likely that the optimizer will choose the same plan for different values of the literal. In case 2, if there is a histogram, then the optimizer may choose different plans depending on how popular the literal value is.

This example shows that if you use histograms, and want the optimizer to choose an optimal plan for different literal values using ACS, then you should set cursor_sharing to force. If it is set to similar, then literal replacement will not take place, and a child cursor will be created for each value of the literal. Setting cursor_sharing to similar effectively disables ACS for these kinds of queries. By setting cursor_sharing to force and letting adaptive cursor sharing kick in, the optimizer can choose optimal plans for different values, but if the same plan is appropriate for several values, they will share a single child cursor. Historically, cursor_sharing=similar has been recommended as a middle ground between no literal replacement (which causes a lot of cursors to be generated) and forced literal replacement (which causes a potentially sub-optimal plan to be shared for all literals). We now recommend using adaptive cursor sharing along with cursor_sharing=force instead.

So far we have only discussed cursor sharing in the presence of histograms. There are other cases where the optimizer's choice of plan can depend on the specific literal that appears in the query, for instance when binds appear in range predicates or when a bind value falls outside of a column's range (according to the optimizer statistics). Binds appearing in these kinds of predicates are also considered by adaptive cursor sharing, whereas they are not considered by cursor_sharing=similar.

Monday Dec 21, 2009

Upgrading from 9i to 11g and the implicit migration from RBO

Now that Oracle Database 11g Release 2 is out, more and more folks are considering upgrading to 11g. However, if you are currently on Oracle 9i then you will have to tackle both the upgrade to a new release and the migration from the Rule-Based Optimizer (RBO) to the Cost-based Optimizer (CBO). The RBO was de-supported in Oracle Database 10g, so in Oracle Database 11g you must use the CBO. Thanks to SQL Plan Management (SPM), originally discussed in our January post, you can handle the upgrade and the migration with ease. By loading the original RBO plans into SPM you can ensure the Optimizer won't change the execution plans during an upgrade and the implicit migration from RBO to CBO. In fact, there are two possible approaches you can take.[Read More]

Tuesday Nov 17, 2009

What should I do with old hints in my workload?

We promised in an earlier post to cover some of the questions from the Optimizer round table discussion at Oracle Open World. Here's our first in a series of posts that will address these questions.

Q: When moving from 10g to 11g, should hints in existing SQL be removed?

A: I was glad to see this question at the round table, since I think this is something people often overlook when adding hints to an application. Over time, hints can become stale. You may have added a hint to a query because of a weakness in an old version of the optimizer, but the hint might no longer be necessary to get the best plan. Even worse, the hint may force a plan which is suboptimal, for a couple of reasons:
  • Depending on how your data change over time, the plan choice that the hint enforces might no longer be a good choice. This problem can occur even without a database upgrade. For example, if the distribution of values in a column change over time, an old access path hint may no longer be appropriate.
  • The hint might prevent new optimizations from taking place. So after an upgrade, you will be stuck using the old hint-enforced plan, when the optimizer could have chosen something better. For example, you may have hinted a plan for a query with bind variables, to avoid a bad plan choice due to bind peeking in the pre-11g optimizer. In 11g, you will not get the benefits of adaptive cursor sharing because of the hints in the query.
These are just some of the reasons why it is good to re-evaluate the hints in your application from time to time. You can test this out using the _optimizer_ignore_hints parameter. Setting this parameter to TRUE will cause the optimizer to ignore the hints embedded in queries. You can set this on the session level, run your workload, and compare the performance to your baseline performance (with the hints). I recommend using SQL Performance Analyzer (SPA) to do this. Read the SPA white paper for more information on how to do that.

If you find that some of your queries really do still require hints to get an optimal plan, you should consider creating a SQL plan baseline, and allowing SPM to manage the plan for you. If you do this, then you will get the hinted plan for now, but if a better plan comes along later (for one of the reasons mentioned earlier), you will eventually get the benefits of the new plan, using the SPM evolve process. There are two ways that you can create a SQL plan baseline based on your hints that will be applied to the query without hints. The first way requires that a SQL plan baseline already exist for the unhinted query. That's described in an earlier post. Here are the steps to use if you don't already have a SQL plan baseline for the unhinted query.

1. Run the query with hints, and confirm that the plan is what you want:
var pid number
exec :pid := 100;
select /*+ leading(t) */ p.prod_name, s.amount_sold, t.calendar_year
from sales s, products p, times t
where s.prod_id = p.prod_id
and s.time_id = t.time_id
and p.prod_id < :pid; 
PROD_NAME                                          AMOUNT_SOLD CALENDAR_YEAR
-------------------------------------------------- ----------- -------------
...
select sql_id from v$sql where sql_text like 'select /*+ leading(t) */ p.prod_name%';
SQL_ID
-------------
2qtu6hy4rf1j9
select * from table(dbms_xplan.display_cursor(sql_id=>'2qtu6hy4rf1j9', 
                                              format=>'basic note'));
EXPLAINED SQL STATEMENT:
------------------------
select /*+ leading(t) */ p.prod_name, s.amount_sold, t.calendar_year from sales s, products p, times t where s.prod_id = p.prod_id and s.time_id = t.time_id and p.prod_id < :pid

Plan hash value: 2290436051
---------------------------------------------------------------
| Id  | Operation                            | Name           |
---------------------------------------------------------------
|   0 | SELECT STATEMENT                     |                |
|   1 |  HASH JOIN                           |                |
|   2 |   HASH JOIN                          |                |
|   3 |    TABLE ACCESS FULL                 | TIMES          |
|   4 |    PARTITION RANGE ALL               |                |
|   5 |     TABLE ACCESS BY LOCAL INDEX ROWID| SALES          |
|   6 |      BITMAP CONVERSION TO ROWIDS     |                |
|   7 |       BITMAP INDEX RANGE SCAN        | SALES_PROD_BIX |
|   8 |   TABLE ACCESS BY INDEX ROWID        | PRODUCTS       |
|   9 |    INDEX RANGE SCAN                  | PRODUCTS_PK    |
---------------------------------------------------------------

2. Load SQL plan baseline from cursor cache, and associate it with the unhinted query:

var sqltext clob;

begin
:sqltext := 'select p.prod_name, s.amount_sold, t.calendar_year
from sales s, products p, times t
where s.prod_id = p.prod_id
and s.time_id = t.time_id
and p.prod_id > :pid';
end;
/

exec :pls := dbms_spm.load_plans_from_cursor_cache( -
              sql_id => '2qtu6hy4rf1j9', -
              plan_hash_value => 2290436051, -
              sql_text => :sqltext);

3. Run the query without hints, and check that the SQL plan baseline was used.

select p.prod_name, s.amount_sold, t.calendar_year
from sales s, products p, times t
where s.prod_id = p.prod_id
and s.time_id = t.time_id
and p.prod_id < :pid;

PROD_NAME                                          AMOUNT_SOLD CALENDAR_YEAR
-------------------------------------------------- ----------- -------------
...

select * from table(dbms_xplan.display_cursor(sql_id=>'a1ax3265pq8x7',
                                              format=>'basic note'));

EXPLAINED SQL STATEMENT:
------------------------
select p.prod_name, s.amount_sold, t.calendar_year from sales s,
products p, times t where s.prod_id = p.prod_id and s.time_id =
t.time_id and p.prod_id < :pid

Plan hash value: 2290436051



---------------------------------------------------------------
| Id  | Operation                            | Name           |
---------------------------------------------------------------
|   0 | SELECT STATEMENT                     |                |
|   1 |  HASH JOIN                           |                |
|   2 |   HASH JOIN                          |                |
|   3 |    TABLE ACCESS FULL                 | TIMES          |
|   4 |    PARTITION RANGE ALL               |                |
|   5 |     TABLE ACCESS BY LOCAL INDEX ROWID| SALES          |
|   6 |      BITMAP CONVERSION TO ROWIDS     |                |
|   7 |       BITMAP INDEX RANGE SCAN        | SALES_PROD_BIX |
|   8 |   TABLE ACCESS BY INDEX ROWID        | PRODUCTS       |
|   9 |    INDEX RANGE SCAN                  | PRODUCTS_PK    |
---------------------------------------------------------------
Note ----- - SQL plan baseline SQL_PLAN_4rw2dhryc2w5h888547d3 used for this statement
About

The Oracle Optimizer blog is written by members of the Optimizer development team. The goal of this blog is to provide an insight into the workings of the Optimizer and the statistics it relies on. The views expressed on this blog are our own and do not necessarily reflect the views of Oracle and its affiliates. The views and opinions expressed by visitors on this blog are theirs solely and may not reflect ours.

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today