Optimizer Transformations: Subquery Unnesting part 1

We have received a ton of requests for more information on Oracle Optimizer Transformations so we thought we would put together a series of blog posts describing the most commonly used transformations. You should expect to see two blog posts a month on this over the next quarter.

Before we begin we should explain what we mean by transformation? When a SQL statement is parsed, the Optimizer will try to "transform" or rewrite the SQL statement into a semantically equivalent SQL statement that can be processed more efficiently. We first discussed the concepts of transformations in a blog post in June 2008 called why are some of tables in my query missing in my plan. This original post dealt with just one of the new transformations introduced in Oracle Database 11g called table elimination. Part one of our new series will deal with subquery unnesting. I must give credit to Rafi, one of the senior Optimizer developers, who provided the content for this topic.

Subquery Unnesting 

Subquery unnesting is an optimization that converts a subquery into a join in the outer query and allows the optimizer to consider subquery table(s) during access path, join method, and join order selection. Unnesting either merges the subquery into the body of the outer query or turns it into an inline view.

Without unnesting, the subquery is evaluated multiple times, for each row of the outer table, and thus many efficient access paths and join methods cannot be considered. Here we will discuss the unnesting of ANY and EXISTS subqueries, which appear in the WHERE clause.

Terminology

Any sub-query block in a query statement may be called a subquery; however, we use the term subquery for a sub-query block that appears in the WHERE, SELECT and HAVING clauses. Some Oracle documentation uses the term "nested subquery" for what we refer to as a subquery. A sub-query block that appears in the FROM clause is called a view or derived table.

There are many ways to classify a subquery. The main categorization comes from the way a subquery is used in SQL statements. A WHERE clause subquery belongs to one of the following types: SINGLE-ROW, EXISTS, NOT EXISTS, ANY, or ALL. A single-row subquery must return at most one row, whereas the other types of subquery can return zero or more rows. ANY and ALL subqueries are used with relational comparison operators: =, >,>=, <, <=, and <>. In SQL, the set operator IN is used as a shorthand for =ANY and the set operator NOT IN is used as a shorthand for <>ALL.

Query A shows an example of a correlated EXISTS subquery.

A.
SELECT C.cust_last_name, C.country_id
FROM    customers C
WHERE EXISTS (SELECT 1
                          FROM sales S
                          WHERE S.quantity_sold > 1000 and
                                        S.cust_id = C.cust_id);

A column that appears in a subquery is called a correlated column, if it comes from a table not defined by the subquery. The subquery in A is correlated, as it refers to a correlated column, C.cust_id, which comes from, customers, a table not defined by the subquery. The predicate, S.cust_id = C.cust_id, is called a correlating condition or a correlated predicate.

Consider query B, which contains an uncorrelated ANY subquery. Note that queries B and A are semantically equivalent.

B.
SELECT C.cust_last_name, C.country_id
FROM    customers C
WHERE C.cust_id =ANY (SELECT S.cust_id
                                       FROM sales S
                                       WHERE S.quantity_sold > 1000);

The subquery in B is uncorrelated, as it does not refer to a correlated column. "C.cust_id = ANY S.cust_id" in B is called a connecting condition.

Subquery Evaluation

A NOT EXISTS subquery evaluates to TRUE, if it returns no rows. The ANY/ALL subquery returns a set of values, and the predicate containing the ANY/ALL subquery will evaluate to TRUE, if it is satisfied. For example, at least one S.cust_id values must match C.cust_id in the connecting condition of the ANY subquery of query B. 

Note that in Oracle, a non-unnested ANY and ALL subquery is converted into a correlated EXISTS and NOT EXISTS subquery respectively.

When a correlated subquery is not unnested, the subquery is evaluated multiple times, for each row of the outer tables, substituting the values of correlated columns (e.g., customer.cust_id in A). Thus, table accesses and joins inside the subquery are repeatedly performed with each invocation and join orders involving subquery tables and outer query tables cannot be explored. This type of evaluation also inhibits parallelization.

XA shows the execution plan for query A. Here subquery unnesting has been disabled. Observe that the text of the non-unnested subquery filter is displayed in the predicate dump at the bottom of the plan. As the execution plan shows, the subquery will be evaluated multiple (i.e., 50K) times for each outer row of CUSTOMERS table.

XA.
explain_plan_for_A.png

EXISTS and ANY Subqueries

Semi-join is generally used for unnesting EXISTS and ANY subqueries. However, in some cases where duplicate rows are not relevant, inner join can also be used to unnest EXISTS and ANY subqueries. Here we represent semi-join by the following non-standard syntax: T1.x S= T2.y, where T1 is the left table and T2 is the right table of the semi-join. The semantics of semi-join is the following: A row of T1 is returned as soon as T1.x finds a match with any value of T2.y without searching for further matches.

Consider the previously shown query A. Unnesting of the subquery in A produces query C, where the body of the subquery has been merged into the outer query. Here the correlating condition has been turned into a join predicate; customers and sales become the left and right tables respectively in the semi-join.

C.
SELECT C.cust_last_name, C.country_id
FROM customers C, sales S
WHERE S.quantity_sold > 1000 and
C.cust_id S= S.cust_id;

The execution plan of C is shown below as XC. Note the difference between the costs of the plan XC and the plan XA; recall that XA was generated by disabling unnesting. Clearly, the plan with unnesting (XC) is much more optimal; the cost has come down from 5006K to 2300. (The query B also produces the same plan as XC.)
XC.
explain_plan_for_C.png












Now consider query D, which contains an uncorrelated ANY subquery that has two tables.
D.
SELECT C.cust_last_name, C.country_id
FROM    customers C
WHERE C.cust_id =ANY (SELECT S.cust_id
                                       FROM sales S, products P
                                       WHERE P.prod_id = S.prod_id and
                                                      P.prod_list_price > 105);

The subquery in D can be unnested by using a semi-join; however, the inner join of the tables in the subquery, sales and products must take place before the semi-join is performed. Therefore, an inline view needs to be generated in order to enforce the join order. The query E shows the unnesting transformation of D. Here the subquery is decorrelated and converted into an inline view, which becomes the right table in the semi-join; and the correlated predicate is turned into a join predicate.

E.
SELECT C.cust_last_name, C.country_id
FROM customers C,
          (SELECT S.cust_id as s_cust_id
           FROM    sales S, products P
           WHERE P.prod_id = S.prod_id and P.prod_list_price > 105) VW
WHERE C.cust_id S= VW.s_cust_id;

XE shows the execution plan of E. It has an optimizer-generated inline view named VW_SQ_1. Of the three available join methods (i.e., nested-loop, hash, and sort-merge), the hash method was selected by the optimizer to do the semi-join.

XE
explain_plan_for_E.png
















Look out for part 2 of our blog on Subquery Unnesting,  which will discuss NOT EXISTS subqueries, single-row aggregated subqueries, as well as the validity checks performed before unnesting is allowed.

Comments:

excellent explanation, very useful.I have a question - how to decide that whether we should use SEMI JOIN or no. what is the use of a hint called PUSH_SUBQ, I have read few places that push_subq has been changed in 11g . can you please elobarate in the use of hint push_subq in query unnesting.

Posted by ajeet on September 26, 2011 at 08:41 PM PDT #

Post a Comment:
  • HTML Syntax: NOT allowed
About

The Oracle Optimizer blog is written by members of the Optimizer development team. The goal of this blog is to provide an insight into the workings of the Optimizer and the statistics it relies on. The views expressed on this blog are our own and do not necessarily reflect the views of Oracle and its affiliates. The views and opinions expressed by visitors on this blog are theirs solely and may not reflect ours.

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today