Optimizer Transformations: View Merging part 2

This is the second of two posts on view merging. See the first post for the basics of view merging, how it works for simple select-project-join views and views appearing in outer joins, and why one of these views might not be merged. In this post we'll cover complex view merging.

Complex View Merging

We use the term "complex view merging" to describe merging of group by and distinct views. Like simple view merging, this allows the optimizer to consider additional join orders and access paths. In addition, the evaluation of the group-by/distinct operation can be delayed until after the joins have been evaluated. Delayed evaluation of group-by can make performance better or worse depending on the characteristics of the data. Delaying a group-by until after joins can result in a reduction in the data set on which the group-by operation is to be performed, if joins are filtering; on the other hand, early group-by can reduce the amount of data to be processed by subsequent joins or the joins could explode the amount of data to undergo group-by. The same is true for distinct operations. Because it is not always better to merge such a view, we choose whether to use this transformation in a cost-based manner. The two options - with and without view merging - are each costed by the optimizer, and we choose to merge the view only if it is cheaper to do so.

Consider the following group by view and query which refers to it:
create view cust_prod_totals_v as
select sum(s.quantity_sold) total, s.cust_id, s.prod_id
from sales s
group by s.cust_id, s.prod_id;

select c.cust_id, c.cust_first_name, c.cust_last_name, c.cust_email
from customers c, products p, cust_prod_totals_v
where c.country_id = 'US'
and c.cust_id = cust_prod_totals_v.cust_id
and cust_prod_totals_v.total > 100
and cust_prod_totals_v.prod_id = p.prod_id
and p.prod_name = 'T3 Faux Fur-Trimmed Sweater';
This query finds all of the customers from the US who have bought at least 100 of a particular item. The view is eligible for complex view merging. After merging, the query looks like this:
select c.cust_id, cust_first_name, cust_last_name, cust_email
from customers c, products p, sales s
where c.country_id = 'US'
and c.cust_id = s.cust_id
and s.prod_id = p.prod_id
and p.prod_name = 'T3 Faux Fur-Trimmed Sweater'
group by s.cust_id, s.prod_id, p.rowid, c.rowid,
c.cust_email, c.cust_last_name, c.cust_first_name, c.cust_id
having sum(s.quantity_sold) > 100;
The transformed query is cheaper than the untransformed query, so the optimizer chooses to merge the view. Why is the transformed query cheaper? In the untransformed query, the group by operator applies to the entire sales table in the view. In the transformed query, the joins to products and customers (especially products) filter out a large portion of the rows from the sales table, so the group by operation is much cheaper. The join is more expensive because the sales table has not been reduced, but it is not that much more expensive, since the group-by operation does not reduce the data size that much in the original query. If any of these characteristics were to change, it may no longer be cheaper to merge the view. Hence the need for a cost-based decision. The final plan is as follows:
| Id  | Operation             | Name      | Cost (%CPU)|
|   0 | SELECT STATEMENT      |           |  2101  (18)|
|*  1 |  FILTER               |           |            |
|   2 |   HASH GROUP BY       |           |  2101  (18)|
|*  3 |    HASH JOIN          |           |  2099  (18)|
|*  4 |     HASH JOIN         |           |  1801  (19)|
|*  5 |      TABLE ACCESS FULL| PRODUCTS  |    96   (5)|
|   6 |      TABLE ACCESS FULL| SALES     |  1620  (15)|
|*  7 |     TABLE ACCESS FULL | CUSTOMERS |   296  (11)|
Predicate Information (identified by operation id):
1 - filter(SUM("QUANTITY_SOLD")>100)
3 - access("C"."CUST_ID"="CUST_ID")
4 - access("PROD_ID"="P"."PROD_ID")
5 - filter("P"."PROD_NAME"='T3 Faux Fur-Trimmed Sweater')
7 - filter("C"."COUNTRY_ID"='US')
There is no view in the plan above, which is what one would expect after the view has been merged. However, there are some cases where a view will still appear in the plan even after view merging, with a name like VW_NWVW_1. We'll discuss the reasons why in a moment, but first let's look at an example. This also gives us a chance to look at an example of distinct view merging. Consider this query to find customers in the US that bought a particular product:
select c.cust_id, c.cust_first_name, c.cust_last_name, c.cust_email
from customers c, products p,
(select distinct s.cust_id, s.prod_id
from sales s) cust_prod_v
where c.country_id = 'US'
and c.cust_id = cust_prod_v.cust_id
and cust_prod_v.prod_id = p.prod_id
and p.prod_name = 'T3 Faux Fur-Trimmed Sweater';
The view can be merged, though it is based on cost, since the reduction in data due to distinct may make the join cheaper. In this case, however, it is cheaper to merge the view, so we get this equivalent query:
select nwvw.cust_id, nwvw.cust_first_name, nwvw.cust_last_name, nwvw.cust_email
from (select distinct c.rowid, p.rowid, s.prod_id, s.cust_id, c.cust_id,
c.cust_first_name, c.cust_last_name, c.cujst_email
from customers c, products p, sales s
where c.country_id = 'US'
and c.cust_id = s.cust_id
and s.prod_id = p.prod_id
and p.prod_name = 'T3 Faux Fur-Trimmed Sweater') nwvw;
and this plan:
| Id  | Operation             | Name      |
|   0 | SELECT STATEMENT      |           |
|   1 |  VIEW                 | VM_NWVW_1 |
|   2 |   HASH UNIQUE         |           |
|*  3 |    HASH JOIN          |           |
|*  4 |     HASH JOIN         |           |
|   6 |      TABLE ACCESS FULL| SALES     |

Predicate Information (identified by operation id):
  3 - access("C"."CUST_ID"="S"."CUST_ID")
  4 - access("S"."PROD_ID"="P"."PROD_ID")
  5 - filter("P"."PROD_NAME"='T3 Faux Fur-Trimmed Sweater')
  7 - filter("C"."COUNTRY_ID"='US')
So why do we still have a view after we've supposedly merged the view? The new view is what we call a "projection view". When we merge the view, we move the distinct to the outer query block. But when we move the distinct, we have to add several additional columns, in order to maintain semantic equivalence with the original query. So we put all of that into a new view, so we can select out just the columns we want in the outer query block's select list. But we still get all of the benefits we promised from merging the view -- all of the tables are in one query block and the optimizer is free to permute them as it desires in the final join order, and the distinct operation has been delayed until after all of the joins are completed. These projection views appear in queries where a distinct view has been merged, or a group by view is merged into an outer query block which also contains group by, having, and/or aggregates. In the latter case, the projection view contains the group by, having, and aggregates from the original outer query block.

Now that this great mystery has been revealed, let's look at some of the reasons a group by or distinct view might not be merged. Aside from cost, there are several other reasons, including:
  • The outer query tables do not have a rowid or unique column
  • View appears in a connect by query block
  • View contains grouping sets, rollup, pivot
  • View or outer query block contains spreadsheet clause

View merging can improve plans by allowing additional join orders, access methods, and other transformations to be considered. In cases where view merging should always lead to a better plan, Oracle automatically merges a view; in other cases, this is determined based on cost. There are many reasons why a view may not be merged, including cost or validity restrictions. Note that view merging that is rejected on the basis of cost or heuristics can be overridden with hints; but view merging that is rejected based on validity may not.

Post a Comment:
  • HTML Syntax: NOT allowed

The Oracle Optimizer blog is written by members of the Optimizer development team. The goal of this blog is to provide an insight into the workings of the Optimizer and the statistics it relies on. The views expressed on this blog are our own and do not necessarily reflect the views of Oracle and its affiliates. The views and opinions expressed by visitors on this blog are theirs solely and may not reflect ours.


« July 2016