### Optimizer Transformations: Join Factorization

#### By Hong Su on Feb 25, 2011

We continue our series on optimizer transformations with a post that describes the Join Factorization transformation. The Join Factorization transformation was introduced in Oracle 11g Release 2 and applies to UNION ALL queries. Union all queries are commonly used in database applications, especially in data integration applications. In many scenarios the branches in a UNION All query share a common processing, i.e, refer to the same tables. In the current Oracle execution strategy, each branch of a UNION ALL query is evaluated independently, which leads to repetitive processing, including data access and join. The join factorization transformation offers an opportunity to share the common computations across the UNION ALL branches. Currently, join factorization only factorizes common references to base tables only, i.e, not views.

Consider a simple example of query Q1.

**Q1: select t1.c1, t2.c2 from t1, t2, t3 where **

**t1.c1 = t2.c1 and t1.c1 > 1 and t2.c2 = 2 and t2.c2 = t3.c2**

union all

select t1.c1, t2.c2

from t1, t2, t4

where t1.c1 = t2.c1 and t1.c1 > 1 and t2.c3 = t4.c3;

union all

select t1.c1, t2.c2

from t1, t2, t4

where t1.c1 = t2.c1 and t1.c1 > 1 and t2.c3 = t4.c3;

Table *t1 *appears in both the branches. As does the filter predicates on* t1 *(*t1.c1 > 1*) and the join predicates involving *t1* (*t1.c1 = t2.c1*). Nevertheless, without any transformation, the scan (and the filtering) on* t1 *has to be done twice, once per branch. Such a query may benefit from join factorization which can transform Q1 into Q2 as follows:

**Q2: select t1.c1, VW_JF_1.item_2 from t1, (select t2.c1 item_1, t2.c2 item_2 from t2, t3 where t2.c2 = t3.c2 and t2.c2 = 2 union all select t2.c1 item_1, t2.c2 item_2 from t2, t4 where t2.c3 = t4.c3) VW_JF_1 where t1.c1 = VW_JF_1.item_1 and t1.c1 > 1;**

In Q2,

*t1*is "factorized" and thus the table scan and the filtering on

*t1*is done only once (it's shared). If

*t1*is large, then avoiding one extra scan of

*t1*can lead to a huge performance improvement.

Another benefit of join factorization is that it can open up more join orders. Let's look at query Q3.

**Q3: select * from t5, **

**(select t1.c1, t2.c2**

from t1, t2, t3

where

where

from t1, t2, t3

where

**t1.c1 = t2.c1 and t1.c1 > 1 and t2.c2 = 2 and t2.c2 = t3.c2**

union all

select t1.c1, t2.c2

from t1, t2, t4

where t1.c1 = t2.c1 and t1.c1 > 1 and t2.c3 = t4.c3) V;union all

select t1.c1, t2.c2

from t1, t2, t4

where t1.c1 = t2.c1 and t1.c1 > 1 and t2.c3 = t4.c3) V;

where

**t5.c1 = V.c1**In Q3, view *V* is same as Q1. Before join factorization,* t1*,* t2 *and *t3 *must be joined first before they can be joined with *t5*. But if join factorization factorizes *t1 *from view *V*, *t1* can then be joined with *t5*. This opens up new join orders. That being said, join factorization imposes certain join orders. For example, in Q2, *t2 *and *t3* appear in the first branch of the UNION ALL query in view *VW_JF_1*. *T2 *must be joined with *t3 *before it can be joined with *t1 *which is outside of the *VW_JF_1 *view. The imposed join order may not necessarily be the best join order. For this reason, join factorization is performed under cost-based transformation framework; this means that we cost the plans with and without join factorization and choose the cheapest plan.

Note that if the branches in UNION ALL have DISTINCT clauses, join factorization is not valid. For example, Q4 is NOT semantically equivalent to Q5.

**Q4: select distinct t1.* from t1, t2 where t1.c1 = t2.c1 union all select distinct t1.* from t1, t2 where t1.c1 = t2.c1**

**Q5: select distinct t1.* from t1, (select t2.c1 item_1 from t2 union all select t2.c1 item_1 from t2) VW_JF_1 where t1.c1 = VW_JF_1.item_1**

Q4 might return more rows than Q5. Q5's results are guaranteed to be duplicate free because of the DISTINCT key word at the top level while Q4's results might contain duplicates.

The examples given so far involve inner joins only. Join factorization is also supported in outer join, anti join and semi join. But only the right tables of outer join, anti join and semi joins can be factorized. It is not semantically correct to factorize the left table of outer join, anti join or semi join. For example, Q6 is NOT semantically equivalent to Q7.

**Q6: select t1.c1, t2.c2 from t1, t2 where t1.c1 = t2.c1(+) and t2.c2 (+) = 2 union all select t1.c1, t2.c2 from t1, t2 where t1.c1 = t2.c1(+) and t2.c2 (+) = 3**

**Q7: select t1.c1, VW_JF_1.item_2 from t1, (select t2.c1 item_1, t2.c2 item_2**

from t2

where t2.c2 = 2

union all

select t2.c1 item_1, t2.c2 item_2

from t2

from t2

where t2.c2 = 2

union all

select t2.c1 item_1, t2.c2 item_2

from t2

where t2.c2 = 3) VW_JF_1

where t1.c1 = VW_JF_1.item_1(+)

where t2.c2 = 3) VW_JF_1

where t1.c1 = VW_JF_1.item_1(+)

However, the right side of an outer join can be factorized. For example, join factorization can transform Q8 to Q9 by factorizing *t2*, which is the right table of an outer join.

**Q8: select t1.c2, t2.c2 from t1, t2**

where t1.c1 = t2.c1 (+) and t1.c1 = 1

union all

select t1.c2, t2.c2

from t1, t2

where t1.c1 = t2.c1(+) and t1.c1 = 2

where t1.c1 = t2.c1 (+) and t1.c1 = 1

union all

select t1.c2, t2.c2

from t1, t2

where t1.c1 = t2.c1(+) and t1.c1 = 2

**Q9: select VW_JF_1.item_2, t2.c2 from t2,**

(select t1.c1 item_1, t1.c2 item_2

from t1

where t1.c1 = 1

union all

select t1.c1 item_1, t1.c2 item_2

from t1

where t1.c1 = 2) VW_JF_1

where VW_JF_1.item_1 = t2.c1(+)

(select t1.c1 item_1, t1.c2 item_2

from t1

where t1.c1 = 1

union all

select t1.c1 item_1, t1.c2 item_2

from t1

where t1.c1 = 2) VW_JF_1

where VW_JF_1.item_1 = t2.c1(+)

All of the examples in this blog show factorizing a single table from two branches. This is just for ease of illustration. Join factorization can factorize multiple tables and from more than two UNION ALL branches.

**Summary**Join factorization is a cost-based transformation. It can factorize common computations from branches in a UNION ALL query which can lead to huge performance improvement.