X

Insights into Statistics, Query Optimization and the Oracle Optimizer

Group-by and Aggregation Elimination

Nigel Bayliss
Product Manager

I get a fair number of questions on query transformations, and it’s especially true at the moment because we’re in the middle of the Oracle Database 12c Release 2 beta program. Sometimes people notice “something missing” or unusual in an execution plan and realize that a transformation is happening. For example, Join Elimination (thanks, Christian) can take you by surprise if you notice that a table in your query doesn’t appear in the SQL execution plan at all (and neither do any of its indexes).

I’m sure you’re aware that query transformations are an important step in query optimization because it’s often possible to reduce the cost of a query by eliminating steps like joins or sorts. Sometimes changing the shape of the query will allow it to use different access paths, different types of join, and entirely different types of query blocks. We cover most transformations in our “What to expect from the Optimizer” collateral and, in particular, release-specific Optimizer white papers (here's the Oracle Database 12c one).

In Oracle Database 12.1.0.2, we added a new transformation called Group-by and Aggregation Elimination and it slipped through any mention in our collateral. It happens to be one of the simplest transformations in the Oracle Optimizer’s repertoire and I know that some of you are very well-informed and know about it already. You might have seen it getting a mention in Mike Dietrich’s upgrade blog. Let’s take a look at what the transformation does…

Many applications contain queries that have a form where a group-by query block has a single table that is a “group-by view”. Here’s an example:

SELECT v.column1, v.column2, MAX(v.sm), SUM(v.sm)
FROM (SELECT t1.column1, t1.column2, SUM(t1.item_count) AS sm
      FROM   t1, t2
      WHERE  t1.column4 > 3
      AND    t1.id = t2.id
      AND    t2.column5 > 10
      GROUP BY t1.column1, t1.column2) V
GROUP BY v.column1, v.column2;

For the transformation to be possible, the outer query must have a single group-by view, but the inner query (that forms the view) can include multiple tables. Under certain conditions, we can transform these types of query to eliminate group-by aggregations, which is an expensive operation. The resulting query is simpler and contains fewer group-by clauses and aggregate functions.  For the query above, we can transform it into something like this:

SELECT t1.column1, t1.column2, SUM(t1.item_count) AS "MAX(V.SM)", SUM(t1.item_count) AS "SUM(V.SM)"
FROM   t1, t2
WHERE  t1.column4 > 3 AND
       t1.id = t2.id AND
       t2.column5 > 10
GROUP BY t1.column1, t1.column2; 

Here’s what the SQL execution plan looks like without the transformation. In my case there are 100,000 rows in each table and the elapsed time is 2.09 seconds:

------------------------------------------------------
   Id | Operation             | Name | Rows  | Bytes |
------------------------------------------------------
|   0 | SELECT STATEMENT      |      |       |       |
|   1 |  HASH GROUP BY        |      | 66521 |  1494K|
|   2 |   VIEW                |      | 66521 |  1494K|
|   3 |    HASH GROUP BY      |      | 66521 |  2143K|
|   4 |     HASH JOIN         |      | 99800 |  3216K|
|   5 |      TABLE ACCESS FULL| T2   | 99800 |   877K|
|   6 |      TABLE ACCESS FULL| T1   | 99998 |  2343K|
------------------------------------------------------

You can see from the plan above that there are two HASH GROUP BY steps, one for the view and one for the outer query block. I used database version 12.1.0.2 for this test, so I disabled the query transformation by setting the database hidden parameter _optimizer_aggr_groupby_elim to FALSE.

Looking at the plan for the transformed query, you can see that there is only a single HASH GROUP BY step. The elapsed time was much lower too – only 1.29 seconds in my case:

----------------------------------------------------
   Id | Operation           | Name | Rows  | Bytes |
----------------------------------------------------
|   0 | SELECT STATEMENT    |      |       |       |
|   1 |  HASH GROUP BY      |      | 66521 |  2143K|
|   2 |   HASH JOIN         |      | 99800 |  3216K|
|   3 |    TABLE ACCESS FULL| T2   | 99800 |   877K|
|   4 |    TABLE ACCESS FULL| T1   | 99998 |  2343K|
----------------------------------------------------

The example above is relatively easy to understand because the GROUP BY columns in the view and the outer query are the same. This doesn’t have to be the case. It’s sometimes possible to transform the query even if the outer query’s GROUP BY is a subset of the view’s GROUP BY. For example:

SELECT v.column1, v.column3, MAX(v.column1), SUM(v.sm)
FROM (SELECT t1.column1, t1.column2, t1.column3, SUM(t1.item_count) AS sm
      FROM   t1, t2
      WHERE  t1.column4 > 3 AND
             t1.id = t2.id  AND
             t2.column5 > 10
      GROUP BY t1.column1, t1.column2, t1.column3) V
GROUP BY v.column1, v.column3;

----------------------------------------------------
   Id | Operation           | Name | Rows  | Bytes |
----------------------------------------------------
|   0 | SELECT STATEMENT    |      |       |       |
|   1 |  HASH GROUP BY      |      | 49891 |  1607K|
|*  2 |   HASH JOIN         |      | 99800 |  3216K|
|*  3 |    TABLE ACCESS FULL| T2   | 99800 |   877K|
|*  4 |    TABLE ACCESS FULL| T1   | 99998 |  2343K|
----------------------------------------------------

You don’t need to do anything special to enable this query transformation. It is enabled by default and it happens automatically whenever a particular query passes the validation criteria. In real-world, enterprise-class systems you can of course expect to see much more significant time-savings when this transformation can be applied. Note that the transformation is not applied for queries that use ROLLUP or CUBE grouping functions.

Were there problems with this transformation? Yes, there were (and this is why Mike Dietrich mentioned it). With any transformation, the Oracle Optimizer has to figure out when it can and can’t be applied and the logic behind this can be incredibly complex. The bottom line is that there were some cases where the transformation was being applied and it shouldn’t have been. Generally, this was where the outer group-by query was truncating or casting columns used by the inner group-by. This is now fixed and it’s covered by patch number 21826068. Please use MOS to check availability for your platform and database version.

Join the discussion

Comments ( 3 )
  • Lukasz Mastalerz Monday, April 25, 2016

    Hi,

    I'd expect the transformed query to have SUM and MAX and in your example there seem to be two SUM aggregates. Is this a typo or I misunderstood something?

    SELECT t1.column1, t1.column2, SUM(t1.item_count) AS "MAX(V.SM)", SUM(t1.item_count) AS "SUM(V.SM)"

    Thanks,

    Lukasz


  • Nigel Bayliss Monday, April 25, 2016

    Hi Lukasz,

    The example is correct, but the reason is not obvious. The outer group-by is grouping single-row sets only, so MAX() and SUM() will return the same value.

    If this is the result of the inner group-by:

    column1 column2 item_count SUM(item_count)

    1 5 10 10

    7 3 12 12

    Then this is the result of the outer-group by:

    column1 column2 item_count SUM(item_count) MAX(item_count)

    1 5 10 10 10

    7 3 12 12 12

    Regards,

    Nigel.


  • Govind Batte Saturday, January 13, 2018
    I'm interested in learning about hash group by
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.Captcha
Oracle

Integrated Cloud Applications & Platform Services