X

Insights into Statistics, Query Optimization and the Oracle Optimizer

How do I know if the cardinality estimates in a plan are accurate?

Maria Colgan
Master Product Manager

It is often recommended when you are tuning a SQL statement to confirm that the Optimizer's cardinality estimates, in the execution plan, are accurate. But how do you do that?

It is probably easier to answer this question with an example. Let's take the following example of a simple two table join.

SELECT   p.prod_name, SUM(s.quantity_sold)
FROM     sales s, products p
WHERE    s.prod_id =p.prod_id
AND      p.prod_desc = 'Envoy Ambassador'
GROUP By p.prod_name ;

The execution plan for the query is as follows:

The Optimizer's cardinality estimate is shown in the column Rows in the execution plan. For each line of the plan, the value in this column indicates the number of rows the Optimizer estimates will be returned by that operation. Traditionally you would have been told to do select count(*) style queries to confirm the cardinality estimates for each line in the plan . For example, you could confirm the cardinality estimate of 1 on line 3 of the above plan by issuing the following SQL statement.

Having to do a select count(*) query for every aspect of a complex plan can be extremely painful. However, life was made considerably easier with the introduction of the GATHER_PLAN_STATISTICS hint in Oracle Database 10g. The GATHER_PLAN_STATISTICS hint tells Oracle to collect execution statistics for a SQL statement. These execution statistics are then shown next to the original Optimizer estimates in the execution plan if you use the function DBMS_XPLAN.DISPLAY_CURSOR to display the plan. You also have to set the  FORMAT parameter to 'ALLSTATS LAST' (DBMS_XPLAN.DISPLAY_CURSOR(FORMAT=>'ALLSTATS LAST')). The original Optimizer estimates are shown in the E-Rows column while the actual statistics gathered during execution are shown in the A-Rows column.

If we add the GATHER_PLAN_STATISTICS hint to our simple SQL statement we should be able to see the actual cardinality of each operation at execution time alongside the Optimizer estimates for each cardinality in the plan.

SELECT     /*+ GATHER_PLAN_STATISTICS */ p.prod_name, SUM(s.quantity_sold)
FROM       sales s, products p
WHERE      s.prod_id =p.prod_id
AND        p.prod_desc = 'Envoy Ambassador'
GROUP By p.prod_name ;
The execution plan for the query is as follows:

In our example the Optimizer did an excellent job at the single table access cardinality estimates (line 3 & 5 of the plan), which are identical to the actual execution statistics. The estimate for the HASH JOIN on line 2 of the plan is slightly over but in the same ball park. You want the cardinality estimate to be at the same order of magnitude as the actual execution statistics. So in this case, the Optimizer estimated 12,941 rows when there were actually 9,591 rows.

Gotchas

Before you run off and start inserting the GATHER_PLAN_STATISTICS hint into every SQL statement you execute, you should be aware of some of the gotchas.

Gathering the execution statistics does incur some overhead. The amount of overhead is dependent on the operating system. Jonathan Lewis discusses this in more detail on his blog.

The second gotcha comes if you use Parallel Execution. Again it is probably easier to explain this using our example. If we were to do an 'Alter session force parallel query;' command before executing our simple query the information in the A-Rows column would look very different.

Alter session force parallel query;

SELECT   /*+ GATHER_PLAN_STATISTICS */ p.prod_name, SUM(s.quantity_sold)
FROM     sales s, products p
WHERE    s.prod_id =p.prod_id
AND      p.prod_desc = 'Envoy Ambassador'
GROUP By p.prod_name ;
SELECT *
FROM table(DBMS_XPLAN.DISPLAY_CURSOR(FORMAT=>'ALLSTATS LAST'));

 

As you can see from the parallel execution plan above the majority of the rows in the table have 0 in the A-Rows column. This is caused by the nature of Parallel Execution and how it interacts with the FORMAT we requested in the DBMS_XPLAN.DISPLAY_CURSOR command. The FORMAT parameter value 'ALLSTATS LAST' requests that the execution statistics of the last execution of the cursor be shown. In parallel execution the last process to execute the cursor is the Query Coordinator (QC). Typically the QC (a single process) executes only a small number of the operations in the execution plan, while the majority of the operations are done by the parallel server processes. So, when we issue the the  DBMS_XPLAN.DISPLAY_CURSOR and ask for the last execution we only get information about the operations in the plan that the QC actually executed. In this case the only operation the QC did was return the final result to our SQL*Plus session, which is why only line 0 and 1 have entries in the A-Rows column.

In order to see the A-Rows values for all of the operations in the plan you would have to use the FORMAT value  'ALLSTATS ALL', which will show you the execution statistics for ALL executions of the cursor. Be careful here as the values in the A-Rows column is now cumulative (incremented on every execution). The more the statement is executed the higher the values in the A-Rows column will become. This FORMAT will also show you a lot more columns in the execution plan. A much simpler approach to check cardinality estimates for parallel statements is to use SQL Monitor. It always shows the A-Rows column and you don't have to worry about it being cumulative. More information on using SQL Monitor can be found on Doug Burns blog and Greg Rahn's blog. The SQL Monitor plan for the parallel version of our SQL statement is below.

Join the discussion

Comments ( 13 )
  • guest Thursday, December 8, 2011

    The GATHER_PLAN_STATISTICS hint tells Oracle to collect execution statistics for a SQL statement.

    Does it mean it starts to gather statistics of all tables in sql by using

    dbms_stats.gather_table_stats??


  • Maria Colgan Tuesday, December 13, 2011

    No the GATHER_PLAN_STATISTICS hint doesn't gather optimizer statistics as in those gather by dbms_stats.gather_table_stats. The GATHER_PLAN_STATISTICS hint captures execution statistics. For example, the number of rows actually returned by each operation in the execution plan.


  • guest Monday, January 9, 2012

    I tested the gather_plan_statistics hint on a query and looked at the plan which showed:

    -----------------------------------------------------------------------------------------------------------------------------------------

    | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads | OMem | 1Mem | Used-Mem |

    -----------------------------------------------------------------------------------------------------------------------------------------

    |* 1 | HASH JOIN | | 1 | 30M| 41555 |00:02:42.06 | 115K| 112K| 904K| 904K| 1237K (0)|

    | 2 | TABLE ACCESS FULL | O_15_OBJ_DIM | 1 | 364 | 364 |00:00:00.02 | 16 | 14 | | | |

    |* 3 | HASH JOIN | | 1 | 30M| 41555 |00:02:41.93 | 115K| 112K| 889K| 889K| 1229K (0)|

    |* 4 | TABLE ACCESS FULL | T_99_TIME_DIM | 1 | 143 | 192 |00:00:15.16 | 5797 | 5788 | | | |

    | 5 | INDEX FAST FULL SCAN| PK_R_99_STAT_RES | 1 | 30M| 30M|00:01:31.31 | 109K| 107K| | | |

    -----------------------------------------------------------------------------------------------------------------------------------------

    The estimate for the actual objects is close, but for the hash join it is way off .. why could this be?


  • Maria Colgan Monday, January 9, 2012

    Without seeing the actual query text it is hard to know why the hash join estimate is so off. It could because multiple columns were used in the join predicate or a number of other reason.


  • fritz Thursday, April 26, 2012

    Is it possible to use the actual rows to improve the execution plan? (I have correlated data, but 10g, so I can't use the new extended statsitics.)


  • John Tornblad Wednesday, August 15, 2012

    Is SQL Monitor / v$sql_monitor the "source" of A-Rows information? We have a very long running, complex parallel query running on hundreds of millions of rows. We've had a hard time making and keeping the query visible in the SQL Monitor to examine A-Rows.

    Using dbms_xplan.display_cursor(sql_id=>'xxx',format=>'allstats all')

    ... on the query while it is running does not display A-Rows. Is it supposed to? Our query is hinted to GATHER_PLAN_STATISTICS.

    First we learned of _sqlmon_max_planlines that limits plans shown in SQL Monitor to 300 lines or less (ours is 1500 lines). After changing that we saw the plan in SQL Monitor for about 90 minutes.. then it disappeared! Another support request lead to _sqlmon_recycle_time which we increased from 30 minutes to 12 hours in the hope that we'll be able to see the executing query for a longer amount of time.

    Bottom line, if the query is not visible in OEM's SQL Monitor / v$sql_monitor, then is there no other source for A-Rows?

    Thanks! Really enjoy the Optimizer blog.


  • Srikar Dasari Tuesday, April 9, 2013

    Hi Maria,

    Congratulations to you and your team on a very interesting blog, it is is very helpful. I enjoyed reading all the articles over this blog.

    Keep up the sounding good work, request you to write more articles.

    Regards,

    Srikar Dasari


  • guest Thursday, June 20, 2013

    It's wonderful blog. Thanks for sharing information with detailed example. As part of dataware house developer, I need to query huge partitioned and sub-partitioned tables in Oracle exadata database. I always use sql_monitor to see performance of long running query.

    There are many new cases in exadata such as dynamic sampling, MBRC value in aux_stats$ table which actually makes full table scan better option and HCC decompression.

    It would be nice to see some topics on these.

    Thanks.


  • Ted Thursday, July 11, 2013

    I'm unclear on the second execution plan output, in which you state that line five is an excellent cardinality estimate. Don't you need first to divide A-Rows by Starts, thus resulting in

    918,000 / 28 = ~ 32,786

    which is then two orders of magnitude under the estimate (E-Rows value)of 918,000?

    Thanks much.


  • guest Saturday, August 3, 2013

    Hi,

    We have issue with an SQL query and checking with DBMS_SQLTUNE package, it was advice that statistics are stale and to gather statistics.

    I have tried that SQL with GATHER_PLAN_STATISTICS hint but total execution of the Query increased by near to double.

    May I please request to explore the issue.

    Regards,

    Jay vardhan


  • guest Thursday, February 20, 2014

    Are there any licensing issues while on Standard edition using FORMAT parameter 'ALLSTATS LAST' in (DBMS_XPLAN.DISPLAY_CURSOR(FORMAT=>'ALLSTATS LAST')).

    Raj


  • Gouranga Mohapatra Friday, February 23, 2018
    Hi,
    I am using following:
    explain plan for
    ;
    but, I am not getting estimated and actual rows. Only rows column is coming only with others.

    Please suggest.
  • Nigel Bayliss Monday, March 12, 2018
    Hi Gouranga, "explain plan" doesn't execute the query - it just shows you the plan. To get the plan statistics (such as A-Rows) the query must physically execute. So, run the query...

    SELECT /*+ gather_plan_statistics */ ...

    ...and then display the plan along with the statistics that were gathered while it executed...

    SELECT * FROM table(DBMS_XPLAN.DISPLAY_CURSOR(FORMAT=>'ALLSTATS LAST'));

    Regards,
    Nigel
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.Captcha
Oracle

Integrated Cloud Applications & Platform Services