How do I know if the cardinality estimates in a plan are accurate?

It is often recommended when you are tuning a SQL statement to confirm that the Optimizer's cardinality estimates, in the execution plan, are accurate. But how do you do that?

It is probably easier to answer this question with an example. Let's take the following example of a simple two table join.

SELECT      p.prod_name, SUM(s.quantity_sold)
FROM         sales s, products p
WHERE      s.prod_id =p.prod_id
AND            p.prod_desc = 'Envoy Ambassador'
GROUP By p.prod_name ;


The execution plan for the query is as follows:


The Optimizer's cardinality estimate is shown in the column Rows in the execution plan. For each line of the plan, the value in this column indicates the number of rows the Optimizer estimates will be returned by that operation. Traditionally you would have been told to do select count(*) style queries to confirm the cardinality estimates for each line in the plan . For example, you could confirm the cardinality estimate of 1 on line 3 of the above plan by issuing the following SQL statement.


Having to do a select count(*) query for every aspect of a complex plan can be extremely painful. However, life was made considerably easier with the introduction of the GATHER_PLAN_STATISTICS hint in Oracle Database 10g. The GATHER_PLAN_STATISTICS hint tells Oracle to collect execution statistics for a SQL statement. These execution statistics are then shown next to the original Optimizer estimates in the execution plan if you use the function DBMS_XPLAN.DISPLAY_CURSOR to display the plan. You also have to set the  FORMAT parameter to 'ALLSTATS LAST' (DBMS_XPLAN.DISPLAY_CURSOR(FORMAT=>'ALLSTATS LAST')). The original Optimizer estimates are shown in the E-Rows column while the actual statistics gathered during execution are shown in the A-Rows column.

If we add the GATHER_PLAN_STATISTICS hint to our simple SQL statement we should be able to see the actual cardinality of each operation at execution time alongside the Optimizer estimates for each cardinality in the plan.

SELECT /*+ GATHER_PLAN_STATISTICS */ p.prod_name, SUM(s.quantity_sold)
FROM         sales s, products p
WHERE      s.prod_id =p.prod_id
AND            p.prod_desc = 'Envoy Ambassador'
GROUP By p.prod_name ;

The execution plan for the query is as follows:



In our example the Optimizer did an excellent job at the single table access cardinality estimates (line 3 & 5 of the plan), which are identical to the actual execution statistics. The estimate for the HASH JOIN on line 2 of the plan is slightly over but in the same ball park. You want the cardinality estimate to be at the same order of magnitude as the actual execution statistics. So in this case, the Optimizer estimated 12,941 rows when there were actually 9,591 rows.


Gotchas


Before you run off and start inserting the GATHER_PLAN_STATISTICS hint into every SQL statement you execute, you should be aware of some of the gotchas.

Gathering the execution statistics does incur some overhead. The amount of overhead is dependent on the operating system. Jonathan Lewis discusses this in more detail on his blog.

The second gotcha comes if you use Parallel Execution. Again it is probably easier to explain this using our example. If we were to do an 'Alter session force parallel query;' command before executing our simple query the information in the A-Rows column would look very different.


Alter session force parallel query;


SELECT /*+ GATHER_PLAN_STATISTICS */ p.prod_name, SUM(s.quantity_sold)
FROM         sales s, products p
WHERE      s.prod_id =p.prod_id
AND            p.prod_desc = 'Envoy Ambassador'
GROUP By p.prod_name ;


SELECT *
FROM table(DBMS_XPLAN.DISPLAY_CURSOR(FORMAT=>'ALLSTATS LAST'));



As you can see from the parallel execution plan above the majority of the rows in the table have 0 in the A-Rows column. This is caused by the nature of Parallel Execution and how it interacts with the FORMAT we requested in the DBMS_XPLAN.DISPLAY_CURSOR command. The FORMAT parameter value 'ALLSTATS LAST' requests that the execution statistics of the last execution of the cursor be shown. In parallel execution the last process to execute the cursor is the Query Coordinator (QC). Typically the QC (a single process) executes only a small number of the operations in the execution plan, while the majority of the operations are done by the parallel server processes. So, when we issue the the  DBMS_XPLAN.DISPLAY_CURSOR and ask for the last execution we only get information about the operations in the plan that the QC actually executed. In this case the only operation the QC did was return the final result to our SQL*Plus session, which is why only line 0 and 1 have entries in the A-Rows column.


In order to see the A-Rows values for all of the operations in the plan you would have to use the FORMAT value  'ALLSTATS ALL', which will show you the execution statistics for ALL executions of the cursor. Be careful here as the values in the A-Rows column is now cumulative (incremented on every execution). The more the statement is executed the higher the values in the A-Rows column will become. This FORMAT will also show you a lot more columns in the execution plan. A much simpler approach to check cardinality estimates for parallel statements is to use SQL Monitor. It always shows the A-Rows column and you don't have to worry about it being cumulative. More information on using SQL Monitor can be found on Doug Burns blog and Greg Rahn's blog. The SQL Monitor plan for the parallel version of our SQL statement is below.


Comments:

The GATHER_PLAN_STATISTICS hint tells Oracle to collect execution statistics for a SQL statement.

Does it mean it starts to gather statistics of all tables in sql by using

dbms_stats.gather_table_stats??

Posted by guest on December 08, 2011 at 02:06 AM PST #

No the GATHER_PLAN_STATISTICS hint doesn't gather optimizer statistics as in those gather by dbms_stats.gather_table_stats. The GATHER_PLAN_STATISTICS hint captures execution statistics. For example, the number of rows actually returned by each operation in the execution plan.

Posted by Maria Colgan on December 13, 2011 at 01:43 PM PST #

I tested the gather_plan_statistics hint on a query and looked at the plan which showed:

-----------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads | OMem | 1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------------------------------
|* 1 | HASH JOIN | | 1 | 30M| 41555 |00:02:42.06 | 115K| 112K| 904K| 904K| 1237K (0)|
| 2 | TABLE ACCESS FULL | O_15_OBJ_DIM | 1 | 364 | 364 |00:00:00.02 | 16 | 14 | | | |
|* 3 | HASH JOIN | | 1 | 30M| 41555 |00:02:41.93 | 115K| 112K| 889K| 889K| 1229K (0)|
|* 4 | TABLE ACCESS FULL | T_99_TIME_DIM | 1 | 143 | 192 |00:00:15.16 | 5797 | 5788 | | | |
| 5 | INDEX FAST FULL SCAN| PK_R_99_STAT_RES | 1 | 30M| 30M|00:01:31.31 | 109K| 107K| | | |
-----------------------------------------------------------------------------------------------------------------------------------------

The estimate for the actual objects is close, but for the hash join it is way off .. why could this be?

Posted by guest on January 09, 2012 at 04:24 AM PST #

Without seeing the actual query text it is hard to know why the hash join estimate is so off. It could because multiple columns were used in the join predicate or a number of other reason.

Posted by Maria Colgan on January 09, 2012 at 04:50 AM PST #

Is it possible to use the actual rows to improve the execution plan? (I have correlated data, but 10g, so I can't use the new extended statsitics.)

Posted by fritz on April 26, 2012 at 01:17 AM PDT #

Is SQL Monitor / v$sql_monitor the "source" of A-Rows information? We have a very long running, complex parallel query running on hundreds of millions of rows. We've had a hard time making and keeping the query visible in the SQL Monitor to examine A-Rows.

Using dbms_xplan.display_cursor(sql_id=>'xxx',format=>'allstats all')

... on the query while it is running does not display A-Rows. Is it supposed to? Our query is hinted to GATHER_PLAN_STATISTICS.

First we learned of _sqlmon_max_planlines that limits plans shown in SQL Monitor to 300 lines or less (ours is 1500 lines). After changing that we saw the plan in SQL Monitor for about 90 minutes.. then it disappeared! Another support request lead to _sqlmon_recycle_time which we increased from 30 minutes to 12 hours in the hope that we'll be able to see the executing query for a longer amount of time.

Bottom line, if the query is not visible in OEM's SQL Monitor / v$sql_monitor, then is there no other source for A-Rows?

Thanks! Really enjoy the Optimizer blog.

Posted by John Tornblad on August 15, 2012 at 02:27 PM PDT #

Hi Maria,

Congratulations to you and your team on a very interesting blog, it is is very helpful. I enjoyed reading all the articles over this blog.

Keep up the sounding good work, request you to write more articles.

Regards,
Srikar Dasari

Posted by Srikar Dasari on April 09, 2013 at 03:17 AM PDT #

It's wonderful blog. Thanks for sharing information with detailed example. As part of dataware house developer, I need to query huge partitioned and sub-partitioned tables in Oracle exadata database. I always use sql_monitor to see performance of long running query.
There are many new cases in exadata such as dynamic sampling, MBRC value in aux_stats$ table which actually makes full table scan better option and HCC decompression.
It would be nice to see some topics on these.
Thanks.

Posted by guest on June 20, 2013 at 12:34 AM PDT #

I'm unclear on the second execution plan output, in which you state that line five is an excellent cardinality estimate. Don't you need first to divide A-Rows by Starts, thus resulting in

918,000 / 28 = ~ 32,786

which is then two orders of magnitude under the estimate (E-Rows value)of 918,000?

Thanks much.

Posted by Ted on July 10, 2013 at 08:04 PM PDT #

Hi,

We have issue with an SQL query and checking with DBMS_SQLTUNE package, it was advice that statistics are stale and to gather statistics.
I have tried that SQL with GATHER_PLAN_STATISTICS hint but total execution of the Query increased by near to double.

May I please request to explore the issue.

Regards,
Jay vardhan

Posted by guest on August 03, 2013 at 11:37 AM PDT #

Are there any licensing issues while on Standard edition using FORMAT parameter 'ALLSTATS LAST' in (DBMS_XPLAN.DISPLAY_CURSOR(FORMAT=>'ALLSTATS LAST')).

Raj

Posted by guest on February 19, 2014 at 08:50 PM PST #

Post a Comment:
  • HTML Syntax: NOT allowed
About

The Oracle Optimizer blog is written by members of the Optimizer development team. The goal of this blog is to provide an insight into the workings of the Optimizer and the statistics it relies on. The views expressed on this blog are our own and do not necessarily reflect the views of Oracle and its affiliates. The views and opinions expressed by visitors on this blog are theirs solely and may not reflect ours.

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today