It is often recommended when you are tuning a SQL statement to confirm that the Optimizer's cardinality estimates, in the execution plan, are accurate. But how do you do that?
It is probably easier to answer this question with an example. Let's take the following example of a simple two table join.
SELECT p.prod_name, SUM(s.quantity_sold) FROM sales s, products p WHERE s.prod_id =p.prod_id AND p.prod_desc = 'Envoy Ambassador' GROUP By p.prod_name ;
The execution plan for the query is as follows:
The Optimizer's cardinality estimate is shown in the column Rows in the execution plan. For each line of the plan, the value in this column indicates the number of rows the Optimizer estimates will be returned by that operation. Traditionally you would have been told to do select count(*) style queries to confirm the cardinality estimates for each line in the plan . For example, you could confirm the cardinality estimate of 1 on line 3 of the above plan by issuing the following SQL statement.
Having to do a select count(*) query for every aspect of a complex plan can be extremely painful. However, life was made considerably easier with the introduction of the GATHER_PLAN_STATISTICS hint in Oracle Database 10g. The GATHER_PLAN_STATISTICS hint tells Oracle to collect execution statistics for a SQL statement. These execution statistics are then shown next to the original Optimizer estimates in the execution plan if you use the function DBMS_XPLAN.DISPLAY_CURSOR to display the plan. You also have to set the FORMAT parameter to 'ALLSTATS LAST' (DBMS_XPLAN.DISPLAY_CURSOR(FORMAT=>'ALLSTATS LAST')). The original Optimizer estimates are shown in the E-Rows column while the actual statistics gathered during execution are shown in the A-Rows column.
If we add the GATHER_PLAN_STATISTICS hint to our simple SQL statement we should be able to see the actual cardinality of each operation at execution time alongside the Optimizer estimates for each cardinality in the plan.
SELECT /*+ GATHER_PLAN_STATISTICS */ p.prod_name, SUM(s.quantity_sold) FROM sales s, products p WHERE s.prod_id =p.prod_id AND p.prod_desc = 'Envoy Ambassador' GROUP By p.prod_name ;The execution plan for the query is as follows:
In our example the Optimizer did an excellent job at the single table access cardinality estimates (line 3 & 5 of the plan), which are identical to the actual execution statistics. The estimate for the HASH JOIN on line 2 of the plan is slightly over but in the same ball park. You want the cardinality estimate to be at the same order of magnitude as the actual execution statistics. So in this case, the Optimizer estimated 12,941 rows when there were actually 9,591 rows.
Before you run off and start inserting the GATHER_PLAN_STATISTICS hint into every SQL statement you execute, you should be aware of some of the gotchas.
Gathering the execution statistics does incur some overhead. The amount of overhead is dependent on the operating system. Jonathan Lewis discusses this in more detail on his blog.
The second gotcha comes if you use Parallel Execution. Again it is probably easier to explain this using our example. If we were to do an 'Alter session force parallel query;' command before executing our simple query the information in the A-Rows column would look very different.
Alter session force parallel query;
SELECT /*+ GATHER_PLAN_STATISTICS */ p.prod_name, SUM(s.quantity_sold) FROM sales s, products p WHERE s.prod_id =p.prod_id AND p.prod_desc = 'Envoy Ambassador' GROUP By p.prod_name ;
SELECT * FROM table(DBMS_XPLAN.DISPLAY_CURSOR(FORMAT=>'ALLSTATS LAST'));
As you can see from the parallel execution plan above the majority of the rows in the table have 0 in the A-Rows column. This is caused by the nature of Parallel Execution and how it interacts with the FORMAT we requested in the DBMS_XPLAN.DISPLAY_CURSOR command. The FORMAT parameter value 'ALLSTATS LAST' requests that the execution statistics of the last execution of the cursor be shown. In parallel execution the last process to execute the cursor is the Query Coordinator (QC). Typically the QC (a single process) executes only a small number of the operations in the execution plan, while the majority of the operations are done by the parallel server processes. So, when we issue the the DBMS_XPLAN.DISPLAY_CURSOR and ask for the last execution we only get information about the operations in the plan that the QC actually executed. In this case the only operation the QC did was return the final result to our SQL*Plus session, which is why only line 0 and 1 have entries in the A-Rows column.
In order to see the A-Rows values for all of the operations in the plan you would have to use the FORMAT value 'ALLSTATS ALL', which will show you the execution statistics for ALL executions of the cursor. Be careful here as the values in the A-Rows column is now cumulative (incremented on every execution). The more the statement is executed the higher the values in the A-Rows column will become. This FORMAT will also show you a lot more columns in the execution plan. A much simpler approach to check cardinality estimates for parallel statements is to use SQL Monitor. It always shows the A-Rows column and you don't have to worry about it being cumulative. More information on using SQL Monitor can be found on Doug Burns blog and Greg Rahn's blog. The SQL Monitor plan for the parallel version of our SQL statement is below.