How do I know what extended statistics are needed for a given workload?

In our previous post we introduced extended statistics, which help the Optimizer improve the accuracy of cardinality estimates for SQL statements that contain predicates involving a function wrapped column (e.g. UPPER(LastName)) or multiple columns from the same table used in filter predicates, join conditions, or group-by keys. So extended statistics are extremely useful but how do you know which extended statistics should be created?

In Oracle Database we introduced Auto Column Group Creation, which automatically determines which column groups are required for a table based on a given workload. Please note this functionality does not create extended statistics for function wrapped columns it is only for column groups. Auto Column Group Creation is a simple three step process:

1. Seed column usage
Oracle must observe a representative workload, in order to determine the appropriate column groups. Using the new procedure DBMS_STATS.SEED_COL_USAGE, you tell Oracle how long it should observe the workload. The following example turns on monitoring for 5 minutes or 300 seconds. This monitoring procedure records different information from the traditional column usage information you see in sys.col_usage$ and it is stored in sys.col_group_usage$.


You don't need to execute all of the queries in your work during this window. You can simply run explain plan for some of your longer running queries to ensure column group information is recorded for these queries. The example below uses two queries that run against the customers_test table (which is a copy of the customers table in the SH schema).




Once the monitoring window has finished, it is possible to review the column usage information recorded for a specific table using the new function DBMS_STATS.REPORT_COL_USAGE. This function generates a report, which lists what columns from the table were seen in filter predicates, join predicates and group by clauses in the workload.


It is also possible to view a report for all the tables in a specific schema by running DBMS_STATS.REPORT_COL_USAGE and providing just the schema name and NULL for the table name.

2. Create the column groups
At this point you can get Oracle to automatically create the column groups for each of the tables based on the usage information captured during the monitoring window. You simply have to call the DBMS_STATS.CREATE_EXTENDED_STATS function for each table.This function requires just two arguments, the schema name and the table name. From then on, statistics will be maintained for each column group whenever statistics are gathered on the table. In this example you will see two column groups were created based on the information captured from the two queries in this workload.


It is also possible to create all of the proposed column groups for a particular schema in one shot by running the DBMS_STATS.CREATE_EXTENDED_STATS function and passing NULL for the table name.

3. Regather statistics
The final step is to regather statistics on the affected tables so that the newly created column groups will have statistics created for them.


Once the statistics have been gathered you should check out the USER_TAB_COL_STATISTICS view to see what additional statistics were created. In this example you will see two new columns listed for the customers_test table. Both columns have system generated names, the same names that were returned from the DBMS_STATS.CREATE_EXTENDED_STATS function.


You will also notice that one of the column groups has a height-based histogram created on it. This column group was created on CUST_CITY, CUST_STATE_PROVINCE, and COUNTRY_ID. While monitoring column groups we also monitor the fact that histogram may be potentially useful for the column groups and subsequent statistics collection create a histogram for the group. So now that the column groups are in place, let's see if they improved the cardinality estimates for the two queries we used in the monitoring window.


In both cases the cardinality estimate is far more accurate than without extended statistics.



Would like to understand how extended statistics impact dynamic sampling in Oracle 11gR2.

Posted by guest on February 08, 2012 at 02:29 PM PST #


I would like to know if it is possible to create extended stats on multiple columns that involves an expression on one of the columns. For example, can we gather extended stats for (UPPER(ename),empid)?


Posted by guest on March 21, 2012 at 06:43 AM PDT #

Hi Senthil,

No unfortunately, column groups are not supported on expressions.


Posted by Maria Colgan on March 21, 2012 at 04:50 PM PDT #


Thanks for your reply! We were getting weird missing paranthesis error message when we tried to create extended statistics for a mixed group of columns that involved an expression. Just wanted to check with you on this.

Thanks again,

Posted by Senthil on March 21, 2012 at 05:26 PM PDT #

Hi Maria,

I try to implement the way you explained and found below observation. All the column orders in a column grouping reported by seed-col-usage and report_col_usage is in alphabetical order. Ex: Actual column order in WHERE clause => "RECEIVING_ORGANIZATION","IS_LATEST","IS_VIEWED","IS_PRINTED","IS_ABNORMAL"; But extended stats created for "IS_ABNORMAL","IS_LATEST","IS_PRINTED","IS_VIEWED","RECEIVING_ORGANIZATION". I thought of drop the already created extended stats column groups using DROP_EXTENDED_STATS, but I got this error:
ORA-00001: unique constraint (SYS.I_WRI$_OPTSTAT_HH_OBJ_ICOL_ST) violated
ORA-06512: at "SYS.DBMS_STATS", line 8576
ORA-06512: at "SYS.DBMS_STATS", line 8630
ORA-06512: at "SYS.DBMS_STATS", line 32711
ORA-06512: at line 1

I try to delete table stats, but already created extended stats column groups didn't get delete (in USER_STAT_EXTENSIONS), I thought of removing entries from WRI$_OPTSTAT_HISTHEAD_HISTORY, but afraid to do so since I am not sure about the side effects. Your help will be highly appreciated on this. I am still struggling on how to drop this extended stats. Thanks!

Posted by Karthick on August 16, 2013 at 12:53 PM PDT #

Works great thanks for a good example. We will implement column groups and reduce wait time and over all DB time , preserve CPU user clocks.

Posted by Arvind Singh on September 15, 2015 at 11:46 AM PDT #

Nice insights, thank you,

Posted by Foued on May 27, 2016 at 04:05 AM PDT #

Post a Comment:
  • HTML Syntax: NOT allowed

The Oracle Optimizer blog is written by members of the Optimizer development team. The goal of this blog is to provide an insight into the workings of the Optimizer and the statistics it relies on. The views expressed on this blog are our own and do not necessarily reflect the views of Oracle and its affiliates. The views and opinions expressed by visitors on this blog are theirs solely and may not reflect ours.


« December 2016