In our previous post we introduced extended statistics, which help the Optimizer improve the accuracy of cardinality estimates for SQL statements that contain predicates involving a function wrapped column (e.g. UPPER(LastName)) or multiple columns from the same table used in filter predicates, join conditions, or group-by keys. So extended statistics are extremely useful but how do you know which extended statistics should be created?
In Oracle Database 11.2.0.2 we introduced Auto Column Group Creation, which automatically determines which column groups are required for a table based on a given workload. Please note this functionality does not create extended statistics for function wrapped columns it is only for column groups. Auto Column Group Creation is a simple three step process:
1. Seed column usage
Oracle must observe a representative workload, in order to determine the appropriate column groups. Using the new procedure DBMS_STATS.SEED_COL_USAGE, you tell Oracle how long it should observe the workload. The following example turns on monitoring for 5 minutes or 300 seconds. This monitoring procedure records different information from the traditional column usage information you see in sys.col_usage$ and it is stored in sys.col_group_usage$.
You don't need to execute all of the queries in your work during this window. You can simply run explain plan for some of your longer running queries to ensure column group information is recorded for these queries. The example below uses two queries that run against the customers_test table (which is a copy of the customers table in the SH schema).
Once the monitoring window has finished, it is possible to review the column usage information recorded for a specific table using the new function DBMS_STATS.REPORT_COL_USAGE. This function generates a report, which lists what columns from the table were seen in filter predicates, join predicates and group by clauses in the workload.
It is also possible to view a report for all the tables in a specific schema by running DBMS_STATS.REPORT_COL_USAGE and providing just the schema name and NULL for the table name.
2. Create the column groups
At this point you can get Oracle to automatically create the column groups for each of the tables based on the usage information captured during the monitoring window. You simply have to call the DBMS_STATS.CREATE_EXTENDED_STATS function for each table.This function requires just two arguments, the schema name and the table name. From then on, statistics will be maintained for each column group whenever statistics are gathered on the table. In this example you will see two column groups were created based on the information captured from the two queries in this workload.
It is also possible to create all of the proposed column groups for a particular schema in one shot by running the DBMS_STATS.CREATE_EXTENDED_STATS function and passing NULL for the table name.
3. Regather statistics
The final step is to regather statistics on the affected tables so that the newly created column groups will have statistics created for them.
Once the statistics have been gathered you should check out the USER_TAB_COL_STATISTICS view to see what additional statistics were created. In this example you will see two new columns listed for the customers_test table. Both columns have system generated names, the same names that were returned from the DBMS_STATS.CREATE_EXTENDED_STATS function.
You will also notice that one of the column groups has a height-based histogram created on it. This column group was created on CUST_CITY, CUST_STATE_PROVINCE, and COUNTRY_ID. While monitoring column groups we also monitor the fact that histogram may be potentially useful for the column groups and subsequent statistics collection create a histogram for the group. So now that the column groups are in place, let's see if they improved the cardinality estimates for the two queries we used in the monitoring window.
In both cases the cardinality estimate is far more accurate than without extended statistics.
Maria Colgan+
Maria,
Would like to understand how extended statistics impact dynamic sampling in Oracle 11gR2.
Maria:
I would like to know if it is possible to create extended stats on multiple columns that involves an expression on one of the columns. For example, can we gather extended stats for (UPPER(ename),empid)?
Thanks,
Senthil
Hi Senthil,
No unfortunately, column groups are not supported on expressions.
Thanks,
Maria
Maria:
Thanks for your reply! We were getting weird missing paranthesis error message when we tried to create extended statistics for a mixed group of columns that involved an expression. Just wanted to check with you on this.
Thanks again,
Senthil
Hi Maria,
I try to implement the way you explained and found below observation. All the column orders in a column grouping reported by seed-col-usage and report_col_usage is in alphabetical order. Ex: Actual column order in WHERE clause => "RECEIVING_ORGANIZATION","IS_LATEST","IS_VIEWED","IS_PRINTED","IS_ABNORMAL"; But extended stats created for "IS_ABNORMAL","IS_LATEST","IS_PRINTED","IS_VIEWED","RECEIVING_ORGANIZATION". I thought of drop the already created extended stats column groups using DROP_EXTENDED_STATS, but I got this error:
ORA-00001: unique constraint (SYS.I_WRI$_OPTSTAT_HH_OBJ_ICOL_ST) violated
ORA-06512: at "SYS.DBMS_STATS", line 8576
ORA-06512: at "SYS.DBMS_STATS", line 8630
ORA-06512: at "SYS.DBMS_STATS", line 32711
ORA-06512: at line 1
I try to delete table stats, but already created extended stats column groups didn't get delete (in USER_STAT_EXTENSIONS), I thought of removing entries from WRI$_OPTSTAT_HISTHEAD_HISTORY, but afraid to do so since I am not sure about the side effects. Your help will be highly appreciated on this. I am still struggling on how to drop this extended stats. Thanks!
Works great thanks for a good example. We will implement column groups and reduce wait time and over all DB time , preserve CPU user clocks.
Nice insights, thank you,
Foued