How do I drop an existing histogram on a column and stop the Auto Stats gathering job from creating it in the future?

Before answering this question, I want to provide a bit more background to this scenario. The person who submitted this question has a table with a VARCHAR2 column called ID whose data distribution is skewed. When the Auto Stats job kicks in for this table a histogram is automatically created on this column as it is used in nearly every SQL statement and it has a data skew. However, the values in the ID column are extremely long and the first 32 characters in each ID are identical.

When Oracle creates a histogram on a VARCHAR2 column it will only consider the first 32 characters in the column. In other words, if the first 32 bytes of the column values are identical all of the values end up in a single bucket of the histogram even though in reality the values are different and are suppose to be in different histogram buckets. The Optimizer can potentially misestimate the cardinality when using this histogram, which can result in suboptimal plan. Note that number of distinct values and other statistics will be accurate, only the histogram will have this issue. For this reason it would be better not to have a histogram on this particular column now or in the future. So how do you drop the existing histogram and stop it from being collected in the future?

The answer to this question depending on which version of the database you running, Oracle Database 11g or 10g and lower.

11G solution
=========

1. You should begin by dropping the histogram on the ID column. You can this by using the dbms_stats.delete_column_stats procedure and setting the col_stat_type parameter to HISTOGRAM.

BEGIN
dbms_stats.delete_column_stats(
ownname=>'SH', tabname=>'SALES', colname=>'PROD_ID', col_stat_type=>'HISTOGRAM');
END;

2. Use the new dbms_stats.set_table_pref procedure to set a specific value for the method_opt parameter for the table effected by this problem. The following value for the method_opt parameter tells Oracle to continue to collect histograms as usual on all of the columns in the SALES table except for the PROD_ID column, which should never have a histogram created on it.

BEGIN
dbms_stats.set_table_prefs('SH', 'SALES','METHOD_OPT', 'FOR ALL COLUMNS SIZE AUTO, FOR COLUMNS SIZE 1 PROD_ID');
END;
/

3. The auto stats gathering job or your own statistics gathering commands will now use the table preference you set when it gathers statistics on this table and will no longer create a histogram on the ID column.

10G solution
=========

1. You should begin by dropping the existing statistics for the sales table. You can do this by using the dbms_stats.delete_table_stats procedure.

BEGIN
dbms_stats.delete_table_stats(ownname=>'SH', tabname=>'SALES');
END;
/

2. Next lock the statistics on the effected table using dbms_stats.lock_table_stats.
BEGIN
dbms_stats.lock_table_stats(ownname=>'SH', tabname=>'SALES');
END;
/

3. The auto-stats gathering job, which will now maintain statistics for all other tables but not the effected table.

4. Create a job of your own that will gather statistics on the effected table using the following command. This command will allow Oracle to automatically gather histograms on all of the other columns in the table but not on the ID column. Note you need to set the force parameter to true to override the locked statistics.

BEGIN
dbms_stats.gather_table_stats('SH', 'SALES',
method_opt=>'FOR ALL COLUMNS SIZE AUTO FOR COLUMNS SIZE 1 PROD_ID',
force => true);
END;
/

Comments:

Is the syntax for the 11g dbms_stats.set_table_prefs example correct? According to the documentation: DBMS_STATS.SET_TABLE_PREFS ( ownname IN VARCHAR2, tabname IN VARCHAR2, pname IN VARCHAR2, pvalue IN VARCHAR2); Should this be the way I call the procedure? SQL> BEGIN 2 dbms_stats.set_table_prefs('V500', 'DM_INFO','METHOD_OPT', 3 'FOR ALL COLUMNS SIZE AUTO FOR COLUMNS SIZE 1 INFO_LONG_ID'); 4 END; 5 / Even when calling the procedure in this manner, I receive errors: BEGIN * ERROR at line 1: ORA-20001: method_opt should follow the syntax "[FOR ALL [INDEXED|HIDDEN] COLUMNS [size_caluse]]" when gathering statistics on a group of tables ORA-06512: at "SYS.DBMS_STATS", line 9641 ORA-06512: at "SYS.DBMS_STATS", line 9716 ORA-06512: at "SYS.DBMS_STATS", line 10048 ORA-06512: at "SYS.DBMS_STATS", line 24573 ORA-06512: at "SYS.DBMS_STATS", line 24739 ORA-06512: at line 2 We were going to use this as a method of excluding columns from having histograms created, but have found that we cannot get the method_opt preference to work for anything other than "for all columns size auto" and "for all colums size x".

Posted by Bart Gaddis on April 27, 2011 at 06:50 AM PDT #

Hi Bart, Thanks for finding my cut and paste error. You are correct the original syntax I had in the DBMS_STATS.SET_TABLE_PREFS command was wrong. I have corrected it now in the blog text. It should be BEGIN dbms_stats.set_table_prefs('SH', 'SALES','METHOD_OPT', 'FOR ALL COLUMNS SIZE AUTO, FOR COLUMNS SIZE 1 PROD_ID'); END; In your example, if you just add a comma after the word AUTO your command should work. Thanks, Maria

Posted by Maria Colgan on April 27, 2011 at 07:08 AM PDT #

How can we best detect when our columns may suffer from this issue? The histogram hurts us right when it gets created, so having a way to remove it as quickly as possible is best, but I cannot find a way to reliably detect it.

Posted by Eric Gold on April 27, 2011 at 07:17 AM PDT #

It turns out Bart was hitting bug 7479309 in Oracle Database 11g Release 1 (both 11.1.0.6 and 11.1.0.7), which prevents the set_table_prefs approach from working. This method_opt setting will only work in 11.2 or if you apply a one off patch for this bug. You can also work around this problem in 11g R1 by using the 10g approach of setting the method_opt parameter in the actual dbms_stats.gather_table_stats command. Thanks, Maria

Posted by Maria Colgan on April 27, 2011 at 09:45 AM PDT #

It would be nice to autodetect this via a script: columns that currently hold an histogram that the vast majority of 32 first chars is mostly identical. :)

Posted by Bruno Lavoie on May 11, 2011 at 04:35 AM PDT #

Hi Maria,

For 10g your solution doesn't work, because the parameter COL_STAT_TYPE in the procedure delete_column_stat was added on 11g.

To delete histograms on 10g I used to execute:
dbms_stat.gather_table_stats(..., method_opt=>'FOR COLUMNS name SIZE 1')

Until I changed to dbms_stat.set_table_stats to avoid analyzing the table, and use as parameter values from dba_tab_col_statistics.

Nelson

Posted by Nelson Calero on September 28, 2011 at 02:19 PM PDT #

hi,
from where to check that histograms are deleted for the table column.
here I collected the data for one from column from dba_tab_histograms
I cant see any difference before and after ..

OWNER TABLE_NAME COLUMN_NAME ENDPOINT_NUMBER ENDPOINT_VALUE ENDPOINT_ACTUAL_VALU
---------- -------------------- ------------------------------ --------------- -------------------------------------- --------------------
ORDER SUPPLY_ARCHIVE SURETY_TIME 0 2455792
ORDER SUPPLY_ARCHIVE SURETY_TIME 1 2464692

BEGIN
dbms_stats.delete_column_stats(
ownname=>'ORDER', tabname=>'SUPPLY_ARCHIVE', colname=>'SURETY_TIME', col_stat_type=> 'HISTOGRAM');
END;
/
PL/SQL procedure successfully completed

OWNER TABLE_NAME COLUMN_NAME ENDPOINT_NUMBER ENDPOINT_VALUE ENDPOINT_ACTUAL_VALU
---------- -------------------- ------------------------------ --------------- -------------------------------------- --------------------
ORDER SUPPLY_ARCHIVE SURETY_TIME 0 2455792
ORDER SUPPLY_ARCHIVE SURETY_TIME 1 2464692

Posted by Sam on October 20, 2011 at 11:08 AM PDT #

hi,
from where to check that histograms are deleted for the table column.
I checked from dba_tab_histograms before and after dropping histograms
I can't see any difference before and after ..

Posted by Sam on October 20, 2011 at 11:39 AM PDT #

Uwe, how can you verify histograms were removed after execution of the steps you describe in Oracle 10gr2? I have a table where, after following your method precisely, the flag stating histograms exist is set to "Y" for each column. (this same column was set to "N" in the intermediate step where the stats were deleted).

Please describe a process or script for validating the success/failure of removing histograms, and perhaps a way to debug the process if it fails?

Posted by Michael Fontana on March 26, 2012 at 09:28 AM PDT #

Michael, try using this query in 10gR2 to see how many histogram buckets there are for each column in a table. If NUM_BUCKETS is 1, there is effectively no histogram for the column.

-- Example: SH.SALES table
SELECT TABLE_NAME, COLUMN_NAME, NUM_BUCKETS
FROM DBA_TAB_COL_STATISTICS
WHERE TABLE_NAME = 'SALES'
AND OWNER = 'SH'
order by 1, 2
/

Posted by Kevin Garlow on June 05, 2012 at 08:54 AM PDT #

Thank you for this post! I have several long string columns which are indexed and very balanced as a whole, but very skewed in the first 32 characters. I have noticed that modifying queries to search both on equality, and on "like" with the first 31 characters, allows the optimizer to use the index.
It seems to me this is an optimizer bug -- shouldn't the presence of histogram data increase the quality of query plans rather than degrade it? Surely the optimizer can be made aware of the 32 character limitation on histograms and not rely on them for long string columns (or act as if the "like" clause had been applied). It would help me explain this to clients if I could point to an existing bug ticket.
Thanks again.

Posted by guest on June 25, 2012 at 11:54 AM PDT #

Hi Maria,

It seems whenever I gather stats or delete stats, in order for that change to take effect I have to do an "alter system flush shared_pool". I am running right now on a 11.1.07. Is there a better way than to clear the cache?

Thanks,
kr

Posted by guest on May 09, 2013 at 12:27 PM PDT #

Hi Krahim,

By default, when you gather or delete statistics the affected or dependent cursors in the shared-pool will be gradually aged out over the next 5 hours. This gradual aging out or invalidation of the affected cursors prevents a hard parse storm and contention on the shared pool from occurring. You should see the new statistics gradually taking effect.

You can change this behavior if you wish using the no_invalidate parameter of the DBMS_STATS.GATHER_*_STATS procedures. By default it is set to DBMS_STATS.AUTO_INVALIDATE, which is the gradual aging out of dependent cursors, I described above.

If you set no_invalidate to TRUE then none of the existing cursors will be invalidated. The newly gathered statistics will only take effect when a SQL statement is hard parsed and a new cursor is created.

However, if you set no_invalidate to false then all of the dependent cursors in the shared pool will be automatically invalidated as soon as the statistics gather command completes. If you have a large number of dependent cursors this can result in a large increase in the number of hard parses done on the system, which could result in shared pool contention.

Thanks,
Maria

Posted by guest on May 21, 2013 at 02:02 PM PDT #

Thanks, you explain so well. NO_INVALIDATE => False, is what I will do while testing, so much better than clearing sp (which I always knew was not the right thing to do)

kr

Posted by guest on May 21, 2013 at 03:57 PM PDT #

Post a Comment:
  • HTML Syntax: NOT allowed
About

The Oracle Optimizer blog is written by members of the Optimizer development team. The goal of this blog is to provide an insight into the workings of the Optimizer and the statistics it relies on. The views expressed on this blog are our own and do not necessarily reflect the views of Oracle and its affiliates. The views and opinions expressed by visitors on this blog are theirs solely and may not reflect ours.

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today