X

Insights into Statistics, Query Optimization and the Oracle Optimizer

How do I drop an existing histogram on a column and stop the Auto Stats gathering job from creating it in the future?

Nigel Bayliss
Product Manager

Before answering this question, I want to provide a bit more background to this scenario. The person who submitted this question has a table with a VARCHAR2 column called ID whose data distribution is skewed. When the Auto Stats job kicks in for this table a histogram is automatically created on this column as it is used in nearly every SQL statement and it has a data skew. However, the values in the ID column are extremely long and the first 32 characters in each ID are identical (the database is Oracle Database 11g).

When Oracle creates a histogram on a VARCHAR2 column it will only consider the first 64 characters of the column value in Oracle Database 12c or the first 32 characters of the column value in Oracle Database 11g. In other words, if the first 32/64 bytes of the column values are identical all of the values end up in a single bucket of the histogram even though in reality the values are different and are suppose to be in different histogram buckets. The Optimizer can potentially misestimate the cardinality when using this histogram, which can result in suboptimal plan. Note that number of distinct values and other statistics will be accurate, only the histogram will have this issue. For this reason it would be better not to have a histogram on this particular column now or in the future. So how do you drop the existing histogram and stop it from being collected in the future?

1. You should begin by dropping the histogram on the ID column. You can this by using the dbms_stats.delete_column_stats procedure and setting the col_stat_type parameter to HISTOGRAM.

BEGIN
   dbms_stats.delete_column_stats(
         ownname=>'SH', tabname=>'SALES', colname=>'PROD_ID',
                                  col_stat_type=>'HISTOGRAM');
END;

2. Use the DBMS_STATS.SET_TABLE_PREF procedure to set a specific value for the METHOD_OPT parameter for the table effected by this problem. The following value for the METHOD_OPT parameter tells Oracle to continue to collect histograms as usual on all of the columns in the SALES table except for the PROD_ID column, which should never have a histogram created on it.

BEGIN
   dbms_stats.set_table_prefs('SH', 'SALES',
   'METHOD_OPT', 
   'FOR ALL COLUMNS SIZE AUTO, FOR COLUMNS SIZE 1 PROD_ID');
END;
/

3. The auto stats gathering job or your own statistics gathering commands will now use the table preference you set when it gathers statistics on this table and will no longer create a histogram on the ID column.

Join the discussion

Comments ( 15 )
  • Bart Gaddis Wednesday, April 27, 2011
    Is the syntax for the 11g dbms_stats.set_table_prefs example correct? According to the documentation:
    DBMS_STATS.SET_TABLE_PREFS (
    ownname IN VARCHAR2, tabname IN VARCHAR2, pname IN VARCHAR2, pvalue IN VARCHAR2);
    Should this be the way I call the procedure?
    SQL> BEGIN
    2 dbms_stats.set_table_prefs('V500', 'DM_INFO','METHOD_OPT',
    3 'FOR ALL COLUMNS SIZE AUTO FOR COLUMNS SIZE 1 INFO_LONG_ID');
    4 END;
    5 /
    Even when calling the procedure in this manner, I receive errors:
    BEGIN
    *
    ERROR at line 1:
    ORA-20001: method_opt should follow the syntax "[FOR ALL [INDEXED|HIDDEN]
    COLUMNS [size_caluse]]" when gathering statistics on a group of tables
    ORA-06512: at "SYS.DBMS_STATS", line 9641
    ORA-06512: at "SYS.DBMS_STATS", line 9716
    ORA-06512: at "SYS.DBMS_STATS", line 10048
    ORA-06512: at "SYS.DBMS_STATS", line 24573
    ORA-06512: at "SYS.DBMS_STATS", line 24739
    ORA-06512: at line 2
    We were going to use this as a method of excluding columns from having histograms created, but have found that we cannot get the method_opt preference to work for anything other than "for all columns size auto" and "for all colums size x".
  • Maria Colgan Wednesday, April 27, 2011
    Hi Bart,
    Thanks for finding my cut and paste error. You are correct the original syntax I had in the DBMS_STATS.SET_TABLE_PREFS command was wrong. I have corrected it now in the blog text. It should be
    BEGIN
    dbms_stats.set_table_prefs('SH', 'SALES','METHOD_OPT',
    'FOR ALL COLUMNS SIZE AUTO, FOR COLUMNS SIZE 1 PROD_ID');
    END;
    In your example, if you just add a comma after the word AUTO your command should work.
    Thanks,
    Maria
  • Eric Gold Wednesday, April 27, 2011
    How can we best detect when our columns may suffer from this issue? The histogram hurts us right when it gets created, so having a way to remove it as quickly as possible is best, but I cannot find a way to reliably detect it.
  • Maria Colgan Wednesday, April 27, 2011
    It turns out Bart was hitting bug 7479309 in Oracle Database 11g Release 1 (both 11.1.0.6 and 11.1.0.7), which prevents the set_table_prefs approach from working. This method_opt setting will only work in 11.2 or if you apply a one off patch for this bug.
    You can also work around this problem in 11g R1 by using the 10g approach of setting the method_opt parameter in the actual dbms_stats.gather_table_stats command.
    Thanks,
    Maria
  • Bruno Lavoie Wednesday, May 11, 2011
    It would be nice to autodetect this via a script: columns that currently hold an histogram that the vast majority of 32 first chars is mostly identical.
    :)
  • Nelson Calero Wednesday, September 28, 2011

    Hi Maria,

    For 10g your solution doesn't work, because the parameter COL_STAT_TYPE in the procedure delete_column_stat was added on 11g.

    To delete histograms on 10g I used to execute:

    dbms_stat.gather_table_stats(..., method_opt=>'FOR COLUMNS name SIZE 1')

    Until I changed to dbms_stat.set_table_stats to avoid analyzing the table, and use as parameter values from dba_tab_col_statistics.

    Nelson


  • Sam Thursday, October 20, 2011

    hi,

    from where to check that histograms are deleted for the table column.

    here I collected the data for one from column from dba_tab_histograms

    I cant see any difference before and after ..

    OWNER TABLE_NAME COLUMN_NAME ENDPOINT_NUMBER ENDPOINT_VALUE ENDPOINT_ACTUAL_VALU

    ---------- -------------------- ------------------------------ --------------- -------------------------------------- --------------------

    ORDER SUPPLY_ARCHIVE SURETY_TIME 0 2455792

    ORDER SUPPLY_ARCHIVE SURETY_TIME 1 2464692

    BEGIN

    dbms_stats.delete_column_stats(

    ownname=>'ORDER', tabname=>'SUPPLY_ARCHIVE', colname=>'SURETY_TIME', col_stat_type=> 'HISTOGRAM');

    END;

    /

    PL/SQL procedure successfully completed

    OWNER TABLE_NAME COLUMN_NAME ENDPOINT_NUMBER ENDPOINT_VALUE ENDPOINT_ACTUAL_VALU

    ---------- -------------------- ------------------------------ --------------- -------------------------------------- --------------------

    ORDER SUPPLY_ARCHIVE SURETY_TIME 0 2455792

    ORDER SUPPLY_ARCHIVE SURETY_TIME 1 2464692


  • Sam Thursday, October 20, 2011

    hi,

    from where to check that histograms are deleted for the table column.

    I checked from dba_tab_histograms before and after dropping histograms

    I can't see any difference before and after ..


  • Michael Fontana Monday, March 26, 2012

    Uwe, how can you verify histograms were removed after execution of the steps you describe in Oracle 10gr2? I have a table where, after following your method precisely, the flag stating histograms exist is set to "Y" for each column. (this same column was set to "N" in the intermediate step where the stats were deleted).

    Please describe a process or script for validating the success/failure of removing histograms, and perhaps a way to debug the process if it fails?


  • Kevin Garlow Tuesday, June 5, 2012

    Michael, try using this query in 10gR2 to see how many histogram buckets there are for each column in a table. If NUM_BUCKETS is 1, there is effectively no histogram for the column.

    -- Example: SH.SALES table

    SELECT TABLE_NAME, COLUMN_NAME, NUM_BUCKETS

    FROM DBA_TAB_COL_STATISTICS

    WHERE TABLE_NAME = 'SALES'

    AND OWNER = 'SH'

    order by 1, 2

    /


  • guest Monday, June 25, 2012

    Thank you for this post! I have several long string columns which are indexed and very balanced as a whole, but very skewed in the first 32 characters. I have noticed that modifying queries to search both on equality, and on "like" with the first 31 characters, allows the optimizer to use the index.

    It seems to me this is an optimizer bug -- shouldn't the presence of histogram data increase the quality of query plans rather than degrade it? Surely the optimizer can be made aware of the 32 character limitation on histograms and not rely on them for long string columns (or act as if the "like" clause had been applied). It would help me explain this to clients if I could point to an existing bug ticket.

    Thanks again.


  • guest Thursday, May 9, 2013

    Hi Maria,

    It seems whenever I gather stats or delete stats, in order for that change to take effect I have to do an "alter system flush shared_pool". I am running right now on a 11.1.07. Is there a better way than to clear the cache?

    Thanks,

    kr


  • guest Tuesday, May 21, 2013

    Hi Krahim,

    By default, when you gather or delete statistics the affected or dependent cursors in the shared-pool will be gradually aged out over the next 5 hours. This gradual aging out or invalidation of the affected cursors prevents a hard parse storm and contention on the shared pool from occurring. You should see the new statistics gradually taking effect.

    You can change this behavior if you wish using the no_invalidate parameter of the DBMS_STATS.GATHER_*_STATS procedures. By default it is set to DBMS_STATS.AUTO_INVALIDATE, which is the gradual aging out of dependent cursors, I described above.

    If you set no_invalidate to TRUE then none of the existing cursors will be invalidated. The newly gathered statistics will only take effect when a SQL statement is hard parsed and a new cursor is created.

    However, if you set no_invalidate to false then all of the dependent cursors in the shared pool will be automatically invalidated as soon as the statistics gather command completes. If you have a large number of dependent cursors this can result in a large increase in the number of hard parses done on the system, which could result in shared pool contention.

    Thanks,

    Maria


  • guest Tuesday, May 21, 2013

    Thanks, you explain so well. NO_INVALIDATE => False, is what I will do while testing, so much better than clearing sp (which I always knew was not the right thing to do)

    kr


  • Carlos Monday, September 15, 2014

    Awesome post, worked just as expected in 11g

    thanks


Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.