How does the METHOD_OPT parameter work?

In last week’s post we addressed the most popular question surrounding statistics gathering, what sample size should be used? Once that decision has been made, folks typically turn their attention to histograms and the time honored question of, do I need histograms or not?

Regardless of which side of this religious debate you come down on, you are going to need to figure out what to set the METHOD_OPT parameter to.

This post explains in detail what the METHOD_OPT parameter controls and how it can be used to influence which columns get statistics and what type of statistics they get.

The METHOD_OPT parameter is probably the most misunderstood parameter in the DBMS_STATS.GATHER_*_STATS procedures. It’s most commonly known as the parameter that controls the creation of histograms but it actually does so much more than that. The METHOD_OPT parameter actually controls the following,

  • which columns will or will not have base column statistics gathered on them
  • the histogram creation,
  • the creation of extended statistics

The METHOD_OPT parameter syntax is made up of multiple parts. The first two parts are mandatory and are broken down in the diagram below.

The leading part of the METHOD_OPT syntax controls which columns will have base column statistics (min, max, NDV, number of nulls, etc) gathered on them. The default, FOR ALL COLUMNS, will collects base column statistics for all of the columns (including hidden columns) in the table.  The alternative values limit the collection of base column statistics as follows;

FOR ALL INDEXED COLUMNS limits base column gathering to only those columns that are included in an index. This value is not recommended as it is highly unlikely that only index columns will be used in the select list, where clause predicates, and group by clause of all of the SQL statement executed in the environment.

FOR ALL HIDDEN COLUMNS limits base column statistics gathering to only the virtual columns that have been created on a table. This means none of the actual columns in the table will have any column statistics gathered on them. Again this value is not recommended for general statistics gathering purposes. It should only be used when statistics on the base table columns are accurate and a new virtual column(s) has been created (e.g. a new column group is created). Then gathering statistics in this mode will gather statistics on the new virtual columns without re-gathering statistics on the base columns.

Note that if a column is not included in the list to have statistics gathered on it, then only its average column length is gathered. The average column length is used to correctly compute average row length and discarded (i.e., not saved to disk) after use.

The SIZE part of the METHOD_OPT syntax controls the creation of histograms and can have the following settings;

AUTO means Oracle will automatically determines the columns that need histograms based on the column usage information (SYS.COL_USAGE$), and the presence of a data skew.

An integer value indicates that a histogram will be created with at most the specified number of buckets. Must be in the range [1,254]. To force histogram creation it is recommend that the number of buckets be left at 254. Note SIZE 1 means no histogram will be created.

REPEAT ensures a histogram will only be created for any column that already has one. If the table is a partitioned table, repeat ensures a histogram will be created for a column that already has one on the global level.  However, this is not a recommended setting, as the number of buckets currently in each histogram will limit the maximum number of buckets used for the newly created histograms. Lets assume there are 5 buckets currently in a histogram. When the histogram is re-gathered with SIZE REPEAT, the newly created histogram will use at most 5 buckets and may not been of good quality.  

SKEWONLY automatically creates a histogram on any column that shows a skew in its data distribution.

If the default value of METHOD_OPT parameter, FOR ALL COLUMNS SIZE AUTO, doesn’t work in your particular environment then you most likely fall into one of the following categories,

  • Automatically create a histogram on any column in the table except a specific column
  • Only create a histogram on this specific column(s)

In an earlier blog post we showed an example of how to prevent Oracle from creating a histogram on a specific column, so let’s look at option 2, creating a histogram only on specific columns.  Let’s assume we only wanted to create a histogram on the CUST_ID of the SALES table. Remember, the METHOD_OPT parameter controls which columns get basic statistics as well as which columns get histograms, so we need to think about setting the METHOD_OPT parameter in two parts.

The first part will specify which columns will have base statistics gathered on them. In this case we wanted all of the columns in the table to have base statistics so we should use FOR ALL COLUMNS.  But what about the SIZE part of the parameter? We only want a histogram on one column, so for this leading edge of the parameter setting we need to specify SIZE 1 to prevent a histogram from being created on any column.

The second part of the parameter setting needs to specify that a histogram is needed on the CUST_ID column. This is achieved using an additional part of the METHOD_OPT syntax that we haven’t mentioned yet, called FOR COLUMNS. The FOR COLUMNS syntax allows us to provide explicit instructions for specific columns listed in this part of the parameter setting. In this case we would use, FOR COLUMNS SIZE 254 CUST_ID to specify we need a histogram on the CUST_ID column. So the final METHOD_OPT parameter setting will be;

Below is the DBMS_STATS.GATHER_TABLE_STATS command in action. The only column that has a histogram created on it is the CUST_ID, even though several of the columns in the SALES table were used in the where clause of queries executed on the system.

BEGIN 
 dbms_stats.Gather_table_stats('SH', 'SALES', - 
 method_opt => 'FOR ALL COLUMNS SIZE 1 FOR COLUMNS SIZE 254 CUST_ID'); 
END; 
/

PL/SQL procedure successfully completed.
SELECT column_name, num_distinct, histogram 
FROM   user_tab_col_statistics 
WHERE  table_name = 'SALES'; 

 

There are several other cases where you may need to provide a more complex setting for the METHOD_OPT parameter. Take for example a scenario where you don’t want any form of statistics on a column. There is no way to tell Oracle don’t gather statistics on a particular column, so you need to do the reverse and explicitly list the column you want to have statistics gathered on using the FOR COLUMNS syntax. Let’s assume we don’t want statistics of any kind on the PROD_ID column of the SALES TABLE. Then the DBMS_STATS.GATHER_TABLE_STATS command would be as follows;

BEGIN 
 dbms_stats.delete_column_stats('SH', 'SALES', 'PROD_ID'); 
END; 
/
PL/SQL procedure completed successfully. 
BEGIN 
 dbms_stats.Gather_table_stats('SH', 'SALES',-
 method_opt => 'FOR COLUMNS SIZE 254 CUST_ID TIME_ID CHANNEL_ID PROMO_ID 
 QUANTITY_SOLD AMOUNT_SOLD'); 
END; 
/

PL/SQL procedure completed successfully.   

-- The average row length still got recorded accurately even though we did not gather statistics on the PROD_ID column.

SELECT num_rows, avg_row_len FROM user_tables WHERE table_name = 'SALES';

 

SELECT column_name, num_distinct, histogram 
FROM   user_tab_col_statistics 
WHERE  table_name = 'SALES';

 

Note the FOR COLUMNS syntax is only valid for the GATHER_TABLE_STATS procedure.

Finally, at the start of this post I mentioned that the METHOD_OPT parameter can also be used to create extended statistics. Extended statistics encompasses two additional types of column statistics; column groups and expression statistics. In the example below, a column group will be automatically created on the PROD_ID and CUST_ID column in the SALES table. It will be given a system-generated name and will have all of the base column statistics gathered on it.


BEGIN
dbms_stats.Gather_table_stats('SH', 'SALES',
method_opt => 'FOR ALL COLUMNS SIZE 254 FOR COLUMNS SIZE 254(PROD_ID, CUST_ID)');
END;
/
 PL/SQL procedure successfully completed. 
 

SELECT column_name, num_distinct, histogram FROM user_tab_col_statistics WHERE table_name = 'SALES';

 

Rather than specifying the METHOD_OPT parameter in the statistics gathering command it is highly recommended that you specify any non-default value for the METHOD_OPT via DBMS_STATS.SET_TABLE_PREFS.


BEGIN
 dbms_stats.Set_table_prefs('SH', 'SALES', 'METHOD_OPT', -
 'FOR ALL COLUMNS SIZE 254 FOR COLUMNS SIZE 1 PROD_ID');
END;
/ 

In summary:

The METHOD_OPT parameter in the GATHER_DICTIONARY_STATS, GATHER_DATABASE_STATS, and GATHER_SCHEMA_STATS procedures only accepts 'FOR ALL [INDEXED|HIDDEN] columns' syntax. No specific column names can be specified.

When used in the GATHER_TABLE_STATS procedure, the METHOD_OPT parameter can accept an additional argument in the form of FOR columns …'. Using this syntax allows you to controls;

  • which columns to gather basic statistics
  • which columns to gather histograms and the bucket size
  • which extended statistics to create

Comments:

I want to make an addition for anyone who finds this page and needs to disable statistics on a particular column (such as the example of prod_id above).

It is possible to change the statistics gathering settings for a particular table.

BEGIN
dbms_stats.set_table_prefs('SH', 'SALES',-
'FOR COLUMNS SIZE 254 CUST_ID TIME_ID CHANNEL_ID PROMO_ID
QUANTITY_SOLD AMOUNT_SOLD');
END;
/

Posted by guest on April 16, 2013 at 07:23 AM PDT #

Hi Maria,

The link to the 'earlier blog post' is not working (not the public one).

I have a question about the FOR COLUMNS syntax. The syntax allows a SIZE after the FOR COLUMNS, and after the column name:
FOR COLUMNS [size clause] column [size_clause] [,column [size_clause]...]
Is that right that the SIZE before the column list is the default and the others overrides it when defined ? Documentation is not very clear about that.

Thanks,
Franck.

Posted by guest on April 16, 2013 at 07:54 AM PDT #

Hi Franck,

Thanks for the heads up on the broken link. It should be working fine now.

Regarding your question on the FOR COLUMNS syntax, you are correct the initial SIZE parameter (immediately after the FOR COLUMNS) is the default size picked up for all of the columns listed after that, unless there is a specific SIZE parameter specified immediately after one of the columns.

For example,
FOR COLUMNS SIZE 254 CUST_ID TIME_ID CHANNEL_ID PROMO_ID QUANTITY_SOLD SIZE 1 AMOUNT_SOLD

will collect a histogram on the CUST_ID, TIME_ID, CHANNEL_ID, PROMO_ID,and the AMOUNT_SOLD columns but not on the QUANTITY_SOLD column.

SQL> SELECT column_name, num_distinct, histogram FROM user_tab_col_statistics WHERE table_name = 'SALES';

COLUMN_NAME NUM_DISTINCT HISTOGRAM
------------------------------ ------------ ---------------
CUST_ID 630 HEIGHT BALANCED
TIME_ID 620 HEIGHT BALANCED
CHANNEL_ID 5 FREQUENCY
PROMO_ID 116 FREQUENCY
QUANTITY_SOLD 44 NONE
AMOUNT_SOLD 583 HEIGHT BALANCED

Posted by guest on April 16, 2013 at 09:48 AM PDT #

Hi Maria,

Clearly explained in layman language.
Congrats to your team, keep up the good work. Looking forward to read more on optimizer stuff.

Regards,
Srikar

Posted by Srikar Dasari on April 20, 2013 at 05:51 AM PDT #

Hi,
FOR all indexed colums size repeat , this option will compute histogram on indexed colums only?
Regards,

Posted by mosa on May 08, 2013 at 01:44 PM PDT #

Hi Mosa,

Setting the METHOD_OPT parameter to for all indexed colums size repeat will only gather statistics on indexed columns and will only create a histogram on any indexed column that currently has a histogram.

Although this setting is possible it is not a recommended configuration for most systems.

Thanks,
Maria

Posted by Maria Colgan on May 21, 2013 at 02:42 PM PDT #

Post a Comment:
  • HTML Syntax: NOT allowed
About

The Oracle Optimizer blog is written by members of the Optimizer development team. The goal of this blog is to provide an insight into the workings of the Optimizer and the statistics it relies on. The views expressed on this blog are our own and do not necessarily reflect the views of Oracle and its affiliates. The views and opinions expressed by visitors on this blog are theirs solely and may not reflect ours.

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today