By JeffG-Oracle on Jul 11, 2014
Hello Everyone! Please check out the following important information regarding the cluster factor and the impact to the CBO. Please comment if you like.
The Cluster Factor (CF) of an index a very important statistic used by the Cost Based Optimizer (CBO) to determin
the most efficient execution plan. However, there is a major flaw in the current CF calculation. Simply put, the CF
is calculated by performing a Full Index Scan, looking at the rowid of each index entry. If the table block being referenced
is different from the previous index entry, the CF is incremented. However if the table block being referenced is the same
as the previous index entry, the CF is not incremented. So the CF gives an indication of how well ordered the data in the table is
in relation to the chosen index entries.
The lower the CF, closer to the number of blocks in the table vs the number of rows, the more efficient it is to use the index as
less table blocks would need to be accessed to retrieve the necessary data via the chosen index.
So far so good. However, what if part of the incoming row is stored in block one and the remainder in block 2? When the row is
required, block 1 and block 2 will be read into cache resulting in near zero waiting for the entire row. OK, fast forward, what if
the table has 100m rows and imagine that this situation occurs 30% of the time. Your current CF method is most likely skewed
Now for the good news, bug 13262857 - INDEX CLUSTERING FACTOR COMPUTATION IS PESSIMISTIC. The computation of the index clustering
factor in dbms_stats package is pessimistic about the caching ratio of the table blocks. It assumes that at most one block from the
table is cached.
This is an enhancement to allow a user to specify the number of blocks that dbms_stats package will consider when gathering the index clustering
factor statistics. Prior to this enhancement dbms_stats assumed 1 and it still does after the enhancement. This enhancement allows the user to
specify a value between 1 and 255. There is also an AUTO option which if specified then dbms_stats will use 1% of the table blocks up to
0.1% of the buffer cache size, in blocks.
The new CF approach is currently available with patches that can be applied on both Exadata databases and Oracle versions 22.214.171.124, 126.96.36.199 and 188.8.131.52.
The Patch ID is 15830250.
Once applied, there is a new statistics collection preference that can be defined, called TABLE_CACHED_BLOCKS. This basically sets the number of
table blocks we can assume would already be cached when performing an index scan and can be ignored when incrementing the CF during statistics gathering.
The default is 1, as stated, the same as today. But can be set up to be a value between 1 and 255, meaning that during the collection of index statistics
the process will not increment the CF if the table block being referenced by the current index entry has already been referenced by any of the prior 255 index
entries, if set to 255. It basically sets the appropriate parameter in the sys_op_countchg function used to calculate the CF value during statistic gathering
to not increment the CF if the current table block has already been accessed “x” index entries previously.
The TABLE_CACHED_BLOCKS preference can be set by either the DBMS_STATS.SET_TABLE_PREFS, DBMS_STATS.SET_SCHEMA_PREFS or DBMS_STATS.SET_DATABASE_PREFS procedures.
Apply the patch, change the TABLE_CACHED_BLOCKS preference for this table and re-calculate the index statistics:
SQL> exec dbms_stats.set_table_prefs(ownname=>dm, tabname=>'SALES_DATA',
PL/SQL procedure successfully completed.
SQL> EXEC dbms_stats.gather_index_stats(ownname=>dm, indname=>'SALE_DATA', estimate_percent=> null);
PL/SQL procedure successfully completed.
SQL> SELECT t.table_name, i.index_name, t.blocks, t.num_rows, i.clustering_factor
2 FROM user_tables t, user_indexes i
3 WHERE t.table_name = i.table_name AND i.index_name='SALES_DATA';
Although I found this bug internally, many thanks to several on the WWW for their comments.