exec dbms_stats.gather_table_stats(null, 'BIGT', estimate_percent => 1);
It is not always easy for users to pick the appropriate sampling percentage. If the specified sampling percentage is too high, it can take longer to gather statistics. On the contrary, if the data is skewed and the specified sampling percentage is too low, the resulting statistics can be inaccurate.
For this reason, Oracle introduced the AUTO value for the estimate_percent parameter. For example, you can gather statistics on BIGT as follows:
exec dbms_stats.gather_table_stats(null, 'BIGT', estimate_percent => dbms_stats.auto_sample_size);
column name | column type |
l_shipdate | date |
l_orderkey | number |
l_discount | number |
l_extendedprice | number |
l_suppkey | number |
l_quantity | number |
l_returnflag | varchar2 |
l_partkey | number |
l_linestatus | varchar2 |
l_tax | number |
l_commitdate | date |
l_receiptdate | date |
l_shipmode | varchar2 |
l_linenumber | number |
l_shipinstruct | varchar2 |
l_comment | varchar2 |
Sampling Percentage | Elapsed Time (sec) |
1% sampling | 797 |
100% sampling (Compute) | 18772 |
Auto sampling in Oracle | 1908 |
We also compare the quality of the statistics gathered using different estimate percentages. Among all the statistics of a column, number of distinct values (NDV) is the one whose accuracy used to be an issue. We define the accuracy rate of NDV of a column as follows:
accuracy rate = 1 - (|estimated NDV - actual NDV|)/actual NDV.The accuracy rate ranges from 0% to 100%. The higher the accuracy rate is, the more accurate the gathered statistics are. Since 100% sampling always lead to an accuracy rate of 100%, we do not report it. We focus on the columns which has at least one statistics accuracy rate below 99.9% when using different estimate percentages. The following table illustrates the accurate rates of the columns:
Column Name | Actual NDV | Auto Sampling | 1% Sampling |
orderkey | 450,000,000 | 98.0% | 50% |
comment | 181,122,127 | 98.60% | 4.60% |
partkey | 60,000,000 | 99.20% | 98.20% |
suppkey | 3,000,000 | 99.60% | 99.90% |
extendedprice | 3,791,320 | 99.60% | 94.30% |
In short, the elapsed time of AUTO sampling is 10 times faster than 100% sampling but yields the statistics of similar quality (accuracy rate close to 100%).
Fantastic information