I've read a number of blog entries from various people who've clearly put great diligence into trying to understand how the decision to use the buffer cache for a table scan or whether it is better to use direct read for the scan is made. Since this is critical decision from a performance perspective, I've decided to write a more definitive account. Part 1 deals with a few principles and part 2 (which will follow very shortly) will show how this works with tracing and clear up a few misunderstandings. Part 3 will deal with PQ in more detail as soon as I have time.
Serial scans use direct path reads when the object is not a small table and, since 184.108.40.206, the size used to determine whether it is "small" has come stats rather than the size from segment header.
_small_table_threshold defaults to the 2% buffer cache or to 20 blocks whichever is bigger (NB: this parameter ignores the effects of compression). An object is small enough for buffer cache reads to be more effective than direct path read if the number of blocks it contains is less than or equal to the value of _small_table_threshold. Such scans will bump the stat “table scans (short tables)”
So what happens when an object is bigger than STT? We fall into the MTT case: MTT is not directly configurable and defaults to 10% of the buffer cache blocks. But it is also depends on a cost based decision that includes how busy the cache is.
NB: PQ and Exadata have different rules.
NB: MTT also depends on _object_statistics being enabled.
For STT < #blocks < MTT, if the cost based decision means a buffer cache scan is chosen, the stats “table scans (short tables)” will still get bumped.
So when is a medium table not worth caching?
First, when MTT < #blocks (ignore VLOT in blogs, it hasn't been used for many years but persists in the trace and blogs for historic reasons).
Second, if the scan is not a select for update: different rules apply because cache scans are more useful if the segment being updated is first the subject of a query. Table larger than MTT also do factor in storage reduction factor. The ratios used are 10X for HCC and 3.3.X for OLTP.
Third, a cost based analysis is done comprising the % currently in cache, the % buffers are dirty (check-pointing cost), the current I/O bottleneck, and any RAC access costs. If the buffer cache is still used the stat “table scans (long tables)” will get bumped. But, if Direct Read is chosen the stat “table scans (direct read)” will get bumped.
Automatic Big Table Caching (ABTC)
So does this mean the buffer cache never helps with large tables? Not at all, large tables can still use the buffer cache for a portion of their scan based on the public parameter DB_BIG_TABLE_CACHE_PERCENT_TARGET which sets aside a percentage of the buffer cache for larger tables
On a single instance, ABTC works with both Serial and PQ scans BUT mixing Serial and PQ may lead to fewer cache hits. On RAC, it only works with PQ and requires that PARALLEL_DEGREE_POLICY set to AUTO or ADAPTIVE.
ABTC works by tracking the heat of medium and large tables and is designed to take advantage of any runs of buffers currently cached. Then if the decision to use ABTC is made, it periodically checks the cache for how many buffers in the next section are already in memory to see whether Direct Read or Buffer Cache scan would be better for the next chunk. Let's see how the heat tracking works which looking at the v$bt_scan_obj_temps view.
SQL> select DATAOBJ#,SIZE_IN_BLKS,TEMPERATURE from V$BT_SCAN_OBJ_TEMPS;
DATAOBJ# SIZE_IN_BLKS TEMPERATURE
---------- ------------ -----------
79042 1689 3000
Each time we scan a medium or larger table it will bump that object's temperature by 1000. Then, the object with the highest temperature will get absolute priority for caching and, if needed, will evict any previously cached ABTC blocks of other objects.
If the hottest object only partially fits in the buffer cache (which is quite likely), the first section that fits will be read through the buffer cache and cached then the rest of the scan will switch to Direct Read when the ABTC allocation in the cache is fully utilized.
NB: The object will still be check-pointed at the beginning so that it is safe to switch to Direct Read at any time.
Part 2 will include tracing to how how to make use of this feature.