Everything about Table Scans including offload to Exadata, Big Data SQL, and External Tables

  • January 12, 2017

When bloggers get it wrong - part 1

Roger MacNicol
Software Architect

I've read a number of blog entries from various people who've clearly put great diligence into trying to understand how the decision to use the buffer cache for a table scan or whether it is better to use direct read for the scan is made. Since this is critical decision from a performance perspective, I've decided to write a more definitive account. Part 1 deals with a few principles and part 2 (which will follow very shortly) will show how this works with tracing and clear up a few misunderstandings. Part 3 will deal with PQ in more detail as soon as I have time.

Serial Scans

Small Table Threshold (STT) 

Serial scans use direct path reads when the object is not a small table and, since, the size used to determine whether it is "small" has come stats rather than the size from segment header.

_small_table_threshold defaults to the 2% buffer cache or to 20 blocks whichever is bigger (NB: this parameter ignores the effects of compression). An object is small enough for buffer cache reads to be more effective than direct path read if the number of blocks it contains is less than or equal to the value of _small_table_threshold. Such scans will bump the stat “table scans (short tables)”

Medium Table Threshold (MTT)

So what happens when an object is bigger than STT? We fall into the MTT case: MTT is not directly configurable and defaults to 10% of the buffer cache blocks. But it is also depends on a cost based decision that includes how busy the cache is. 
NB: PQ and Exadata have different rules.
NB: MTT also depends on _object_statistics being enabled.

For STT < #blocks < MTT, if the  cost based decision means a buffer cache scan is chosen, the stats “table scans (short tables)” will still get bumped.

So when is a medium table not worth caching?

First, when MTT < #blocks (ignore VLOT in blogs, it hasn't been used for many years but persists in the trace and blogs for historic reasons).

Second, if  the scan is not a select for update: different rules apply because cache scans are more useful if the segment being updated is first the subject of a query. Table larger than MTT also do factor in storage reduction factor. The ratios used are 10X for HCC and 3.3.X for OLTP.

Third, a cost based analysis is done comprising the % currently in cache, the % buffers are dirty (check-pointing cost), the current I/O bottleneck, and any RAC access costs. If the buffer cache is still used the stat “table scans (long tables)” will get bumped. But, if Direct Read is chosen the stat “table scans (direct read)” will get bumped.

Automatic Big Table Caching (ABTC)

So does this mean the buffer cache never helps with large tables? Not at all, large tables can still use the buffer cache for a portion of their  scan based on the public parameter DB_BIG_TABLE_CACHE_PERCENT_TARGET which sets aside a percentage of the buffer cache for larger tables

On a single instance, ABTC works with both Serial and PQ scans BUT mixing Serial and PQ may lead to fewer cache hits. On RAC, it only works with PQ and requires that PARALLEL_DEGREE_POLICY set to AUTO or ADAPTIVE.

ABTC works by tracking the heat of medium and large tables and is designed to take advantage of any runs of buffers currently cached. Then if the decision to use ABTC is made, it periodically checks the cache for how many buffers in the next section are already in memory to see whether Direct Read or Buffer Cache scan would be better for the next chunk. Let's see how the heat tracking works which looking at the v$bt_scan_obj_temps view.


---------- ------------ -----------    
     79042         1689        3000

Each time we scan a medium or larger table it will bump that object's temperature by 1000. Then, the object with the highest temperature will get absolute priority for caching and, if needed, will evict any previously cached ABTC blocks of other objects.
If the hottest object only partially fits in the buffer cache (which is quite likely), the first section that fits will be read through the buffer cache and cached then the rest of the scan will switch to Direct Read when the ABTC allocation in the cache is fully utilized.
NB: The object will still be check-pointed at the beginning so that it is safe to switch to Direct Read at any time.

Part 2 will include tracing to how how to make use of this feature. 

Join the discussion

Comments ( 4 )
  • Alberto Dell'Era Tuesday, June 27, 2017
    Hi Roger,

    might you please shed some light on the phrase "Table larger than MTT also do factor in storage reduction factor. The ratios used are 10X for HCC and 3.3X for OLTP" ?

    I mean, are compressed tables more likely to be cached than scanned by direct read, or the other way around? And why for #size>MTT only ?

    Also, I would be especially interested in the rationale for considering the compression ratio in the decision.

    Thanks in advance!
  • Sayan Malakshinov Sunday, November 19, 2017
    "since, the size used to determine whether it is "small" has come stats rather than the size from segment header."

    At least since Oracle again takes it from segment header:

    "ignore VLOT in blogs, it hasn't been used for many years"

    Could you tell us since which version?
  • Roger Wednesday, November 21, 2018
    Hi Sayan,

    Yes the sizing reverted back to using space layer metadata as part of the In-Memory project in
    The first call in your tracing is still getting it from the table statistics block which gets overwritten by hints.

    The VLO calculation is still there but has had no practical effect for as long as I have worked at Oracle - being set at 500% the cost model decides to use direct read to avoid thrashing the cache long before reaching that. I haven't tried to trace that back prior to 11.2 since so few people still use earlier releases.
  • Arturo Wednesday, November 11, 2020
    Hi Roger,
    Thanks for your blog, it provides very interesting information.
    Thanks for your blog, it provides very interesting information.
    I have observed that the _small_table_threshold parameter does not change if changes are made to the size of the database buffer cache. Or at least querying the internal view that shows its value, it doesn't reflect it.
    It is normal for the value of the database buffer cache to change dynamically, due to dynamic tuning mechanisms. Is this value then determined at instance startup, and is it not re-evaluated afterwards?

Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.