Everything about Table Scans including offload to Exadata, Big Data SQL, and External Tables

  • January 21, 2017

When bloggers get it wrong - part 2

Roger Macnicol
Software Architect

In Part 2 we are going to look at making use of the trace events that show what was discussed in Part 1. 
NB: Oracle no longer adds new numeric trace events, going forward new trace events use the Unified Tracing Service whose grammer is much simpler. The elements we need are:

trace[[x.]y] disk = [ lowest | low | medium | high | highest ]

For example Table Scan tracing is in the DATA hierachy:

[1] DATA
[2] KDS    “Kernel Data Scan”
[3] KDSFTS  “Full Table Scan”
[3] KDSRID  “RowID”

‘trace[KDSFTS] disk low’ - only trace full table scans
‘trace[KDSRID] disk low’ - only trace fetch by rowid
‘trace[KDS.*] disk low’ - trace both table scans and fetch by rowid
NB: don’t use ‘lowest’ with KDS – it is used for memory tracing only

Tracing Full Table Scans: KDSFTS

At the beginning of a granule we see if it is possible to use Turbo Scan (which is a prerequisite for using Exadata Smart Scan) and the data object number being scanned:

kdst_mode(): kdst_mode_possible: 1 fastPathFetch: 1 
kdst_mode(): push_complex: 1 sage_push: 0 pred_split: 0 objd_ktid: 78768

If we can use Turbo Scan, we see the initialization of the row source and the sql_id and text of the query:

kdsirsf(): not DR|DW td: 0x7fb2a16245e0 flag: 0x1080008 heap: 1

The trace also told us this was not an ABTC scan (see further down this entry)
NB: several scans may be interleaved so use the ‘td’ address to follow the scan of interest:

kdsirsf(): kcb_clone on td: 0x7fb2a16245e0 parent: 0x7fb2a16245f8 child: 0x7fb2a1624c70

Dumping sqltext for sqlid: dch22s8kyzgp8
kxsDumpSqlTextFromSQLID SQL= select sum(S_ACCTBAL) from supplier

This is important because it enables us to tie the trace into the SQL v$ views. The scan then starts:

kdsttgr: start td 0x7fb2a16245e0 for 32767 rows at slot 0 blkrows 0
kdstf000010100001000km -> FCO ROWP RSET ARR

First we see the extent map to be scanned get loaded, the start rdba of each extent, and the number of blocks in the extent: 

map with 10 extents
Ext: 0, dba: 0x140000e, len: 3, skp: 0
Ext: 1, dba: 0x1400011, len: 4, skp: 0

Second, we will see each block in turn get fetched, since extent 0 started at rdba  0x140000e, we will see 0x140000e, 0x140000e, 0x1400010 get fetched in order etc 
Since this is a serial scan, it will be followed by extent 1 (a PQ granule would have had the extents sorted by rdba)

kdst_fetch0(): req tsn: 5 dba: 0x140000e dsf: 0x1 dsf2: 0x0
kdst_fetch0(): req tsn: 5 dba: 0x140000f dsf: 0x100001 dsf2: 0x4000

Finally when all the blocks have been consumed we will see the scan  terminate:

kdsers(): ending td: 0x7fb2a16245e0
kdsers(): kcb_endscan on td: 0x7fb2a16245e0

Tracing the decision to use buffer cache or direct read

To understand the decisoin whether to use buffer cache or direct read for a scan, you need to enable tracing for buffer cache decision making:

alter session set events ‘10358 trace name context forever, level 2’;
alter session set events 'trace [NSMTIO] disk highest‘; 

The Medium Table Threshold (MTT) 

Our query against the Supplier table starts with "Is Small Table"?
Our table has 1689 blocks, the buffer cache has space for 13,512 blocks, and the Small Table Threshold is therefore 270 blocks:

kcbism: islarge 1 next 0 nblks 1689 type 2, bpid 3, kcbisdbfc 0 kcbnhl 1024  kcbstt 270  keep_nb 0 kcbnbh 13512 kcbnwp 1

Clearly this isn't a small table; so "Is Medium Table"? Well, 1689 blocks is greater than 1/10 of the buffer cache i.e. 1,351 blocks so it isn't medium:

kcbimd: nblks 1689 kcbstt 270 kcbpnb 1351 bpid 3 kcbisdbfc 0 is_medium 0

So "Is Very Large Object"? Well it's small enough relative to the buffer cache to be worth costing out so its not large:

kcbivlo: nblks 1689 vlot 500 pnb 13512 kcbisdbfc 0 is_large 0

NSMTIO shows the decision (but we don't give any internal details of the cost model used). Here we see the "Direct Path Check" shows the cost model predicted using Direct Read would be best and that we need to force a check point.

NSMTIO: qertbFetch:[MTT < OBJECT_SIZE < VLOT]: Checking cost to read from caches(local/remote)
 and checking storage reduction factors (OLTP/EHCC Comp)
NSMTIO: kcbdpc: DirectRead: tsn: 5, objd: 79042, objn: 79042 ckpt: 1, nblks: 1689, ntcache: 0, ntdist:0

Let's make the buffer cache much bigger to make MTT bigger than the table to see the tracing for that. With space for 28,368 blocks in the cache, MTT is now 2,836 blocks, and STT is therefore 567 blocks so we see the is_medium check set to true

kcbism:  islarge 1 next 0 nblks 1689 type 2, bpid 3, kcbisdbfc 0 kcbnhl 2048 kcbstt 567
               keep_nb 0 kcbnbh 28368 kcbnwp 1
kcbimd: nblks 1689 kcbstt 567 kcbpnb 2836 bpid 3 kcbisdbfc 0 is_medium 1

So is it worth it to "Cache Medium Table"? This has to factor in such things as the current state of the cache and the aging of the current contents, and whether the entire database can be cached (kcbisdbfc):

kcbcmt1: scann age_diff adjts last_ts nbuf nblk has_val kcbisdbfc 0 8168 0 28368 1689 0 0
kcbivlo: nblks 1689 vlot 500 pnb 28368 kcbisdbfc 0 is_large 0

NSMTIO: qertbFetch:[MTT < OBJECT_SIZE < VLOT]: Checking cost to read from caches(local/remote) and checking storage reduction factors (OLTP/EHCC Comp)
NSMTIO: kcbdpc: DirectRead: tsn: 5, objd: 79042, objn: 79042 ckpt: 1, nblks: 1689, ntcache: 0, ntdist:0

and here we see that even though the table is smaller than 10% of the buffer cache, the current state of the buffer cache meant that it was not worth caching and so we will still use Direct Read.

Forcing Buffer Cache scan using ABTC

Since we couldn't get our scan to use the buffer cache, lets try using Automatuc Big Table Caching by setting the system parameter: DB_BIG_TABLE_CACHE_PERCENT_TARGET=50. Now we see it costing Serial DW scan:

kcbism: islarge 1 next 0 nblks 1689 type 2, bpid 3, kcbisdbfc 0 kcbnhl 2048 kcbstt 567 keep_nb 0 kcbnbh 28368 kcbnwp 1
kcbivlo: nblks 1689 vlot 500 pnb 28368 kcbisdbfc 0 is_large 0

The difference is that now we look at the object's temperature and here the ABTC policy returns that the table should be cached (ABTC policy is true) and we no longer choose Direct Read.:

NSMTIO: kcbimd: serial DW scan <0.5, 79042> nblks 1689 policy 1 imc 0 pq 0)
NSMTIO: kcbimd: serial DW cache scan <0.5, 79042> nblks 1689 pq
NSMTIO: qertbFetch: NoDirectRead:[- STT < OBJECT_SIZE < MTT]: Obect's size: 1689 (blocks), Threshold: MTT(2836 blocks)

Join the discussion

Comments ( 10 )
  • Seb Wednesday, March 1, 2017

    Hi Roger, could you enlighten us about the meaning of pnb (and kcbpnb) ?

    I noticed it seems always equal to VLOT / vlot * 100 but I do wonder what "pnb" stands for...

    Thanks a lot.



  • guest Thursday, March 2, 2017

    Hi Seb, in kcbism and kcbimd kcbpnb is the medium table threshold (MTT) and typically 10% of the cache. In kcbivlo pnb is the number of block buffers

  • Seb Rigaud Friday, March 3, 2017

    Hello, OK thanks. But then... what does kcbnbh represent ?

    In your examples kcbnbh and pnb are indeed always equal,

    but I got NSMTIO traces over here on our databases where

    these values are always different (for instance kcbnbh =

    474106 and pnb = 557664...) - *always*.

    Would you mind telling us more about what kcbnbh stands for ?

    Thanks !


  • guest Tuesday, March 28, 2017

    Hi Seb, nbh is the actual number of buffers while kcbnbh is roughly the target number of buffers. Beyond that we're getting into internals so that's as far as I can go. Sorry.

  • Alberto Dell'Era Sunday, June 18, 2017
    Very interesting post(s), thanks a lot!

    Might you please give us some hints about PQ, or better yet, invest the time to write part III? Thanks in advance, at least for reading!
  • Maurice Müller Thursday, October 3, 2019
    Hi Roger,
    How does auto dop and it's in-memory parallel execution feature influence the direct IO/buffer cache reads decision on HCC tables? The db version is 12.2 and it's not a RAC.
  • Roger Thursday, October 3, 2019
    Hi Maurice, if the table is more than 80% loaded in memory the decision is over-ridden and we force buffer cache reads for the rest of the table. If it is less than 80% populated we may check point and use direct reads. Note: this behaviour varies across different releases and I will try to clarify after returning from vacation. Roger
  • Parth soni Wednesday, November 20, 2019
    Can you describe conditions under which direct path read occurs instead of cell smart scan on Exadata?
  • Tomasz Wednesday, November 20, 2019
    Hi Roger
    Thanks for great post.
    Any chance let us know what's the meaning of ntcache - is my assumption correct that it's the number of blocks already cached?
  • Atif Monday, June 22, 2020
    Hi Roger,

    How does Cache option on table affects the direct read decision and is it ? Recently I have discovered a strange behavior on a production DBs. One DB with same table structure but cache option and blocks < 5*STT always goes for buffer read while on another DB same table without cache and blocks > 5*STT always goes for direct path read and hence consume more time although at times 50 to 90% block of that table is already in cache.
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.