X

Everything about Table Scans including offload to Exadata, Big Data SQL, and External Tables

  • February 24, 2015

Examining the new Columnar Cache with v$cell_state

Roger Macnicol
Software Architect

12.1.0.2 introduced the new Columnar Flash Cache where 1MB of  blocks that are all in HCC format are rewritten so as to make each column from each CU contiguous. This works by first writing all the block headers to an array, then writing all the CU headers to an array, finally writing all the Column 1 column-level-CUs, then writing all the Column2 column-level-CUs etc.

The flash cache hash table maintains a simple mapping of column numbers to 64KB flash pages so, for any given query, we can simply do asynchronous disk I/O of the minimum set of 64KB pages required to cover the referenced columns.

Within the "flashcache" cell stats there is a new nested cell stat called "columnarcache" that enables you to track how it is working.

> set long 50000000
> set pagesize 10000
> select xmltype(STATISTICS_VALUE).getclobval(2,2) from v$cell_state;

XMLTYPE(STATISTICS_VALUE).GETCLOBVAL(2,2)
--------------------------------------------------------------------------------
....

  <stats type="flashcache">
    <stats type="columnarcache">
      <stat name="columnar_cache_size">0</stat>
      <stat name="columnar_cache_keep_size">0</stat>
      <stat name="columnar_cache_keep_read">0</stat>
      <stat name="columnar_cache_keep_read_bytes">0</stat>
      <stat name="columnar_attempted_read">0</stat>
      <stat name="columnar_attempted_read_bytes">0</stat>
      <stat name="columnar_cache_hits">0</stat>
      <stat name="columnar_cache_hits_read_bytes">0</stat>
      <stat name="columnar_cache_hits_saved_bytes">0</stat>
      <stat name="columnar_cache_pop">0</stat>
      <stat name="columnar_cache_pop_bytes">0</stat>
      <stat name="columnar_cache_pop_throttled">0</stat>
      <stat name="columnar_cache_pop_invalid_buf">0</stat>
      <stat name="columnar_cache_pop_size_align">0</stat>
      <stat name="columnar_cache_pop_findchdrfailure_mainchdr">0</stat>
      <stat name="columnar_cache_pop_findchdrfailure_childchdr">0</stat>
    </stats>
  </stats>

I typically spool the output of this to wade through it an editor but if we want to monitor how it is working with some workload, we need to extract individual stats, for example I bounced the cells and verified the cache was empty:

> select xmlcast(xmlquery('/cell_stats/stats/stats/stat[@name="columnar_cache_size"]' passing xmltype(STATISTICS_VALUE) returning content) as varchar2(200) ) "COLUMNAR CACHE SIZE"
from v$cell_state
where xmlexists('/cell_stats/stats/stats[@type="columnarcache"]' passing xmltype(STATISTICS_VALUE));
COLUMNAR CACHE SIZE
--------------------------------------------------------------------------------
0

I am using the 1GB TPC-H schema which takes a little more 400MB on disk when compressed with Query Low:

SQL> select sum(bytes) from user_segments where SEGMENT_NAME in ('SUPPLIER','PARTSUPP','LINEITEM','ORDERS','PART','CUSTOMER');

SUM(BYTES)
----------
 420675584

and checking the columnar cache again shows about half of the data has been rewritten into columnar cache format instead of caching raw blocks:


SQL> select xmlcast(xmlquery('/cell_stats/stats/stats/stat[@name="columnar_cache_size"]' passing xmltype(STATISTICS_VALUE) returning content) as varchar2(200) ) "COLUMNAR CACHE SIZE"
from v$cell_state
where xmlexists('/cell_stats/stats/stats[@type="columnarcache"]' passing xmltype(STATISTICS_VALUE));
  2    3 
COLUMNAR CACHE SIZE
--------------------------------------------------------------------------------
179306496

So let's look at how the cache helped:

SQL> select xmlcast(xmlquery('/cell_stats/stats/stats/stat[@name="columnar_cache_hits_read_bytes"]' passing xmltype(STATISTICS_VALUE) returning content) as varchar2(200) ) "HIT READ BYTES"
from v$cell_state
where xmlexists('/cell_stats/stats/stats[@type="columnarcache"]' passing xmltype(STATISTICS_VALUE));

  2    3 
HIT READ BYTES
--------------------------------------------------------------------------------
1909456896

SQL> SQL> select xmlcast(xmlquery('/cell_stats/stats/stats/stat[@name="columnar_cache_hits_saved_bytes"]' passing xmltype(STATISTICS_VALUE) returning content) as varchar2(200) ) "HIT SAVED BYTES"
from v$cell_state
where xmlexists('/cell_stats/stats/stats[@type="columnarcache"]' passing xmltype(STATISTICS_VALUE));

  2    3 
HIT SAVED BYTES
--------------------------------------------------------------------------------
1128267776

which shows we were able to eliminate about 35% of the disk I/O for this query run!

We could, of course, have gotten that information more simply with the regular stat "cell physical IO bytes saved by columnar cache" but I wanted to show how to pull values from v$cell_state for use in scripts.


Many people only use Query High compression as they find the increased disk I/O from Query Low more than offsets the savings from cheaper decompression costs. However, with the columnar cache in place, those trade-offs have changed. It may be worth re-evaluating the decision as to when user Query Low vs. Query High particularly on CPU-bound cells.

Addendum: performance bug 20525311 affecting the columnar cache with selective predicates is fixed in the next rpm.

Roger MacNicol
Oracle Data Storage Technology 

Join the discussion

Comments ( 6 )
  • Vishal Desai Wednesday, February 25, 2015

    What is 179306496? - does this represent 100% of 420675584 bytes stored in flash columnar cache?

    Can you explain columnar_cache_hits_read_bytes and columnar_cache_hits_saved_byte in detail?

    How did you arrive to 35% savings?


  • Roger Wednesday, November 25, 2015

    179306496 means that of the 400 MB total, 171 MB had been cached i.e 42% of the schema was cached (this was 1 GB raw data stored in HCC form). The flash cache will have a mixture of 1 MB regions stored in original block form 1 MB regions stored in columnar cache form. 171 MB represent the total that has been reformatted.

    I get ~35% from 1128267776/(1128267776 + 1909456896) - the query stream would have had to read (1128267776 + 1909456896) without the cache but because of columnar I/O it only actually read 1909456896 saving 1128267776.


  • Kevin Closson Thursday, August 18, 2016

    Hi Roger,

    I hope it's OK to resurrect this older blog post. I have an AWR where "cell physical IO bytes saved by columnar cache" is a value 25% greater than 'cell physical IO bytes eligible for predicate offload.' This makes no sense to me unless the "eligible" math is not aware of how deeply compressed the columnar cache is and perhaps the columnar cache savings is the *decompressed* amount. In other words, is the columnar cache savings an "effective savings?"


  • guest Thursday, August 18, 2016

    Hi Kevin, which cell release are you using? Thanks.


  • Kevin Closson Thursday, August 18, 2016

    Hi Roger,

    All I have is an AWR. I'll have to see if I can chase down the cellsrv rev.


  • guest Monday, August 29, 2016

    Hi Kevin, the saving is calculated as the percentage of the decompressed size that was saved. Maybe this is less useful for a DBA working on tuning their flash I/O subsystem but was chosen to emphasize the combined savings from HCC plus columnar cache. This gets more interesting when the IMC format columnar cache is made available since the IMC format used (dictionary encoding plus LZO) uses typically 20% more flash than HCC Query High (ZLIB) and the columnar cache becomes vital in not increasing the usage of the flash subsystem (note that NVMe's high bandwidth is not particularly susceptible to I/O pressure and even with LZO it tends to exceed to the CPU bandwidth).


Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.