X

Enabling Real-Time Analytics With Database In-Memory

Why does BYTES_NOT_POPULATED suddenly show a non-zero value?

Maria Colgan
Master Product Manager

I recently got a question from a customer who has just begun adopting Database In-Memory and thought it was worth answering as a blog post in case anyone else had encountered the same scenario and was wondering what was going on. 

Here is the question I got:

“I've observed a scenario where a table is fully populated in the In-Memory column store with 0 in the  BYTES_NOT_POPULATED column in v$im_segments, but after some time, the BYTES_NOT_POPULATED column shows a non-zero value. Why is Oracle purging part of my table from memory?”

Let me begin to answer this by assuring you all that Oracle does not purge your data out of the In-Memory column store (IM column store). Remember, it’s a store not a cache and any data populated into the IM column store will remain there until you either mark the object NO INMEMORY or the instance is shutdown.

So, if data isn’t being purged why would the BYTES_NOT_POPULATED column shows a non-zero value?

The non-zero value is the result of a data load operation, most likely a direct path load operation if the value in the BYTES_NOT_POPULATED column is large, as it was in this case.

If you are not familiar with the term direct path load let me explain what I mean by that.

A direct path load occurs if you do a CREATE TABLE AS SELECT command or an INSERT statement with the APPEND hint.

A direct path load parses the input data, converts the data for each input field to its corresponding Oracle data type, and then builds a column array structure for the data. These column array structures are used to format Oracle data blocks and build index keys. The newly formatted database blocks are then written directly to the database, bypassing the standard SQL processing engine and the database buffer cache.

A direct path load operation is an all or nothing operation. This means that the operation is not committed until all of the data has been loaded. Should something go wrong in the middle of the operation, the entire operation will be aborted. To meet this strict criterion, a direct path loads inserts data into database blocks that are created above the segment high water mark (maximum number of database blocks used so far by an object or segment).


Once the direct path load is committed, the high water mark is moved to encompass the newly created blocks into the segment and the blocks will be made visible to other SQL operations on the same table. Up until this point the IM column store is not aware that any data change occurred on the segment.
Once the operation has been committed, the IM column store is instantly aware it does not have all of the data populated for the object. The size of the missing data will be visible in the BYTES_NOT_POPULATED column of the v$IM_SEGMENTS view.

Let’s take a look at an example. Imagine I have the SALES2 table fully populated into the IM column store as:


Now, lets insert a new set of data into the SALES2 table via an INSERT statement with an APPEND hint.


If we check the output of v$IM_SEGMENTS before we commit the INSERT statement we still see the BYTES_NOT_POPULATED column is still set to zero. This is because the data inserted into the SALES2 table has gone into database blocks above the high water mark and are therefore not considered to be part of the SALES2 table yet.

Let’s now commit our INSERT  command and check the output of v$IM_SEGMENTS.

Now the BYTES_NOT_POPULATED column shows a non-zero value as the IM column store becomes aware that we are missing some of the data from the SALES2 table as soon as the commit statement completes. If we had specified a PRIORITY on the SALES2 table then the newly added data would be automatically populated into the IM column store the next time the IMCO (In-Memory coordinator) background process woke up. Remember it’s on a two-minute cycle.

Since we didn’t specify a priority, the missing data won’t be populated until we query the SALES2 table again. Let’s try that now and check what happens in v$IM_SEGMENTS.


As you can see the POPULATE_STATUS has changed to STARTED, as the missing data is populated into the IM column store. If we wait a minute and check again we will see that the POPULATE_STATUS has changed again to COMPLETED and the BYTES_NOT_POPULATED column is 0.


Join the discussion

Comments ( 1 )
  • Parth Divekar Sunday, October 23, 2016

    Hi,

    I recently started adopting in-memory option.

    We have two setups

    Set up 1:

    Its a two node RAC but the database runs only on 1 node. On this database, I have a table (LIST PARTITIONED) with 500 mill rows. However only few (let's say 6 columns are used in querying). So i set up this table to be "inmemory" and 5 columns to be in memory for HIGH QUERY.This table gets loaded only once a month so there are no DML happening on this table at this point. So all the data gets loaded (Verified using gv$segment_details view. No non-zero value in "BYTES NOT POPULATED". Column. Query that was taking close to 2 mins is not down to under 10 seconds. So Great!

    Set up 2:

    Its a two node RAC but the database instance active on both the node. On this database, we have same table from scenario #1. The "in-memory" options set exactly same as Set up#1. However, not all the bytes are getting loaded "in-memory". And the query performance obviously is very bad. Around 2-3 mins.

    I tried searching everywhere but I think I am missing something. Mostly in concepts!

    Could you please guide to the right direction!

    Thanks in advance...

    Parth


Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.