Thursday, March 29, 2018

New in 18c: Introduction to In-Memory External Tables in 18c

By: Roger Macnicol | Software Architect

On June 10th 2014, Larry Ellison officially launched the new Oracle Database In-Memory option which introduced revolutionary dual-format storage to access the same data as either rows or columns transparently. At OOW that year, two ISVs put in a request for us to also support In-Memory External Tables (IMXT). The use cases of IMXTs include: 

  • Low-value or ephemeral data that the customer doesn’t wish to load into Oracle storage but needs to scan repeatedly in a short space of time
  • Big Data that has been summarized through Map/Reduce or other Hadoop aggregation tools that now needs to be joined to Experprise Data for reporting
  • Data that needs to be queried both from the RDBMS side and from Hadoop tools and thus doesn't need to be duplicated again in Oracle storage

Simply integrating Enterprise Data and Big Data will not generate value for a company, instead advanced analytics that draw on data from both sources are required. Such queries often require an iterative approach as a data analyst explores multiple avenues for revenue enhancement. These analytics need to run within Oracle where the richest set of SQL tools in any ecosystem exist, and they need high performance access to both data sources. Any time external data needs to be queried repeatedly in a relatively short space of time, going out to HDFS or other external disk formats is not cost effective and hence can benefit from Oracle's DBIM technology.

Prior to 18.1, there was no way to get fast scans without loading external data into Oracle. One mechanism that people did use to perform repeated loads and high performance scans in prior releases use was to define an:

Create Materialized View <imxtmv>
Build Immediate Refresh Complete On Demand As
Select * From <external table> INMEMORY;

INMEMORY Materialized View and set the session Query_Rewrite_Integrity parameter to STALE_TOLERATED. In this case you would see the query plan say: MAT_VIEW ACCESS INMEMORY FULL as its rowsource.

Other related requests that I'm aware have been made include

  • In-Memory DBLINKs
  • In-Memory only Materialized Views

‚ÄčI personally am not aware of any plans to implement these.

What's in Oracle 18.1

In 18.1 we implemented the INMEMORY MEMCOMPRESS clause for External Tables that use the two legacy drivers (ORACLE_LOADER and ORACLE_DATAPUMP). We are still working on the adding support for the Big data drivers (ORACLE_HDFS and ORACLE_HIVE).

The INMEMORY clause goes at the end of the table definition in the same section as the Reject Limit clause:.

create table s_et(
    s_suppkey            number ,
    s_name               char(25) ,
    s_address            varchar2(40) ,
    s_nationkey          number ,
    s_phone              char(15) ,
    s_acctbal            number ,
    s_comment            varchar2(101)
)
organization external (
type ORACLE_LOADER
default directory T_WORK
access parameters
(
    records delimited by newline
    nobadfile
    nologfile
    fields terminated by '|'
    missing field values are null
  )
  location (T_WORK:'supplier.tbl'))
reject limit unlimited
INMEMORY MEMCOMPRESS FOR CAPACITY;

All the DBIM compression levels are supported and have the same meanings as they do with heap tables:

  • NO INMEMORY
  • INMEMORY
  • INMEMORY NO MEMCOMPRESS
  • INMEMORY MEMCOMPRESS FOR QUERY LOW
  • INMEMORY MEMCOMPRESS FOR QUERY HIGH
  • INMEMORY MEMCOMPRESS FOR CAPACITY LOW
  • INMEMORY MEMCOMPRESS FOR CAPACITY HIGH

If you use 'INMEMORY' on its own you will get 'INMEMORY MEMCOMPRESS FOR QUERY LOW'

Table must be fully loaded before use

One key difference with heap tables is that an In-Memory External Table must be fully loaded into memory before a table scan can use it. External Tables do not currently support hybrid In-Memory/On-Disk scans. See below under 'Data Dictionary' for how to use v$im_segments to check its population status.

Session must set Query_Rewrite_Integrity

In-Memory External Tables function like refresh on-demand Materialized Views: once we've populated the In-Memory area we are unaware of any changes to the underlying external files pointed to by the Location clause. Unlike heap tables, External Tables don't generally support DMLs so the issue is mostly that one or more of the files may have been replaced with different (newer?) versions. 

Therefore, since an In-Memory scan could return different results to an external scan we need the user to tell us explicitly that this is OK (just as you have to when using refresh on-demand Materialized Views). This is done by setting:

SQL> alter session set query_rewrite_integrity=stale_tolerated;

For the use cases outlined at the top, this actually provides a real benefit. Let's say, for example, that an external process is producing output in csv format or a map-reduce job is creating HDFS files. The decoupling of the In-Memory snapshot from the disk file allows the external process to complete without interrupting any queries that are running. Then, when the external process is complete, the last step can be to call dbms_inmemory.repopulate to create a new In-Memory snapshot of the external data. Any queries currently running against the old snapshot will hold a read latch on IMCUs and so should complete successfully.

Data Dictionary

The In-Memory attributes of an External Table can be seen in any of six views

  1. USER_EXTERNAL_TABLES
  2. ALL_EXTERNAL_TABLES
  3. DBA_EXTERNAL_TABLES
  4. USER_TABLES
  5. ALL_TABLES
  6. DBA_TABLES

So for the example table shown above we could see

SQL> column TABLE_NAME format a10
SQL> select table_name, inmemory, inmemory_compression
  2  from user_tables where EXTERNAL = 'YES'

TABLE_NAME INMEMORY INMEMORY_COMPRESS
---------- -------- -----------------
R_ET       DISABLED
N_ET       DISABLED
S_ET       ENABLED  FOR QUERY LOW

3 rows selected.

We have also enhanced some of the v$ views used by DBIM to include External Tables

v$im_segments and v$im_segment_detail both gain a new column called 'IS_EXTERNAL' which has values TRUE, FALSE. For example, let's load our example table and also a heap table loaded from it:

SQL> exec dbms_inmemory.populate(USER,'S_ET');

PL/SQL procedure successfully completed.

SQL> exec dbms_inmemory.populate(USER,'SUPPLIER');

PL/SQL procedure successfully completed.

SQL> select SEGMENT_NAME,INMEMORY_SIZE,BYTES_NOT_POPULATED,POPULATE_STATUS,IS_EXTERNAL
   from v$im_segments;

SEGMENT_NAME INMEMORY_SIZE BYTES_NOT_POPULATED POPULATE_STAT IS_EX
------------ ------------- ------------------- ------------- -----
SUPPLIER           2359296                   0 COMPLETED     FALSE
S_ET               2228224                   0 COMPLETED     TRUE     

Before querying an In-Memory External Table, you need to check the bytes_not_populated, and populate_status columns of v$im_segments to be sure that the table is indeed fully loaded.

Notes:

  • The following External Table drivers are currently supported: ORACLE_LOADER and ORACLE_DATAPUMP. 
  • You have to explicitly invoke DBMS_INMEMORY.(re)populate to update the in-memory contents of the external table 
  • Population of the in-memory contents of the external table is serial
  • Only serial scans of the In-Memory External Table are supported; PQ is not yet supported.
    • This is unlikely to be a serious performance impediment since an In-Memory scan is so many orders of magnitude faster than a scan that goes out to an external table
  • Only the MEMCOMPRESS sub-clause of the INMEMORY clause is supported; the PRIORITY, DISTRIBUTE and DUPLICATE sub-clauses are not yet supported
  • Only unpartitioned External Tables are supported; specifying INMEMORY on a partition or on the top level of a partitioned External Table will give an error.

Roger MacNicol

Join the discussion

Comments ( 0 )
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.Captcha
Oracle

Integrated Cloud Applications & Platform Services