On June 10th 2014, Larry Ellison officially launched the new Oracle Database In-Memory option which introduced revolutionary dual-format storage to access the same data as either rows or columns transparently. At OOW that year, two ISVs put in a request for us to also support In-Memory External Tables (IMXT). The use cases of IMXTs include:
Simply integrating Enterprise Data and Big Data will not generate value for a company, instead advanced analytics that draw on data from both sources are required. Such queries often require an iterative approach as a data analyst explores multiple avenues for revenue enhancement. These analytics need to run within Oracle where the richest set of SQL tools in any ecosystem exist, and they need high performance access to both data sources. Any time external data needs to be queried repeatedly in a relatively short space of time, going out to HDFS or other external disk formats is not cost effective and hence can benefit from Oracle's DBIM technology.
Prior to 18.1, there was no way to get fast scans without loading external data into Oracle. One mechanism that people did use to perform repeated loads and high performance scans in prior releases use was to define an:
Create Materialized View <imxtmv>
Build Immediate Refresh Complete On Demand As
Select * From <external table> INMEMORY;
INMEMORY Materialized View and set the session Query_Rewrite_Integrity parameter to STALE_TOLERATED. In this case you would see the query plan say: MAT_VIEW ACCESS INMEMORY FULL as its rowsource.
Other related requests that I'm aware have been made include
I personally am not aware of any plans to implement these.
In 18.1 we implemented the INMEMORY MEMCOMPRESS clause for External Tables that use the two legacy drivers (ORACLE_LOADER and ORACLE_DATAPUMP). We are still working on the adding support for the Big data drivers (ORACLE_HDFS and ORACLE_HIVE).
The INMEMORY clause goes at the end of the table definition in the same section as the Reject Limit clause:.
create table s_et(
s_suppkey number ,
s_name char(25) ,
s_address varchar2(40) ,
s_nationkey number ,
s_phone char(15) ,
s_acctbal number ,
organization external (
default directory T_WORK
records delimited by newline
fields terminated by '|'
missing field values are null
reject limit unlimited
INMEMORY MEMCOMPRESS FOR CAPACITY;
All the DBIM compression levels are supported and have the same meanings as they do with heap tables:
If you use 'INMEMORY' on its own you will get 'INMEMORY MEMCOMPRESS FOR QUERY LOW'
One key difference with heap tables is that an In-Memory External Table must be fully loaded into memory before a table scan can use it. External Tables do not currently support hybrid In-Memory/On-Disk scans. See below under 'Data Dictionary' for how to use v$im_segments to check its population status.
In-Memory External Tables function like refresh on-demand Materialized Views: once we've populated the In-Memory area we are unaware of any changes to the underlying external files pointed to by the Location clause. Unlike heap tables, External Tables don't generally support DMLs so the issue is mostly that one or more of the files may have been replaced with different (newer?) versions.
Therefore, since an In-Memory scan could return different results to an external scan we need the user to tell us explicitly that this is OK (just as you have to when using refresh on-demand Materialized Views). This is done by setting:
SQL> alter session set query_rewrite_integrity=stale_tolerated;
For the use cases outlined at the top, this actually provides a real benefit. Let's say, for example, that an external process is producing output in csv format or a map-reduce job is creating HDFS files. The decoupling of the In-Memory snapshot from the disk file allows the external process to complete without interrupting any queries that are running. Then, when the external process is complete, the last step can be to call dbms_inmemory.repopulate to create a new In-Memory snapshot of the external data. Any queries currently running against the old snapshot will hold a read latch on IMCUs and so should complete successfully.
The In-Memory attributes of an External Table can be seen in any of six views
So for the example table shown above we could see
SQL> column TABLE_NAME format a10
SQL> select table_name, inmemory, inmemory_compression
2 from user_tables where EXTERNAL = 'YES'
TABLE_NAME INMEMORY INMEMORY_COMPRESS
---------- -------- -----------------
S_ET ENABLED FOR QUERY LOW
3 rows selected.
We have also enhanced some of the v$ views used by DBIM to include External Tables
v$im_segments and v$im_segment_detail both gain a new column called 'IS_EXTERNAL' which has values TRUE, FALSE. For example, let's load our example table and also a heap table loaded from it:
SQL> exec dbms_inmemory.populate(USER,'S_ET');
PL/SQL procedure successfully completed.
SQL> exec dbms_inmemory.populate(USER,'SUPPLIER');
PL/SQL procedure successfully completed.
SQL> select SEGMENT_NAME,INMEMORY_SIZE,BYTES_NOT_POPULATED,POPULATE_STATUS,IS_EXTERNAL
SEGMENT_NAME INMEMORY_SIZE BYTES_NOT_POPULATED POPULATE_STAT IS_EX
------------ ------------- ------------------- ------------- -----
SUPPLIER 2359296 0 COMPLETED FALSE
S_ET 2228224 0 COMPLETED TRUE
Before querying an In-Memory External Table, you need to check the bytes_not_populated, and populate_status columns of v$im_segments to be sure that the table is indeed fully loaded.