Objective & Scope

The objective of this blog is to show how Oracle Database In-Memory features accelerate queries made on JSON data stored in Oracle Database.

The below contents works with all depolyment methods of Oracle Database (Base DB, Autonomous, ExaCS, ExaCC, Exadata).

The example below is made using Base DB, however the result varies from one deployment method to another.

Introduction

JSON Data In Oracle Database

Oracle Database 23ai supports JSON data type natively with all relational database features like transactions, indexing, declarative querying, views, external table, ….

It is also possible to project JSON data type relationally, making it available for applications expecting relational data via Duality Views.

Oracle Database automatically creates virtual columns for columns with JSON data type.

Oracle Database In-Memory

The In-Memory Column Store is an optional SGA pool that stores copies of tables, partitions and materialzied views in a special columnar format optimized for rapid scans. It supplements the row-based storage in the database Buffer Cache. In-Memory offers great performance because it also takes advantage of a type of hardware acceleration called SIMD processing. 

Columns with JSON data type accessed via the Columnar Store of 23ai offer a great performance compared to earlier releases. One of the reasons is that JSON columns are represented as virtual columns.

In-Memory Exprssions is an In-Memory feature introduced in 12.2. It improves performance by pre-calculating expressions in queries and storing them in virtual columns.
For example: SELECT avg(lo_ordtotalprice / 130) “Average Price” FROM lineorder. Instead of spending time calculating the average of the division for each retrieved row at runtime, In-Memory Expressions pre-calculates the average of the division and stores the value in an extra virtual column in the Columnar Store. Hence queries run much faster due to that.

In-Memory Expressions work with JSON SQL functions and conditions like JSON_TABLE, JSON_QUERY, JSON_VALUE, JSON_EXISTS, and JSON_TEXTCONTAINS. They work in exactly the same way as the above example. 

Oracle stores the value of each column in separate In-Memory Compression Units (IMCU). In the below diagram, we see one table with each column stored in 1 or more IMCUs:

IMCUs

To view the columns stored for a specific table, we can issue the following SQL:

SELECT column_name, virtual_column, hidden_column 
FROM dba_tab_cols 
WHERE table_name = 'LINEORDER'
ORDER BY column_name DESC;

COLUMN_NAME             VIRTUAL HIDDEN
----------------------- ------- ------
LO_TAX                  NO      NO
LO_SUPPLYCOST           NO      NO
LO_SUPPKEY              NO      NO
LO_SHIPPRIORITY         NO      NO
LO_SHIPMODE             NO      NO
LO_REVENUE              NO      NO
LO_QUANTITY             NO      NO
LO_PARTKEY              NO      NO
LO_ORDTOTALPRICE        NO      NO
LO_ORDERPRIORITY        NO      NO
...

When we describe a JSON Collection Table, we see that it will always have 1 ccolumn called DATA of type JSON.

CREATE JSON COLLECTION TABLE json_flights_col;
Table Created.

DESC JSON_FLIGHTS_COL
Name                                      Null?    Type
----------------------------------------- -------- ----------------------------
DATA                                               JSON

Examining DBA_TAB_COLS to check the associated columns there are always other hidden columns (virtual and physical):

SELECT column_name, virtual_column, hidden_column
FROM dba_tab_cols
WHERE table_name = 'JSON_FLIGHTS_COL'
ORDER BY column_name;
COLUMN_NAME                                   VIRTUAL HIDDEN
------------------------------                ------- ------
DATA                                          NO      NO
ETAG                                          NO      YES
RESID                                         NO      YES
SYS_IME_OSON_FB28D4C028A44FE0BF532D33EF2840D6 YES     YES

The column SYS_IME_OSON_* hidden column is used when data is loaded into the IM column store, to optimize in-memory performance.
Also, when In-Memory Expressions are added to a query, then virtual columns are added for each expression discovered (see below).

Note: It is also posible to create JSON virtual columns explictly via DBMS_JSON.ADD_VIRTUAL_COLUMNS procedure. These manually created columns can be made hidden or visible.

Setting Parameters

For the below example, we are using a JSON Collection Table called JSON_Flights_Col. The table has 16 million rows of open source aeroplane flights information.
Below is a sample of the data:

{"FL_DATE":"2006-01-01","DEP_DELAY":5,"ARR_DELAY":19,"AIR_TIME":350,"DISTANCE":2475,"DEP_TIME":9.083333,"ARR_TIME":12.483334}
{"FL_DATE":"2006-01-02","DEP_DELAY":167,"ARR_DELAY":216,"AIR_TIME":343,"DISTANCE":2475,"DEP_TIME":11.783334,"ARR_TIME":15.766666}
{"FL_DATE":"2006-01-03","DEP_DELAY":-7,"ARR_DELAY":-2,"AIR_TIME":344,"DISTANCE":2475,"DEP_TIME":8.883333,"ARR_TIME":12.133333}
{"FL_DATE":"2006-01-04","DEP_DELAY":-5,"ARR_DELAY":-13,"AIR_TIME":331,"DISTANCE":2475,"DEP_TIME":8.916667,"ARR_TIME":11.95}
{"FL_DATE":"2006-01-05","DEP_DELAY":-3,"ARR_DELAY":-17,"AIR_TIME":321,"DISTANCE":2475,"DEP_TIME":8.95,"ARR_TIME":11.883333}
{"FL_DATE":"2006-01-06","DEP_DELAY":-4,"ARR_DELAY":-32,"AIR_TIME":320,"DISTANCE":2475,"DEP_TIME":8.933333,"ARR_TIME":11.633333}

INMEMORY_SIZE controls the size of the Columnar Store. Please check Enabling and Sizing the IM Column Store for sizing.

$sqlplus sys/pw@PDB as Sysdba
ALTER SYSTEM SET INMEMORY_SIZE = xxxg;

To be able to use In-Memory Expressions, the following init.ora parameters must be set:

-- Necessary for In-Memory to start populating Expressions based on Virtual Columns
ALTER SYSTEM SET INMEMORY_EXPRESSIONS_USAGE = ENABLE; -- default value

-- Controls which user-defined virtual columns are stored as In-Memory virtual columns.
-- for JSON data the value maybe STATIC_ONLY or ENABLE.
ALTER SYSTEM SET INMEMORY_VIRTUAL_COLUMNS = <STATIC_ONLY | ENABLE>; -- default MANUAL 

-- Only in 23ai, date-based analytic queries run much faster by leveraging the In-Memory Expressions framework. 
ALTER SYSTEM SET inmemory_optimized_date = ENABLE; -- default DISABLE

-- The following parameter accelerates In-Memory access via native Number calculations using SIMD hardware acceleration.
-- Used in conjunction with For Query Low In-Memory Compression
-- Does not affect In-Memory Expressions usage.
ALTER SYSTEM SET inmemory_optimized_arithmetic = ENABLE; -- default DISABLE

Loading Data

After setting the parameters, we can initiate loading the JSON Collection Table into the Columnar Store:

-- Load the table into the Columnar Store
ALTER TABLE json_flights_col INMEMORY PRIORITY CRITICAL MEMCOMPRESS FOR QUERY LOW;

-- Monitor the loading process by checking BYTES_NOT_POPULATED & POPULATION_STATUS columns for the loading status.
-- Note that for RAC databases, In-Memory can distribute the data across RAC nodes, hence checking INST_ID and using GV$IM_SEGMENTS.
SELECT Inst_Id, Owner, Segment_Name, Bytes_Not_Populated/1024/1024 Pending_MB, Populate_Status FROM GV$IM_Segments;

Loading is complete when BYTES_NOT_POPULATED = 0 and POPULATE_STATUS = COMPLETE.

Execute The Queries Without Expressions

Start by running the queries against the Buffer Cache for the sake of benchmarking by using NO_INMEMORY hint: 

-- Enable query timing, tracing without producing a query output, and setting the line size to accomodate the Plan Table
SET autoTrace traceOnly timing on lines 264

-- Let us first retrieve data from the Buffer Cache by using NO_INMEMORY hint for comparison.
-- Note the exeuction time, and the access method TABLE ACCESS FULL.
SELECT /*+ NO_INMEMORY parallel (2) */ avg(JSON_VALUE(fl.data, '$.ARR_DELAY')) "Average Delay Time",
                                       avg(JSON_VALUE(fl.data, '$.AIR_TIME')) "Average Air Time"
FROM JSON_flights_col fl
ORDER BY "Average Delay Time";

-- Retrieve data from the Columnar Store - default behaviour
-- Note the exeuction time, and the access method TABLE ACCESS INMEMORY FULL.
SELECT /*+ parallel (2) */ avg(JSON_VALUE(fl.data, '$.ARR_DELAY')) "Average Delay Time", 
                           avg(JSON_VALUE(fl.data, '$.AIR_TIME')) "Average Air Time" 
FROM JSON_flights_col fl 
ORDER BY "Average Delay Time";

Capture Expressions and Re-Execute The Queries

Next, we will capture the IMEs that were implicitly created based on the query we just issued:

-- Flush database monitoring information to disk.
-- This happens automatically every 15 minutes, but we want don't want to wait, so we issue the below statement.
Exec DBMS_STATS.FLUSH_DATABASE_MONITORING_INFO;

-- Capture In-Memory Expressions for the query we just executed.
Exec DBMS_INMEMORY_ADMIN.IME_CAPTURE_EXPRESSIONS('CURRENT');

-- Populate captured IM Expressions into the Columnar Store
-- This step will re-populate the Columar Store because we added the newly discovered IME in a form of a virtual column.
Exec DBMS_INMEMORY_ADMIN.IME_POPULATE_EXPRESSIONS();

Confirm that the In-Memory Expressions have been captured:

SELECT table_name, column_name, sql_expression 
FROM dba_im_expressions;

TABLE_NAME           COLUMN_NAME               SQL_EXPRESSION
-------------------- ------------------------- --------------------------------------------------------------------------------
JSON_FLIGHTS_COL     SYS_IME000100000A09D45F   TO_NUMBER(JSON_VALUE("DATA" FORMAT OSON , '$.ARR_DELAY' RETURNING VARCHAR2(4000)
JSON_FLIGHTS_COL     SYS_IME0001000018FFFB6C   TO_NUMBER(JSON_VALUE("DATA" FORMAT OSON , '$.AIR_TIME' RETURNING VARCHAR2(4000)

Note: Because the query included 2 columns from the JSON Collection table (ARR_DELAY & AIR_TIME), we see that the In-Memory engine captured 2 expressions.
This is because we used the function JSON_VALUE in the query. So this is equivalent of making a calculation on the retrieved column for non-JSON Collection Tables. 

The table definition will now include the new virtual column that is visible from DBA_TAB_COLUMNS also:

SELECT column_name, virtual_column, hidden_column
FROM dba_tab_cols
WHERE table_name = 'JSON_FLIGHTS_COL'
ORDER BY column_name;

COLUMN_NAME                                  VIRTUAL HIDDEN
-------------------------------------------- ------- ------
DATA                                          NO      YES
ETAG                                          NO      YES
RESID                                         NO      YES
SYS_IME_OSON_FB28D4C028A44FE0BF532D33EF2840D6 YES     YES
SYS_IME000100000A09D45F                       YES     YES
SYS_IME0001000018FFFB6C                       YES     YES

Note: we also see that the JSON_FLIGHTS_COL collection table also included 2 extra virtual columns, each representing the JSON columns (ARR_DELAY & AIR_TIME). 

The procedure DBMS_INMEMORY_ADMIN.IME_POPULATE_EXPRESSSIONS automatically reloads the Columnar Store with the captured virtual column(s).
Hence, we need to make sure that the data has been completely reloaded:

SELECT Inst_Id, Owner, Segment_Name, Populate_Status, Bytes_Not_Populated 
FROM GV$IM_Segments;

Confirm Whether In-Memory Expressions Are Used Or Not

To be able to confirm if In-Memory Expressions are used, we use the following query on the same session that issued the query.
If we see “IM scan EU ..” in the query result, then it is a confirmation that IMEs have been used.
Below we see that “IM scan EU” statistic is used, indiciating that In-Memory Expression Units are used:

SELECT b.Name, To_Char(a.Value,'9,999,999,999') Value
FROM V$SesStat a, V$StatName b
WHERE a.Statistic# = b.Statistic#
  AND a.Value > 0
  AND b.Name Like 'IM scan EU%';
NAME                                   VALUE 
-------------------------------------- ------------ 
IM scan EUs memcompress for query low            72 
IM scan EU rows                          20,827,654 
IM scan EUs columns accessed                     72

Sample Results

The table below summarizes the results ran on OCI base DB 23.8:

Results

It is important to note that each database yields different results, depending on the hardware used (server, storage and network) and data distribution.

On the previous section we ran a single query 3 times:

  1. Reading from the Buffer Cache, using NO_INMEMORY hint — 35 seconds.
  2. Once from the Columnar Store – no Expressions — 4 seconds.
  3. Once from the Columnar Store – with Expressions enabled — 0.11 seconds.

Restrictions

For In-Memory Expressions to be used to accelerate JSON data, the following restrictions apply:

  • Database compatibility is 12.2.0.0 or higher.JSON Tables / Collection Tables data resides in In-Memory using INMEMORY clause.
  • For 23ai, data type is JSON. For earlier releases initialization parameter MAX_STRING_SIZE must be set to EXTENDED.
  • INMEMORY_EXPRESSIONS_USAGE = STATIC_ONLY or ENABLE.
  • JSON data must reside in an In-Memory table by use of ALTER | CREATE TABLE .. INMEMORY clause.
  • For Autonomous Database In-Memory is available on an instance with a minimum of 16 ECPUs.
  • The In-Memory Column Store is supported only for JSON documents smaller than 32 KBytes.

Conclusion

  • Oracle Database 23ai make use of JSON column data type.
  • Each JSON column used in a query, In-Memory automatically creates a corresponding virtual column to the queried table.  
  • Once a table is loaded into the Columnar Store, the performance is greatly enhanced.
  • In-Memory Expressions offers a superior performance for JSON data stored in Oracle Database 23ai.

References