Another new feature in Oracle Database 21c is a feature we call In-Memory Full Text Columns. In the past we didn’t support predicates for non-scalar documents. In other words, if you were using domain indexes for Oracle full text index, XML Search Index, or JSON Search Index then we didn’t populate those in the IM column store. This limited the speed with which you could search on non-scalar type data. Now in Oracle Database 21c Database In-Memory supports the ability to store, or populate, these types of domain-specific indexes. The In-Memory Full Text Columns feature supports the following data types:
• CHAR
• VARCHAR2
• CLOB
• BLOB
• JSON
This means that for queries that use CONTAINS() and JSON_TEXTCONTAINS() Database In-Memory can now push those operators into the scan of the in-memory columnar data as SQL predicates just as we push other predicates and aggregations into the in-memory scan.
Let’s take a look at a simple example to help illustrate what’s going on with In-Memory Full Text Columns. Since In-Memory Full Text Columns is implemented with In-Memory Expression/Virtual Column technology we need to make sure that the database is enabled with MAX_STRING_SIZE set to extended and INMEMORY_EXPRESSIONS_USAGE and INMEMORY_VIRTUAL_COLUMNS set to appropriate values. You can find more information about In-Memory Expressions in our three part blog series starting here.
OK, now that we’ve got the pre-requisites out of the way let’s try an example. I’m using the Chicago Crimes data set and will focus on the DESCRIPTION column to create an In-Memory Full Text Column. The dataset doesn’t really matter since I’m only interested in showing you the mechanics of how to set up and use an In-Memory Full Text Column, but the SSB schema, which I usually use for examples, doesn’t have any meaningful text columns.
The resulting table looks like the following:
SQL> desc chicago_data Name Null? Type ----------------------------- -------- -------------------- ID NUMBER CASE_NUMBER VARCHAR2(8) C_DATE VARCHAR2(30) BLOCK VARCHAR2(35) IUCR VARCHAR2(10) PRIMARY_TYPE VARCHAR2(40) DESCRIPTION VARCHAR2(100) LOCATION_DESC VARCHAR2(100) ARREST VARCHAR2(20) DOMESTIC VARCHAR2(20) BEAT VARCHAR2(20) DISTRICT VARCHAR2(20) WARD NUMBER COMMUNITY VARCHAR2(20) FBI_CODE VARCHAR2(20) X_COORD NUMBER Y_COORD NUMBER C_YEAR NUMBER UPDATED_ON VARCHAR2(30) LATTITUDE NUMBER LONGITUDE NUMBER LOCATION VARCHAR2(40) SQL>
Next we’ll define the In-Memory Full Text Column, and enable and populate the table in the IM column store:
SQL> alter table chicago_data inmemory text (description);
Table altered.
SQL> alter table chicago_data inmemory priority high;
Table altered.
SQL> exec dbms_inmemory.populate('CHICAGO','CHICAGO_DATA');
SQL> @imseg
BYTES NOT
SEGMENT_NAME BYTES POP STATUS INMEMORY_SIZE POPULATED
-------------------- ---------------- ------------- ---------------- ------------
CHICAGO_DATA 1,739,448,320 COMPLETED 657,981,440 0
SQL>
And finally we’ll query the table searching for text within the DESCRIPTION column:
SQL> select count(*)
2 from chicago_data
3 where district = '009'
4 and CONTAINS(description, 'BATTERY', 1) > 0;
COUNT(*)
----------
29099
SQL>
Note that I’ve used a CONTAINS() operator to make use of the In-Memory Full Text Column. That query generates the following execution plan:
--------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 2220 (100)| |
| 1 | SORT AGGREGATE | | 1 | 217 | | |
|* 2 | TABLE ACCESS INMEMORY FULL| CHICAGO_DATA | 29099 | 6166K| 2220 (4)| 00:00:01 |
--------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - inmemory(("DISTRICT"='009' AND SYS_CTX_CONTAINS2("DESCRIPTION" , 'BATTERY' ,
SYS_CTX_MKIVIDX("DESCRIPTION" RETURNING RAW(32767)))>0))
filter(("DISTRICT"='009' AND SYS_CTX_CONTAINS2("DESCRIPTION" , 'BATTERY' ,
SYS_CTX_MKIVIDX("DESCRIPTION" RETURNING RAW(32767)))>0))
Notice that we have performed an in-memory query (i.e. TABLE ACCESS INMEMORY FULL) and that in the predicate information section we see a SYS_CTX_CONTAINS2 function with input for the DESCRIPTION column and the search value “BATTERY”. This indicates a context index access and is the result of the CONTAINS clause in the SQL WHERE clause and our definition of an In-Memory Full Text Column. This is documented in the Application Developer’s Guide if you want to explore this in more detail.
Let’s also take a look at the session stats generated by the query:
NAME VALUE -------------------------------------------------- -------------------- CPU used by this session 11 IM scan CUs columns accessed 14 IM scan CUs memcompress for query low 7 IM scan EU rows 6821896 IM scan EUs columns accessed 7 IM scan EUs memcompress for query low 7 IM scan rows 6821896 IM scan rows projected 29099 IM scan segments minmax eligible 7 session logical reads 212426 session logical reads - IM 212335 session pga memory 25168120 table scans (IM) 1
The astute may notice that we have some “IM scan EU” stats. These statistics occur when In-Memory Expressions are used. As I said at the beginning, In-Memory Full Text Columns are implemented using In-Memory Expressions under the covers. We can see this if we take a look at the view USER_TAB_COLS for the CHICAGO_DATA table and look for SYS type column names:
TABLE_NAME COLUMN_NAME DATA_DEFAULT
------------ --------------------------------------------- ---------------------------------------------------
CHICAGO_DATA SYS_IME_IVDX_8D0B6C4734154FFEBFA621566E6E3C1F SYS_CTX_MKIVIDX("DESCRIPTION" RETURNING RAW(32767))
Note that the column name starts with SYS_IME_ and the DATA_DEFAULT has the call to the same function we saw in the predicate section of the execution plan.
To conclude, the In-Memory Full Text Columns feature provides a faster way to perform searches on text type data and does not require maintaining a separate Oracle Text index. This fits in nicely with our work to support all types of application workload with the Oracle converged database.
Original publish date: November 9, 2021
