Oracle Database 21c In-Memory Full Text Columns

November 9, 2021 | 10 minute read
Andy Rivenes
Product Manager
Text Size 100%:

Another new feature in Oracle Database 21c is a feature we call In-Memory Full Text Columns. In the past we didn't support predicates for non-scalar documents. In other words, if you were using domain indexes for Oracle full text index, XML Search Index, or JSON Search Index then we didn't populate those in the IM column store. This limited the speed with which you could search on non-scalar type data. Now in Oracle Database 21c Database In-Memory supports the ability to store, or populate, these types of domain-specific indexes. The In-Memory Full Text Columns feature supports the following data types:

•    CHAR
•    VARCHAR2
•    CLOB
•    BLOB
•    JSON

This means that for queries that use CONTAINS() and JSON_TEXTCONTAINS() Database In-Memory can now push those operators into the scan of the in-memory columnar data as SQL predicates just as we push other predicates and aggregations into the in-memory scan.

Let's take a look at a simple example to help illustrate what's going on with In-Memory Full Text Columns. Since In-Memory Full Text Columns is implemented with In-Memory Expression/Virtual Column technology we need to make sure that the database is enabled with MAX_STRING_SIZE set to extended and INMEMORY_EXPRESSIONS_USAGE and INMEMORY_VIRTUAL_COLUMNS set to appropriate values. You can find more information about In-Memory Expressions in our three part blog series starting here.

OK, now that we've got the pre-requisites out of the way let's try an example. I'm using the Chicago Crimes data set and will focus on the DESCRIPTION column to create an In-Memory Full Text Column. The dataset doesn't really matter since I'm only interested in showing you the mechanics of how to set up and use an In-Memory Full Text Column, but the SSB schema, which I usually use for examples, doesn't have any meaningful text columns.

The resulting table looks like the following:

SQL> desc chicago_data
 Name                          Null?    Type
 ----------------------------- -------- --------------------
 ID                                     NUMBER
 CASE_NUMBER                            VARCHAR2(8)
 C_DATE                                 VARCHAR2(30)
 BLOCK                                  VARCHAR2(35)
 IUCR                                   VARCHAR2(10)
 PRIMARY_TYPE                           VARCHAR2(40)
 DESCRIPTION                            VARCHAR2(100)
 LOCATION_DESC                          VARCHAR2(100)
 ARREST                                 VARCHAR2(20)
 DOMESTIC                               VARCHAR2(20)
 BEAT                                   VARCHAR2(20)
 DISTRICT                               VARCHAR2(20)
 WARD                                   NUMBER
 COMMUNITY                              VARCHAR2(20)
 FBI_CODE                               VARCHAR2(20)
 X_COORD                                NUMBER
 Y_COORD                                NUMBER
 C_YEAR                                 NUMBER
 UPDATED_ON                             VARCHAR2(30)
 LATTITUDE                              NUMBER
 LONGITUDE                              NUMBER
 LOCATION                               VARCHAR2(40)

SQL> 

Next we'll define the In-Memory Full Text Column, and enable and populate the table in the IM column store:

SQL> alter table chicago_data inmemory text (description);

Table altered.

SQL> alter table chicago_data inmemory priority high;

Table altered.

SQL> exec dbms_inmemory.populate('CHICAGO','CHICAGO_DATA');

SQL> @imseg

                                                                        BYTES NOT
SEGMENT_NAME                    BYTES POP STATUS       INMEMORY_SIZE    POPULATED
-------------------- ---------------- ------------- ---------------- ------------
CHICAGO_DATA            1,739,448,320 COMPLETED          657,981,440            0
SQL>


And finally we'll query the table searching for text within the DESCRIPTION column:

SQL> select count(*)
  2  from chicago_data
  3  where district = '009'
  4  and CONTAINS(description, 'BATTERY', 1) > 0;

  COUNT(*)
----------
     29099

SQL>

Note that I've used a CONTAINS() operator to make use of the In-Memory Full Text Column. That query generates the following execution plan:

--------------------------------------------------------------------------------------------
| Id  | Operation                   | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |              |       |       |  2220 (100)|          |
|   1 |  SORT AGGREGATE             |              |     1 |   217 |            |          |
|*  2 |   TABLE ACCESS INMEMORY FULL| CHICAGO_DATA | 29099 |  6166K|  2220   (4)| 00:00:01 |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - inmemory(("DISTRICT"='009' AND SYS_CTX_CONTAINS2("DESCRIPTION" , 'BATTERY' ,
              SYS_CTX_MKIVIDX("DESCRIPTION" RETURNING RAW(32767)))>0))
       filter(("DISTRICT"='009' AND SYS_CTX_CONTAINS2("DESCRIPTION" , 'BATTERY' ,
              SYS_CTX_MKIVIDX("DESCRIPTION" RETURNING RAW(32767)))>0))

Notice that we have performed an in-memory query (i.e. TABLE ACCESS INMEMORY FULL) and that in the predicate information section we see a SYS_CTX_CONTAINS2 function with input for the DESCRIPTION column and the search value "BATTERY". This indicates a context index access and is the result of the CONTAINS clause in the SQL WHERE clause and our definition of an In-Memory Full Text Column. This is documented in the Application Developer's Guide if you want to explore this in more detail.

Let's also take a look at the session stats generated by the query:

NAME                                                              VALUE
-------------------------------------------------- --------------------
CPU used by this session                                             11
IM scan CUs columns accessed                                         14
IM scan CUs memcompress for query low                                 7
IM scan EU rows                                                 6821896
IM scan EUs columns accessed                                          7
IM scan EUs memcompress for query low                                 7
IM scan rows                                                    6821896
IM scan rows projected                                            29099
IM scan segments minmax eligible                                      7
session logical reads                                            212426
session logical reads - IM                                       212335
session pga memory                                             25168120
table scans (IM)                                                      1

The astute may notice that we have some "IM scan EU" stats. These statistics occur when In-Memory Expressions are used. As I said at the beginning, In-Memory Full Text Columns are implemented using In-Memory Expressions under the covers. We can see this if we take a look at the view USER_TAB_COLS for the CHICAGO_DATA table and look for SYS type column names:

TABLE_NAME   COLUMN_NAME                                   DATA_DEFAULT
------------ --------------------------------------------- ---------------------------------------------------
CHICAGO_DATA SYS_IME_IVDX_8D0B6C4734154FFEBFA621566E6E3C1F SYS_CTX_MKIVIDX("DESCRIPTION" RETURNING RAW(32767))

Note that the column name starts with SYS_IME_ and the DATA_DEFAULT has the call to the same function we saw in the predicate section of the execution plan.

To conclude, the In-Memory Full Text Columns feature provides a faster way to perform searches on text type data and does not require maintaining a separate Oracle Text index. This fits in nicely with our work to support all types of application workload with the Oracle converged database.

 

 

Andy Rivenes

Product Manager

Andy Rivenes is a Senior Principal Product Manager at Oracle Corporation and is a Product Manager for Database In-Memory. Andy has been working with Oracle products since 1992 and spent a large part of his career managing Oracle databases for both custom and E-Business Suite applications. Andy now spends his time helping customers implement Database In-Memory and providing information about Database In-Memory through blogs, conferences, and customer visits.


Previous Post

Oracle Analytics Cloud Uses Database In-Memory

Andy Rivenes | 1 min read

Next Post


Oracle Database 21c Enhanced In-Memory External Table Support

Andy Rivenes | 9 min read