New full-text search in 23ai: Ubiquitous Database Search

Do you know how to search in Oracle Database? There is a functionality called Oracle Text which allows a full-text search that is fully integrated into Oracle Database. In Oracle Database 23, there is a new feature called Ubiquitous Database Search. It helps in cases where you need to search in different data sources. Only one command is needed to create a self managed index and another one to add different sources. Without writing additional PL/SQL code you can include different table data sources in your search.

The following posting is subdivided in

A Short Oracle Text Review
Step 1: Index Creation for Ubiquitous Database Search
Step 2: Adding Data Sources
Step 3: Review the virtual document
Step 4: Run queries against your index
Adding more Data Sources
Monitoring

Oracle Text Review

Oracle Text is included in all database editions and is available directly without any further installation. You can start immediately in a “normal” database schema. The simplest use case is to perform a full-text search within a single column. But even more complex cases are supported: Search within multiple columns in the same data source (MULTI_COLUMN_DATASTORE). Also searches within multiple data sources are supported (USER_DATASTORE). In this case a stored procedures with a special signature to synthesize documents during indexing is required. In all scenarios only one index is required to fullfill all kind of full-text search criteria. Examples are operators such as fuzzy, soundex, near etc.

What is new and special with Ubiquitous Database Search? It is much easier now to use Oracle Text in complex use cases where multiple data sources are referenced. There is a new simple PL/SQL interface called DBMS_SEARCH to create an index across multiple objects, easily add data sources (specific tables or views) into the index or also remove them. This strongly simplifies indexing tasks. For example it is not necessary to define complex USER_DATASTORE procedures. This enhances developer productivity a lot. In addition the synchronization of the index will be maintained automatically.

So let’s demonstrate this with a simple example to get an idea how it works.

Step 1: Index Creation for Ubiquitous Database Search

The new interface DBMS_SEARCH allows the indexing of multiple schema objects in a single index. Detailed information can be found in the documentation in Oracle Text Reference: DBMS_SEARCH

Before you start you may install the Oracle Database sample schema from github. In my example I will use the tables from the schema “customer orders” (user CO) to illustrate the functionality. After gathering schema statistics it should look like this:

SQL> select table_name, num_rows from user_tables;

TABLE_NAME                       NUM_ROWS
------------------------------ ----------
CUSTOMERS                             392
STORES                                 23
PRODUCTS                               46
ORDERS                               1950
SHIPMENTS                            1892
ORDER_ITEMS                          3914
INVENTORY                             566

Let’s create an ubiquitous index with the name SEARCH_PRODUCTS. Use the CREATE_INDEX procedure to create an ubiquitous search index. It uses a JSON search index technology with some default indexing preferences.

execute DBMS_SEARCH.CREATE_INDEX(index_name=>'SEARCH_PRODUCTS');

It creates the infrastructure for Oracle Text including a table SEARCH_PRODUCTS with two JSON columns DATA and METADATA. The METADATA column uniquely identifies each row of a particular data source, using the OWNER, SOURCE, and KEY (or ROWID) JSON values.

SQL> desc SEARCH_PRODUCTS
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 METADATA                                  NOT NULL JSON
 DATA                                               JSON
 OWNER                                              VARCHAR2(128)
 SOURCE                                             VARCHAR2(128)
 KEY                                                VARCHAR2(1024)

If you need to drop the index first, use this.

execute DBMS_SEARCH.DROP_INDEX('SEARCH_PRODUCTS');

Step 2: Add Data Sources

You can add a set of tables, external tables, or views as data sources to this index. All the columns in the specified sources are indexed and available then for a full-text search. In our case let’s add the table SHIPMENTS and STORES. The ADD_SOURCE procedure adds the tables to the DBMS_SEARCH index as a data source.

Note: You can use the DBMS_SEARCH PL/SQL package to create a ubiquitous search index on multiple tables and views within a schema. In general, you can create this index only on the views that have a primary key and a foreign key constraint relationship with the component table (see also 3.1.8 Indexing and Views) .

In our case I made sure that there are primary keys and foreign Keys defined for the two tables.

execute DBMS_SEARCH.ADD_SOURCE(index_name =>'SEARCH_PRODUCTS', source_name => 'SHIPMENTS');
execute DBMS_SEARCH.ADD_SOURCE(index_name =>'SEARCH_PRODUCTS', source_name => 'STORES');

Step 3: Review the virtual document

To get an idea what you can search for, execute the GET_DOCUMENT procedure. The GET_DOCUMENT procedure returns a virtual indexed document that is created after populating it with multiple columns, tables, or views. You can review this document before running a query against the DBMS_SEARCH index as follows.
Note: It uses a user data store to assemble a JSON document on the fly.

SQL> set long 1000 longc 500

SQL> select DBMS_SEARCH.GET_DOCUMENT (INDEX_NAME=>'SEARCH_PRODUCTS', DOCUMENT_METADATA=>METADATA) output 
     from SEARCH_PRODUCTS;
OUTPUT
-----------------------------------------------------------------------------------------------------------------------------------------------------------
{"CO":{"STORES":{"LOGO_CHARSET":null,"LOGO_FILENAME":null,"LONGITUDE":34.862138,"STORE_ID":23,"LOGO":null,"LOGO_MIME_TYPE":null,"LATITUDE":32.100664,"STORE_NAME":"Tel Aviv","LOGO_LAST_UPDATED":null,"WEB_ADDRESS":null,"PHYSICAL_ADDRESS":"B, Aharon Bart St 18, Petah Tikva, 4951400, Israel"}}}
...
{"CO":{"SHIPMENTS":{"SHIPMENT_ID":1956,"STORE_ID":1,"CUSTOMER_ID":376,"DELIVERY_ADDRESS":"Pecos, NM 87552 USA","SHIPMENT_STATUS":"IN-TRANSIT"}}}

1915 rows selected.

Step 4: Run queries against your index

You can run queries using the JSON_TEXTCONTAINS and CONTAINS operators.

First let’s use CONTAINS with the operator FUZZY. Remember the FUZZY operator expands queries to include words that are spelled similarly to the specified term. The result from the tables SHIPMENTS and STORES looks like this. It contains results from both tables.

SQL> SELECT METADATA output from SEARCH_PRODUCTS 
     WHERE CONTAINS(data,'fuzzy(Los)')>0;

OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------
{"OWNER":"CO","SOURCE":"STORES","KEY":{"STORE_ID":10}}
{"OWNER":"CO","SOURCE":"SHIPMENTS","KEY":{"SHIPMENT_ID":976}}
{"OWNER":"CO","SOURCE":"SHIPMENTS","KEY":{"SHIPMENT_ID":469}}
{"OWNER":"CO","SOURCE":"SHIPMENTS","KEY":{"SHIPMENT_ID":470}}
{"OWNER":"CO","SOURCE":"SHIPMENTS","KEY":{"SHIPMENT_ID":1484}}
{"OWNER":"CO","SOURCE":"SHIPMENTS","KEY":{"SHIPMENT_ID":545}}
{"OWNER":"CO","SOURCE":"SHIPMENTS","KEY":{"SHIPMENT_ID":546}}
{"OWNER":"CO","SOURCE":"SHIPMENTS","KEY":{"SHIPMENT_ID":1806}}
{"OWNER":"CO","SOURCE":"SHIPMENTS","KEY":{"SHIPMENT_ID":1807}}
{"OWNER":"CO","SOURCE":"SHIPMENTS","KEY":{"SHIPMENT_ID":854}}
{"OWNER":"CO","SOURCE":"SHIPMENTS","KEY":{"SHIPMENT_ID":855}}
{"OWNER":"CO","SOURCE":"SHIPMENTS","KEY":{"SHIPMENT_ID":1833}}
{"OWNER":"CO","SOURCE":"SHIPMENTS","KEY":{"SHIPMENT_ID":1834}}

-- let's check
SQL> select PHYSICAL_ADDRESS from stores where store_id=10;

PHYSICAL_ADDRESS
--------------------------------------------------------------------------------
C/ Jos?? Echegaray 6B
    Las Rozas          -- here fuzzy(Los) finds the term 'Las' 
    28230 Madrid

SQL> select DELIVERY_ADDRESS from shipments where shipment_id=976;

DELIVERY_ADDRESS
--------------------------------------------------------------------------------
Los Angeles, CA 90026 USA

Let’s try a query with JSON_TEXTCONTAINS. JSON_TEXTCONTAINS checks if a specified string exists in JSON property values or not.

SQL> SELECT metadata from SEARCH_PRODUCTS
     WHERE JSON_TEXTCONTAINS(data,'$.CO.STORES.PHYSICAL_ADDRESS','fuzzy(LOS)');  

METADATA
-----------------------------------------------------------------------------------------------------------------------------------------------------------
{"OWNER":"CO","SOURCE":"STORES","KEY":{"STORE_ID":10}}

Adding more data sources

Now let’s add another source and see if you can include it in a search. Let’s add the table CUSTOMERS.

execute DBMS_SEARCH.ADD_SOURCE(index_name =>'SEARCH_PRODUCTS', source_name => 'CUSTOMERS');

Now let’s query. We use the term ‘jon’ – more precisely fuzzy(jon) – to find results in the table CUSTOMERS.

SQL> SELECT METADATA output from SEARCH_PRODUCTS WHERE CONTAINS(data,'fuzzy(jon)')>0;

OUTPUT
-----------------------------------------------------------------------------------------------------------------------------------------------------------
{"OWNER":"CO","SOURCE":"CUSTOMERS","KEY":{"CUSTOMER_ID":66}}
{"OWNER":"CO","SOURCE":"CUSTOMERS","KEY":{"CUSTOMER_ID":368}}

2 rows selected.

-- Let's check
SQL> select full_name from customers where customer_id=66;

FULL_NAME
--------------------------------------------------------------------------------
John Venzl

Let’s add a row to the table CUSTOMERS und search for it again.

SQL> insert into customers values (1000,'john.johnson@oracle.com','John Johnson');

1 row created.

SQL> SELECT METADATA output from SEARCH_PRODUCTS WHERE CONTAINS(data,'fuzzy(jon)')>0;

OUTPUT
-----------------------------------------------------------------------------------------------------------------------------------------------------------
{"OWNER":"CO","SOURCE":"CUSTOMERS","KEY":{"CUSTOMER_ID":66}}
{"OWNER":"CO","SOURCE":"CUSTOMERS","KEY":{"CUSTOMER_ID":368}}

2 rows selected.

There are still only two rows. You will find the new row when you commit the change. To make this happen JSON search indexes use a method known as sync on commit by default. That means the index is synchronized automatically as part of the commit.

SQL> commit;
Commit complete.

SQL> SELECT METADATA output from SEARCH_PRODUCTS WHERE CONTAINS(data,'fuzzy(jon)')>0;

OUTPUT
-----------------------------------------------------------------------------------------------------------------------------------------------------------
{"OWNER":"CO","SOURCE":"CUSTOMERS","KEY":{"CUSTOMER_ID":66}}
{"OWNER":"CO","SOURCE":"CUSTOMERS","KEY":{"CUSTOMER_ID":368}}
{"OWNER":"CO","SOURCE":"CUSTOMERS","KEY":{"CUSTOMER_ID":1000}}

3 rows selected.

Monitoring

If you have some experience with Oracle Text, you may know that there are special data dictionary views to monitor and administer the Oracle text index.

Let’s start with USER_INDEXES to get basic information about the created index.

SQL> select INDEX_TYPE,TABLE_OWNER, TABLE_NAME, TABLE_TYPE, STATUS, ITYP_OWNER
     from user_indexes where index_name='SEARCH_PRODUCTS';

INDEX_TYPE                  TABLE_OWNER     TABLE_NAME           TABLE_TYPE  STATUS   ITYP_OWNER
--------------------------- --------------- -------------------- ----------- -------- ----------
DOMAIN                      CO              SEARCH_PRODUCTS      TABLE       VALID    CTXSYS

For those users which are more interested in technical details, query e.g. CTX_INDEX_VALUES. It displays attribute values for each object used in the index. Remember this view is queryable by user CTXSYS only.

SQL> select IXV_CLASS, IXV_OBJECT, IXV_ATTRIBUTE, IXV_VALUE
     from ctxsys.CTX_INDEX_VALUES where IXV_INDEX_OWNER='CO';

IXV_CLASS                      IXV_OBJECT                IXV_ATTRIBUTE             IXV_VALUE
------------------------------ ------------------------- ------------------------- ----------------------------------------
DATASTORE                      USER_DATASTORE            PROCEDURE                 "CTXSYS"."DRVXMD"."GET_DOCUMENT"
DATASTORE                      USER_DATASTORE            OUTPUT_TYPE               JSON
SECTION_GROUP                  PATH_SECTION_GROUP        JSON_ENABLE               YES
SECTION_GROUP                  PATH_SECTION_GROUP        DATAGUIDE                 ON
...
WORDLIST                       BASIC_WORDLIST            WILDCARD_INDEX            YES
WORDLIST                       BASIC_WORDLIST            WILDCARD_INDEX_K          3
STORAGE                        BASIC_STORAGE             R_TABLE_CLAUSE            lob (data) store as (cache)
STORAGE                        BASIC_STORAGE             I_INDEX_CLAUSE            compress 2
STORAGE                        BASIC_STORAGE             STAGE_ITAB                YES
STORAGE                        BASIC_STORAGE             STAGE_ITAB_MAX_ROWS       10000

Let’s explain some details in short: The index type is a JSON search index usually used for indexing and querying JSON documents. As expected it uses the USER_DATASTORE preference with output JSON. The section group tells us if querying within sections with JSON is enabled, and how the document sections are defined. Dataguide is a summary of the structural and type information contained in a set of JSON documents it records metadata about the fields used in those documents.To create persistent dataguide information as part of a JSON search index DATAGUIDE is specified.The BASIC_WORDLIST enables stemming and fuzzy matching. WILDCARD_INDEX is enabled, i.e. fast and efficient wildcard search for all wildcard expressions is possible.
There is also information about the index storage. Important to mention is that the STAGE_ITAB option is enabled. When the STAGE_ITAB index option is enabled, the new documents are stored temporarily in the $G staging table, not in the $I table. This storage ensures that the $I table is not fragmented and does not deteriorate the query performance. Starting with Oracle Database Release 21c, the contents of $G are automatically moved to $I during index synchronization when $G has more than 10K rows by default. This value is controlled by the STAGE_ITAB_MAX_ROWS attribute of the STORAGE preference. More information on that can be found in the documentation.

Further reading

Application Developer’s Guide: Using Ubiquitous Database Search
Oracle Text Reference: DBMS_SEARCH

New full-text search in 23ai: Ubiquitous Database Search

Ulrike Schwinn

Distinguished Data Management Expert

Lock-free reservation: scale your apps

Oracle Database Monthly News - June 2023 - Quick Links

New full-text search in 23ai: Ubiquitous Database Search

Authors

Ulrike Schwinn

Distinguished Data Management Expert

Lock-free reservation: scale your apps

Oracle Database Monthly News - June 2023 - Quick Links