Do you know how to search in Oracle Database? There is a functionality called Oracle Text which allows a full-text search that is fully integrated into Oracle Database. In Oracle Database 23, there is a new feature called Ubiquitous Database Search. It helps in cases where you need to search in different data sources. Only one command is needed to create a self managed index and another one to add different sources. Without writing additional PL/SQL code you can include different table data sources in your search.
The following posting is subdivided in
- A Short Oracle Text Review
- Step 1: Index Creation for Ubiquitous Database Search
- Step 2: Adding Data Sources
- Step 3: Review the virtual document
- Step 4: Run queries against your index
- Adding more Data Sources
- Monitoring
Oracle Text Review
Oracle Text is included in all database editions and is available directly without any further installation. You can start immediately in a “normal” database schema. The simplest use case is to perform a full-text search within a single column. But even more complex cases are supported: Search within multiple columns in the same data source (MULTI_COLUMN_DATASTORE). Also searches within multiple data sources are supported (USER_DATASTORE). In this case a stored procedures with a special signature to synthesize documents during indexing is required. In all scenarios only one index is required to fullfill all kind of full-text search criteria. Examples are operators such as fuzzy, soundex, near etc.
What is new and special with Ubiquitous Database Search? It is much easier now to use Oracle Text in complex use cases where multiple data sources are referenced. There is a new simple PL/SQL interface called DBMS_SEARCH to create an index across multiple objects, easily add data sources (specific tables or views) into the index or also remove them. This strongly simplifies indexing tasks. For example it is not necessary to define complex USER_DATASTORE procedures. This enhances developer productivity a lot. In addition the synchronization of the index will be maintained automatically.
So let’s demonstrate this with a simple example to get an idea how it works.
Step 1: Index Creation for Ubiquitous Database Search
The new interface DBMS_SEARCH allows the indexing of multiple schema objects in a single index. Detailed information can be found in the documentation in Oracle Text Reference: DBMS_SEARCH
Before you start you may install the Oracle Database sample schema from github. In my example I will use the tables from the schema “customer orders” (user CO) to illustrate the functionality. After gathering schema statistics it should look like this:
SQL> select table_name, num_rows from user_tables; TABLE_NAME NUM_ROWS ------------------------------ ---------- CUSTOMERS 392 STORES 23 PRODUCTS 46 ORDERS 1950 SHIPMENTS 1892 ORDER_ITEMS 3914 INVENTORY 566
Let’s create an ubiquitous index with the name SEARCH_PRODUCTS. Use the CREATE_INDEX procedure to create an ubiquitous search index. It uses a JSON search index technology with some default indexing preferences.
execute DBMS_SEARCH.CREATE_INDEX(index_name=>'SEARCH_PRODUCTS');
It creates the infrastructure for Oracle Text including a table SEARCH_PRODUCTS with two JSON columns DATA and METADATA. The METADATA column uniquely identifies each row of a particular data source, using the OWNER, SOURCE, and KEY (or ROWID) JSON values.
SQL> desc SEARCH_PRODUCTS Name Null? Type ----------------------------------------- -------- ---------------------------- METADATA NOT NULL JSON DATA JSON OWNER VARCHAR2(128) SOURCE VARCHAR2(128) KEY VARCHAR2(1024)
If you need to drop the index first, use this.
execute DBMS_SEARCH.DROP_INDEX('SEARCH_PRODUCTS');
Step 2: Add Data Sources
You can add a set of tables, external tables, or views as data sources to this index. All the columns in the specified sources are indexed and available then for a full-text search. In our case let’s add the table SHIPMENTS and STORES. The ADD_SOURCE procedure adds the tables to the DBMS_SEARCH index as a data source.
Note: You can use the DBMS_SEARCH PL/SQL package to create a ubiquitous search index on multiple tables and views within a schema. In general, you can create this index only on the views that have a primary key and a foreign key constraint relationship with the component table (see also 3.1.8 Indexing and Views) .
In our case I made sure that there are primary keys and foreign Keys defined for the two tables.
execute DBMS_SEARCH.ADD_SOURCE(index_name =>'SEARCH_PRODUCTS', source_name => 'SHIPMENTS'); execute DBMS_SEARCH.ADD_SOURCE(index_name =>'SEARCH_PRODUCTS', source_name => 'STORES');
Step 3: Review the virtual document
To get an idea what you can search for, execute the GET_DOCUMENT procedure. The GET_DOCUMENT procedure returns a virtual indexed document that is created after populating it with multiple columns, tables, or views. You can review this document before running a query against the DBMS_SEARCH index as follows.
Note: It uses a user data store to assemble a JSON document on the fly.
SQL> set long 1000 longc 500
SQL> select DBMS_SEARCH.GET_DOCUMENT (INDEX_NAME=>'SEARCH_PRODUCTS', DOCUMENT_METADATA=>METADATA) output
from SEARCH_PRODUCTS;
OUTPUT
-----------------------------------------------------------------------------------------------------------------------------------------------------------
{"CO":{"STORES":{"LOGO_CHARSET":null,"LOGO_FILENAME":null,"LONGITUDE":34.862138,"STORE_ID":23,"LOGO":null,"LOGO_MIME_TYPE":null,"LATITUDE":32.100664,"STORE_NAME":"Tel Aviv","LOGO_LAST_UPDATED":null,"WEB_ADDRESS":null,"PHYSICAL_ADDRESS":"B, Aharon Bart St 18, Petah Tikva, 4951400, Israel"}}}
...
{"CO":{"SHIPMENTS":{"SHIPMENT_ID":1956,"STORE_ID":1,"CUSTOMER_ID":376,"DELIVERY_ADDRESS":"Pecos, NM 87552 USA","SHIPMENT_STATUS":"IN-TRANSIT"}}}
1915 rows selected.
Step 4: Run queries against your index
You can run queries using the JSON_TEXTCONTAINS and CONTAINS operators.
First let’s use CONTAINS with the operator FUZZY. Remember the FUZZY operator expands queries to include words that are spelled similarly to the specified term. The result from the tables SHIPMENTS and STORES looks like this. It contains results from both tables.
SQL> SELECT METADATA output from SEARCH_PRODUCTS
WHERE CONTAINS(data,'fuzzy(Los)')>0;
OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------
{"OWNER":"CO","SOURCE":"STORES","KEY":{"STORE_ID":10}}
{"OWNER":"CO","SOURCE":"SHIPMENTS","KEY":{"SHIPMENT_ID":976}}
{"OWNER":"CO","SOURCE":"SHIPMENTS","KEY":{"SHIPMENT_ID":469}}
{"OWNER":"CO","SOURCE":"SHIPMENTS","KEY":{"SHIPMENT_ID":470}}
{"OWNER":"CO","SOURCE":"SHIPMENTS","KEY":{"SHIPMENT_ID":1484}}
{"OWNER":"CO","SOURCE":"SHIPMENTS","KEY":{"SHIPMENT_ID":545}}
{"OWNER":"CO","SOURCE":"SHIPMENTS","KEY":{"SHIPMENT_ID":546}}
{"OWNER":"CO","SOURCE":"SHIPMENTS","KEY":{"SHIPMENT_ID":1806}}
{"OWNER":"CO","SOURCE":"SHIPMENTS","KEY":{"SHIPMENT_ID":1807}}
{"OWNER":"CO","SOURCE":"SHIPMENTS","KEY":{"SHIPMENT_ID":854}}
{"OWNER":"CO","SOURCE":"SHIPMENTS","KEY":{"SHIPMENT_ID":855}}
{"OWNER":"CO","SOURCE":"SHIPMENTS","KEY":{"SHIPMENT_ID":1833}}
{"OWNER":"CO","SOURCE":"SHIPMENTS","KEY":{"SHIPMENT_ID":1834}}
-- let's check
SQL> select PHYSICAL_ADDRESS from stores where store_id=10;
PHYSICAL_ADDRESS
--------------------------------------------------------------------------------
C/ Jos?? Echegaray 6B
Las Rozas -- here fuzzy(Los) finds the term 'Las'
28230 Madrid
SQL> select DELIVERY_ADDRESS from shipments where shipment_id=976;
DELIVERY_ADDRESS
--------------------------------------------------------------------------------
Los Angeles, CA 90026 USA
Let’s try a query with JSON_TEXTCONTAINS. JSON_TEXTCONTAINS checks if a specified string exists in JSON property values or not.
SQL> SELECT metadata from SEARCH_PRODUCTS
WHERE JSON_TEXTCONTAINS(data,'$.CO.STORES.PHYSICAL_ADDRESS','fuzzy(LOS)');
METADATA
-----------------------------------------------------------------------------------------------------------------------------------------------------------
{"OWNER":"CO","SOURCE":"STORES","KEY":{"STORE_ID":10}}
Adding more data sources
Now let’s add another source and see if you can include it in a search. Let’s add the table CUSTOMERS.
execute DBMS_SEARCH.ADD_SOURCE(index_name =>'SEARCH_PRODUCTS', source_name => 'CUSTOMERS');
Now let’s query. We use the term ‘jon’ – more precisely fuzzy(jon) – to find results in the table CUSTOMERS.
SQL> SELECT METADATA output from SEARCH_PRODUCTS WHERE CONTAINS(data,'fuzzy(jon)')>0;
OUTPUT
-----------------------------------------------------------------------------------------------------------------------------------------------------------
{"OWNER":"CO","SOURCE":"CUSTOMERS","KEY":{"CUSTOMER_ID":66}}
{"OWNER":"CO","SOURCE":"CUSTOMERS","KEY":{"CUSTOMER_ID":368}}
2 rows selected.
-- Let's check
SQL> select full_name from customers where customer_id=66;
FULL_NAME
--------------------------------------------------------------------------------
John Venzl
Let’s add a row to the table CUSTOMERS und search for it again.
SQL> insert into customers values (1000,'john.johnson@oracle.com','John Johnson');
1 row created.
SQL> SELECT METADATA output from SEARCH_PRODUCTS WHERE CONTAINS(data,'fuzzy(jon)')>0;
OUTPUT
-----------------------------------------------------------------------------------------------------------------------------------------------------------
{"OWNER":"CO","SOURCE":"CUSTOMERS","KEY":{"CUSTOMER_ID":66}}
{"OWNER":"CO","SOURCE":"CUSTOMERS","KEY":{"CUSTOMER_ID":368}}
2 rows selected.
There are still only two rows. You will find the new row when you commit the change. To make this happen JSON search indexes use a method known as sync on commit by default. That means the index is synchronized automatically as part of the commit.
SQL> commit;
Commit complete.
SQL> SELECT METADATA output from SEARCH_PRODUCTS WHERE CONTAINS(data,'fuzzy(jon)')>0;
OUTPUT
-----------------------------------------------------------------------------------------------------------------------------------------------------------
{"OWNER":"CO","SOURCE":"CUSTOMERS","KEY":{"CUSTOMER_ID":66}}
{"OWNER":"CO","SOURCE":"CUSTOMERS","KEY":{"CUSTOMER_ID":368}}
{"OWNER":"CO","SOURCE":"CUSTOMERS","KEY":{"CUSTOMER_ID":1000}}
3 rows selected.
Monitoring
If you have some experience with Oracle Text, you may know that there are special data dictionary views to monitor and administer the Oracle text index.
Let’s start with USER_INDEXES to get basic information about the created index.
SQL> select INDEX_TYPE,TABLE_OWNER, TABLE_NAME, TABLE_TYPE, STATUS, ITYP_OWNER
from user_indexes where index_name='SEARCH_PRODUCTS';
INDEX_TYPE TABLE_OWNER TABLE_NAME TABLE_TYPE STATUS ITYP_OWNER
--------------------------- --------------- -------------------- ----------- -------- ----------
DOMAIN CO SEARCH_PRODUCTS TABLE VALID CTXSYS
For those users which are more interested in technical details, query e.g. CTX_INDEX_VALUES. It displays attribute values for each object used in the index. Remember this view is queryable by user CTXSYS only.
SQL> select IXV_CLASS, IXV_OBJECT, IXV_ATTRIBUTE, IXV_VALUE from ctxsys.CTX_INDEX_VALUES where IXV_INDEX_OWNER='CO'; IXV_CLASS IXV_OBJECT IXV_ATTRIBUTE IXV_VALUE ------------------------------ ------------------------- ------------------------- ---------------------------------------- DATASTORE USER_DATASTORE PROCEDURE "CTXSYS"."DRVXMD"."GET_DOCUMENT" DATASTORE USER_DATASTORE OUTPUT_TYPE JSON SECTION_GROUP PATH_SECTION_GROUP JSON_ENABLE YES SECTION_GROUP PATH_SECTION_GROUP DATAGUIDE ON ... WORDLIST BASIC_WORDLIST WILDCARD_INDEX YES WORDLIST BASIC_WORDLIST WILDCARD_INDEX_K 3 STORAGE BASIC_STORAGE R_TABLE_CLAUSE lob (data) store as (cache) STORAGE BASIC_STORAGE I_INDEX_CLAUSE compress 2 STORAGE BASIC_STORAGE STAGE_ITAB YES STORAGE BASIC_STORAGE STAGE_ITAB_MAX_ROWS 10000
Let’s explain some details in short: The index type is a JSON search index usually used for indexing and querying JSON documents. As expected it uses the USER_DATASTORE preference with output JSON. The section group tells us if querying within sections with JSON is enabled, and how the document sections are defined. Dataguide is a summary of the structural and type information contained in a set of JSON documents it records metadata about the fields used in those documents.To create persistent dataguide information as part of a JSON search index DATAGUIDE is specified.The BASIC_WORDLIST enables stemming and fuzzy matching. WILDCARD_INDEX is enabled, i.e. fast and efficient wildcard search for all wildcard expressions is possible.
There is also information about the index storage. Important to mention is that the STAGE_ITAB option is enabled. When the STAGE_ITAB index option is enabled, the new documents are stored temporarily in the $G staging table, not in the $I table. This storage ensures that the $I table is not fragmented and does not deteriorate the query performance. Starting with Oracle Database Release 21c, the contents of $G are automatically moved to $I during index synchronization when $G has more than 10K rows by default. This value is controlled by the STAGE_ITAB_MAX_ROWS attribute of the STORAGE preference. More information on that can be found in the documentation.
Further reading
Application Developer’s Guide: Using Ubiquitous Database Search
Oracle Text Reference: DBMS_SEARCH
