X

All Things Database: Education, Best Practices,
Use Cases & More

Linguistic analysis comes to Autonomous Database

Keith Laker
Senior Principal Product Manager

We are pleased to announce that Oracle Text indexes has been enabled in Oracle Autonomous Database. Which means you can now do linguistic analysis in Autonomous Database! If you are completely new to Oracle Autonomous Database (where have you been for the last 18 months?) then here is a quick recap of the key features:

What is  Oracle Autonomous Database?

Oracle Autonomous Database provides a self-driving, self-securing, self-repairing cloud service that eliminate the overhead and human errors associated with traditional database administration. Oracle Autonomous Database takes care of configuration, tuning, backup, patching, encryption, scaling, and more. Additional information can be found at https://www.oracle.com/database/autonomous-database.html.

Special Thanks...

This post has been prepared by Roger Ford who is the product manager for Oracle Text.He is extremely well known within the Oracle developer community and if you want to follow Roger's posts on using Text features and functions then use this link and his Oracle Text blog is here.

Let's Get Started With Oracle Text...

Those developers with a strong Oracle Database background will no doubt be familiar with Oracle Text indexes, but this post is aimed at those with some knowlege of SQL but new to Oracle Database, as well as those experienced Oracle users who need a refresher on this functionality.

What is Oracle Text?

Oracle Text allows you to do full-text searching on textual content in the database. Storing names and addresses? You can search them by looking for a lastname, a street name or a zip code. Storing full-length documents such as PDFs or Excel spreadsheets? You can find them by any word, phrase or code used in the document. It's a search engine built into the database, and accessible directly through SQL or PL/SQL.

Oracle Text indexes use the databases extensibility framework which allows you to create new indextypes. Oracle Text provides three of these:

  • CONTEXT - general purpose full-text index
  • CTXCAT - Specialist Catalog index
  • CTXRULE - An index on a pre-defined set of queries to be run against one document at a time.

Additionally, Oracle Text technology is used in the JSONSEARCH index, part of Oracle's suite of JSON support tools.

In this discussion, we're going to focus on the CONTEXT and JSONSEARCH indexes.

CONTEXT index

A CONTEXT index is a word-based index. That is, it allows to find any word (or combination of words) within a document. Now, I should probably define document: Although it can be a full-sized document such as a PDF or MS Word document in a BLOB column of the database, it can also be just a simple VARCHAR2 column value.

To create a CONTEXT index, we use the standard create index syntax, with the addition of phrase 'INDEXTYPE IS CTXSYS.CONTEXT' (the CTXSYS schema owns all Oracle Text objects).

So let's see an example:

CREATE TABLE demo (text VARCHAR2(200));
INSERT INTO demo VALUES ('David Copperfield: A book by Charles Dickens');
CREATE INDEX demoindex on demo(text) INDEXTYPE IS CTXSYS.CONTEXT;
Now each Oracle Text indextype has an associated search operator. In the case of a CONTEXT index, that is the CONTAINS function. CONTAINS takes the name of the column to search and a search expression, and returns a value which represents a hit or not (that is, whether the row contents matches the search expression. The value returned can be considered as zero for no match, or greater than zero for a match. So we can do a search like:
SELECT text FROM demo WHERE CONTAINS (text, 'copperfield') > 0;
Many people think a CONTEXT index is just an indexed version of the LIKE operator. This is not the case. You will note that, unlike a LIKE search, we didn't need to put wildcards around "copperfield", and nor did we need to worry about case sensitivity. As it's a full-word index, doing a search like:
SELECT text FROM demo WHERE CONTAINS (text, 'copper') > 0;
will NOT succeed. 'copper' as a word does not appear in the text. We could have used a trailing wild card to make the query work:
SELECT text FROM demo WHERE CONTAINS (text, 'copper%') > 0;
But note that the wild card is making 'copper' match 'copperfield', it's not just doing a substring search on the whole string.

Now, what else can we do with a CONTEXT search? Lots. Way too much to cover here. But here's a few examples which will hopefully be self-expanatory:

-- Phrase Search 
SELECT text FROM demo WHERE CONTAINS (text, 'charles dickens') > 0;
-- AND search
SELECT text FROM demo WHERE CONTAINS (text, 'copperfield AND dickens') > 0;
-- OR search
SELECT text FROM demo WHERE CONTAINS (text, 'copperfield OR (two cities)') > 0;
-- Proximity search : two terms within 20 words of each other
SELECT text FROM demo WHERE CONTAINS (text, 'NEAR((copperfield, dickens),20)') >0;
It should be noted that CONTEXT indexes are not synchronous. The index is only updated when they are SYNC'd. By default, an index is only SYNC'd by a call to the PL/SQL procedure: ctx_ddl.sync_index(''). However we can make that automatic using the optional PARAMETERS clause of the CREATE INDEX statement. The simplest way is to specify SYNC(ON COMMIT) which means the index is automatically updated when changes are committed:
CREATE INDEX demoindex on demo(text) INDEXTYPE IS CTXSYS.CONTEXT PARAMETERS ('SYNC(ON COMMIT)');
This is fine for low-DML indexes. However, if you have have many processes all doing inserts and updates on the indexed table, you may find that the many commits start to block each other. In this case, you might want to use SYNC(EVERY ) instead. Timeperiod is specified as a scheduler interval. For example the following will invoke SYNC every 15 seconds:
CREATE INDEX demoindex on demo(text) INDEXTYPE IS CTXSYS.CONTEXT PARAMETERS ('SYNC(EVERY SYSDATE+1/24/60/4)');
There are many other customizations of the text index which are performed through the PARAMETERS clause. Interested readers are directed to the following books:
Text Reference Manual : https://docs.oracle.com/en/database/oracle/oracle-database/18/ccref/index.html
Text Application Developer's Guide : https://docs.oracle.com/en/database/oracle/oracle-database/18/ccapp/index.html

JSON Search index

When dealing with JSON (JavaScript Object Notation) documents, you can choose to index them in different ways. If you know the layout (schema) of all your documents, it's often efficient to create a function-based index on particular elements of your JSON documents.

For example, if we have a table called j_purchaseorder which has an element PONumber, we might create an index thus:

CREATE UNIQUE INDEX po_num_idx1
    ON j_purchaseorder (json_value(po_document, '$.PONumber' 
    RETURNING NUMBER));
But one of the great things about working with JSON is that you don't need to know the schema in advance. How can we therefore ensure that everything in the document is indexed, even though we don't know in advance the layout of documents? In that case we can use the JSON search index.

JSON Search indexes only work on tables with an 'IS JSON' constraint - if you don't have one then you must add it to your table (it's a good idea anyway to avoid rogue, invalid documents getting into your system).

Although the index is a variation on the CONTEXT indextype, the syntax to create a JSON search index is somewhat simpler: CREATE SEARCH INDEX indexname ON jsontable(jsoncolumn) FOR JSON;

That will automatically add all the name/value pairs found in the document set into the index.

The index will be used automatically, where appropriate, to speed up any JSON searches, such as queries using the JSON_VALUE predicate. But it also has it's own search function, quite similar to the CONTAINS clause used for CONTEXT indexes. For example you could call:

SELECT id FROM table WHERE JSON_TEXTCONTAINS ( jsoncolumn, '$', 'dickens AND copperfield')
(note there's no '> 0' needed at the end here, unlike CONTAINS).

The second argument of JSON_TEXTCONTAINS is a path specification. Here we're searching the whole document, but we could restrict it to a specific part of the document if we chose.

Let's look at a full worked example:

-- create the table, not forgetting the IS JSON constraint
create table jsontab (id number, jsoncol varchar2(200), constraint colIsJson check (jsoncol is json));

-- insert some test data
insert into jsontab values (1, '{ booktitle: "David Copperfield", bookauthor: "Charles Dickens" }'); 

-- create our Search index
create search index jsonindex on jsontab (jsoncol) for json;

-- search for Dickens and Copperfield anywhere in the doc
select * from jsontab where json_textcontains(jsoncol, '$', 'dickens AND copperfield');

-- search only bookauthor values for Dickens
select * from jsontab where json_textcontains(jsoncol, '$.bookauthor', 'dickens');
JSON_TEXTCONTAINS allows most of the syntax used in the Context CONTAINS clause, and has similar characteristics with regard to case sensitivity, word splitting, and so on.

Another thing that a JSON search index gives us is the DATAGUIDE. This is a description of the layout of our JSON documents, which can be really useful if you don't know exactly what's been inserted into your table. For the simple example above, we can the dataguide using the get_index_dataguide procedure in the dbms_json package:

select dbms_json.get_index_dataguide (
    'jsontab', 'jsoncol', dbms_json.format_hierarchical, dbms_json.pretty) from dual;
This gives the output:
{
  "type" : "object",
  "properties" : {
    "booktitle" : {
      "type" : "string",
      "o:length" : 32,
      "o:preferred_column_name" : "JSONCOL$booktitle"
    },
    "bookauthor" : {
      "type" : "string",
      "o:length" : 16,
      "o:preferred_column_name" : "JSONCOL$bookauthor"
    }
  }
}

So we can see the layout of our documents, including information about the maximum length of any values in the table.

What's Next?

Over on the Oracle Text blog, Roger has posted a series of articles on getting started with Oracle Text:

 

Summary

As with CONTEXT indexes, we've only touched the surface of JSON search capabilities. For more information, the reader is encouraged to look at the Oracle Database JSON Developer's Guide : https://docs.oracle.com/en/database/oracle/oracle-database/18/adjsn/

 

Be the first to comment

Comments ( 0 )
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.