Indexing data from multiple tables with Oracle Text
By Roger Ford on Apr 10, 2012
select * from mytable where contains (title, 'dog') > 0 or contains (body, 'cat') > 0will tend to perform much worse than
select * from mytable where contains (text, 'dog WITHIN title OR cat WITHIN body') > 0For this reason, Oracle Text provides the MULTI_COLUMN_DATASTORE which will combine data from multiple columns into a single index. Effectively, it constructs a "virtual document" at indexing time, which might look something like:
<title>the big dog</title>
<body>the ginger cat smiles</body>
This virtual document can be indexed using either AUTO_SECTION_GROUP, or by explicitly defining sections for title and body, allowing the query as expressed above. Note that we've used a column called "text" - this might have been a dummy column added to the table simply to allow us to create an index on it - or we could created the index on either of the "real" columns - title or body.
It should be noted that MULTI_COLUMN_DATASTORE doesn't automatically handle updates to columns used by it - if you create the index on the column text, but specify that columns title and body are to be indexed, you will need to arrange triggers such that the text column is updated whenever title or body are altered.
That works fine for single tables. But what if we actually want to combine data from multiple tables?
In that case there are two approaches which work well:
- Create a real table which contains a summary of the information, and create the index on that using the MULTI_COLUMN_DATASTORE. This is simple, and effective, but it does use a lot of disk space as the information to be indexed has to be duplicated.
- Create our own "virtual" documents using the USER_DATASTORE. The user datastore allows us to specify a PL/SQL procedure which will be used to fetch the data to be indexed, returned in a CLOB, or occasionally in a BLOB or VARCHAR2. This PL/SQL procedure is called once for each row in the table to be indexed, and is passed the ROWID value of the current row being indexed. The actual contents of the procedure is entirely up to the owner, but it is normal to fetch data from one or more columns from database tables.
I've written full examples of both these techniques, as SQL scripts to be run in the SQL*Plus tool. You will need to run them as a user who has CTXAPP role and CREATE DIRECTORY privilege.
Part of the data to be indexed is a Microsoft Word file called "1.doc". You should create this file in Word, preferably containing the single line of text: "test document". This file can be saved anywhere, but the SQL scripts need to be changed so that the "create or replace directory" command refers to the right location. In the example, I've used C:\doc.