Friday Mar 15, 2013

A staging table for Oracle Text index updates

Oracle Text CONTEXT indexes differ from standard BTREE indexes in that they are not synchronous.  That is, the index is not updated at the same time that the data changes. It is necessary to invoke an index sync in some manner before any updates (or inserts) to the table are reflected in the index.

The simplest way to do this is to specify SYNC (ON COMMIT) in the parameters clause of the index creation. That means that as soon as changes are committed, the index is sync'd.

 (This still isn't quite synchronous. The indexing doesn't start until the commit, so although changes to the base table will be visible to the user who made them before doing a commit, that user will NOT see changes reflected in the index. Additionally, there will be a short period between the end of the commit and the end of the sync when other userswill be able to see the table changes but not the index changes).

 SYNC(ON COMMIT) is popular, but it's not necessarily the best option.  This is because frequent syncs cause fragmentation of the index. New data is written in small chunks to the end of the $I postings list table, and the more and smaller these are, the worse query performance will be until the index is optimized.

So there is usually a trade-off between update frequency, and the fragmentation of an index. Users would like to have their updates available for searching immediately by means of SYNC(ON COMMIT), but DBAs would prefer to have SYNCs spaced widely apart to avoid fragmentation of the index.

One way around this is to create a staging table to hold recent updates to the main table.  The staging table will be a copy of the main table's layout, with the addition of a timestamp column.

Indexes are created on both tables (main and stage).  These indexes are identical, except the main index has manual sync, and the staging table has SYNC(ON COMMIT).

We then create triggers which copy inserts and updates from the main table to the stage table, where they are indexed near-instantaneously.  Because the stage table is kept small, we needn't worry about fragmentation.  It's probably advisable to arrange things such that the stage table is "pinned" in the SGA buffer cache, but if it's used regularly, it will probably stay in memory anyway.

Queries which would previously have run against the main table are modified so that they do a UNION between the main table and the staging table. That way, we get all the unmodified rows from the main table, and the modified rows from the stage table.  If you're thinking "won't we get the modified rows in their old form from the base table?" then remember that DELETEs are actioned immediately in a CONTEXT index, and that updates are implemented internally as DELETE followed by INSERT).

Then we just need to put a wrapper around the SYNC for the main table which performs the actual SYNC then deletes all the rows from the staging table which were inserted before the start of the SYNC (hence the need for a timestamp column).  If we choose, we can also optimize the stage table index at this point to.

I've written some sample code to demonstrate this on a simple table. It is intended to be run in SQL*Plus but will probably work in SQL Developer with little or no modification.  Be aware that it drops and recreates a user called TESTUSER.  You will need to edit the CONNECT line unless your SYSTEM password is "manager".

Download link (right click and save): staging_index.sql or view in browser: staging_index.sql.txt



About

bocadmin_ww

Search

Categories
Archives
« March 2013 »
MonTueWedThuFriSatSun
    
1
2
3
4
5
6
7
8
9
10
11
12
13
14
16
17
18
19
20
21
23
24
25
26
27
28
29
30
31
       
Today