Monday Aug 04, 2014

Getting started Part 3 - Index maintenance

This post follows on from Part 1 and Part 2, and uses the same example "quickstart" table.

One thing that surprised new users is that Oracle Text CONTEXT indexes are not synchronous. That is, updates to the table are not immediately reflected in the index. Instead, the index must be synchronized. We can do this manually with a call to CTX_DLL.SYNC_INDEX, or we can arrange to have it done automatically for us.

Let's show the manual method first:. We'll insert a new run then search for it. We won't find it because the index is not up-to-date. Then we'll call SYNC_INDEX, giving it the name of the index, and search again:

SQL> insert into quickstart values (4, 'GB', 'Michael Smith');
1 row created.

SQL> select score(99), id, full_name from quickstart where contains ( full_name, 'michael', 99) > 0 order by score(99) desc;
no rows selected

SQL> execute ctx_ddl.sync_index('full_name_index')
PL/SQL procedure successfully completed.

SQL> select score(99), id, full_name from quickstart where contains ( full_name, 'michael', 99) > 0 order by score(99) desc;

 SCORE(99)         ID FULL_NAME
---------- ---------- ----------------------------------------
         5          4 Michael Smith

If we don't want to bother with manual SYNC calls,  we can use the parameter string "SYNC (ON COMMIT)" when we create the index. That means that immediately after updates are committed to the table, a SYNC operation will be run to get the index into step with the table.

SQL> drop index full_name_index;
Index dropped.

SQL> create index full_name_index on quickstart( full_name ) indextype is ctxsys.context parameters ('sync (on commit)');
Index created.

SQL> insert into quickstart values (5, 'US', 'Scott Peters');
1 row created

We can search for that, but we won't find it since we haven't yet committed - so the SYNC has not run: 

SQL> select score(99), id, full_name from quickstart where contains ( full_name, 'scott', 99) > 0 order by score(99) desc;
no rows selected

Now we can commit...

SQL> commit;
Commit complete.

And this time it will find it:

SQL> select score(99), id, full_name from quickstart where contains ( full_name, 'scott', 99) > 0 order by score(99) desc;

 SCORE(99)         ID FULL_NAME
---------- ---------- ----------------------------------------
         5          5 Scott Peters

SYNC(ON COMMIT) is fine if you don't make too many updates to your table, but is not ideal if you have a very large table and lots of changes take place on it.

Because of the way CONTEXT indexes are built, frequent changes will cause fragmentation of the index, decreasing performance over time. So a common way is to arrange for indexes to be synchronized at certain fixed intervals.  We can do this manually, using the database scheduler, or we can have it done automatically with a "SYNC (EVERY ... )" string in the parameters clause.  The time clause in that uses scheduler syntax, and can be a little convoluted. Every 1 minute can be represented in days as 1/24/60", so we can try the following:

SQL> drop index full_name_index;
Index dropped.

SQL> create index full_name_index on quickstart( full_name ) indextype is ctxsys.context parameters ('sync ( every SYSDATE+1/24/60 )');
Index created.

SQL> insert into quickstart values (6, 'US', 'Paul Peters');
1 row created.

SQL> commit;
Commit complete.

SQL> select score(99), id, full_name from quickstart where contains ( full_name, 'paul', 99) > 0 order by score(99) desc;

no rows selected

So we've committed, but the regular SYNC hasn't kicked off yet. We can wait a minute and try again, and this time we'll find it:

SQL> select score(99), id, full_name from quickstart where contains ( full_name, 'paul', 99) > 0 order by score(99) desc;

 SCORE(99)         ID FULL_NAME
---------- ---------- ----------------------------------------
         5          6 Paul Peters

Back in Part 1 we said that it was necessary to grant the "CREATE JOB" privilege to our user - this is why.  If the user doesn't have CREATE JOB privilege, then we will get an "insufficient privileges" error if we try to create the index with "SYNC (EVERY ...)".

Finally, we should talk about index optimization. CONTEXT indexes, over time, get less efficient as they get updated.  This inefficiency takes two forms:

  1. The index gets filled with garbage - it contains references to deleted or updated documents which are no longer current
  2. The "postings lists" get fragmented. The pointers to which documents contain each word get broken up into small chunks instead of the idea long strings.

We can fix this by running index optimization.  As with the manual call to sync indexes, this is done with a call to the PL/SQL package CTX_DDL:

SQL> execute ctx_ddl.optimize_index('full_name_index', 'FULL')
PL/SQL procedure successfully completed. 

The "FULL" parameter there is the mode of optimization. You can choose from the following:

  • FAST: Only posting fragmentation is fixed - no garbage collection occurs
  • FULL: Both fragmentation and garbage collection is dealt with
  • REBUILD: The entire index is copied to a fresh table (requires more disk space but produces the best results).

The more frequently you run SYNC, the more fragmented your index will become, and the more often you will need to optimize.

A typical "maintenance regime" at many customers is to run SYNC every five minutes, then run OPTIMIZE in FULL mode nightly, and OPTIMIZE in REBUILD mode once a week. 

The next topic is for advanced users: Getting Started Part 4 - Advanced Index Customization

About

bocadmin_ww

Search

Categories
Archives
« August 2015
MonTueWedThuFriSatSun
     
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
      
Today