Oracle Text query parser

Oracle Text provides a rich query syntax which enables powerful text searches.

However, this syntax isn't intended for use by inexperienced end-users.  If you provide a simple search box in your application, you probably want users to be able to type "Google-like" searches into the box, and have your application convert that into something that Oracle Text understands.

For example if your user types "windows nt networking" then you probably want to convert this into something like
"windows ACCUM nt ACCUM networking".  But beware - "NT" is a reserved word, and needs to be escaped.  So let's escape all words:
"{windows} ACCUM {nt} ACCUM {networking}".  That's fine - until you start introducing wild cards. Then you must escape only non-wildcarded searches:
"win% ACCUM {nt} ACCUM {networking}".  There are quite a few other "gotchas" that you might encounter along the way.

Then there's the issue of scoring.  Given a query for "oracle text query syntax", it would be nice if we could score a full phrase match higher than a hit where all four words are present but not in a phrase.  And then perhaps lower than that would be a document where three of the four terms are present.  Progressive relaxation helps you with this, but you need to code the "progression" yourself in most cases.

To help with this, I've developed a query parser which will take queries in Google-like syntax, and convert them into Oracle Text queries. It's designed to be as flexible as possible, and will generate either simple queries or progressive relaxation queries. The input string will typically just be a string of words, such as "oracle text query syntax" but the grammar does allow for more complex expressions:

  word : score will be improved if word exists
  +word : word must exist
  -word : word CANNOT exist
  "phrase words" : words treated as phrase (may be preceded by + or -)
  field:(expression) : find expression (which allows +,- and phrase as above) within "field".

So for example if I searched for
  +"oracle text" query +syntax -ctxcat
Then the results would have to contain the phrase "oracle text" and the word syntax. Any documents mentioning ctxcat would be excluded from the results.

All the instructions are in the top of the file (see "Downloads" at the bottom of this blog entry).  Please download the file, read the instructions, then try it out by running "parser.pls" in either SQL*Plus or SQL Developer.

I am also uploading a test file "test.sql". You can run this and/or modify it to run your own tests or run against your own text index. test.sql is designed to be run from SQL*Plus and may not produce useful output in SQL Developer (or it may, I haven't tried it).

I'm putting the code up here for testing and comments. I don't consider it "production ready" at this point, but would welcome feedback.  I'm particularly interested in comments such as
  • "The instructions are unclear - I couldn't figure out how to do XXX"
  • "It didn't work in my environment" (please provide as many details as possible)
  • "We can't use it in our application" (why not?)
  • "It needs to support XXX feature"
  • "It produced an invalid query output when I fed in XXXX"


The "download links" are for saving to a file - on most browsers you can right click and "Save Target As" or "Save Link As". 

If you want to view the code in a browser just click on "open in browser" (the file will have an extra .txt suffix if you then save it from the browser page).

parser.pls - main program file download link / open in browser

test.sql - for testing procedure download link / open in browser


Have a look at my other entry:

In order to prioritize columns you can use multipliers in the query.
For example:
select * from tab where contains (text, ' ( oracle WITHIN title )*2 OR ( oracle WITHIN body )*0.5' ) > 0

In that query, hits in the title section will score 4x as much as hits in the body section.

Posted by guest on January 09, 2014 at 11:58 AM GMT #


Is there any updated version of this code which is more production-ready?
Because I'm working exactly on this kind of single input box interface to Oracle full text search.


Posted by Dmitry on February 24, 2016 at 02:58 AM GMT #

The current links have code which fixes a couple of issues people have found during testing. The code IS in use in production systems, so to that extent I would say it is now production ready. However, the code still has the status of "sample code" - if you use it, you must own it and be prepared to maintain it, if necessary. I'm very happy to personally help with any issues you might find (in terms of bugs, probably not upgrades to add further functionality), but Oracle Support will not be able to help with it.

Posted by guest on February 24, 2016 at 10:21 AM GMT #

Hi, very nice project. However, it has significant bug.

reParse procedure uses regular expressions to split string into single words and when there are two or more following phrases that should be treated as words, they will be matched as single phrase. So, having search string like

"this is phrase1" "this is phrase2" "this is phrase3" word

the coded regular expression will match only 2 words:
1) "this is phrase1" "this is phrase2" "this is phrase3"
2) word

What I expected was outcome with 4 words:

1) "this is phrase1"
2) "this is phrase2"
3) "this is phrase3"
4) word

I think the code that generates following regular expression: ([+-]?[[:alnum:]]+[*]?)|([+-]?".+") should be changed to generate ([+-]?[[:alnum:]]+[*]?)|([+-]?"[^"]+")

Posted by guest on February 26, 2016 at 05:22 PM GMT #

Post a Comment:
  • HTML Syntax: NOT allowed



« August 2016