Tuesday Jul 10, 2012

Adjusting the Score on Oracle Text search results

When you sort the results of a search by Score using OracleTextSearch as the search engine in WebCenter Content, the results coming back are based on the relevancy on the document.  In theory, the more relevant the search term is to the document, the higher ranked Score it should receive.  But in practice, the relevancy score can seem somewhat of a mystery.  It's not entirely clear how it ranks the importance of some documents over others based on the search term.  And often times, once a word appears a certain number of times within a document, the Score simply maxes out at 100 and the top results can be difficult to discern from one another.  Take for example the search for 'vacation' on this set of documents:

Score by relevance

Out of 7 results, 6 of them have a Score of '100' which means they are basically ranked the same.  This doesn't make the sort by Score very meaningful.  

Besides sorting by relevance, you can also tell Oracle Text to sort by occurrence.  In that case, it is a much more predictable result in how they would be ranked. And for many cases provide a more meaningful sorting of results then relevance. To change this takes a small component change to the SearchOperatorMap resource.  By default, the query used for full-text searching looks like:

<td>(ORACLETEXTSEARCH)fullText</td>
<td>DEFINESCORE((%V), RELEVANCE * .1)</td>
<td>text</td>

Overriding this resource and changing it to:

<td>(ORACLETEXTSEARCH)fullText</td>
<td>DEFINESCORE((%V), OCCURRENCE * .01)</td> 
<td>text</td>

will force it to now use occurrence (note the change in scale to .01 as well).  So running the same search and sort options as the example above, the results come out quite a bit differently:

Sort by occurrence

In this case, there is a clear understanding of how the items rank.   And generally, if the search term appears 3 times more in one document then another, it's got a better chance of being a document I'm interested in. 

You may or may not feel the relevance ranking is better then the search term occurrence, but this provides the opportunity to try an alternate method that might work better for your results.  A pre-built component is available for download here.

There is one caveat in using this method.  The occurrence ranking also maxes out at 100, so if a search term is in the document more then that, the Score result will stay at 100.

About

Kyle Hatlestad is a Solution Architect in the WebCenter Architecture group (A-Team) who works with WebCenter Content and other products in the WebCenter & Fusion Middleware portfolios. The WebCenter A-Team blog can be found at: https://blogs.oracle.com/ ateam_webcenter/

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today