Adjusting the Score on Oracle Text search results

When you sort the results of a search by Score using OracleTextSearch as the search engine in WebCenter Content, the results coming back are based on the relevancy on the document.  In theory, the more relevant the search term is to the document, the higher ranked Score it should receive.  But in practice, the relevancy score can seem somewhat of a mystery.  It's not entirely clear how it ranks the importance of some documents over others based on the search term.  And often times, once a word appears a certain number of times within a document, the Score simply maxes out at 100 and the top results can be difficult to discern from one another.  Take for example the search for 'vacation' on this set of documents:

Score by relevance

Out of 7 results, 6 of them have a Score of '100' which means they are basically ranked the same.  This doesn't make the sort by Score very meaningful.  

Besides sorting by relevance, you can also tell Oracle Text to sort by occurrence.  In that case, it is a much more predictable result in how they would be ranked. And for many cases provide a more meaningful sorting of results then relevance. To change this takes a small component change to the SearchOperatorMap resource.  By default, the query used for full-text searching looks like:

<td>(ORACLETEXTSEARCH)fullText</td>
<td>DEFINESCORE((%V), RELEVANCE * .1)</td>
<td>text</td>

Overriding this resource and changing it to:

<td>(ORACLETEXTSEARCH)fullText</td>
<td>DEFINESCORE((%V), OCCURRENCE * .01)</td> 
<td>text</td>

will force it to now use occurrence (note the change in scale to .01 as well).  So running the same search and sort options as the example above, the results come out quite a bit differently:

Sort by occurrence

In this case, there is a clear understanding of how the items rank.   And generally, if the search term appears 3 times more in one document then another, it's got a better chance of being a document I'm interested in. 

You may or may not feel the relevance ranking is better then the search term occurrence, but this provides the opportunity to try an alternate method that might work better for your results.  A pre-built component is available for download here.

There is one caveat in using this method.  The occurrence ranking also maxes out at 100, so if a search term is in the document more then that, the Score result will stay at 100.

Comments:

I wasn't aware of how content was scored. This is very interesting. Thanks for the information.

Posted by guest on July 10, 2012 at 06:47 PM CDT #

Hello
Kyle

Great article, this component works on UCM 10g? if not how can I change?

Posted by guest on July 25, 2012 at 07:41 AM CDT #

Hi Kyle,

I was facing difficulties with Oracle text search and meta search scoring and have a SR opened with Oracle for the same. I came across your blog and implemented the component, but unfortunately that doesn't seem to take effect. I've restarted the servers, made sure my component is enabled, but still, it doesn't seem to work.

I tried all the options but the scoring of the content doesn't change and if I deliberately make any error in the component, it's not even throwing any error. I wanted to use the IGNORE option so that the content's score is ignored and I can apply weights to the contents.

Is there anything else I need to take care of before implementing this component? I created that through component manager and simply enabled it and restarted UCM.

Posted by guest on August 28, 2012 at 05:31 AM CDT #

No, there is nothing additional needed beyond installing,enabling the component, and restarting.

To verify it's working, you can go to Administration -> System Audit Information and add 'searchquery' in the Active Sections to trace. Then go to the 'View Server Output', clear the output, and then perform your search. Go back to the output and you should now see the query that was actually executed. There you should see whether it is doing relevance or occurrence and if the component is taking effect.

Thanks,
-Kyle

Posted by Kyle Hatlestad on August 28, 2012 at 09:30 AM CDT #

Hi Kyle,

Thanks for the reply. I checked into the audit information but that doesn't tell me if the SCORE is calculated based on RELEVANCE or OCCURRENCE.

Actually I must ask this - does this work with meta search or only works for text search? the query which I'm using is -

((((SDATA(sdxSubDocType LIKE 'SECTOR') and ( ((foo*10|bar*8|cat*6|dog*4) WITHIN xINMClassification) ))))) [1,200] sort(SCORE Desc)

where xINMClassification is a meta data which is used to tag a content. My purpose was to provide weight to the contents according to my preference, but the final SCORE which gets calculated is a bit arbitrary and doesn't comply with the weights I provide. Hence I thought of using the component with IGNORE option to IGNORE the Oracle SCORE so that only my weights can decide the order of the contents. But this doesn't seem to work.

Any advice?

Posted by dalia on August 28, 2012 at 11:50 AM CDT #

As per the component OperatorName : (ORACLETEXTSEARCH)fullText. So the score definition works when you do a full-text only. For your requirement, you need to change the operator.
Also as per your example you haven't mentioned anything on whether the score should be calculated on relevance(default) or occurrence.

Posted by guest on September 07, 2012 at 01:11 PM CDT #

Post a Comment:
  • HTML Syntax: NOT allowed
About

Kyle Hatlestad is a Solution Architect in the WebCenter Architecture group (A-Team) who works with WebCenter Content and other products in the WebCenter & Fusion Middleware portfolios. The WebCenter A-Team blog can be found at: https://blogs.oracle.com/ ateam_webcenter/

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today