Highlighting search results in Minion

I've just posted a new piece of Minion documentation about how search results highlighting works.

It's kind of complicated, but then again getting the highlighting that you want is kind of complicated. The short version is: if you have a set of query terms and a document that you want to highlight that contains (some of) those terms, then:


  1. Tell the passage retrieval API what fields you want to highlight and how to treat the passages in that field.

  2. Use the passage retrieval algorithm to find a set of passages.

  3. Pull out the highlighted passages and display theme.

Using the passage retrieval algorithm to find a set of passages has some handy side effects like it easily handles things like finding morphological variations of the query terms.

A major improvement for this version over previous versions, is that the process of figuring out how to build a passage of a particular size (e.g., you want to display a 500 character passage from the body of an email message) is a lot more robust.

Comments:

I assume it only works for fields which text is stored, right? (it doesn't "reverse engineer" the text based on the terms).
If someone wanted to save disk space and lower the size of the index by only indexing data and not storing the original text (assuming it is stored in a database on a separate system), it wouldn't be able to give back a highlighted passage, right? in such case, it would be up to the system to then create the snippet on its own?

And to add... I think the way you addressed highlighting of context vs passage is very handy.

Posted by Ron Kass on June 13, 2008 at 11:53 PM EDT #

Post a Comment:
Comments are closed for this entry.
About

This is Stephen Green's blog. It's about the theory and practice of text search engines, with occasional forays into recommendation and other technologies that can use a good text search engine. Steve is the PI of the Information Retrieval and Machine Learning project in Oracle Labs.

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today