Friday Feb 27, 2009

Results Filtering in Minion

One of the capabilities that we've recently added to Minion is results filtering.

Let's say that you run a search to find all of the documents that contain the word dog. When you get back the ResultSetfrom the search, you want to show the user the top 10 results. But let's also say that you only want to show the top 10 results where the breed field is german shepherd. Now, you could have added a clause to the query that would have added this restriction to the search, but in an interactive system, it would be nice if you could change the breed restriction without having to re-run the query every time.

This is where results filtering comes in. Rather than re-run the search, you can build a results filter that you pass to the ResultSet.getResults method.

The ResultsFilter interface describes the methods that a results filter needs to implement. The main method is ResultsFilter.filter, which takes an accessor for the result currently under consideration. If the implementation of this method returns true then the result currently under consideration can be returned in the list of results. If this method returns false then the result won't be returned.

The ResultAccessor interface gives you access to the saved field values for the result currently under consideration. In addition, you can fetch the score for the result or the key for the result.

All of this raises the question: when does a result come under consideration? A simple implementation of results filtering would run the filter against all of the documents that satisfied a query. This approach would require fetching field values for a lot of results that would never make it into the top n results.

The approach that Minion takes is the following: while we're building the heap of the top n results so far, if we decide that the search result that we're looking at should replace the top element of the heap, then we run the results filter on that result. If the results filter says that we should add the result, then we replace the top of the heap.

We've been using results filters a lot in building recommenders in the AURA Project and they've turned out to be pretty ferociously useful.

One final word: Minion doesn't actually do anything with the ResultsFilter.getPassed or ResultsFilter.getTested methods: they're there for your benefit, so you can feel free to have them just return 0 if you don't want to keep track of that information. Those methods are there because our first runs with filters turned on were so fast that I thought the filters weren't getting called!

About

This is Stephen Green's blog. It's about the theory and practice of text search engines, with occasional forays into recommendation and other technologies that can use a good text search engine. Steve is the PI of the Information Retrieval and Machine Learning project in Oracle Labs.

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today