By searchguy on Feb 27, 2009
One of the capabilities that we've recently added to Minion is results filtering.
Let's say that you run a search to find all of the documents that contain the word dog. When you get back the
ResultSetfrom the search, you want to show the user the top 10 results. But let's also say that you only want to show the top 10 results where the
breed field is
german shepherd. Now, you could have added a clause to the query that would have added this restriction to the search, but in an interactive system, it would be nice if you could change the breed restriction without having to re-run the query every time.
This is where results filtering comes in. Rather than re-run the search, you can build a results filter that you pass to the
ResultsFilter interface describes the methods that a results filter needs to implement. The main method is
ResultsFilter.filter, which takes an accessor for the result currently under consideration. If the implementation of this method returns
true then the result currently under consideration can be returned in the list of results. If this method returns
false then the result won't be returned.
ResultAccessor interface gives you access to the saved field values for the result currently under consideration. In addition, you can fetch the score for the result or the key for the result.
All of this raises the question: when does a result come under consideration? A simple implementation of results filtering would run the filter against all of the documents that satisfied a query. This approach would require fetching field values for a lot of results that would never make it into the top n results.
The approach that Minion takes is the following: while we're building the heap of the top n results so far, if we decide that the search result that we're looking at should replace the top element of the heap, then we run the results filter on that result. If the results filter says that we should add the result, then we replace the top of the heap.
We've been using results filters a lot in building recommenders in the AURA Project and they've turned out to be pretty ferociously useful.
One final word: Minion doesn't actually do anything with the
ResultsFilter.getTested methods: they're there for your benefit, so you can feel free to have them just return 0 if you don't want to keep track of that information. Those methods are there because our first runs with filters turned on were so fast that I thought the filters weren't getting called!