Minion and Lucene

A commenter asked if I had any advice about when someone should choose Minion instead of Lucene. When we started the open source process for Minion we took some time to figure out exactly what distinguished Minion from Lucene. We're not by any means Lucene experts, so if we're incorrect in any of our assessments, please let me know. None of what I'm going to say is meant to disparage Lucene: it's a good engine with a great community of developers and users. In an alternate world where Sun opened up a bit earlier, I would have been working on Lucene from the get-go, rather than starting from scratch.

In a fundamental way, the engines are very similar: documents are modeled as a number of field/value pairs, they use inverted files and compressed postings to store their indices, they provide similar query capabilities and they are both extendable. Where the engines differ is in the default capabilities that they provide.

Over the next few days, I'll provide a series of posts explaining what I think the distinguishing characteristics of Minion are, starting with how they treat fields.

Comments:

Post a Comment:
Comments are closed for this entry.
About

This is Stephen Green's blog. It's about the theory and practice of text search engines, with occasional forays into Machine Learning and statistical NLP. Steve is the PI of the Information Retrieval and Machine Learning project in Oracle Labs.

Search

Archives
« August 2015
SunMonTueWedThuFriSat
      
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
     
Today