Dictionaries in Minion: An Introduction

As I've mentioned previously, our design philosophy for Minion is that everything is a dictionary and postings.

To recap, in Minion, a dictionary is a map from entry names to entries. The entry name may be a term from a document, a value from a saved field, or a bigram that will be used for doing wildcard matching. The entry pointed to by the name contains information about that name, including frequency of occurrence, the dictionary that the entry was drawn from, and (most importantly) a pointer to the postings associated with that entry.

There are actually two kinds of dictionaries in Minion. The first kind is a MemoryDictionary, which is used to hold a dictionary at indexing time. The second kind is a DiskDictionary, which is used to hold a dictionary at query time. Both of these dictionary types are implementations of the Dictionary interface that describes the capabilities that all dictionaries share.

Over the next few posts I'll be describing the entries that you'll find in dictionaries, and how the MemoryDictionary and DiskDictionary classes are used in Minion.

Posts in this series:

Comments:

Post a Comment:
Comments are closed for this entry.
About

This is Stephen Green's blog. It's about the theory and practice of text search engines, with occasional forays into recommendation and other technologies that can use a good text search engine. Steve is the PI of the Information Retrieval and Machine Learning project in Oracle Labs.

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today