The rest of the story...
By searchguy on Apr 23, 2008
Paul blogged today about using the search engine to implement a persistent set of strings. He called it abusing the search engine, but it was so simple to do that it seems like more of a use than an abuse, IMHO.
One of Minion's strengths is that it offers a fairly small "public" API that is supposed to offer all of the functionality that you need for indexing and searching documents. Paul's persistent set uses an interface called
SimpleIndexer that, as the name suggests, provides a simple way to index documents.
Recall that a document in Minion is just a bunch of fields, so to index using a
SimpleIndexer you just do something like:
SearchEngine e = SearchEngineFactory.getSearchEngine(indexDir);
SimpleIndexer si = e.getSimpleIndexer();
to get the a search engine and the simple indexer. Then for each document you want to index you can say:
when you're done you need to tell the engine that you're done with the simple indexer so that any information that's accumulated in memory can be flushed to disk:
Don't worry about "indexing too much" with a simple indexer. The engine will flush data to the disk when the heap starts to fill. Also, don't forget to close the engine when you're done with all your indexing:
As it stands right now, if you forget to call
finish some of the data that you've indexed might be discarded. This is the kind of infelicity that I'm hoping to fix over the next little while. Paul was complaining about having to remember to close the engine yesterday, so we'll probably make that a little easier to deal with as well.
Generally speaking, when Paul complains about the engine I listen. His (constructive!) criticism is the reason that we have
SimpleIndexer in the first place.
There are a couple of other ways of indexing documents, but the
SimpleIndexer is a remarkably powerful way to index a whole host of things (blogs, email, databases, etc.)