Clsutering in Minion
By searchguy on Mar 18, 2009
Mostly just a note to myself here, but this post on speeding up K-means clustering should come in handy when I get back to Minion's clustering code.
I think we might already be covered, because we're usually clustering document vectors, and those are represented sparsely by nature. The memory locality is something we'd need to keep in mind, though.
They're getting a minute per epoch on a "few hundred thousand" text messages. I'll have to see what Minion's clustering performance would be like on that size of data. In our internal use of the clustering we have good interactive-time (i.e., you can run clustering as part of an async HTTP request in a Web app) clustering performance up to about a thousand (short) docs using K-means.