Clsutering in Minion

Mostly just a note to myself here, but this post on speeding up K-means clustering should come in handy when I get back to Minion's clustering code.

I think we might already be covered, because we're usually clustering document vectors, and those are represented sparsely by nature. The memory locality is something we'd need to keep in mind, though.

They're getting a minute per epoch on a "few hundred thousand" text messages. I'll have to see what Minion's clustering performance would be like on that size of data. In our internal use of the clustering we have good interactive-time (i.e., you can run clustering as part of an async HTTP request in a Web app) clustering performance up to about a thousand (short) docs using K-means.

Comments:

Post a Comment:
Comments are closed for this entry.
About

This is Stephen Green's blog. It's about the theory and practice of text search engines, with occasional forays into recommendation and other technologies that can use a good text search engine. Steve is the PI of the Information Retrieval and Machine Learning project in Oracle Labs.

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today