My new favorite kind of music

I was working on computing tag-tag similarities for last.fm tags for Frank for a 50K artist crawl that we did last month using an Aura instance running on EC2.

I wrote a quick program to pull the document vectors for the tags (the tags here are "documents" and the artists to whom the tags have been applied are the "words" in those documents.)

Given the vectors, it was easy to compute the complete similarity table for the tags and output the similarities for each tag in decreasing similarity order.

For 1500 tags pulled from an index of 1.8 million documents, this takes about 55 seconds to run.

I wanted to make sure that it was doing something reasonable, so I had it dump the top 10 similar tags for each tag as it was running, and I found this:


19/1510 artist-tag:8bit computing 1510 similarities
Most similar: ["<artist-tag:8bit, 1.000>", "<artist-tag:chiptune, 0.795>", "<artist-tag:chiptunes, 0.726>", "<artist-tag:bitpop, 0.584>", "<artist-tag:chipmusic, 0.341>", "<artist-tag:blipblop, 0.258>", "<artist-tag:nintendocore, 0.194>", "<artist-tag:nintendo, 0.189>", "<artist-tag:c64, 0.174>", "<artist-tag:vgm, 0.166>"]

Can you tell what caught my eye? Nintendocore? Awesome!

Comments:

Post a Comment:
Comments are closed for this entry.
About

This is Stephen Green's blog. It's about the theory and practice of text search engines, with occasional forays into recommendation and other technologies that can use a good text search engine. Steve is the PI of the Information Retrieval and Machine Learning project in Oracle Labs.

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today