Running Mahout on Elastic MapReduce

Here in the Labs we have a Big Data reading group. The idea is that we get together once a week and discuss a paper of interest. We've covered a lot of the famous ones, like the initial papers for GFS and MapReduce. A couple of weeks ago, I volunteered to tackle the paper from Stanford that lays out methods for running a number of standard machine learning techniques in a MapReduce framework.

The Apache Mahout project was started to build the algorithms described in the paper on the Hadoop MapReduce framework (the original paper describes running the algorithms on multicore processors.) They've also brought in the Taste Collaborative Filtering framework, which is interesting to us as recommendation folks. As it turns out, they had just released Mahout 0.1. around the time we were going to read the paper.

Coincidentally, Amazon had just announced their Elastic MapReduce (EMR) service that lets you run a MapReduce job on EC2 instances, so I decided to see what it would take to get Mahout running on EMR.

I didn't manage to get it running in time for the reading group, but one Mahout issue and a few "Oh, that's the way it works"es later, I had it running.

Apparently I'm the first person to have run Mahout on Elastic MapReduce, which just shows, as my father used to say, that brute force has an elegance all its own.

If you're interested the details are on the Mahout wiki.

Comments:

is it not possible to try on your own project caroline ?

Posted by anon on May 06, 2009 at 07:31 AM EDT #

Hi, anon.

We haven't really spent any time trying to get Hadoop running on Caroline (the no-fork provisions would mean that we would have some problems with some of the execs that HDFS wants to do), so we had to run it on EC2.

Posted by Stephen Green on May 06, 2009 at 08:44 AM EDT #

Post a Comment:
Comments are closed for this entry.
About

This is Stephen Green's blog. It's about the theory and practice of text search engines, with occasional forays into recommendation and other technologies that can use a good text search engine. Steve is the PI of the Information Retrieval and Machine Learning project in Oracle Labs.

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today