Speedy and Scalable Analytics
By David Dorf on Mar 18, 2009
Mike Olson, formerly from Sleepy Cat and Oracle, has a new startup called Cloudera. Much as Sleepy Cat supported the open-source Berkeley Database, Cloudera will be supporting the open-source distributed file system Hadoop. First, what is Hadoop, then how is it relevant to retail?
How in the world does Google provide such accurate search results and provide relevant advertising all within a matter of seconds? The advertisements displayed by Google, Yahoo!, and Facebook are all determined using analytics across a distributed file system that scales very well. Google developed software called MapReduce that allows large volumes of data to be distributed as smaller chunks on low-cost servers. By breaking a problem down into smaller parts, Google was able to scale their search and analytics. Hadoop is an open-source version of MapReduce written in Java and sponsored by Apache.
Cloudera hopes to help industries like retail put Hadoop to good use. Hadoop can quickly analyze customer data, for example, in order to determine alternate offers and promotions. This very scalable approach on low-cost hardware could significantly benefit e-commerce sites, then eventually even be used in stores. One of a retailer's most important assets is its data, and making use of that data has always been a challenge. Perhaps Hadoop is one possible solution.
More details in this NYTimes article.