The Oldest Big Data Problem: Parsing Human Language
By dan.mcclary on Oct 15, 2012
There's a new whitepaper up on Oracle Technology Network which details the use of Digital Reasoning Systems' Synthesys software on Oracle Big Data Appliance. Digital Reasoning's approach is inherently "big data friendly," as it leverages multiple components of the Hadoop ecosystem. Moreover, the paper addresses the oldest big data problem of them all: extracting knowledge from human text.
You can find the paper here.
From the Executive Summary:
There is a wealth of information to be extracted from natural language, but that extraction is challenging. The volume of human language we generate constitutes a natural Big Data problem, while its complexity and nuance requires a particular expertise to model and mine. In this paper we illustrate the impressive combination of Oracle Big Data Appliance and Digital Reasoning Synthesys software. The combination of Synthesys and Big Data Appliance makes it possible to analyze tens of millions of documents in a matter of hours. Moreover, this powerful combination achieves four times greater throughput than conducting the equivalent analysis on a much larger cloud-deployed Hadoop cluster.