MySQL and Hadoop Integration - Unlocking New Insight
By Mat Keep on Nov 29, 2012
“Big Data” offers the potential for organizations to revolutionize their operations. With the volume of business data doubling every 1.2 years, analysts and business users are discovering very real benefits when integrating and analyzing data from multiple sources, enabling deeper insight into their customers, partners, and business processes.
As the world’s most popular open source database, and the most deployed database in the web and cloud, MySQL is a key component of many big data platforms, with Hadoop vendors estimating 80% of deployments are integrated with MySQL.
The new Guide to MySQL and Hadoop presents the tools enabling integration between the two data platforms, supporting the data lifecycle from acquisition and organisation to analysis and visualisation / decision, as shown in the figure below
The Guide details each of these stages and the technologies supporting them:
Acquire: Through new NoSQL APIs, MySQL is able to ingest high volume, high velocity data, without sacrificing ACID guarantees, thereby ensuring data quality. Real-time analytics can also be run against newly acquired data, enabling immediate business insight, before data is loaded into Hadoop. In addition, sensitive data can be pre-processed, for example healthcare or financial services records can be anonymized, before transfer to Hadoop.
Organize: Data is transferred from MySQL tables to Hadoop using Apache Sqoop. With the MySQL Binlog (Binary Log) API, users can also invoke real-time change data capture processes to stream updates to HDFS.
Analyze: Multi-structured data ingested from multiple sources is consolidated and processed within the Hadoop platform.
Decide: The results of the analysis are loaded back to MySQL via Apache Sqoop where they inform real-time operational processes or provide source data for BI analytics tools.
So how are companies taking advantage of this today? As an example, on-line retailers can use big data from their web properties to better understand site visitors’ activities, such as paths through the site, pages viewed, and comments posted. This knowledge can be combined with user profiles and purchasing history to gain a better understanding of customers, and the delivery of highly targeted offers.
Of course, it is not just in the web that big data can make a difference. Every business activity can benefit, with other common use cases including:
- Sentiment analysis;
- Marketing campaign analysis;
- Customer churn modeling;
- Fraud detection;
- Research and Development;
- Risk Modeling;
- And more.
As the guide discusses, Big Data is promising a significant transformation of the way organizations leverage data to run their businesses. MySQL can be seamlessly integrated within a Big Data lifecycle, enabling the unification of multi-structured data into common data platforms, taking advantage of all new data sources and yielding more insight than was ever previously imaginable.
Download the guide to MySQL and Hadoop integration to learn more. I'd also be interested in hearing about how you are integrating MySQL with Hadoop today, and your requirements for the future, so please use the comments on this blog to share your insights.