Monday Apr 22, 2013

Announcing the MySQL Applier for Apache Hadoop

Enabling Real-Time MySQL to HDFS Integration

Batch processing delivered by Map/Reduce remains central to Apache Hadoop, but as the pressure to gain competitive advantage from “speed of thought” analytics grows, so Hadoop itself is undergoing significant evolution. The development of technologies allowing real time queries, such as Apache Drill, Cloudera Impala and the Stinger Initiative are emerging, supported by new generations of resource management with Apache YARN

To support this growing emphasis on real-time operations, we are releasing a new MySQL Applier for Hadoop to enable the replication of events from MySQL to Hadoop / Hive / HDFS (Hadoop Distributed File System) as they happen.  The Applier complements existing batch-based Apache Sqoop connectivity.

Replication via the MySQL Applier for Hadoop is implemented by connecting to the MySQL master and reading binary log events as soon as they are committed, and writing them into a file in HDFS.


The Applier for Hadoop uses an API provided by libhdfs, a C library to manipulate files in HDFS. The library comes precompiled with Hadoop distributions.

It connects to the MySQL master to read the binary log and then:

  • Fetches the row insert events occurring on the master
  • Decodes these events, extracts data inserted into each field of the row, and uses content handlers to get it in the format required
  • Appends it to a text file in HDFS.

Databases are mapped as separate directories, with their tables mapped as sub-directories with a Hive data warehouse directory. Data inserted into each table is written into text files (named as datafile1.txt) in Hive / HDFS. Data can be in comma separated format; or any other, that is configurable by command line arguments. 


You can learn more about the design of the MySQL Applier for Hadoop from this blog

The installation, configuration and implementation are discussed in detail in this Applier for Hadoop blog. Integration with Hive is also documented.

You can also see it in action from this MySQL Hadoop Applier Video Tutorial 

With the growth in big data projects and Hadoop adoption, it would be great to get your feedback on how we can further develop the Applier to meet your real-time integration needs. Please use the comments section to let the MySQL team know your priorities.

See the benefits of MySQL Cluster through Oracle Training

If the following items items describe what you need in a high-availability solution, then MySQL Cluster is for you:

  • High scale, reads and writes
  • 99.999% availability
  • Real-time
  • SQL and NoSQL
  • Low TCO

And what better way to get started on MySQL Cluster than taking the authentic MySQL Cluster training course.

In this 3-day course, you learn important cluster concepts and get hands-on experience installing, configuring and managing a cluster. Some events already on the schedule for this course include:

 Location

 Date

 Delivery Language

 London, England

 12 June 2013

 English

 Hamburg, Germany

 1 July 2013

 German

 Munich, Germany

 10 June 2013

 German

 Budapest, Hungary

 19 June 2013

 Hungarian

 Warsaw, Poland

 10 June 2013

 Polish

 Barcelona, Spain

 12 August 2013

 Spanish

 Madrid, Spain

 10 June 2013

 Spanish

 Istanbul, Turkey

 27 May 2013

 Turkish

 Irvine, CA, United States

 24 July 2013

 English

 Edison, NJ, United States

 29 May 2013

 English

 Jakarta, Indonesia

 5 August 2013

 English

For more information on this course or other courses on the authentic MySQL curriculum, go to http://oracle.com/education/mysql.

About

Get the latest updates on products, technology, news, events, webcasts, customers and more.

Twitter


Facebook

Search

Archives
« April 2013 »
SunMonTueWedThuFriSat
 
1
3
5
6
8
11
12
13
15
16
19
20
21
23
24
27
28
29
    
       
Today