Monday Apr 22, 2013

Announcing the MySQL Applier for Apache Hadoop

Enabling Real-Time MySQL to HDFS Integration

Batch processing delivered by Map/Reduce remains central to Apache Hadoop, but as the pressure to gain competitive advantage from “speed of thought” analytics grows, so Hadoop itself is undergoing significant evolution. The development of technologies allowing real time queries, such as Apache Drill, Cloudera Impala and the Stinger Initiative are emerging, supported by new generations of resource management with Apache YARN

To support this growing emphasis on real-time operations, we are releasing a new MySQL Applier for Hadoop to enable the replication of events from MySQL to Hadoop / Hive / HDFS (Hadoop Distributed File System) as they happen.  The Applier complements existing batch-based Apache Sqoop connectivity.

Replication via the MySQL Applier for Hadoop is implemented by connecting to the MySQL master and reading binary log events as soon as they are committed, and writing them into a file in HDFS.


The Applier for Hadoop uses an API provided by libhdfs, a C library to manipulate files in HDFS. The library comes precompiled with Hadoop distributions.

It connects to the MySQL master to read the binary log and then:

  • Fetches the row insert events occurring on the master
  • Decodes these events, extracts data inserted into each field of the row, and uses content handlers to get it in the format required
  • Appends it to a text file in HDFS.

Databases are mapped as separate directories, with their tables mapped as sub-directories with a Hive data warehouse directory. Data inserted into each table is written into text files (named as datafile1.txt) in Hive / HDFS. Data can be in comma separated format; or any other, that is configurable by command line arguments. 


You can learn more about the design of the MySQL Applier for Hadoop from this blog

The installation, configuration and implementation are discussed in detail in this Applier for Hadoop blog. Integration with Hive is also documented.

You can also see it in action from this MySQL Hadoop Applier Video Tutorial 

With the growth in big data projects and Hadoop adoption, it would be great to get your feedback on how we can further develop the Applier to meet your real-time integration needs. Please use the comments section to let the MySQL team know your priorities.

Friday Mar 08, 2013

MySQL Web Reference Architectures - Your Guide to Innovating on the Web

MySQL is deployed in 9 of the top 10 most trafficked sites on the web including Facebook, Twitter, eBay and YouTube, as well as in some of the fastest growing services such as Tumblr, Pinterest and box.com

Working with these companies has given MySQL developers, consultants and support engineers unique insight into how to design database-driven web architectures – whether deployed on-premise or in the cloud.

The MySQL Web Reference Architectures are a set of documented and repeatable best practices for building infrastructure that deliver the highest levels of scalability, agility and availability with the lowest levels of cost, risk and complexity. 

Four components common to most web and mobile properties are sized, with optimum deployment architectures for each:

User authentication and session management

Content management

Ecommerce

Analytics and big data integration

The sizing is defined by database size and load, as shown below

For each reference architecture, strategies for scaling the service and ensuring high availability are discussed, along with approaches to secure, audit and backup user data, and tools to monitor and manage the environment.

The Reference Architectures cover the core underlying technologies supporting today’s most successful web services including:

- MySQL Database

- MySQL Cluster

- MySQL Replication

- Caching with Memcached and Redis

- Big Data with Hadoop

- NoSQL APIs

- Geographic Redundancy

- Hardware Recommendations

- Operational Best Practices

An example of the "Large" reference architecture is shown below

To learn more:

- Download the MySQL Web Reference Architectures Guide

- View the MySQL Web Reference Architectures slides

The Reference Architecture are designed as a starting point which we hope will enable you build the next web and mobile phenomenon!

Tuesday Jan 15, 2013

Upcoming MySQL Events

Oracle's MySQL team is running/participating to a number of events during the upcoming weeks and months. Don't miss this chance to learn about the latest developments straight from the source and to get all your questions answered!

Additional events will likely be scheduled down the road and posted on our events page, but you can already register for the following ones:

MySQL Tech Tour: Big Data and High Availability with MySQL– Pleasanton, California

January 22

MySQL Tech Tour: Big Data and High Availability with MySQL– Belmont, California

January 23

FOSDEM – Brussels, Belgium

February 2-3

MySQL Tech Tour: From the Web to the Cloud – Istanbul, Turkey

February 5

Oracle & Zend LAMP Seminar - Milan, Italy

February 15

MySQL Tech Tour for Software & Hardware Vendors - Arie Petach Tikva, Israel

February 19

MySQL Tech Tour: From the Web to the Cloud – Oslo, Norway

February 21

MySQL Tech Tour: From the Web to the Cloud – Brussels, Belgium

February 21

MySQL Tech Tour: From the Web to the Cloud – Stockholm, Sweden

March 20

MySQL Tech Tour: From the Web to the Cloud – Munich, Germany

April 17

MySQL Tech Tour: From the Web to the Cloud – Baden, Switzerland

April 24

We hope to see many of you at those events!

Thursday Nov 29, 2012

MySQL and Hadoop Integration - Unlocking New Insight

“Big Data” offers the potential for organizations to revolutionize their operations. With the volume of business data doubling every 1.2 years, analysts and business users are discovering very real benefits when integrating and analyzing data from multiple sources, enabling deeper insight into their customers, partners, and business processes.

As the world’s most popular open source database, and the most deployed database in the web and cloud, MySQL is a key component of many big data platforms, with Hadoop vendors estimating 80% of deployments are integrated with MySQL.

The new Guide to MySQL and Hadoop presents the tools enabling integration between the two data platforms, supporting the data lifecycle from acquisition and organisation to analysis and visualisation / decision, as shown in the figure below


The Guide details each of these stages and the technologies supporting them:

Acquire: Through new NoSQL APIs, MySQL is able to ingest high volume, high velocity data, without sacrificing ACID guarantees, thereby ensuring data quality. Real-time analytics can also be run against newly acquired data, enabling immediate business insight, before data is loaded into Hadoop. In addition, sensitive data can be pre-processed, for example healthcare or financial services records can be anonymized, before transfer to Hadoop.

Organize: Data is transferred from MySQL tables to Hadoop using Apache Sqoop. With the MySQL Binlog (Binary Log) API, users can also invoke real-time change data capture processes to stream updates to HDFS.

Analyze: Multi-structured data ingested from multiple sources is consolidated and processed within the Hadoop platform.

Decide: The results of the analysis are loaded back to MySQL via Apache Sqoop where they inform real-time operational processes or provide source data for BI analytics tools.

So how are companies taking advantage of this today? As an example, on-line retailers can use big data from their web properties to better understand site visitors’ activities, such as paths through the site, pages viewed, and comments posted. This knowledge can be combined with user profiles and purchasing history to gain a better understanding of customers, and the delivery of highly targeted offers.

Of course, it is not just in the web that big data can make a difference. Every business activity can benefit, with other common use cases including:

- Sentiment analysis;

- Marketing campaign analysis;

- Customer churn modeling;

- Fraud detection;

- Research and Development;

- Risk Modeling;

- And more.

As the guide discusses, Big Data is promising a significant transformation of the way organizations leverage data to run their businesses. MySQL can be seamlessly integrated within a Big Data lifecycle, enabling the unification of multi-structured data into common data platforms, taking advantage of all new data sources and yielding more insight than was ever previously imaginable.

Download the guide to MySQL and Hadoop integration to learn more. I'd also be interested in hearing about how you are integrating MySQL with Hadoop today, and your requirements for the future, so please use the comments on this blog to share your insights.




About

Get the latest updates on products, technology, news, events, webcasts, customers and more.

Twitter


Facebook

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
2
5
6
9
10
11
12
13
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today