Streaming relational transactions to Hive
By Ananth R. Tiru on Oct 09, 2013
Following the introductory blog post on the topic – ' Stream your transactions into Big Data Systems', and blog post on 'Streaming relational transactions to HDFS', in this blog post I will discuss the architecture for streaming relational transactions into Hive.
Referring to the architecture diagram below, integrating database with Hive is accomplished by developing a custom handler using Oracle GoldenGate's Java API and Hadoop HDFS APIs.
The custom handler is deployed as an integral part of the Oracle GoldenGate Pump process. The Pump process and the custom adapter are configured through the Pump parameter file and custom adapter's properties file.The Pump process executes adapter in its address space. The Pump reads the Trail File created by the Oracle GoldenGate Capture process and passes the transactions to the adapter. Based on the configuration, the adapter writes the transactions in a desired format, with the appropriate content to a file which is defined by the Hive DDL for the table.
A sample implementation of the Hive adapter is provided on My Oracle Support (Knowledge ID - 1586188.1). This is provided to illustrate the capability and to assist in the adoption of the Oracle GoldenGate Java API in developing custom solutions. The sample implementation illustrates the configuration and the code required for replicating database transactions on an example table to a corresponding Hive table. The instructions for configuring Oracle GoldenGate, compiling and running the sample implementation are also provided.
The sample code and configuration may be extended to develop custom solutions, however, please note that Oracle will not provide support for the code and the configuration illustrated in the knowledge base paper.
It would be great if you could share your use case about leveraging Oracle GoldenGate in your Big Data strategy and your feedback on using the custom handler for integrating relational database with Hive. Please post your comments in this blog or in the Oracle GoldenGate public forum - https://forums.oracle.com/community/developer/english/business_intelligence/system_management_and_integration/goldengate