In prior articles, we have introduced architectures for streaming transactions from Oracle GoldenGate to HDFS, Hive, and HBase. In this article we are adding to this list by presenting a solution for streaming transactions into the Flume service.
Apache Flume is a distributed, fault tolerant, and available service for efficiently collecting, aggregating, and moving large amounts of streaming data into HDFS. It can be configured into a variety of distribution mechanisms, such as delivery to multiple clusters, or rolling of HDFS files based on time or size.
As shown in the diagram below, streaming database transactions to Flume is accomplished by developing a custom handler using Oracle GoldenGate's Java API and Flume's Avro RPC APIs.
The custom handler is deployed as an integral part of the Oracle GoldenGate Pump process. The Pump process and the custom adapter are configured through the Pump parameter file and custom adapter's properties file. The Pump process executes the adapter in its address space. The Pump reads the Trail File created by the Oracle GoldenGate Capture process and passes the transactions to the adapter. The adapter then writes the transactions to a Flume Avro RPC source at the given host/port defined in the configuration file. The Flume Agent streams the data to the final destination; in the supplied example Flume writes into an HDFS directory for subsequent consumption by Hive.
A sample implementation of the Flume Adapter for Oracle GoldenGate is provided at the My Oracle Support site as Knowledge Base article 1926867.1.