Are you one of the organizations adopting ‘big data systems’ to manage and analyze a class of data typically referred to as big data? If so, you may know that big data includes data that could be structured, semi-structured or unstructured, each of which originates from a variety of different sources. Another characterization of big data is described by the data's volume, velocity, and veracity. Due to its promise to help harness the data deluge we are faced with, the adoption of big data solutions is becoming quite pervasive. In this blog post I’d like discuss how to leverage Oracle GoldenGate’s real-time replication for big data systems.
The term 'big data systems' is an umbrella terminology used in general to discuss a wide variety of technologies each of which is used for a specific purpose. Broadly speaking, big data technologies address the needs for batch, transactional, and real-time processing requirements. Using the appropriate big data technology is highly dependent on the use case being addressed.
While gaining business intelligence from transactional data continues to be a dominant factor in the decision making process, businesses have realized that gaining intelligence from other forms of data they have been collecting will enable them achieve a more complete view, address additional business objectives, and lead to better decision making. The following table illustrates some examples of various industry verticals, forms of data, and the objective the business attempts to achieve using the other forms of data.
Practitioner’s notes, machine statistics.
Best practices and reduced hospitalization.
Weblog, click streams.
Weblogs, fraud reports.
Fraud detection, risk analysis.
Smart meter reading, call center data.
Real-time and predictive utilization analysis.
When using other forms of data for analytics, better contextual intelligence is obtained when the analysis is combined with transactional data. Especially low-latency transactional data brings additional value to dynamically changing operations that day-old data cannot deliver. In organizations, a vast majority of applications' transactional data is captured in relational databases. In order to ensure an efficient supply of transactional data for big data analytics, there are several requirements that the data integration solution should address:
<!--[if !supportLists]-->· <!--[endif]-->Reliable change data capture and delivery mechanism
<!--[if !supportLists]-->· <!--[endif]-->Minimal resource consumption when extracting data from the relational data source
<!--[if !supportLists]-->· <!--[endif]-->Secured data delivery
<!--[if !supportLists]-->· <!--[endif]-->Ability to customize data delivery
<!--[if !supportLists]-->· <!--[endif]-->Support heterogeneous database sources
<!--[if !supportLists]-->· <!--[endif]-->Easy to install, configure and maintain
A solution which can reliably stream database transactions to a desired target enables that the effort is spent on data analysis rather than data acquisition. Also, when the solution is non-intrusive and minimally impacts the source database, it minimizes the need for additional resources and changes on the source database.
Oracle GoldenGate is a time tested and proven product for real-time, heterogeneous relational database replication. Oracle GoldenGate addresses the challenges listed above and is widely used by organizations for mission critical data replication among relational databases. Furthermore, GoldenGate moves transactional data in real-time to support timely operational business intelligence needs.
There is a variety of integration options available with the Oracle GoldenGate product that facilitates delivering transactions on relational databases into non-relational targets.
Oracle GoldenGate provides pre-built adapters which integrate with Flat Files and Messaging Systems. Please refer to Oracle GoldenGate for Java - Administration Guide and Oracle GoldenGate for Flat Files -Administration Guide for more information.
Oracle GoldenGate also provides Java APIs and a framework for developing custom integrations to Java enabled targets. Using this capability, custom adapters or handlers can be developed to address specific requirements. In this blog post I’d like focus on Oracle GoldenGate Java APIs for developing custom integrations to big data systems.
As we mentioned earlier, 'big data systems' is an umbrella terminology used in general to describe a wide variety of technologies, each of which is used for a specific purpose. Among the various big data systems, Hadoop and its suite of technologies are widely adopted by various organizations for processing big data. The below diagram illustrates a general high level architecture for integrating with Hadoop.
You can implement custom adapter or handler for the big data system using Oracle GoldenGate's Java API. The custom adapter is deployed as an integral part of the Oracle GoldenGate Pump process. The Pump and the custom adapter are configured through the Pump parameter file and custom adapter's properties file respectively. Depending upon the requirements, the properties for the custom adapter will need to be determined and implemented. Pump Parameter file Adapter Properties file
Pump Parameter file
Adapter Properties file
The Pump process will execute the adapter in its address space. The Pump reads the Trail File created by the Oracle GoldenGate Capture process and passes the transactions to the adapter. Based on the configuration, the adapter will write the transactions into Hadoop.
Enabling the co-existence of big data systems with relational systems will benefit organizations to better serve customers and improve decision-making capabilities. Oracle GoldenGate, which has an excellent record of empowering IT on the various aspects of data management requirements, provides the capability to integrate with big data systems. In the upcoming blog posts, we will discuss in depth the implementation and the configuration of integrating Oracle GoldenGate with Hadoop technologies.