Thursday Sep 12, 2013

Stream Relational Transactions into Big Data Systems

Are you one of the organizations adopting ‘big data systems’ to manage and analyze a class of data typically referred to as big data? If so, you may know that big data includes data that could be structured, semi-structured or unstructured, each of which originates from a variety of different sources.  Another characterization of big data is described by the data's volume, velocity, and veracity. Due to its promise to help harness the data deluge we are faced with, the adoption of big data solutions is becoming quite pervasive. In this blog post I’d like discuss how to leverage Oracle GoldenGate’s real-time replication for big data systems.

The term 'big data systems' is an umbrella terminology used in general to discuss a wide variety of technologies each of which is used for a specific purpose. Broadly speaking, big data technologies address the needs for batch, transactional, and real-time processing requirements. Using the appropriate big data technology is highly dependent on the use case being addressed.

While gaining business intelligence from transactional data continues to be a dominant factor in the decision making process, businesses have realized that gaining intelligence from other forms of data they have been collecting will enable them achieve a more complete view, address additional business objectives, and lead to better decision making. The following table illustrates some examples of various industry verticals, forms of data, and the objective the business attempts to achieve using the other forms of data.

Industry

Data

Objective

Healthcare

Practitioner’s notes, machine statistics.

Best practices and reduced hospitalization.

Retail

Weblog, click streams.

Micro-segmentation recommendations.

Banking

Weblogs, fraud reports.

Fraud detection, risk analysis.

Utilities

Smart meter reading, call center data.

Real-time and predictive utilization analysis.

Role of transactional data

When using other forms of data for analytics, better contextual intelligence is obtained when the analysis is combined with transactional data. Especially low-latency transactional data brings additional value to dynamically changing operations that day-old data cannot deliver. In organizations, a vast majority of applications' transactional data is captured in relational databases. In order to ensure an efficient supply of transactional data for big data analytics, there are several requirements that the data integration solution should address:

<!--[if !supportLists]-->· <!--[endif]-->Reliable change data capture and delivery mechanism

<!--[if !supportLists]-->· <!--[endif]-->Minimal resource consumption when extracting data from the relational data source

<!--[if !supportLists]-->· <!--[endif]-->Secured data delivery

<!--[if !supportLists]-->· <!--[endif]-->Ability to customize data delivery

<!--[if !supportLists]-->· <!--[endif]-->Support heterogeneous database sources

<!--[if !supportLists]-->· <!--[endif]-->Easy to install, configure and maintain

A solution which can reliably stream database transactions to a desired target enables that the effort is spent on data analysis rather than data acquisition. Also, when the solution is non-intrusive and minimally impacts the source database, it minimizes the need for additional resources and changes on the source database.

Oracle GoldenGate is a time tested and proven product for real-time, heterogeneous relational database replication. Oracle GoldenGate addresses the challenges listed above and is widely used by organizations for mission critical data replication among relational databases. Furthermore, GoldenGate moves transactional data in real-time to support timely operational business intelligence needs.

Oracle GoldenGate Integration Options for Big Data Analytics

There is a variety of integration options available with the Oracle GoldenGate product that facilitates delivering transactions on relational databases into non-relational targets.

Oracle GoldenGate provides pre-built adapters which integrate with Flat Files and Messaging Systems. Please refer to Oracle GoldenGate for Java - Administration Guide and Oracle GoldenGate for Flat Files -Administration Guide for more information.

Oracle GoldenGate also provides Java APIs and a framework for developing custom integrations to Java enabled targets. Using this capability, custom adapters or handlers can be developed to address specific requirements. In this blog post I’d like focus on Oracle GoldenGate Java APIs for developing custom integrations to big data systems.

As we mentioned earlier, 'big data systems' is an umbrella terminology used in general to describe a wide variety of technologies, each of which is used for a specific purpose. Among the various big data systems, Hadoop and its suite of technologies are widely adopted by various organizations for processing big data. The below diagram illustrates a general high level architecture for integrating with Hadoop.

<!--[if !vml]--><!--[endif]--> <!--[if !vml]--><!--[endif]-->

Custom Adapter

<!--[if !vml]--><!--[endif]--> <!--[if !vml]--><!--[endif]--> <!--[if !vml]--><!--[endif]--> <!--[if !vml]--><!--[endif]-->

Pump Parameter file

Adapter Properties file

<!--[if !vml]-->

You can implement custom adapter or handler for the big data system using Oracle GoldenGate's Java API. The custom adapter is deployed as an integral part of the Oracle GoldenGate Pump process. The Pump and the custom adapter are configured through the Pump parameter file and custom adapter's properties file respectively. Depending upon the requirements, the properties for the custom adapter will need to be determined and implemented.

The Pump process will execute the adapter in its address space. The Pump reads the Trail File created by the Oracle GoldenGate Capture process and passes the transactions to the adapter. Based on the configuration, the adapter will write the transactions into Hadoop.

Enabling the co-existence of big data systems with relational systems will benefit organizations to better serve customers and improve decision-making capabilities. Oracle GoldenGate, which has an excellent record of empowering IT on the various aspects of data management requirements, provides the capability to integrate with big data systems. In the upcoming blog posts, we will discuss in depth the implementation and the configuration of integrating Oracle GoldenGate with Hadoop technologies. 

Friday Mar 15, 2013

Pervasive Access to Any Data

In my previous blog, I shared with you the five key data integration requirements, which can be summarized as: integrating any data from any source, stored on premise or in the cloud, with maximum performance and availability, to achieve 24/7 access to timely and trusted information. Today, I want to focus on the requirement for integrating “any data”.

We all feel the impact of huge growth in the amount of raw data collected on a daily basis. And big data is a popular topic of information technology these days. Highly complex, large volumes of data bring opportunities and challenges to IT and business audiences. The opportunities, as discussed in McKinsey’s report, are vast, and companies are ready to tap into big data to differentiate and innovate in today’s competitive world.

One of the key challenges of big data is managing the unstructured data, which is estimated to be %80 of enterprise data. Structured and unstructured data must coexist and be used in conjunction with each other in order to gain maximum insight. This means, organizations must collect, organize, and analyze data from sensors, conversations, e-commerce websites, social networks, and many other sources.

Big data also changes the perspective into information management. It changes the question from “How do you look at your data?” to “How do you look at the data that is relevant to you?” This shift in perspective has huge implications in terms of information-management best practices and technologies applied. Data integration technologies now need to support unstructured and semi-structured data, in addition to structured transactional data, to be able to support a complete picture of the enterprise that will drive higher efficiencies, productivity and innovation.

Oracle addresses big data requirements with a complete solution.

In addition to Oracle Big Data Appliance for acquiring and organizing big data, Oracle offers Oracle Big Data Connectors that enable an integrated data set for analysis. Big Data Connectors is a software suite that integrates big data across the enterprise. Oracle Data Integrator offers an Application Adapter for Hadoop, which is part of the Big Data Connectors, and allows organizations to build Hadoop metadata within Oracle Data Integrator, load data into Hadoop, transform data within Hadoop, and load data directly into Oracle Database using Oracle Loader for Hadoop. Oracle Data Integrator has the necessary capabilities for integrating structured, semi-structured, and unstructured data to support organizations with transforming any type of data into real value.

If you would like to learn more about how to use Oracle’s data integration offering for your big data initiatives take a look at our resources on Bridging the Big Data Divide with Oracle Data Integration.

About

Learn the latest trends, use cases, product updates, and customer success examples for Oracle's data integration products-- including Oracle Data Integrator, Oracle GoldenGate and Oracle Enterprise Data Quality

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
2
3
5
6
7
8
9
10
12
13
14
17
18
19
20
21
23
24
25
26
27
28
29
30
   
       
Today