Welcome to All Things Data Integration: Announcements, Insights, Best Practices, Tips & Tricks, and Trend Related...

Big Data Matters with ODI12c

contributed by Mike Eisterer

On October 17th, 2013, Oracle announced the
release of Oracle Data Integrator 12c (ODI12c). 
This release signifies improvements to Oracle’s Data Integration
portfolio of solutions, particularly Big Data integration.

Why Big Data = Big Business

Organizations are gaining greater insights and actionability
through increased storage, processing and analytical benefits offered by Big
Data solutions.  New technologies and
frameworks like HDFS, NoSQL, Hive and MapReduce support these benefits now. As
further data is collected, analytical requirements increase and the complexity
of managing transformations and aggregations of data compounds and
organizations are in need for scalable Data Integration solutions.

ODI12c provides enterprise solutions for the movement,
translation and transformation of information and data heterogeneously and in
Big Data Environments through:

  • The ability for existing ODI and SQL developers to
    leverage new Big Data technologies.
  • A metadata focused approach for cataloging, defining and
    reusing Big Data technologies, mappings and process executions.
  • Integration between many heterogeneous environments and technologies
    such as HDFS and Hive.
  • Generation of Hive Query Language.

Working with Big Data using Knowledge Modules

 ODI12c provides
developers with the ability to define sources and targets and visually develop
mappings to effect the movement and transformation of data.  As the mappings are created, ODI12c leverages
a rich library of prebuilt integrations, known as Knowledge Modules (KMs). 
These KMs are contextual to the technologies and platforms to be
integrated.  Steps and actions needed to
manage the data integration are pre-built and configured within the KMs. 

The Oracle Data Integrator Application Adapter for Hadoop
provides a series of KMs, specifically designed to integrate with Big Data Technologies.  The Big Data KMs include:

  • Check Knowledge Module
  • Reverse Engineer Knowledge Module
  • Hive Transform Knowledge Module
  • Hive Control Append Knowledge Module
  • File to Hive (LOAD DATA) Knowledge Module
  • File-Hive to Oracle (OLH-OSCH) Knowledge Module 

Nothing to beat an Example:

demonstrate the use of the KMs which are part of the ODI Application Adapter
for Hadoop, a mapping may be defined to move data between files and Hive

The mapping
is defined by dragging the source and target into the mapping, performing the
attribute (column) mapping (see Figure 1)
and then selecting the KM which will govern the process. 

In this
mapping example, movie data is being moved from an HDFS source into a Hive
table.  Some of the attributes, such as
“CUSTID to custid”, have been mapped over.

1  Defining the Mapping

Before the
proper KM can be assigned to define the technology for the mapping, it needs to
be added to the ODI project.  The Big
Data KMs have been made available to the project through the KM import process.   Generally, this is done prior to defining
the mapping.

Figure 2 
Importing the Big Data Knowledge Modules

the import, the KMs are available in the Designer Navigator.

Figure 3 
The Project View in Designer, Showing Installed IKMs

Once the KM
is imported, it may be assigned to the mapping target.  This is done by selecting the Physical View
of the mapping and examining the Properties of the Target.  In this case MOVIAPP_LOG_STAGE is the target
of our mapping.

Figure 4 
Physical View of the Mapping and Assigning the Big Data Knowledge Module
to the Target

KMs may have been selected as well, providing flexibility and abstracting the
logical mapping from the physical implementation.  Our mapping may be applied to other technologies
as well.

The mapping
is now complete and is ready to run.  We
will see more in a future blog about running a mapping to load Hive.

To complete
the quick ODI for Big Data Overview, let us take a closer look at what the IKM
File to Hive is doing for us.  ODI
provides differentiated capabilities by defining the process and steps which
normally would have to be manually developed, tested and implemented into the
KM.  As shown in figure 5, the KM is
preparing the Hive session, managing the Hive tables, performing the initial
load from HDFS and then performing the insert into Hive.  HDFS and Hive options are selected
graphically, as shown in the properties in Figure 4.

Figure 5 
Process and Steps Managed by the KM

What’s Next

Big Data being
the shape shifting business challenge it is is fast evolving into the deciding
factor between market leaders and others.

Now that an
introduction to ODI and Big Data has been provided, look for additional blogs
coming soon using the Knowledge Modules which make up the Oracle Data
Integrator Application Adapter for Hadoop:

  • Importing Big Data
    Metadata into ODI, Testing Data Stores and Loading Hive Targets
  • Generating Transformations
    using Hive Query language
  • Loading Oracle from
    Hadoop Sources

For more
information now, please visit the Oracle Data Integrator Application Adapter
for Hadoop web site, http://www.oracle.com/us/products/middleware/data-integration/hadoop/overview/index.html

Do not forget to tune in to the ODI12c Executive Launch webcast on the 12th to hear more about ODI12c and GG12c.

Be the first to comment

Comments ( 0 )
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.