Monday Nov 04, 2013

Big Data Matters with ODI12c

contributed by Mike Eisterer

On October 17th, 2013, Oracle announced the release of Oracle Data Integrator 12c (ODI12c).  This release signifies improvements to Oracle’s Data Integration portfolio of solutions, particularly Big Data integration.

Why Big Data = Big Business

Organizations are gaining greater insights and actionability through increased storage, processing and analytical benefits offered by Big Data solutions.  New technologies and frameworks like HDFS, NoSQL, Hive and MapReduce support these benefits now. As further data is collected, analytical requirements increase and the complexity of managing transformations and aggregations of data compounds and organizations are in need for scalable Data Integration solutions.

ODI12c provides enterprise solutions for the movement, translation and transformation of information and data heterogeneously and in Big Data Environments through:

  • The ability for existing ODI and SQL developers to leverage new Big Data technologies.
  • A metadata focused approach for cataloging, defining and reusing Big Data technologies, mappings and process executions.
  • Integration between many heterogeneous environments and technologies such as HDFS and Hive.
  • Generation of Hive Query Language.

Working with Big Data using Knowledge Modules

 ODI12c provides developers with the ability to define sources and targets and visually develop mappings to effect the movement and transformation of data.  As the mappings are created, ODI12c leverages a rich library of prebuilt integrations, known as Knowledge Modules (KMs).  These KMs are contextual to the technologies and platforms to be integrated.  Steps and actions needed to manage the data integration are pre-built and configured within the KMs. 

The Oracle Data Integrator Application Adapter for Hadoop provides a series of KMs, specifically designed to integrate with Big Data Technologies.  The Big Data KMs include:

  • Check Knowledge Module
  • Reverse Engineer Knowledge Module
  • Hive Transform Knowledge Module
  • Hive Control Append Knowledge Module
  • File to Hive (LOAD DATA) Knowledge Module
  • File-Hive to Oracle (OLH-OSCH) Knowledge Module 

Nothing to beat an Example:

To demonstrate the use of the KMs which are part of the ODI Application Adapter for Hadoop, a mapping may be defined to move data between files and Hive targets. 

The mapping is defined by dragging the source and target into the mapping, performing the attribute (column) mapping (see Figure 1) and then selecting the KM which will govern the process. 

In this mapping example, movie data is being moved from an HDFS source into a Hive table.  Some of the attributes, such as “CUSTID to custid”, have been mapped over.

Figure 1  Defining the Mapping

Before the proper KM can be assigned to define the technology for the mapping, it needs to be added to the ODI project.  The Big Data KMs have been made available to the project through the KM import process.   Generally, this is done prior to defining the mapping.


Figure 2  Importing the Big Data Knowledge Modules

Following the import, the KMs are available in the Designer Navigator.

Figure 3  The Project View in Designer, Showing Installed IKMs

Once the KM is imported, it may be assigned to the mapping target.  This is done by selecting the Physical View of the mapping and examining the Properties of the Target.  In this case MOVIAPP_LOG_STAGE is the target of our mapping.


Figure 4  Physical View of the Mapping and Assigning the Big Data Knowledge Module to the Target

Alternative KMs may have been selected as well, providing flexibility and abstracting the logical mapping from the physical implementation.  Our mapping may be applied to other technologies as well.

The mapping is now complete and is ready to run.  We will see more in a future blog about running a mapping to load Hive.

To complete the quick ODI for Big Data Overview, let us take a closer look at what the IKM File to Hive is doing for us.  ODI provides differentiated capabilities by defining the process and steps which normally would have to be manually developed, tested and implemented into the KM.  As shown in figure 5, the KM is preparing the Hive session, managing the Hive tables, performing the initial load from HDFS and then performing the insert into Hive.  HDFS and Hive options are selected graphically, as shown in the properties in Figure 4.


Figure 5  Process and Steps Managed by the KM

What’s Next

Big Data being the shape shifting business challenge it is is fast evolving into the deciding factor between market leaders and others.

Now that an introduction to ODI and Big Data has been provided, look for additional blogs coming soon using the Knowledge Modules which make up the Oracle Data Integrator Application Adapter for Hadoop:

  • Importing Big Data Metadata into ODI, Testing Data Stores and Loading Hive Targets
  • Generating Transformations using Hive Query language
  • Loading Oracle from Hadoop Sources

For more information now, please visit the Oracle Data Integrator Application Adapter for Hadoop web site, http://www.oracle.com/us/products/middleware/data-integration/hadoop/overview/index.html

Do not forget to tune in to the ODI12c Executive Launch webcast on the 12th to hear more about ODI12c and GG12c.

Sunday Oct 27, 2013

ODI 12c's Mapping Designer - Combining Flow Based and Expression Based Mapping

post by David Allan

ODI is renowned for its declarative designer and minimal expression based paradigm. The new ODI 12c release has extended this even further to provide an extended declarative mapping designer. The ODI 12c mapper is a fusion of ODI's new declarative designer with the familiar flow based designer while retaining ODI’s key differentiators of:

  • Minimal expression based definition,
  • The ability to incrementally design an interface and to extract/load data from any combination of sources, and most importantly
  • Backed by ODI’s extensible knowledge module framework.

The declarative nature of the product has been extended to include an extensible library of common components that can be used to easily build simple to complex data integration solutions. Big usability improvements through consistent interactions of components and concepts all constructed around the familiar knowledge module framework provide the utmost flexibility.

Here is a little taster:

So what is a mapping? A mapping comprises of a logical design and at least one physical design, it may have many. A mapping can have many targets, of any technology and can be arbitrarily complex. You can build reusable mappings and use them in other mappings or other reusable mappings.

In the example below all of the information from an Oracle bonus table and a bonus file are joined with an Oracle employees table before being written to a target. Some things that are cool include the one-click expression cross referencing so you can easily see what's used where within the design.

The logical design in a mapping describes what you want to accomplish  (see the animated GIF here illustrating how the above mapping was designed) . The physical design lets you configure how it is to be accomplished. So you could have one logical design that is realized as an initial load in one physical design and as an incremental load in another. In the physical design below we can customize how the mapping is accomplished by picking Knowledge Modules, in ODI 12c you can pick multiple nodes (on logical or physical) and see common properties. This is useful as we can quickly compare property values across objects - below we can see knowledge modules settings on the access points between execution units side by side, in the example one table is retrieved via database links and the other is an external table.


In the logical design I had selected an append mode for the integration type, so by default the IKM on the target will choose the most suitable/default IKM - which in this case is an in-built Oracle Insert IKM (see image below). This supports insert and select hints for the Oracle database (the ANSI SQL Insert IKM does not support these), so by default you will get direct path inserts with Oracle on this statement.