New Feature in ODI 126.96.36.199: ODI for Big Data
By Julien Testut on Mar 28, 2012
By Ananth Tirupattur
Starting with Oracle Data Integrator 188.8.131.52.0, ODI is offering a solution to process Big Data. This post provides an overview of this feature.
With all the buzz around Big Data and before getting into the details of ODI for Big Data, I will provide a brief introduction to Big Data and Oracle Solution for Big Data.
So, what is Big Data?
Big data includes:
- structured data (this includes data from relation data stores, xml data stores),
- semi-structured data (this includes data from weblogs)
- unstructured data (this includes data from text blob, images)
Traditionally, business decisions are based on the information gathered from transactional data. For example, transactional Data from CRM applications is fed to a decision system for analysis and decision making. Products such as ODI play a key role in enabling decision systems. However, with the emergence of massive amounts of semi-structured and unstructured data it is important for decision system to include them in the analysis to achieve better decision making capability.
While there is an abundance of opportunities for business for gaining competitive advantages, process of Big Data has challenges. The challenges of processing Big Data include:
- Volume of data
- Velocity of data - The high Rate at which data is generated
- Variety of data
In order to address these challenges and convert them into opportunities, we would need an appropriate framework, platform and the right set of tools.
Hadoop is an open source framework which is highly scalable, fault tolerant system, for storage and processing large amounts of data. Hadoop provides 2 key services, distributed and reliable storage called Hadoop Distributed File System or HDFS and a framework for parallel data processing called Map-Reduce.
Innovations in Hadoop and its related technology continue to rapidly evolve, hence therefore, it is highly recommended to follow information on the web to keep up with latest information.
Oracle's vision is to provide a comprehensive solution to address the challenges faced by Big Data. Oracle is providing the necessary Hardware, software and tools for processing Big Data
Oracle solution includes:
- Big Data Appliance
- Oracle NoSQL Database
- Cloudera distribution for Hadoop
- Oracle R Enterprise- R is a statistical package which is very popular among data scientists.
- ODI solution for Big Data
- Oracle Loader for Hadoop for loading data from Hadoop to Oracle.
Further details can be found here: http://www.oracle.com/us/products/database/big-data-appliance/overview/index.html
ODI Solution for Big Data:
ODI’s goal is to minimize the need to understand the complexity of Hadoop framework and simplify the adoption of processing Big Data seamlessly in an enterprise.
ODI is providing the capabilities for an integrated architecture for processing Big Data. This includes capability to load data in to Hadoop, process data in Hadoop and load data from Hadoop into Oracle.
ODI is expanding its support for Big Data by providing the following out of the box Knowledge Modules (KMs).
- IKM File to Hive (LOAD DATA).
Load unstructured data from File (Local file system or HDFS ) into Hive
- IKM Hive Control Append
Transform and validate structured data on Hive
- IKM Hive Transform
Transform unstructured data on Hive
- IKM File/Hive to Oracle (OLH)
Load processed data in Hive to Oracle
- RKM Hive
Reverse engineer Hive tables to generate models
Using the Loading KM you can map files (local and HDFS files) to the corresponding Hive tables. For example, you can map weblog files categorized by date into a corresponding partitioned Hive table schema.
Using the Hive control Append KM you can validate and transform data in Hive. In the below example, two source Hive tables are joined and mapped to a target Hive table.
The Hive Transform KM facilitates processing of semi-structured data in Hive. In the below example, the data from weblog is processed using a Perl script and mapped to target Hive table.
Using the Oracle Loader for Hadoop (OLH) KM you can load data from Hive table or HDFS to a corresponding table in Oracle. OLH is available as a standalone product. ODI greatly enhances OLH capability by generating the configuration and mapping files for OLH based on the configuration provided in the interface and KM options. ODI seamlessly invokes OLH when executing the scenario. In the below example, a HDFS file is mapped to a table in Oracle.
Development and Deployment:
The following diagram illustrates the development and deployment of ODI solution for Big Data.
Using the ODI Studio on your development machine create and develop ODI solution for processing Big Data by connecting to a MySQL DB or Oracle database on a BDA machine or Hadoop cluster. Schedule the ODI scenarios to be executed on the ODI agent deployed on the BDA machine or Hadoop cluster.
ODI Solution for Big Data provides several exciting new capabilities to facilitate the adoption of Big Data in an enterprise. You can find more information about the Oracle Big Data connectors on OTN.
You can find an overview of all the new features introduced in ODI 184.108.40.206 in the following document: ODI 220.127.116.11 New Features Overview