By Alex Kotopoulis-Oracle on Jul 13, 2014
Oracle Data Integrator (ODI) 12.1.3 extends its Hadoop capabilities through a number of exciting new cababilities. The new features include:
- Loading of RDBMS data from and to Hadoop using Sqoop
- Support for Apache HBase databases
- Support for Hive append functionality
The diagram below shows all ODI Hadoop knowledge modules with KMs added in ODI 12.1.3 in red.
Apache Sqoop is designed for efficiently transferring bulk amounts of data between Hadoop and relational databases such as Oracle, MySQL, Teradata, DB2, and others. Sqoop operates by creating multiple parallel map-reduce processes across a Hadoop cluster and connecting to an external database and transfering data from or to Hadoop storage in a partitioned fashion. Data can be stored in Hadoop using HDFS, Hive, or HBase. ODI adds two knowledge modules IKM SQL to Hive- HBase-File (SQOOP) and IKM File-Hive to SQL (SQOOP).
Loading from and to Sqoop in ODI is straightforward. Create a mapping with the database source and hadoop target (or vice versa) and apply any necessary transformation expressions.
In the physical design of the map, make sure to set the LKM of the target to LKM SQL Multi-Connect.GLOBAL and choose a Sqoop IKM, such as IKM SQL to Hive- HBase-File (SQOOP). Change the MapReduce Output Directory IKM property MAPRED_OUTPUT_BASE_DIR to an appropriate HDFS dir. Review all other properties and tune as necessary. Using these simple steps you should be able to perform a quick Sqoop load.
ODI adds support for HBase as a source and target. HBase metadata can be reverse-engineered using the RKM HBase knowledge module, and HBase can be used as source and target of a Hive transformation using LKM HBase to Hive and IKM Hive to HBase. Sqoop KMs also support HBase as a target for loads from a database.
For more information please read the ODI 12.1.3 documentation about HBase.
Hive Append support
Prior to Hive 0.8 there had been no direct way to append data to an existing table. Prior Hive KMs emulated such logic by renaming the existing table and concatenating old and new data into a new table with the prior name. This emulated append operation caused major data movement, particularly when the target table has been large.
Starting with version 0.8 Hive has been enhanced to support appending. All ODI 12.1.3 Hive KMs have been updated to support the append capability by default but provide backward compatibility to the old behavior through the KM property HIVE_COMPATIBLE=0.7.
ODI 12.1.3 provides an optimal and easy-to use way to perform data integration in a Big Data environment. ODI utilizes the processing power of the data storage and processing environment rather than relying on a proprietary transformation engine. This core "ELT" philosophy has its perfect match in a Hadoop environment, where ODI can provide unique value by providing a native and easy-to-use data integration envionment.