By Alex Kotopoulis-Oracle on Jan 29, 2014
Update 4/14/2015: This blog is outdated, instructions for the current Big Data Lite 4.1 are located here.
Oracle's big data team has just announced the Oracle BigDataLite Virtual Machine, a pre-built environment to get you started on an environment reflecting the core software of Oracle's Big Data Appliance 2.4. BigDataLite is a VirtualBox VM that contains a fully configured Cloudera Hadoop distribution CDH 4.5, an Oracle DB 12c, Oracle's Big Data Connectors, Oracle Data Integrator 12.1.2, and other software.
You can use this environment to see ODI 12c in action integrating big data with Oracle DB using ODI's declarative graphical design, efficient EL-T loads, and Knowledge Modules designed to optimize big data integration.
The sample data contained in BigDataLite represents the fictional Oracle MoviePlex on-line movie streaming company. The ODI sample performs the following two steps:
- Pre-process application logs within Hadoop: All user activity on the MoviePlex web site is gathered on HDFS in Avro format. ODI is reading these logs through Hive and processes activities by aggregating, filtering, joining and unioning the records in an ODI flow-based mapping. All processing is performed inside Hive map-reduce jobs controlled by ODI, and the resulting data is stored in a staging table within Hive.
- Loading user activity data from Hadoop into Oracle: The previously pre-processed data is loaded from Hadoop into an Oracle 12c database, where this data can be used as basis for Business Intelligence reports. ODI is using the Oracle Loader for Hadoop (OLH) connector, which executes distributed Map-Reduce processes to load data in parallel from Hadoop into Oracle. ODI is transparently configuring and invoking this connector through the Hive-to-Oracle Knowledge Module.
Both steps are orchestrated and executed through an ODI Package workflow.
Please follow these steps to execute the ODI demo in BigDataLite:
- Download and install BigDataLite. Please follow the instructions in the Deployment Guide at the download page.
- Start the VM and log in as user oracle, password welcome1.
- Start the Oracle Database 12c by double-clicking the icon on the desktop.
- Start ODI 12.1.2 by clicking the icon on the toolbar.
- Press Connect To Repository... on the ODI Studio window.
- Press OK in the ODI Login dialog.
- Switch to the Designer tab, open the Projects accordion and expand the projects tree to Movie > First Folder > Mappings. Double-click on the mapping Transform Hive Avro to Hive Staging.
- Review the mapping that transforms source Avro data by aggregating, joining, and unioning data within Hive. You can also review the mapping Load Hive Staging to Oracle the same way.
(Click image for full size)
- In the Projects accordion expand the projects tree to Movie > First Folder > Packages. Double-click on the package Process Movie Data.
- The Package workflow for Process Movie Data opens. You can review the package.
- Press the Run icon on the toolbar. Press OK for the Run and Information: Session started dialogs.
- You can follow the progress of the load by switching to the Operator tab and expanding All Executions and the upmost Process Movie Data entry. You can refresh the display by pressing the refresh button or setting Auto-Refresh.
- Depending on the environment, the load can take 5-15 minutes. When the load is complete, the execution will show all green checkboxes. You can traverse the operator log and double-click entries to explore statistics and executed commands.
This demo shows only some of the ODI big data capabilities. You can find more information about ODI's big data capabilities at:
- ODI Application Adapter for Hadoop
- White paper: Bridging Two Worlds: Big Data and Enterprise Data
- Oracle Learning Library: ODI Application Adapter for Hadoop Tutorial