When moving data between Hadoop/Hive/HDFS and Oracle Database, a common question asked is, what is the best tool to use?
Broadly, the wide array of products can be categorized as below:
Oracle Shell for Hadoop Loaders (OHSH for short): This tool is a universal command-line UI for the three data movement products described below.
Load into Oracle Database
Oracle Loader for Hadoop: Load from Hive/HDFS (and Kafka) to Oracle Database
Oracle SQL Connector for HDFS: Load text data from Hive/HDFS to Oracle Database using SQL
Copy from Oracle Database
Copy to Hadoop: Copy data from Oracle Database tables to HDFS and access from Hive (and Spark through a Hive context)
In addition to the command-line OHSH these products are available for use through SQL Developer.
Data Access (in place)
Oracle Big Data SQL: Oracle SQL access of data in Hive/HDFS (and Kafka)
Oracle Table Access for Apache Hadoop: Hive SQL access to data in Oracle Database tables
Oracle Data Integrator has KMs for data movement, some of them use the above products.
Load Speed and Performance
The data movement tools are architected to minimize CPU usage by the database system during the load process. So for large loads, in particular when the load jobs co-exist with a database running applications, the data movement tools of Oracle Loader for Hadoop and Copy to Hadoop are ideal. They enable fast data movement with minimal impact on the database. Oracle Loader for Hadoop works very well for continuous loads or frequent loads into a database, as it can load data without impacting other applications. Oracle SQL Connector for HDFS is faster, but it uses more database CPU cycles. When used with Oracle Big Data Appliance and Oracle Exadata connected by InfiniBand, Oracle SQL Connector for HDFS can load into the database at 15 TB an hour.
SQL Developer can be used with all the data movement tools.
ODI KMs are very popular among customers who are already using ODI for data integration. With the same skillset and a familiar product they can use ODI KMs to move data between Hadoop and Oracle Database.
The data access tools enable data movement through SQL, and so are appealing to SQL programmers. Once the external table is created data can be moved using a CTAS (Create Table as Select) statement that reads the external table and writes the data to a local table. However the primary goal of these tools is data access.
Cloud: Oracle Loader for Hadoop and Copy to Hadoop (and the UI Oracle Shell for Hadoop Loaders) are available with Big Data Cloud Service and Big Data Cloud licenses. No additional licenses are required.
On-prem: Oracle Loader for Hadoop and Oracle SQL Connector for HDFS (and the UI Oracle Shell for Hadoop Loaders) are licensed with Oracle Big Data Connectors, along with Oracle Table Access for Apache Hadoop. Copy to Hadoop (and the UI Oracle Shell for Hadoop Loaders) is licensed with Oracle Big Data SQL.