X

Move data between Oracle Database and Apache Hadoop using high speed connectors.

  • February 1, 2018

Which Data Movement Tool Should I Use?

When moving data between Hadoop/Hive/HDFS and Oracle Database, a common question asked is, what is the best tool to use?

Broadly, the wide array of products can be categorized as below:

Data Movement

 

Oracle Shell for Hadoop Loaders (OHSH for short): This tool is a universal command-line UI for the three data movement products described below.

 

Load into Oracle Database

Oracle Loader for Hadoop: Load from Hive/HDFS (and Kafka) to Oracle Database

Oracle SQL Connector for HDFS: Load text data from Hive/HDFS to Oracle Database using SQL

 

Copy from Oracle Database

Copy to Hadoop:  Copy data from Oracle Database tables to HDFS and access from Hive (and Spark through a Hive context)

 

In addition to the command-line OHSH these products are available for use through SQL Developer.

 

Data Access (in place)

 

Oracle Big Data SQL: Oracle SQL access of data in Hive/HDFS (and Kafka)

 

Oracle Table Access for Apache Hadoop: Hive SQL access to data in Oracle Database tables

 

 

Oracle Data Integrator has KMs for data movement, some of them use the above products.

 

Load Speed and Performance

The data movement tools are architected to minimize CPU usage by the database system during the load process.  So for large loads, in particular when the load jobs co-exist with a database running applications, the data movement tools of Oracle Loader for Hadoop and Copy to Hadoop are ideal.   They enable fast data movement with minimal impact on the database.  Oracle Loader for Hadoop works very well for continuous loads or frequent loads into a database, as it can load data without impacting other applications.   Oracle SQL Connector for HDFS is faster, but it uses more database CPU cycles.      When used with Oracle Big Data Appliance and Oracle Exadata connected by InfiniBand, Oracle SQL Connector for HDFS can load into the database at 15 TB an hour. 

UI/Skill Sets

SQL Developer can be used with all the data movement tools.

ODI KMs are very popular among customers who are already using ODI for data integration.   With the same skillset and a familiar product they can use ODI KMs to move data between Hadoop and Oracle Database. 

The data access tools enable data movement through SQL, and so are appealing to SQL programmers.  Once the external table is created data can be moved using a CTAS (Create Table as Select) statement that reads the external table and writes the data to a local table.    However the primary goal of these tools is data access.

Licensing

Cloud: Oracle Loader for Hadoop and Copy to Hadoop (and the UI Oracle Shell for Hadoop Loaders) are available with Big Data Cloud Service and Big Data Cloud licenses.   No additional licenses are required.

On-prem: Oracle Loader for Hadoop and Oracle SQL Connector for HDFS (and the UI Oracle Shell for Hadoop Loaders) are licensed with Oracle Big Data Connectors, along with Oracle Table Access for Apache Hadoop.     Copy to Hadoop (and the UI Oracle Shell for Hadoop Loaders) is licensed with Oracle Big Data SQL.

Be the first to comment

Comments ( 0 )
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.