X

Move data between Oracle Database and Apache Hadoop using high speed connectors.

  • October 6, 2016

Copy to Hadoop with OShell (OHSH)

Sue Mavris
Senior Director

Here are some Copy to Hadoop examples to use with Oracle Big Data Lite VM.  Copy to Hadoop is a feature of Oracle Big Data SQL.

Starting with Big Data SQL 3.1 (available with Oracle Big Data Lite VM 4.7), there are two options when using Copy to Hadoop.

  • directcopy: A Hadoop job copies data from the database table directly to HDFS
  • stage: Data is first written as Oracle Data Pump files on the database file system, and then copied over to HDFS
The default is directcopy.  stage option is useful when there is no direct connection between Oracle Database and the Hadoop cluster.   

#Startup OHSH

prompt> ohsh

Set up some resources 

# Create a hive resource for hive database MOVIEDEMO

ohsh> create hive resource hive_moviedemo \
connectionurl="jdbc:hive2://bigdatalite.localdomain:10000/moviedemo"

 # Create a SQL resource for Oracle Database.  You will be prompted for username and password (moviedemo/welcome1).

ohsh> create sqlplus resource sql  \
connectid="bigdatalite.localdomain:1521/orcl"

# Create a JDBC resource to connect to Oracle Database.  You will be prompted for username and password (moviedemo/welcome1).

ohsh> create oracle jdbc resource moviedemo \
connectid="bigdatalite.localdomain:1521/orcl"

Now you are ready to run some Copy to Hadoop examples

# Copy the table movie_sessions_tab in Oracle Database schema MOVIEDEMO to Hive schema MOVIEDEMO
ohsh> create hive table hive_moviedemo:movie_sessions_tab from oracle table \ moviedemo:movie_sessions_tab; 

# Verify the results of the copy by comparing the number of rows
ohsh> %sql select count(*) from movie_sessions_tab;
%hive_moviedemo select count(*) from movie_sessions_tab;

# Append additional rows to the table
ohsh>
load hive table hive_moviedemo:movie_sessions_tab from oracle table \
moviedemo:movie_sessions_tab where "(cust_id = 1446522)";

# Verify the results again - you will have four more rows
%hive_moviedemo select count(*) from movie_sessions_tab;

Using the stage option

You will need database directories for staging the Oracle Data Pump files and for log files.  You can create them outside of OHSH, or from OHSH itself for convenience.   For example:

# Create directories for staging data pump files and log files

ohsh> %bash0 mkdir /home/oracle/src/samples/c2hadoop/dmpfiles
ohsh> %bash0 mkdir /home/oracle/src/samples/c2hadoop/logs

# Create database directories for these directories

# Temporary database directory for staging data pump files

ohsh> %sql create or replace directory C2H_DP_DIR as \
'/home/oracle/src/samples/c2hadoop/dmpfiles';

# Database directory for log files

ohsh> %sql create or replace directory C2H_LOG_DIR as \
'/home/oracle/src/samples/c2hadoop/logs';

# set OHSH resources for database directories

ohsh> set locationdirectory C2H_DP_DIR
ohsh> set defaultdirectory C2H_LOG_DIR

 # You can now again copy the table movie_sessions_tab in Oracle Database schema MOVIEDEMO to Hive schema
 # MOVIEDEMO, this time copying to the Hive table movie_sessions_tab_stage.  
 # Note the clause using stage at the end.

ohsh> create hive table hive_moviedemo:movie_sessions_tab_stage from oracle table \ moviedemo:movie_sessions_tab using stage; 

Be the first to comment

Comments ( 0 )
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.