Wednesday Apr 10, 2013

Accessing a Hive Table from Oracle Database

Hive is a popular tool to query data in HDFS.  Oracle SQL Connector for HDFS can read data directly from a Hive table in version 2.0. Also, the command-line tool of Oracle SQL Connector for HDFS has been simplified greatly. The connector only needs the Hive table and Hive database name, specified as Hadoop configuration properties, to access data from a Hive table.

oracle.hadoop.exttab.sourceType=hive
oracle.hadoop.exttab.hive.tableName
=<hive table name>
oracle.hadoop.exttab.hive.databaseName=<hive database name>

A command-line tool creates an external table in Oracle Database. This external table is queried to access data in the Hive table. Since an external table can be queried like any other table in the database, Oracle SQL functionality is available to query the Hive table.

The name of the external table and database connection information are also specified as Hadoop configuration properties.

hadoop jar $OSCH_HOME/jlib/orahdfs.jar \
oracle.hadoop.exttab.ExternalTable
-D oracle.hadoop.exttab.sourceType=hive \
-D oracle.hadoop.exttab.hive.tableName=<hive table name> \
-D oracle.hadoop.exttab.hive.databaseName=<hive database name> \
-D oracle.hadoop.connection.url=<database connection url> \
-D oracle.hadoop.exttab.tableName=<external table name> \
….
-createTable

While creating the external table the Hive table data types are automatically mapped to Oracle data types in the external table. Below are some sample mappings:

Hive Tinyint, Smallint, Int Oracle NUMBER
Hive Float, Double Oracle NUMBER (default precision)
Hive Decimal Oracle NUMBER
Hive Boolean Oracle VARCHAR2(5)
Hive String Oracle VARCHAR2(4000)
Hive Timestamp Oracle TIMESTAMP

Thursday Nov 08, 2012

Welcome!

Welcome to the Oracle Big Data Connectors blog, which will focus on posts related to integrating data on a Hadoop cluster with Oracle Database.

In particular the blog will focus on best practices, usage notes, and performance tips for using Oracle Loader for Hadoop and Oracle Direct Connector for HDFS, which are part of Oracle Big Data Connectors. Oracle Big Data Connectors 1.0 also includes Oracle R Connector for Hadoop and Oracle Data Integrator Application Adapters for Hadoop.

Oracle Loader for Hadoop: Oracle Loader for Hadoop loads data from Hadoop to Oracle Database. It runs as a MapReduce job on Hadoop to partition, sort, and convert the data into an Oracle-ready format, offloading to Hadoop the processing that is typically done using database CPUs. The data is thenloaded to the database by the Oracle Loader for Hadoop job (online load) or written out as Oracle Data Pump files for load and access later (offline load) with Oracle Direct Connector for HDFS.

Oracle Direct Connector for HDFS: Oracle Direct Connector for HDFS is a connector for high speed access of data on HDFS from Oracle Database. With this connector Oracle SQL can be used to directly query data on HDFS. The data can be Oracle Data Pump files generated by Oracle Loader for Hadoop or delimited text files. The connector can also be used to load data into the database using SQL.

About

Oracle Loader for Hadoop and Oracle SQL Connector for HDFS

Search

Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today