X

Jeff Taylor's Weblog

Oracle SQL Connector for HDFS on SPARC

This blog describes the steps necessary to configure Oracle
SQL Connector for Hadoop Distributed File System
(OSCH) to
enable Oracle Database running on Oracle Solaris to access and
analyze data residing in Cloudera HDFS running on an Oracle
Big Data Appliance
. The steps have been verified against a
Cloudera Distribution including Apache Hadoop running in the Oracle
Big Data Lite Virtual Machine
. My configuration adventures
regarding the Oracle Big Data Lite Virtual Machine and VirtualBox
are shown in this
previous blog entry
.


Although similar steps are expected to work on many similar
hardware and software configurations, it is worth noting that this
document has been tested using the following two configurations:

Tested
Configuration 1

RDBMS
Connector
Hadoop
  • Oracle SPARC T4-2 Server
  • Oracle Solaris 11.2
  • Oracle Database 12c Enterprise Edition Release
    12.1.0.2.0
  • Oracle SQL Connector for Hadoop Distributed File
    System v3.1.0
  • Oracle Big Data Appliance
  • Oracle Enterprise Linux 6.4
  • Cloudera Manager (5.1.0)
  • Cloudera Distribution including Apache Hadoop
    (CDH5.1.0)



Tested
Configuration 2
RDBMS
Connector
Hadoop
  • SPARC SuperCluster T4-4
  • Oracle Solaris 11.1
  • Oracle Database 12c Enterprise Edition Release
    12.1.0.2.0
  • Oracle SQL Connector for Hadoop Distributed File
    System v3.1.0
  • Oracle Big Data Lite Virtual Machine 4.0.1
  • Oracle Enterprise Linux 6.4
  • Cloudera Manager (5.1.2)
  • Cloudera Distribution including Apache Hadoop
    (CDH5.1.2)


Part A: Preparing the initial environment

Step 1: Oracle Database 12c deployment


* Follow the typical database deployment guides:

Oracle
Database Online Documentation 12c Release 1 (12.1)


-> Installing
and Upgrading


--> Under, "Solaris Installation Guides", follow the guide that suites your needs:

---> Grid Infrastructure Installation Guide for Oracle Solaris

---> Database Installation Guide for Oracle Solaris

---> Database
Quick Installation Guide for Oracle Solaris on SPARC (64-Bit)




Step 2:
Oracle Big Data deployment

* For the Big Data Appliance, follow the instructions in the Oracle Big Data Documentation.


* For the Oracle Big
Data Lite Virtual Machine, follow the Quick
Deployment
Step-by-step instructions. Also, see my configuration adventures
recounted in this
previous blog entry
.


Part B: Enable the Cloudera HDFS client on Solaris


* It is assumed that both Oracle database and Hadoop cluster are up
and running independently before you attempt to install the
connector.

Step 1: Make sure that Java is installed on Solaris


$ sudo pkg install --accept jdk-7


$ /usr/java/bin/java -version

java version "1.7.0_65"

Java(TM) SE Runtime Environment (build 1.7.0_65-b17)

Java HotSpot(TM) Server VM (build 24.65-b04, mixed mode)



Step 2: Determine the Version of Cloudera that is running on your Big Data Appliance:

* Starting with the BDA up and running:

initial_state.png

* Click on the "Support" pull down and the "About":

about_1.png


* In my case, I was using Cloudera Express 5.1.2

about.png




Step 3:
On
the Oracle Database servers, i
nstall the version of Hadoop that matches your cluster


* Visit Cloudera
Downloads
, click on "CDH Download Now", and choose your
version. In my case, CDH 5.1.2.

file:///Users/user/Desktop/Connector/download_1.png


* This took me to the Packaging
and Tarballs
page for all of the CDH components. Download
the Apache
Hadoop Tarball


download_2.png


Place this tarball onto your Solaris server, or in the case of
Oracle RAC, copy the tarball to every node.There are many ways to get
the tarball onto your Solaris server. In my case, the most convenient
method was to use wget:


$ export http_proxy=http://the-proxy-server.bigcorp.com:80

$ wget http://archive.cloudera.com/cdh5/cdh/5/hadoop-2.3.0-cdh5.1.2.tar.gz

$ scp hadoop-2.3.0-cdh5.1.2.tar.gz oracle@my-rac-node-2:



Unpack the tarball on your Solaris server, or on every RAC node:

$ tar xzf ../Downloads/hadoop-2.3.0-cdh5.1.2.tar.gz

And verify:


$ ls hadoop-2.3.0-cdh5.1.2

bin                  examples             libexec

bin-mapreduce1       examples-mapreduce1  sbin

cloudera             include              share

etc                  lib                  src


Step 4: Download the hdfs configuration files to the
Solaris RDBMS server



* In the Cloudera Manager, go the the hdfs page:

client_1.png

* From the hdfs page, download the client configuration. Place the client configuration on the Solaris server(s)

client_2.png



* Unpack the client configuration on your Solaris server(s):

$ unzip
./Downloads/hdfs-clientconfig.zip


$ ls hadoop-conf
core-site.xml     hdfs-site.xml     topology.map
hadoop-env.sh     log4j.properties  topology.py



Step 5: Configure the Hadoop client software on the Solaris server(s)


* Edit hadoop-conf/hadoop-env.sh set JAVA_HOME correctly:

export JAVA_HOME=/usr/jdk/instances/jdk1.7.0



* Move the configuration files into place:


$ cp hadoop-conf/* hadoop-2.3.0-cdh5.1.2/etc/hadoop/



* Add the Hadoop bin directory to your PATH. You may want to do this in your local shell or .bashrc

$ export PATH=~/hadoop-2.3.0-cdh5.1.2/bin:$PATH


Step 6: Test the Hadoop client software on Solaris


* Verify that the remote hdfs filesystem is visible from your Solaris server(s)

$ hdfs dfs -ls

14/12/12 09:35:23 WARN util.NativeCodeLoader: Unable to load
native-hadoop library for your platform... using builtin-java classes
where applicable

Found 6 items

drwx------   - oracle oracle          0 2014-08-25 02:55 .Trash

drwx------   - oracle oracle          0 2014-09-23 10:25 .staging

drwxr-xr-x   - oracle oracle          0 2014-01-12 15:15 moviedemo

drwxr-xr-x   - oracle oracle          0 2014-09-24 06:38 moviework

drwxr-xr-x   - oracle oracle          0 2014-09-08 12:50 oggdemo

drwxr-xr-x   - oracle oracle          0 2014-09-20 10:59 oozie-oozi



Part C: Install "Oracle SQL Connector for HDFS" (OSCH) on your Solaris server(s)

Step 1: Download OSCH



* Download Oracle SQL Connector for Hadoop Distributed File System
Release 3.1.0 from the Oracle
Big Data Connectors Downloads
page.


* Unzip OSCH

$ unzip oraosch-3.1.0.zip

Archive:  oraosch-3.1.0.zip

 extracting: orahdfs-3.1.0.zip      

  inflating: README.txt             


$ unzip orahdfs-3.1.0

Archive:  orahdfs-3.1.0.zip

replace orahdfs-3.1.0/doc/README.txt? [y]es, [n]o, [A]ll, [N]one, [r]ename: A

  inflating: orahdfs-3.1.0/doc/README.txt 

  inflating: orahdfs-3.1.0/jlib/ojdbc6.jar 

  inflating: orahdfs-3.1.0/jlib/osdt_cert.jar 

  inflating: orahdfs-3.1.0/jlib/orahdfs.jar 

  inflating: orahdfs-3.1.0/jlib/oraclepki.jar 

  inflating: orahdfs-3.1.0/jlib/osdt_core.jar 

  inflating: orahdfs-3.1.0/jlib/ora-hadoop-common.jar 

  inflating: orahdfs-3.1.0/bin/hdfs_stream 

  inflating: orahdfs-3.1.0/examples/sql/mkhive_unionall_view.sql  


Step 2: Install OSCH


* Follow the instructions in the Connectors
User's Guide
which is available as part of the Oracle
Big Data Documentation
. Also see
Getting
Started with Oracle Big Data Connectors
.


* I'm tempted to cut & paste everything from "1.4.3 Installing
Oracle SQL Connector for HDFS" into this document, but I won't.


* Your mileage may vary, but for me, it looked like this:

$ tail -1 /var/opt/oracle/oratab

dbmc1:/u01/app/oracle/product/12.1.0.2/dbhome_1:N               # line added by Agent


$ export PATH=/usr/local/bin:$PATH


$ export ORACLE_SID=dbmc1


$ . /usr/local/bin/oraenv

ORACLE_SID = [dbmc1] ?

The Oracle base has been set to /u01/app/oracle


$ env | grep ORA

ORACLE_SID=dbmc1

ORACLE_BASE=/u01/app/oracle

ORACLE_HOME=/u01/app/oracle/product/12.1.0.2/dbhome_1


$ srvctl status database -d dbmc1

Instance dbmc11 is running on node etc20dbadm01

Instance dbmc12 is running on node etc20dbadm02


$ export OSCH_HOME=/export/home/oracle/orahdfs-3.1.0


$ export HADOOP_CLASSPATH=$OSCH_HOME/jlib/*


$ sqlplus / as sysdba

SQL> CREATE USER hdfsuser IDENTIFIED BY n0ne0fyourBusiness
   DEFAULT TABLESPACE users
   QUOTA UNLIMITED ON users;

SQL> GRANT CREATE SESSION, CREATE TABLE, CREATE VIEW TO hdfsuser;

SQL> GRANT EXECUTE ON sys.utl_file TO hdfsuser;

SQL> CREATE OR REPLACE DIRECTORY osch_bin_path AS '/export/home/oracle/orahdfs-3.1.0/bin'
SQL> GRANT READ, EXECUTE ON DIRECTORY osch_bin_path TO hdfsuser;


Notice that MOVIEDEMO_DIR needs to be on shared storage, visible to both
RAC nodes. From a Solaris shell prompt, create the MOVIEDEMO_DIR,
substituting the ZFS_SA InfiniBand hostname for XXX, below.  Then allow
the database user to assess the directory:

$ mkdir /net/XXX/export/test/hdfsuser

$ sqlplus / as sysdba
SQL>
CREATE OR REPLACE DIRECTORY MOVIEDEMO_DIR AS '/net/XXX/export/test/hdfsuser';
SQL> GRANT READ, WRITE ON DIRECTORY MOVIEDEMO_DIR TO hdfsuser;

Step 3: Test using the Movie Demo


Test using the movie demo which is documented in the Big
Data Connectors User's Guide
. Cut and paste moviefact_hdfs.sh and moviefact_hdfs.xml from Example 2-1 Accessing HDFS Data Files from Oracle Database


In moviefact_hdfs.sh, for my configuration I need to change the path to OSCH_HOME and moviefact_hdfs.xml.


In moviefact_hdfs.xml, I needed to change two properties, as follows. For the database connection, use the Oracle Single Client Access Name (SCAN).

    <property>

      <name>oracle.hadoop.connection.url</name>

      <value>jdbc:oracle:thin:@sc-scan:1521/dbmc1</value>

    </property>


    <property>

      <name>oracle.hadoop.connection.user</name>

      <value>hdfsuser</value>

    </property>



Run the script:


$ sh moviefact_hdfs.sh

Oracle SQL Connector for HDFS Release 3.1.0 - Production


Copyright (c) 2011, 2014, Oracle and/or its affiliates. All rights reserved.


[Enter Database Password:]

14/12/12 12:36:00 WARN util.NativeCodeLoader: Unable to load
native-hadoop library for your platform... using builtin-java classes
where applicable

The create table command succeeded.


CREATE TABLE "HDFSUSER"."MOVIE_FACTS_EXT"

(

 "CUST_ID"                        VARCHAR2(4000),

 "MOVIE_ID"                       VARCHAR2(4000),

 "GENRE_ID"                       VARCHAR2(4000),

 "TIME_ID"                        TIMESTAMP(9),

 "RECOMMENDED"                    NUMBER,

 "ACTIVITY_ID"                    NUMBER,

 "RATING"                         NUMBER,

 "SALES"                          NUMBER

)

ORGANIZATION EXTERNAL

(

   TYPE ORACLE_LOADER

   DEFAULT DIRECTORY "MOVIEDEMO_DIR"

   ACCESS PARAMETERS

   (

     RECORDS DELIMITED BY 0X'0A'

     CHARACTERSET AL32UTF8

     PREPROCESSOR "OSCH_BIN_PATH":'hdfs_stream'

     FIELDS TERMINATED BY 0X'09'

     MISSING FIELD VALUES ARE NULL

     (

       "CUST_ID" CHAR(4000),

       "MOVIE_ID" CHAR(4000),

       "GENRE_ID" CHAR(4000),

       "TIME_ID" CHAR DATE_FORMAT TIMESTAMP MASK 'YYYY-MM-DD:HH:MI:SS',

       "RECOMMENDED" CHAR,

       "ACTIVITY_ID" CHAR,

       "RATING" CHAR,

       "SALES" CHAR

     )

   )

   LOCATION

   (

     'osch-20141212123602-2522-1',

     'osch-20141212123602-2522-2',

     'osch-20141212123602-2522-3',

     'osch-20141212123602-2522-4'

   )

) PARALLEL REJECT LIMIT UNLIMITED;


The following location files were created.


osch-20141212123602-2522-1 contains 1 URI, 12754882 bytes


    12754882 hdfs://ofsaa-bdl.us.oracle.com:8020/user/oracle/moviework/data/part-00001


osch-20141212123602-2522-2 contains 1 URI, 438 bytes


         438 hdfs://ofsaa-bdl.us.oracle.com:8020/user/oracle/moviework/data/part-00002


osch-20141212123602-2522-3 contains 1 URI, 432 bytes


         432 hdfs://ofsaa-bdl.us.oracle.com:8020/user/oracle/moviework/data/part-00003


osch-20141212123602-2522-4 contains 1 URI, 202 bytes


         202 hdfs://ofsaa-bdl.us.oracle.com:8020/user/oracle/moviework/data/part-00004




Final verification with SQL Developer:

/test.png

Be the first to comment

Comments ( 0 )
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.