X

Jeff Taylor's Weblog

Cloudera Hive client on Solaris

Goal: From an Oracle Solaris server, access Hive data that is stored
in a Cloudera Hadoop cluster, for example, access Hive data in an
Oracle Big Data Appliance from a Solaris SPARC RDBMS server.

SC_BDA.png

This blog assumes that the Cloudera Hadoop cluster is already up and running. To test, I started with Oracle Big Data Lite Virtual Machine that was installed using the process documented at Big Data Lite with a static IP address on Oracle Linux 6 / EL6

start.png

Step 1: On the Solaris server, create an "oracle" user:


The user doesn't have to be "oracle", but that fits well with my planned usage.

# /usr/sbin/groupadd oinstall
# /usr/sbin/groupadd dba
# /usr/sbin/useradd -d localhost:/export/home/oracle -m -g oinstall -G dba oracle
# passwd oracle
# echo "oracle ALL=(ALL) ALL" > /etc/sudoers.d/oracle
# su - oracle



Set up a VNC server using the process at Installing VNC server on Solaris 11.

Step 2: Visit the Cloudera Manager to determine the version of Cloudera that is running:

  • Click on the "Support" pull down and the "About":

ClientConfig_1.png

  • In my case, I was using Cloudera Express 5.1.2
version2_1.png

Step 3: On the Oracle Solaris server, install the Hive and Hadoop tarballs that match your cluster
tarball_1_1.png

  • Choose your version. In my case, CDH 5.1.2.

tarball_2_1.png


tarball_3_1.png

  • Unpack the tarballs on your Solaris server:
$ tar xzf Downloads/hadoop-2.3.0-cdh5.1.2.tar.gz
$ tar xzf Downloads/hive-0.12.0-cdh5.1.2.tar.gz


  • And verify:
$ ls hadoop-2.3.0-cdh5.1.2

bin                  examples             libexec

bin-mapreduce1       examples-mapreduce1  sbin

cloudera             include              share

etc                  lib                  src

$ ls hive-0.12.0-cdh5.1.2/

bin                examples           LICENSE            RELEASE_NOTES.txt
conf               hcatalog           NOTICE             scripts
docs               lib                README.txt


Step 4: Download the HDFS configuration files to the Solaris server:
  • In the Cloudera Manager, go the the hdfs page:
HdfsClientConfig1_1.png


* From the hdfs page, download the client configuration. Place the client configuration onto the Solaris server

HdfsClientConfig2_1.png

  • Unpack the HDFS client configuration on your Solaris server:

$ unzip Downloads/hdfs-clientconfig.zip
Archive:  Downloads/hdfs-clientconfig.zip
  inflating: hadoop-conf/hdfs-site.xml 
  inflating: hadoop-conf/log4j.properties 
  inflating: hadoop-conf/topology.py 
  inflating: hadoop-conf/topology.map 
  inflating: hadoop-conf/hadoop-env.sh 
  inflating: hadoop-conf/core-site.xml 

Step 6: Configure the HDFS client software on the Solaris server

  • Edit hadoop-conf/hadoop-env.sh set JAVA_HOME correctly:

export JAVA_HOME=/usr/jdk/instances/jdk1.7.0

  • Move the configuration files into place:

$ cp hadoop-conf/* hadoop-2.3.0-cdh5.1.2/etc/hadoop/

  • Add the Hadoop bin directory to your PATH. You may want to do this in your local shell or .bashrc

$ export PATH=~/hadoop-2.3.0-cdh5.1.2/bin:$PATH

Step 7: Test the HDFS client software on Solaris

  • Verify that the remote hdfs filesystem is visible from your Solaris server

$ hdfs dfs -ls
15/04/02 14:40:54 WARN util.NativeCodeLoader: Unable to
load native-hadoop library for your platform... using builtin-java
classes where applicable

Found 6 items
drwx------   - oracle oracle          0 2014-08-25 01:55 .Trash
drwx------   - oracle oracle          0 2014-09-23 09:25 .staging
drwxr-xr-x   - oracle oracle          0 2014-01-12 15:15 moviedemo
drwxr-xr-x   - oracle oracle          0 2014-09-24 05:38 moviework
drwxr-xr-x   - oracle oracle          0 2014-09-08 11:50 oggdemo
drwxr-xr-x   - oracle oracle          0 2014-09-20 09:59 oozie-oozi


Step 8: Download the Hive configuration files to the Solaris server:

  • In the Cloudera Manager, go the the hive page:
go_to_hive_1.png


* From the hive page, download the client configuration. Place the client configuration on the Solaris server

ClientConfig_1.png

  • Unpack the hive client configuration on your Solaris server:

$ unzip Downloads/hive-clientconfig.zip
Archive:  Downloads/hive-clientconfig(1).zip
  inflating: hive-conf/hive-site.xml 
  inflating: hive-conf/hive-env.sh  
  inflating: hive-conf/log4j.properties 
  inflating: hive-conf/hadoop-env.sh 
  inflating: hive-conf/core-site.xml 
  inflating: hive-conf/mapred-site.xml 
  inflating: hive-conf/topology.py  
  inflating: hive-conf/yarn-site.xml 
  inflating: hive-conf/hdfs-site.xml 
  inflating: hive-conf/topology.map  


Step 9: Configure the Hive client software on the Solaris server

  • Move the configuration files into place:

$ cp hive-conf/* hive-0.12.0-cdh5.1.2/conf/

Step 10: YARN configuration option for Hadoop:

  • The HFDS configuration files don't include yarn-site.xml. Copy the YARN configuration file from the hive tree to the hadoop tree:

$ cp hive-0.12.0-cdh5.1.2/conf/yarn-site.xml hadoop-2.3.0-cdh5.1.2/etc/hadoop/

Step 11: Hide hive-env.sh:

$ mv hive-0.12.0-cdh5.1.2/conf/hive-env.sh hive-0.12.0-cdh5.1.2/conf/hive-env.sh.HIDDEN

  • Add the Hive bin directory to your PATH. You may want to do this in your local shell or .bashrc

$ export PATH=~/hive-0.12.0-cdh5.1.2/bin:$PATH

Step 12: Test the Hive client software on Solaris

  • Verify that the remote Hive tables are visible from your Solaris server

$ hive
15/04/03 12:20:37 INFO Configuration.deprecation:
mapred.input.dir.recursive is deprecated. Instead, use
mapreduce.input.fileinputformat.input.dir.recursive

15/04/03 12:20:37 INFO Configuration.deprecation:
mapred.max.split.size is deprecated. Instead, use
mapreduce.input.fileinputformat.split.maxsize

15/04/03 12:20:37 INFO Configuration.deprecation:
mapred.min.split.size is deprecated. Instead, use
mapreduce.input.fileinputformat.split.minsize

15/04/03 12:20:37 INFO Configuration.deprecation:
mapred.min.split.size.per.rack is deprecated. Instead, use
mapreduce.input.fileinputformat.split.minsize.per.rack

15/04/03 12:20:37 INFO Configuration.deprecation:
mapred.min.split.size.per.node is deprecated. Instead, use
mapreduce.input.fileinputformat.split.minsize.per.node

15/04/03 12:20:37 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces
15/04/03 12:20:37 INFO Configuration.deprecation:
mapred.reduce.tasks.speculative.execution is deprecated. Instead, use
mapreduce.reduce.speculative

15/04/03 12:20:37 WARN conf.HiveConf: DEPRECATED:
Configuration property hive.metastore.local no longer has any effect.
Make sure to provide a valid value for hive.metastore.uris if you are
connecting to a remote metastore.


Logging initialized using configuration in
jar:file:/home/oracle/hive-0.12.0-cdh5.1.2/lib/hive-common-0.12.0-cdh5.1.2.jar!/hive-log4j.properties

SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in
[jar:file:/home/oracle/hadoop-2.3.0-cdh5.1.2/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in
[jar:file:/home/oracle/hive-0.12.0-cdh5.1.2/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
hive> show tables;
OK
cust
movie
movie_rating
movieapp_log_avro
movieapp_log_json
movieapp_log_odistage
movieapp_log_stage
movielog
session_stats
Time taken: 1.799 seconds, Fetched: 9 row(s)
hive> 

Conclusion: Successfully accomplished task of configuring an Oracle Solaris server to access Hive data that is stored in a Cloudera Hadoop cluster.

Be the first to comment

Comments ( 0 )
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.