X

Jeff Taylor's Weblog

  • Sun
    May 22, 2013

Debugging Hadoop using Solaris Studio in a Solaris 11 Zone



I've found Orgad Kimchi's How to Set Up a Hadoop Cluster Using Oracle Solaris Zones
to be very useful, however, for a development environment, it is too
complex. When map/reduce tasks are running in a clustered environment,
it is challenging to isolate bugs. Debugging is easier when working within a standalone Hadoop
installation. I've put the following instructions together for installation of a
standalone Hadoop configuration in a Solaris Zone with Solaris Studio for
application development.


A lovely feature of Solaris is that your global zone may host both a
Hadoop cluster set up in a manner similar to Orgad's instructions and
simultaneously host a zone for development that is running a Hadoop
standalone configuration.

Create the Zone


These instructions assume that Solaris 11.1 is already running in the Global Zone.


Add the Hadoop Studio Zone

# dladm create-vnic -l net0 hadoop_studio


# zonecfg -z 
hadoop-studio

Use 'create' to begin configuring a new zone.

zonecfg:
hadoop-studio> create

create: Using system default template 'SYSdefault'

zonecfg:
hadoop-studio> set zonepath=/ZONES/hadoop-studio

zonecfg:
hadoop-studio> add net

zonecfg:
hadoop-studio:net> set physical=hadoop_studio

zonecfg:
hadoop-studio:net> end

zonecfg:
hadoop-studio> verify

zonecfg:
hadoop-studio> commit

zonecfg:
hadoop-studio> exit



Install and boot the zone

# zoneadm -z hadoop-studio install
# zoneadm -z 
hadoop-studio boot


Login to the zone console to set the network, time, root password, and unprivileged user.

# zlogin -C hadoop-studio


After the zone's initial configuration steps, nothing else needs to be
done from within the global zone. You should be able to log into the
Hadoop Studio zone with ssh as the unprivileged user and gain privileges
with "su" and "sudo".


All of the remaining instructions are from inside the Hadoop Studio Zone.


Install extra Solaris software and set up the development environment

I like to start with both JDK's installed and not rely on the "/usr/java" symbolic link:

# pkg install  jdk-6
# pkg install --accept jdk-7



Verify the JDKs:

# /usr/jdk/instances/jdk1.6.0/bin/java -version
java version "1.6.0_35"
Java(TM) SE Runtime Environment (build 1.6.0_35-b10)
Java HotSpot(TM) Server VM (build 20.10-b01, mixed mode)

# /usr/jdk/instances/jdk1.7.0/bin/java -version
java version "1.7.0_07"
Java(TM) SE Runtime Environment (build 1.7.0_07-b10)
Java HotSpot(TM) Server VM (build 23.3-b01, mixed mode)



Add VNC Remote Desktop software

# pkg install --accept solaris-desktop



Create a Hadoop user:

# groupadd hadoop
# useradd -d localhost:/export/home/hadoop -m -g hadoop hadoop
# passwd hadoop

# usermod -R root hadoop



Edit /home/hadoop/.bashrc:

export PATH=/usr/bin:/usr/sbin

export PAGER="/usr/bin/less -ins"

typeset +x PS1="\u@\h:\w\\$ "


# Hadoop

export HADOOP_PREFIX=/home/hadoop/hadoop

export PATH=$HADOOP_PREFIX/bin:$PATH


# Java

export JAVA_HOME=/usr/jdk/instances/jdk1.6.0

export PATH=$JAVA_HOME/bin:$PATH


# Studio

export PATH=$PATH:/opt/solarisstudio12.3/bin
alias solstudio='solstudio --jdkhome /usr/jdk/instances/jdk1.6.0'

Edit /home/hadoop/.bash_profile:

. ~/.bashrc


And make sure that the ownership and permission make sense:

# ls -l /home/hadoop/.bash*      
-rw-r--r--   1 hadoop   hadoop        12 May 22 05:24 /home/hadoop/.bash_profile
-rw-r--r--   1 hadoop   hadoop       372 May 22 05:24 /home/hadoop/.bashrc



Now is a good time to a start remote VNC desktop for this zone:

# su - hadoop


$ vncserver


You will require a password to access your desktops.

Password:
Verify:
xauth:  file /home/hadoop/.Xauthority does not exist

New '
hadoop-studio:1 ()' desktop is hadoop-studio:1

Creating default startup script /home/hadoop/.vnc/xstartup
Starting applications specified in /home/hadoop/.vnc/xstartup
Log file is /home/hadoop/.vnc/
hadoop-studio:1.log


Access the remote desktop with your favorite VNC client


The default 10 minute time out on the VNC desktop is too fast for my preferences:

System -> Preferences -> Screensaver
  Display Modes:
  Blank after: 100
  Close the window (I always look for a "save" button, but no, just close the window without explicitly saving.)



Download and Install Hadoop


For this article, I used the "12 October, 2012 Release 1.0.4" release. Download the Hadoop tarball and copy it into the home directory of hadoop:

$ ls -l hadoop-1.0.4.tar.gz

-rw-r--r--   1 hadoop   hadoop   62793050 May 21 12:03 hadoop-1.0.4.tar.gz


Unpack the tarball into the home directory of the hadoop user:


$ gzip -dc hadoop-1.0.4.tar.gz  | tar -xvf -

$ mv hadoop-1.0.4 hadoop


Hadoop comes pre-configured in Standalone Mode


Edit /home/hadoop/hadoop/conf/hadoop-env.sh, and set JAVA_HOME:

export JAVA_HOME=/usr/jdk/instances/jdk1.6.0


That is all. Now, you can run a Hadoop example:

$ hadoop jar hadoop/hadoop-examples-1.0.4.jar pi 2 10
Number of Maps  = 2
Samples per Map = 10
Wrote input for Map #0
Wrote input for Map #1
Starting Job
...
Job Finished in 10.359 seconds
Estimated value of Pi is 3.80000000000000000000



Install Solaris Studio:

Visit https://pkg-register.oracle.com/ to obtain
Oracle_Solaris_Studio_Support.key.pem,
Oracle_Solaris_Studio_Support.certificate.pem and follow the
instructions for "pkg set-publisher" and "pkg update" or "pkg
install"

# sudo pkg set-publisher \
          -k /var/pkg/ssl/Oracle_Solaris_Studio_Support.key.pem \
          -c /var/pkg/ssl/Oracle_Solaris_Studio_Support.certificate.pem \
          -G '*' -g https://pkg.oracle.com/solarisstudio/support solarisstudio

# pkg install developer/solarisstudio-123/*



If your network requires a proxy, you will need set the proxy before starting Solaris Studio:

proxy.jpg



Start Solaris Studio:

$ solstudio


(Notice the alias in .bashrc that adds --jdkhome to the solstudio start up command.)


Go to "Tools -> Plugins.


Click on "Reload Catalog"


Load the Java SE plugins. I ran into a problem when the Maven plug in
was installed. Something that I should diagnose at a future date.

plugins.jpg




Create a New Project:


File -> New Project


Step 1:

- Catagory: Java

- Project: Java Application

- Next

NewProject.jpg


Step 2: Fill in similar to this:

NewJavaApplication.jpg


Copy the example source into the project:

$ cp -r \

    $HADOOP_PREFIX/src/examples/org/apache/hadoop/examples/* \


    ~/SolStudioProjects/examples/src/org/apache/hadoop/examples/



Starting to look like a development environment:

DevEnvironment.jpg



Modify the Project to compile with Hadoop jars. Right-click on the project and select "Properties"

properties.jpg



Add in the necessary Hadoop compile jars:

CompileJars.jpg


I found that I needed these jars at run time:

RunJars.jpg


Add Program Arguments (2 10):

Arguments.jpg


Now, if you click on the "Run" button. PiEstimators will run inside the IDE:

PiRuns.jpg


And the set up behaves as expected if you set a break point and click on "Debug":

debug.jpg

Be the first to comment

Comments ( 0 )
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.