Hadoop on an Oracle SPARC T4-2 Server

I recently configured a Oracle SPARC T4-2 server to store and process a combination of 2 types of data:
  1. Critical and sensitive data. ACID transactions are required. Security is critical. This data needs to be stored in an Oracle Database.
  2. High-volume/low-risk data that needs to be processed using Apache Hadoop. This data is stored in HDFS.

Based on the requirements, I configured the server using a combination of:

  1. Oracle VM Server for SPARC, used for hard partitioning of system resources such as CPU, memory, PCIe buses and devices.
  2. Oracle Solaris Zones to host a Hadoop cluster as shown in Orgad Kimchi's How to Set Up a Hadoop Cluster Using Oracle Solaris Zones

The configuration is shown in the following diagram:

Oracle SPARC T4 Configuration

Hadoop Low CPU utilization:

As you can see in the diagram, a T4 CPU is allocated for Hadoop map/reduce processing. The T4 CPU has 8 cores and 64 virtual processors, enabling it to simultaneously run up to 64 software threads:

# psrinfo -pv
The physical processor has 8 cores and 64 virtual processors (0-63)
  The core has 8 virtual processors (0-7)
  The core has 8 virtual processors (8-15)
  The core has 8 virtual processors (16-23)
  The core has 8 virtual processors (24-31)
  The core has 8 virtual processors (32-39)
  The core has 8 virtual processors (40-47)
  The core has 8 virtual processors (48-55)
  The core has 8 virtual processors (56-63)
    SPARC-T4 (chipid 0, clock 2848 MHz)

To test the Hadoop configuration, I created a large Hive table and launched Hadoop map/reduce processes using Hive:

INSERT into table Table2
  SELECT ColumnA, SUM(ColumnB)
  FROM Table1
  GROUP BY ColumnA;

Out-of-the-box Hadoop performance was not well tuned. Simple Hive functions were not able to take advantage of the T4's CPU resources.

While the HIve job was running, I ran iostat in the Global Zone, I could see that:

  1. The CPU was not very busy
  2. The 3 data-node disks would spike, but were not stressed. 

# iostat -MmxPznc 60
...
     cpu
 us sy wt id
 12  1  0 88
                    extended device statistics             
    r/s    w/s   Mr/s   Mw/s wait actv wsvc_t asvc_t  %w  %b device
    2.2   19.1    0.0    0.1  0.0  0.0    0.0    1.8   0   1 (Primary LDOM root)
   23.4    2.9    3.0    0.0  0.0  0.0    0.0    1.3   0   2
(data-node1)
  105.5    5.5   11.7    0.0  0.0  0.4    0.0    3.2   0  10 (data-node2)
    0.0   23.7    0.0    0.3  0.0  0.0    0.0    1.9   0   1 
(Guest LDOM root)
   24.2    2.9    1.9    0.0  0.0  0.0    0.0    1.2   0   2
(data-node3)
    7.2   22.9    0.4    0.3  0.0  0.1    0.0    5.0   0   6 (/ZONES)

     cpu
 us sy wt id
 12  1  0 87
                    extended device statistics             
    r/s    w/s   Mr/s   Mw/s wait actv wsvc_t asvc_t  %w  %b device
    2.3   19.2    0.0    0.1  0.0  0.0    0.0    1.8   0   1
(Primary LDOM root)
    3.8    4.0    0.4    0.0  0.0  0.0    0.0    1.4   0   1
(data-node1)
   47.9    5.4    4.1    0.0  0.0  0.1    0.0    1.6   0   3 
(data-node2)
    0.0   25.6    0.0    0.3  0.0  0.0    0.0    1.5   0   1
(Guest LDOM root)
   38.2    3.9    3.2    0.0  0.0  0.1    0.0    1.4   0   3
(data-node3)
    9.5   21.9    0.6    0.3  0.0  0.1    0.0    4.4   0   6
(/ZONES)

     cpu
 us sy wt id
 11  1  0 88
                    extended device statistics             
    r/s    w/s   Mr/s   Mw/s wait actv wsvc_t asvc_t  %w  %b device
    5.3   18.6    0.1    0.1  0.0  0.1    0.0    4.4   0   4
(Primary LDOM root)
    0.5    3.6    0.0    0.0  0.0  0.0    0.0    1.1   0   0 
(data-node1)
    0.4    3.6    0.0    0.0  0.0  0.0    0.0    0.8   0   0
(data-node2)
    0.0   23.5    0.0    0.3  0.0  0.0    0.0    1.3   0   1
(Guest LDOM root)
  124.9    7.2   10.3    0.0  0.0  0.2    0.0    1.8   0  10 (data-node3)
    8.5   24.4    0.6    0.4  0.0  0.2    0.0    4.6   0   6
(/ZONES)

To understand the low CPU activity, I looked at active software threads on the Hadoop cluster from the Global Zone.  Six (6) threads were active, one thread per process from map/reduce processes.


$ prstat -mL
   PID USERNAME USR SYS TRP TFL DFL LCK SLP LAT VCX ICX SCL SIG PROCESS/LWPID 

 27529 hadoop    98 0.5 0.0 0.0 0.0 1.0 0.0 0.0  14  46  3K   0 java/2
 27534 hadoop    98 0.5 0.0 0.0 0.0 1.1 0.0 0.0  13  46  3K   0 java/2
 27575 hadoop    98 0.5 0.0 0.0 0.0 1.1 0.0 0.0  15  53  3K   0 java/2
 27577 hadoop    98 0.6 0.0 0.0 0.0 1.2 0.0 0.0  14  53  3K   0 java/2
 27576 hadoop    98 0.6 0.0 0.0 0.0 1.1 0.8 0.0  46  57  3K   0 java/2
 27578 hadoop    97 0.6 0.0 0.0 0.0 1.1 1.0 0.0  53  53  4K   0 java/2
 27575 hadoop   5.8 0.0 0.0 0.0 0.0  94 0.0 0.0  19   4  26   0 java/32
 27578 hadoop   5.6 0.0 0.0 0.0 0.0  94 0.0 0.0  19   5  35   0 java/33
 27529 hadoop   5.6 0.0 0.0 0.0 0.0  94 0.0 0.0   2   8   2   0 java/32
 27576 hadoop   5.5 0.0 0.0 0.0 0.0  95 0.0 0.0  21   6  36   0 java/33
 27028 hadoop   1.2 1.3 0.0 0.0 0.0 0.0  97 0.1 254   5  2K   0 java/87
 27028 hadoop   1.2 1.2 0.0 0.0 0.0 0.0  97 0.1 251   2  2K   0 java/86
   958 root     1.9 0.1 0.0 0.0 0.0  98 0.4 0.0   9   4  27   0 fmd/36
 27005 hadoop   1.2 0.8 0.0 0.0 0.0 0.0  98 0.0  99   2  2K   0 java/86
 27005 hadoop   1.1 0.8 0.0 0.0 0.0 0.0  98 0.0  98   3  2K   0 java/87
 11956 root     1.8 0.1 0.0 0.0 0.0 0.0  98 0.0  44   3 882   0 Xvnc/1
 27016 hadoop   1.0 0.8 0.0 0.0 0.0 0.0  98 0.0  95   2  2K   0 java/86
 27577 hadoop   1.7 0.0 0.0 0.0 0.0  98 0.0 0.0  18   1  28   0 java/32
 27016 hadoop   1.0 0.7 0.0 0.0 0.0 0.0  98 0.0  93   2  2K   0 java/85
 27576 hadoop   1.7 0.0 0.0 0.0 0.0  98 0.0 0.0  20   1  33   0 java/32
 27577 hadoop   1.6 0.0 0.0 0.0 0.0  98 0.0 0.0  18   0  31   0 java/33
Total: 619 processes, 4548 lwps, load averages: 3.46, 8.63, 27.90


From the Hadoop JobTracker UI, it could be seen that Hadoop was only scheduling a few of the tasks.

SixMapProcesses.jpg

This was because there were only 6 Map and 6 Reduce job slots. (See the "Map Task Capacity" and "Reduce Task Capacity"columns): 

SixJobSlots.jpg

Increasing the number of Job Slots:

The T4 CPU is able to run 64 software threads simultaneously, and over-subscribing the Hadoop CPU's is recommended. I enabled 25 Job Slots per node, for a total of 75, by adding these properties to mapred-site.xml:


  <property>
    <name>mapred.tasktracker.map.tasks.maximum</name>
    <value>25</value>
  </property>

  <property>
    <name>mapred.tasktracker.reduce.tasks.maximum</name>
    <value>25</value>
  </property>

Improved Hadoop CPU utilization:

After add Job slots, while running iostat in the Global Zone, I could see that:

  1. The CPU was very busy
  2. The 3 data-node disk were active (but not stressed)

# iostat -MmxPznc 30
...
 us sy wt id

 98  2  0  0
                    extended device statistics             
    r/s    w/s   Mr/s   Mw/s wait actv wsvc_t asvc_t  %w  %b device
    1.2   20.6    0.0    0.1  0.0  0.0    0.0    1.5   0   1 c0t5000CCA025261B74d0s0
    0.9   12.9    0.0    0.1  0.0  0.0    0.0    3.2   0   2 c0t5000CCA025311930d0s0
    1.0    9.7    0.0    0.0  0.0  0.0    0.0    1.2   0   1 c0t5000CCA02530B058d0s0
    0.0   22.4    0.0    0.2  0.0  0.0    0.0    1.0   0   1 c0t5000CCA0250CB198d0s0
    1.3   15.7    0.0    0.1  0.0  0.0    0.0    1.1   0   1 c0t5000CCA025324D98d0s0
    2.8   28.3    0.1    0.4  0.0  0.1    0.0    3.5   0   4 c0t5000CCA0253C11B0d0s0
     cpu
 us sy wt id
 98  2  0  0
                    extended device statistics             
    r/s    w/s   Mr/s   Mw/s wait actv wsvc_t asvc_t  %w  %b device
    1.3   21.1    0.0    0.1  0.0  0.1    0.0    3.0   0   3 c0t5000CCA025261B74d0s0
    0.6    7.2    0.0    0.0  0.0  0.0    0.0    1.3   0   0 c0t5000CCA025311930d0s0
    0.8    5.7    0.0    0.0  0.0  0.0    0.0    1.5   0   0 c0t5000CCA02530B058d0s0
    0.0   22.6    0.0    0.3  0.0  0.0    0.0    1.1   0   1 c0t5000CCA0250CB198d0s0
    0.5    7.7    0.0    0.1  0.0  0.0    0.0    1.1   0   0 c0t5000CCA025324D98d0s0
    2.2   24.7    0.1    0.3  0.0  0.1    0.0    2.4   0   2 c0t5000CCA0253C11B0d0s0
     cpu
 us sy wt id
 98  2  0  0
                    extended device statistics             
    r/s    w/s   Mr/s   Mw/s wait actv wsvc_t asvc_t  %w  %b device
    0.9   20.6    0.0    0.1  0.0  0.1    0.0    2.6   0   2 c0t5000CCA025261B74d0s0
    0.5    7.1    0.0    0.0  0.0  0.0    0.0    0.9   0   0 c0t5000CCA025311930d0s0
    4.8   10.3    0.4    0.0  0.0  0.0    0.0    1.2   0   1 c0t5000CCA02530B058d0s0
    0.0   21.7    0.0    0.2  0.0  0.0    0.0    1.1   0   1 c0t5000CCA0250CB198d0s0
    2.4   15.1    0.2    0.1  0.0  0.0    0.0    1.2   0   1 c0t5000CCA025324D98d0s0
    4.8   25.3    0.5    0.4  0.0  0.2    0.0    5.4   0   5 c0t5000CCA0253C11B0d0s0

Now all 69 maps jobs can run in parallel:

69JobsRunning.jpg

Because there are 75 map job slots: (See the "Map Task Capacity" and "Reduce Task Capacity"columns)

75slots.jpg

Hadoop Memory Usage:

For this simple case, I found that allocating additional memory to the Java map/reduce processes did not help, but for reference, this is how I increased the memory available to the Hive map/reduce processes:

$ hive --hiveconf mapred.map.child.java.opts="-Xmx1000m -Xms500m"




Comments:

Came across you blog when I was searching about Solaris. This is unrelated to the current blog entry.

Hope you don't mind.
I am looking at a Sparc server installation and I would like to get the core-wise utilization.

What is the Solaris report( like http://www.percona.com/doc/percona-toolkit/2.1/pt-summary.html#system-requirements) I could gather ?

What are the Solaris forums or resources that I should search ?

Posted by Mohan Radhakrishnan on October 21, 2013 at 04:28 AM EDT #

Hi Mohan,

Darryl Gove's books and blogs are a great place to start.
You will want to understand the difference between HW utilization vs. SW utilization.
Both HW and SW are reported by pgstat. See https://blogs.oracle.com/d/entry/pginfo_pgstat

Jeff

Posted by guest on October 21, 2013 at 10:52 AM EDT #

Post a Comment:
  • HTML Syntax: NOT allowed
About

user12620111

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today