Wednesday Feb 26, 2014

Quadratic data in Oracle R Enterprise and Oracle Data Mining

I was working with some data which was stored in an Oracle database on a SPARC T4 server. I thought that the data had a quadratic component and I wanted to analyze the data using SQL Developer and Oracle Data Mining, a component of the Oracle Advanced Analytics Option. When I reviewed the initial analysis, I wasn't getting results that I had expected, and the fit of the model wasn't very good. I decided to feed some simple, synthetic quad data into Oracle Data Miner to ensure that I was using the tool properly.

Oracle R Enterprise was used as the tool to create and view the synthetic data.

From an R session that has the Oracle R Enterprise package installed, it is easy to access an Oracle Database:

require(ORE)
## Loading required package: ORE
## Loading required package: OREbase
## 
## Attaching package: 'OREbase'
## 
## The following object(s) are masked from 'package:base':
## 
##     cbind, data.frame, eval, interaction, order, paste, pmax,
##     pmin, rbind, table
## 
## Loading required package: OREstats
## Loading required package: MASS
## Loading required package: OREgraphics
## Loading required package: OREeda
## Loading required package: OREdm
## Loading required package: lattice
## Loading required package: OREpredict
## Loading required package: ORExml
ore.connect("SCOTT", "orcl", "sparc-T4", "TIGER", 1521)
## Loading required package: ROracle
## Loading required package: DBI

The following R function, quad1(), is used to calculate "y=ax^2 + bx + c",

where:
 - the data frame that is passed in has a column of x values.
 - a is in coefficients[feature, 1]
 - b is in coefficients[feature, 2]
 - c is in coefficients[feature, 3]

The function will simply calculate points along a parabolic line and is more complicated than it needs to be. I will leave it in this complicated format so that I can extend it to work with more interesting functions, such as a parabolic surface, later.  

quad1 <- function(df, coefficients) {
    feature <- 1

    coefficients[feature, 1] * df[, feature] * df[, feature] +
      coefficients[feature, 2] * df[, feature] +
      coefficients[feature, 3]
}

The following function, genData(), creates random "x" data points and uses func() to calculate the y values that correspond to the random x values.

genData <- function(nObservations, func, coefficients, nFeatures, scale) {
    dframe <- data.frame(x1 = rep(1, nObservations))
    
    for (feature in seq(nFeatures)) {
        name <- paste("x", feature, sep = "")
        dframe[name] <- runif(nObservations, -scale[feature], scale[feature])
    }
    dframe["y"] <- func(dframe, coefficients)
    return(dframe)
}

The following function, quadGraph(), is used for graphing. The points in dframe are displayed in a scatter plot. The coefficients for the known synthetic data is passed in and the corresponding line is sketched in blue. (Obviously, if you aren't working with synthetic data, it is unlikely that you will know the "true" coefficients.) The R model that is the best estimate of the data based on regression is passed in and sketched in blue.

quadGraph <- function(dframe, coefficients = NULL, model = NULL, ...) {
    with(dframe, plot(x1, y))
    title(main = "Quadratic Fit")
    legend("topright", inset = 0.05, c("True", "Model"), lwd = c(2.5, 2.5), 
        col = c("blue", "red"))
    xRange <- range(dframe[, "x1"])
    smoothX <- seq(xRange[1], xRange[2], length.out = 50)
    trueY <- quad1(data.frame(smoothX), coefficients)
    lines(smoothX, trueY, col = "blue")
    
    new = data.frame(x1 = smoothX)
    y_estimated <- predict(model, new)
    lines(smoothX, y_estimated, col = "red")
}

Here are the settings that will be used.

nFeatures <- 1  # one feature can sketch a line, 2 a surface, ...
nObservations <- 20  # How many rows of data to create for modeling
degree <- 2  # 2 is quadratic, 3 is cubic, etc
set.seed(2)  # I'll get the same coefficients every time I run 
coefficients <- matrix(rnorm(nFeatures * (degree + 1)), nFeatures, degree + 1)
scale <- (10^rpois(nFeatures, 2)) * rnorm(nFeatures, 3)

Here, synthetic data is created that matches the quadratic function and the random coefficients.

modelData <- genData(nObservations, quad1, coefficients, nFeatures, scale)

We can make this exercise at least slightly more realistic by adding some irreducible error for the regression algorithm to deal with. Add noise.

yRange <- range(modelData[, "y"])
yJitter <- (yRange[2] - yRange[1])/10
modelData["y"] <- modelData["y"] + rnorm(nObservations, 0, yJitter)

Great. At this point I have good quadratic synthetic data which can be analyzed. Feed the synthetic data to the Oracle Database.

oreDF <- ore.push(modelData)
tableName <- paste("QuadraticSample_", nObservations, "_", nFeatures, sep = "")
ore.drop(table = tableName)
ore.create(oreDF, table = tableName)

The Oracle R Enterprise function to fit the linear model works as expected.

m = ore.lm(y ~ x1 + I(x1 * x1), dat = oreDF)
summary(m)
## 
## Call:
## ore.lm(formula = y ~ x1 + I(x1 * x1), data = oreDF)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -2.149 -0.911 -0.156  0.888  1.894 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   1.3264     0.4308    3.08   0.0068 ** 
## x1           -0.0640     0.1354   -0.47   0.6428    
## I(x1 * x1)   -0.8392     0.0662  -12.68  4.3e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## Residual standard error: 1.28 on 17 degrees of freedom
## Multiple R-squared: 0.912,	Adjusted R-squared: 0.901 
## F-statistic: 87.7 on 2 and 17 DF,  p-value: 1.1e-09
coefficients
##         [,1]   [,2]  [,3]
## [1,] -0.8969 0.1848 1.588

Notice that the "true" coefficients, that were used to create the synthetic data are close to the values from the regression. For example, the true "a" is stored in coefficients[1,1] = -0.8969 and is close to the model's I(x1 * x1) = -0.8392. Not bad given that the model was created from only 20 data points.

quadGraph(modelData, coefficients, m)

The 20 data points, which were calculated from the "true" equation, but with noisy irreducible error added, are shown in the graph. The model, estimated by ore.lm() from the 20 noisy data points, is close to true.

plot of chunk unnamed-chunk-10

At this point, my job is either complete, or ready to start, depending on your perspective. I'm happy that ore.lm() does a nice job of fitting, so maybe I'm done. But if you remember that my initial goal was to validate that SQL Developer and Oracle Data Miner work with quadratic data, my job has just begun. Now that I have known good quadratic synthetic data in the database, I'll use SQL Developer and the Oracle Data Mining to verify that everything is fine.

One more step in R. Create a second Oracle Database table that will be used to test the regression model. 

testData <- genData(nObservations, quad1, coefficients, nFeatures, scale)
oreTestData <- ore.push(testData)
tableName <- paste("QuadraticTest_", nObservations, "_", nFeatures, sep = "")
ore.drop(table = tableName)
ore.create(oreTestData, table = tableName)  

Here is the SQL Developer workflow that will be used. The synthetic data is in the Oracle Database table "QuadraticSample_20_1". The "Regress Build" node will run linear regression on the synthetic data. The test data which was generated using R in the previous paragraph, is stored in a Oracle Database table named "QuadraticTest_20_1". The Apply node will use the regression model that has been created and use the "x1" values from the test data, storing the y values in an Oracle Database table named "QUADTESTRESULTS". 

SQL Developer Data Mining Work Flow

So how did it work? A PhD in statistics would quickly tell you, "not well", and might look at you like you're an idiot if you don't know that a Model F Value Statistic of 3.25 isn't good. My more pedestrian approach is to plot the results of applying the model to the test data. 

The predictive confidence of the model that was created is poor:

Predictive Confidence

Pull the test result data into R for viewing:

ore.sync()
ore.attach()
testResults <- ore.pull(QUADTESTRESULTS)
## Warning: ORE object has no unique key - using random order
colnames(testResults)[1] <- "y"
with(testResults, plot(x1, y))
title(main = "Results of Applying Model to Test Data")

 Hmm, that doesn't look parabolic to me:

Linear fit, not quadratic

Now that I'm quite sure that SQL Developer and Oracle Data Mining isn't giving an expected fit, check through the advanced settings: 

Advanced Setting

There it is!!

Feature Selection

 Set the feature generation to use quadratic candidates, re-run the model.

The predictive confidence has improved. 

Bring the new results back into R:

Data viewed from R looks parabolic

Also, your statistician friends will be happy because the new model has a Model F Value Statistic of 124. Exciting, right? 

Now, off to work on parabolic surfaces...

Monday Feb 10, 2014

Finding datasets for South African Heart Disease

I've been working with R recently and have been looking for interesting data sets, for example:

http://statweb.stanford.edu/~tibs/ElemStatLearn/data.html

The following command is documented but fails because R can't track through 2 302 status codes and a "301 Moved Permanently":

> read.table("http://www-stat.stanford.edu/~tibs/ElemStatLearn/datasets/SAheart.data", sep=",",head=T,row.names=1)
Error in file(file, "rt") : cannot open the connection


This worked:

> read.table("http://statweb.stanford.edu/~tibs/ElemStatLearn/datasets/SAheart.data", sep=",",head=T,row.names=1)

Even better:

http://cran.r-project.org/web/packages/ElemStatLearn/ElemStatLearn.pdf

> install.packages("ElemStatLearn")
> help(package="ElemStatLearn")
> library( package = ElemStatLearn )
> str(SAheart)
> summary(SAheart)
> data(SAheart)

R attributes and regexpr

I've been working with R recently.

Here is an example of using the match.length attribute that is returned from regexpr:

Three strings in a vector, first and third include an embedded string:

data=c("<a href=\"ch4.html\">Chapter 1</a>",
       "no quoted string is embedded in this string",
       "<a   href=\"appendix.html\">Appendix</a>")

Use regexpr to locate the embedded strings:

> locations <- regexpr("\"(.*?)\"", data)

Matches are in the first string (at 9 with length 10) and third string (at 11 with length 15):

> locations
[1]  9 -1 11
attr(,"match.length")
[1] 10 -1 15
attr(,"useBytes")
[1] TRUE

Vector from the attribute:

> attr(locations,"match.length")
[1] 10 -1 15

Use substr and the attribute vector to extract the strings:

> quoted_strings=substr( data, 
                         locations, 
                         locations+attr(locations,"match.length")-1 )    
> quoted_strings
[1] "\"ch4.html\""      ""                  "\"appendix.html\""

Maybe you'd like to remove the embedded quote characters from your strings:

> gsub("\"", "", quoted_strings)
[1] "ch4.html"      ""              "appendix.html"

An alternative is to use regmatches:

> regmatches(data,locations)
[1] "\"ch4.html\""      "\"appendix.html\""

Friday Jan 24, 2014

Gathering Database Statistics in a Test Harness

Use of the Automatic Workload Repository (AWR) is fundamental to understanding Oracle Database performance. By default, Oracle Database automatically generates snapshots of the performance data once every hour and retains the statistics in the workload repository for 8 days. You can also manually create snapshots, but this is usually not necessary.

When benchmarking, it is helpful to be able to create an AWR report that covers the exact interval of benchmark execution. Since my benchmarks don't begin or end at the top of the hour, I take an AWR snapshot immediately before and after benchmark execution. At benchmark completion, an AWR report is generated that exactly covers the benchmark execution interval.

Here is a simple shell script "snap.sh" that takes a new snapshot and returns the ID of the snapshot. I call this script before and after benchmark execution.

#!/bin/bash

result=$(sqlplus -S sys/SysPass as sysdba << EOF

set head off
set term off
set echo off
set trim on
set trimspool on
set feedback off
set pagesize 0
exec sys.dbms_workload_repository.create_snapshot;
select max(snap_id) from dba_hist_snapshot;
EOF
)

echo $result


Here is a script "gen_awr.sh". I call this script after benchmark execution with my preferred name for the report, and the before and after snap IDs.

sqlplus / as sysdba << @EOF
@$ORACLE_HOME/rdbms/admin/awrrpt
html
1
$2
$3
$1
exit
@EOF

Here is a script "run.sh" that controls the gathering of data for the benchmark execution environment:

Comments:
run.sh

#!/usr/bin/env bash

###
# Usage
###
[ $# -eq 0 ] && { echo "Usage: $0 <TestSize>"; exit 1; }

###
# Set up
###
size=$1
export DateStamp=`date +%y%m%d_%H%M%S`
export DIR=/STATS/`hostname`_${size}_$DateStamp
mkdir -p $DIR

###
# Start gathering statistics
###
vmstat 1 > $DIR/vmstat.out &
vmstat_pid=$!
Take an AWR snapshot before execution
first=`./snap.sh`

###
# Run the benchmark
###
time sqlplus User/UserPass @RunAll $size > $DIR/std.out 2> $DIR/std.err

###
# Stop gathering statistics
###
Take an AWR snapshot after execution last=`./snap.sh`

kill $vmstat_pid

###
# Generate the report
###
Generate the AWR report
./gen_awr.sh $size $first $last > /dev/null
mv $size.lst $DIR/awr_$size.html


###
# View the AWR report
###
firefox $DIR/awr_$size.html

After many benchmark runs, I have one directory per benchmark execution, named <nodename>_<size>_<date>_<time>, and each directory contains one AWR report that exactly covers the benchmark interval.


Friday Sep 06, 2013

Hadoop on an Oracle SPARC T4-2 Server

I recently configured a Oracle SPARC T4-2 server to store and process a combination of 2 types of data:
  1. Critical and sensitive data. ACID transactions are required. Security is critical. This data needs to be stored in an Oracle Database.
  2. High-volume/low-risk data that needs to be processed using Apache Hadoop. This data is stored in HDFS.

Based on the requirements, I configured the server using a combination of:

  1. Oracle VM Server for SPARC, used for hard partitioning of system resources such as CPU, memory, PCIe buses and devices.
  2. Oracle Solaris Zones to host a Hadoop cluster as shown in Orgad Kimchi's How to Set Up a Hadoop Cluster Using Oracle Solaris Zones

The configuration is shown in the following diagram:

Oracle SPARC T4 Configuration

Hadoop Low CPU utilization:

As you can see in the diagram, a T4 CPU is allocated for Hadoop map/reduce processing. The T4 CPU has 8 cores and 64 virtual processors, enabling it to simultaneously run up to 64 software threads:

# psrinfo -pv
The physical processor has 8 cores and 64 virtual processors (0-63)
  The core has 8 virtual processors (0-7)
  The core has 8 virtual processors (8-15)
  The core has 8 virtual processors (16-23)
  The core has 8 virtual processors (24-31)
  The core has 8 virtual processors (32-39)
  The core has 8 virtual processors (40-47)
  The core has 8 virtual processors (48-55)
  The core has 8 virtual processors (56-63)
    SPARC-T4 (chipid 0, clock 2848 MHz)

To test the Hadoop configuration, I created a large Hive table and launched Hadoop map/reduce processes using Hive:

INSERT into table Table2
  SELECT ColumnA, SUM(ColumnB)
  FROM Table1
  GROUP BY ColumnA;

Out-of-the-box Hadoop performance was not well tuned. Simple Hive functions were not able to take advantage of the T4's CPU resources.

While the HIve job was running, I ran iostat in the Global Zone, I could see that:

  1. The CPU was not very busy
  2. The 3 data-node disks would spike, but were not stressed. 

# iostat -MmxPznc 60
...
     cpu
 us sy wt id
 12  1  0 88
                    extended device statistics             
    r/s    w/s   Mr/s   Mw/s wait actv wsvc_t asvc_t  %w  %b device
    2.2   19.1    0.0    0.1  0.0  0.0    0.0    1.8   0   1 (Primary LDOM root)
   23.4    2.9    3.0    0.0  0.0  0.0    0.0    1.3   0   2
(data-node1)
  105.5    5.5   11.7    0.0  0.0  0.4    0.0    3.2   0  10 (data-node2)
    0.0   23.7    0.0    0.3  0.0  0.0    0.0    1.9   0   1 
(Guest LDOM root)
   24.2    2.9    1.9    0.0  0.0  0.0    0.0    1.2   0   2
(data-node3)
    7.2   22.9    0.4    0.3  0.0  0.1    0.0    5.0   0   6 (/ZONES)

     cpu
 us sy wt id
 12  1  0 87
                    extended device statistics             
    r/s    w/s   Mr/s   Mw/s wait actv wsvc_t asvc_t  %w  %b device
    2.3   19.2    0.0    0.1  0.0  0.0    0.0    1.8   0   1
(Primary LDOM root)
    3.8    4.0    0.4    0.0  0.0  0.0    0.0    1.4   0   1
(data-node1)
   47.9    5.4    4.1    0.0  0.0  0.1    0.0    1.6   0   3 
(data-node2)
    0.0   25.6    0.0    0.3  0.0  0.0    0.0    1.5   0   1
(Guest LDOM root)
   38.2    3.9    3.2    0.0  0.0  0.1    0.0    1.4   0   3
(data-node3)
    9.5   21.9    0.6    0.3  0.0  0.1    0.0    4.4   0   6
(/ZONES)

     cpu
 us sy wt id
 11  1  0 88
                    extended device statistics             
    r/s    w/s   Mr/s   Mw/s wait actv wsvc_t asvc_t  %w  %b device
    5.3   18.6    0.1    0.1  0.0  0.1    0.0    4.4   0   4
(Primary LDOM root)
    0.5    3.6    0.0    0.0  0.0  0.0    0.0    1.1   0   0 
(data-node1)
    0.4    3.6    0.0    0.0  0.0  0.0    0.0    0.8   0   0
(data-node2)
    0.0   23.5    0.0    0.3  0.0  0.0    0.0    1.3   0   1
(Guest LDOM root)
  124.9    7.2   10.3    0.0  0.0  0.2    0.0    1.8   0  10 (data-node3)
    8.5   24.4    0.6    0.4  0.0  0.2    0.0    4.6   0   6
(/ZONES)

To understand the low CPU activity, I looked at active software threads on the Hadoop cluster from the Global Zone.  Six (6) threads were active, one thread per process from map/reduce processes.


$ prstat -mL
   PID USERNAME USR SYS TRP TFL DFL LCK SLP LAT VCX ICX SCL SIG PROCESS/LWPID 

 27529 hadoop    98 0.5 0.0 0.0 0.0 1.0 0.0 0.0  14  46  3K   0 java/2
 27534 hadoop    98 0.5 0.0 0.0 0.0 1.1 0.0 0.0  13  46  3K   0 java/2
 27575 hadoop    98 0.5 0.0 0.0 0.0 1.1 0.0 0.0  15  53  3K   0 java/2
 27577 hadoop    98 0.6 0.0 0.0 0.0 1.2 0.0 0.0  14  53  3K   0 java/2
 27576 hadoop    98 0.6 0.0 0.0 0.0 1.1 0.8 0.0  46  57  3K   0 java/2
 27578 hadoop    97 0.6 0.0 0.0 0.0 1.1 1.0 0.0  53  53  4K   0 java/2
 27575 hadoop   5.8 0.0 0.0 0.0 0.0  94 0.0 0.0  19   4  26   0 java/32
 27578 hadoop   5.6 0.0 0.0 0.0 0.0  94 0.0 0.0  19   5  35   0 java/33
 27529 hadoop   5.6 0.0 0.0 0.0 0.0  94 0.0 0.0   2   8   2   0 java/32
 27576 hadoop   5.5 0.0 0.0 0.0 0.0  95 0.0 0.0  21   6  36   0 java/33
 27028 hadoop   1.2 1.3 0.0 0.0 0.0 0.0  97 0.1 254   5  2K   0 java/87
 27028 hadoop   1.2 1.2 0.0 0.0 0.0 0.0  97 0.1 251   2  2K   0 java/86
   958 root     1.9 0.1 0.0 0.0 0.0  98 0.4 0.0   9   4  27   0 fmd/36
 27005 hadoop   1.2 0.8 0.0 0.0 0.0 0.0  98 0.0  99   2  2K   0 java/86
 27005 hadoop   1.1 0.8 0.0 0.0 0.0 0.0  98 0.0  98   3  2K   0 java/87
 11956 root     1.8 0.1 0.0 0.0 0.0 0.0  98 0.0  44   3 882   0 Xvnc/1
 27016 hadoop   1.0 0.8 0.0 0.0 0.0 0.0  98 0.0  95   2  2K   0 java/86
 27577 hadoop   1.7 0.0 0.0 0.0 0.0  98 0.0 0.0  18   1  28   0 java/32
 27016 hadoop   1.0 0.7 0.0 0.0 0.0 0.0  98 0.0  93   2  2K   0 java/85
 27576 hadoop   1.7 0.0 0.0 0.0 0.0  98 0.0 0.0  20   1  33   0 java/32
 27577 hadoop   1.6 0.0 0.0 0.0 0.0  98 0.0 0.0  18   0  31   0 java/33
Total: 619 processes, 4548 lwps, load averages: 3.46, 8.63, 27.90


From the Hadoop JobTracker UI, it could be seen that Hadoop was only scheduling a few of the tasks.

SixMapProcesses.jpg

This was because there were only 6 Map and 6 Reduce job slots. (See the "Map Task Capacity" and "Reduce Task Capacity"columns): 

SixJobSlots.jpg

Increasing the number of Job Slots:

The T4 CPU is able to run 64 software threads simultaneously, and over-subscribing the Hadoop CPU's is recommended. I enabled 25 Job Slots per node, for a total of 75, by adding these properties to mapred-site.xml:


  <property>
    <name>mapred.tasktracker.map.tasks.maximum</name>
    <value>25</value>
  </property>

  <property>
    <name>mapred.tasktracker.reduce.tasks.maximum</name>
    <value>25</value>
  </property>

Improved Hadoop CPU utilization:

After add Job slots, while running iostat in the Global Zone, I could see that:

  1. The CPU was very busy
  2. The 3 data-node disk were active (but not stressed)

# iostat -MmxPznc 30
...
 us sy wt id

 98  2  0  0
                    extended device statistics             
    r/s    w/s   Mr/s   Mw/s wait actv wsvc_t asvc_t  %w  %b device
    1.2   20.6    0.0    0.1  0.0  0.0    0.0    1.5   0   1 c0t5000CCA025261B74d0s0
    0.9   12.9    0.0    0.1  0.0  0.0    0.0    3.2   0   2 c0t5000CCA025311930d0s0
    1.0    9.7    0.0    0.0  0.0  0.0    0.0    1.2   0   1 c0t5000CCA02530B058d0s0
    0.0   22.4    0.0    0.2  0.0  0.0    0.0    1.0   0   1 c0t5000CCA0250CB198d0s0
    1.3   15.7    0.0    0.1  0.0  0.0    0.0    1.1   0   1 c0t5000CCA025324D98d0s0
    2.8   28.3    0.1    0.4  0.0  0.1    0.0    3.5   0   4 c0t5000CCA0253C11B0d0s0
     cpu
 us sy wt id
 98  2  0  0
                    extended device statistics             
    r/s    w/s   Mr/s   Mw/s wait actv wsvc_t asvc_t  %w  %b device
    1.3   21.1    0.0    0.1  0.0  0.1    0.0    3.0   0   3 c0t5000CCA025261B74d0s0
    0.6    7.2    0.0    0.0  0.0  0.0    0.0    1.3   0   0 c0t5000CCA025311930d0s0
    0.8    5.7    0.0    0.0  0.0  0.0    0.0    1.5   0   0 c0t5000CCA02530B058d0s0
    0.0   22.6    0.0    0.3  0.0  0.0    0.0    1.1   0   1 c0t5000CCA0250CB198d0s0
    0.5    7.7    0.0    0.1  0.0  0.0    0.0    1.1   0   0 c0t5000CCA025324D98d0s0
    2.2   24.7    0.1    0.3  0.0  0.1    0.0    2.4   0   2 c0t5000CCA0253C11B0d0s0
     cpu
 us sy wt id
 98  2  0  0
                    extended device statistics             
    r/s    w/s   Mr/s   Mw/s wait actv wsvc_t asvc_t  %w  %b device
    0.9   20.6    0.0    0.1  0.0  0.1    0.0    2.6   0   2 c0t5000CCA025261B74d0s0
    0.5    7.1    0.0    0.0  0.0  0.0    0.0    0.9   0   0 c0t5000CCA025311930d0s0
    4.8   10.3    0.4    0.0  0.0  0.0    0.0    1.2   0   1 c0t5000CCA02530B058d0s0
    0.0   21.7    0.0    0.2  0.0  0.0    0.0    1.1   0   1 c0t5000CCA0250CB198d0s0
    2.4   15.1    0.2    0.1  0.0  0.0    0.0    1.2   0   1 c0t5000CCA025324D98d0s0
    4.8   25.3    0.5    0.4  0.0  0.2    0.0    5.4   0   5 c0t5000CCA0253C11B0d0s0

Now all 69 maps jobs can run in parallel:

69JobsRunning.jpg

Because there are 75 map job slots: (See the "Map Task Capacity" and "Reduce Task Capacity"columns)

75slots.jpg

Hadoop Memory Usage:

For this simple case, I found that allocating additional memory to the Java map/reduce processes did not help, but for reference, this is how I increased the memory available to the Hive map/reduce processes:

$ hive --hiveconf mapred.map.child.java.opts="-Xmx1000m -Xms500m"




Tuesday Aug 20, 2013

Error in invoking target 'all_no_orcl' of makefile

Installing Oracle12c RDBMS on Solaris 11.1 I got this:

NeedAssembler3.png


In /u01/app/oraInventory/logs/installActions2013-08-20_02-25-50PM.log, I found:

INFO: sh[1]: /usr/ccs/bin/as: not found [No such file or directory]

INFO: *** Error code 127
The following command caused the error:
cd /u01/app/hadoop/product/12.1.0/dbhome_1/rdbms/lib/; \
/usr/ccs/bin/as -P -m64 config.s; \
/usr/ccs/bin/ar r /u01/app/hadoop/product/12.1.0/dbhome_1/lib/libserver12.a /u01/app/hadoop/product/12.1.0/dbhome_1/rdbms/lib/config.o

INFO: make: Fatal error: Command failed for target `/u01/app/hadoop/product/12.1.0/dbhome_1/rdbms/lib/config.o'


Thanks to Giri: https://blogs.oracle.com/mandalika/entry/solaris_tips_assembler_file_descriptors

Solution is to install the assembler:

# pkg install pkg:/developer/assembler

After installing the assembler, I clicked on the "retry" button and the installation successfully completed.

Tuesday Aug 13, 2013

Hive 0.11 (May, 15 2013) and Rank() within a category

This is a follow up to a Stack Overflow question HiveQL and rank():

libjack recommended that I upgrade to Hive 0.11 (May, 15 2013) to take advantage of Windowing and Analytics functions. His recommendation worked immediately, but it took a while for me to find the right syntax to sort within categories. This blog entry records the correct syntax.


1. Sales Rep data

Here is a CSV file with Sales Rep data:

$ more reps.csv
1,William,2
2,Nadia,1
3,Daniel,2
4,Jana,1


Create a Hive table for the Sales Rep data:

create table SalesRep (
  RepID INT,
  RepName STRING,
  Territory INT
  )
  ROW FORMAT DELIMITED
    FIELDS TERMINATED BY ','
    LINES TERMINATED BY '\n';

... and load the CSV into the Hive Sales Rep table:

LOAD DATA
 LOCAL INPATH '/home/hadoop/MyDemo/reps.csv'
 INTO TABLE SalesRep;



2. Purchase Order data

Here is a CSV file with PO data:

$ more purchases.csv
4,1,100
2,2,200
2,3,600
3,4,80
4,5,120
1,6,170
3,7,140


Create a Hive table for the PO's:

create table purchases (
  SalesRepId INT,
  PurchaseOrderId INT,
  Amount INT
  )
  ROW FORMAT DELIMITED
    FIELDS TERMINATED BY ','
    LINES TERMINATED BY '\n';


... and load CSV into the Hive PO table:

LOAD DATA
 LOCAL INPATH '/home/hadoop/MyDemo/purchases.csv'
 INTO TABLE purchases;



3. Hive JOIN

So this is the underlining data that is being worked with:

SELECT p.PurchaseOrderId, s.RepName, p.amount, s.Territory
FROM purchases p JOIN SalesRep s
WHERE p.SalesRepId = s.RepID;


PO ID Rep
Amount
Territory
1
Jana 100 1
2
Nadia 200 1
3
Nadia 600 1
4
Daniel 80 2
5
Jana 120 1
6
William 170 2
7
Daniel 140 2


4. Hive Rank by Volume only

SELECT
  s.RepName, s.Territory, V.volume,
rank() over (ORDER BY V.volume DESC) as rank
FROM
  SalesRep s
  JOIN
    ( SELECT
      SalesRepId, SUM(amount) as Volume
      FROM purchases
      GROUP BY SalesRepId) V
  WHERE V.SalesRepId = s.RepID
  ORDER BY V.volume DESC;



Rep
Territory
Amount
Rank
Nadia 1
800 1
Daniel 2
220 2
Jana 1
220
2
William 2
170 4

The ranking over the entire data set - Daniel is tied for second among all Reps.


5. Hive Rank within Territory, by Volume

SELECT
  s.RepName, s.Territory, V.volume,
  rank() over (PARTITION BY s.Territory ORDER BY V.volume DESC) as rank
FROM
  SalesRep s
  JOIN
    ( SELECT
      SalesRepId, SUM(amount) as Volume
      FROM purchases
      GROUP BY SalesRepId) V
  WHERE V.SalesRepId = s.RepID
  ORDER BY V.volume DESC;



Rep
Territory
Amount
Rank
Nadia 1
800 1
Jana 1
220 2
Daniel 2
220
1
William 2
170 2

The ranking is within the territory - Daniel is the best is his territory.


6. FYI: this example was developed on a SPARC T4 server with Oracle Solaris 11 and Apache Hadoop 1.0.4

Tuesday Jul 23, 2013

Ganglia on Solaris 11.1

Here are some notes that I took while building Ganglia Core 3.6.0 and Ganglia Web 3.5.7 with Solaris Studio 12.3 and installing on Solaris 11.1. These notes are only intended augment (not replace) other Ganglia install guides.

1) Add a ganglia user to build with (This is an optional step, you may build as any user, gmond will run as root )

    # useradd -d localhost:/export/home/ganglia -m ganglia
    # passwd ganglia
    # usermod -R root ganglia
    # echo "ganglia ALL=(ALL) ALL" > /etc/sudoers.d/ganglia
    # chmod 440 /etc/sudoers.d/ganglia
# su - ganglia



2) These packages are needed:

    $ sudo pkg install system/header gperf glibmm apache-22 php-53 apache-php53 apr-util-13 libconfuse rrdtool
$ sudo svcadm enable apache22

3) Download and unpack:

$ gzip -dc ganglia-3.6.0.tar.gz | tar xvf -

$ gzip -dc ganglia-web-3.5.7.tar.gz | tar xvf -

$ cd ganglia-3.6.0


4) Compiler error when building libgmond.c: "default_conf.h", line 75: invalid directive". In lib/default_conf.h, Removed the 3 lines with a '#' character.

udp_recv_channel {\n\
mcast_join = 239.2.11.71\n\
port = 8649\n\
  bind = 239.2.11.71\n\
retry_bind = true\n\
  # Size of the UDP buffer. If you are handling lots of metrics you really\n\
  # should bump it up to e.g. 10MB or even higher.\n\
# buffer = 10485760\n\
}\n\

5) Compiler error: "data_thread.c", line 143: undefined symbol: FIONREAD. In gmetad/data_thread.c, added:

#include <sys/filio.h>

6) Runtime error: "Cannot load /usr/local/lib/ganglia/modcpu.so metric module: ld.so.1: gmond: fatal: relocation error: file /usr/local/lib/ganglia/modcpu.so: symbol cpu_steal_func: referenced symbol not found". Stub added to the bottom of ./libmetrics/solaris/metrics.c

g_val_t
cpu_steal_func ( void )
{
  static g_val_t val=0;
return val;
}


7) Build Ganglia Core 3.6.0 without gmetad for all of the machines in the cluster, except a primary node:

$ export PATH=/usr/local/bin:/usr/local/sbin:$PATH
$ export LD_LIBRARY_PATH=/usr/local/lib:/usr/apr/1.3/lib
$ export PKG_CONFIG_PATH=/usr/apr/1.3/lib/pkgconfig

$ cd ganglia-3.6.0

$ ./configure

$ make

8) Install Ganglia Core 3.6.0 (on all machines in the cluster):

$ sudo make install
$ gmond --default_config > /tmp/gmond.conf
$ sudo cp /tmp/gmond.conf /usr/local/etc/gmond.conf

$ sudo vi /usr/local/etc/gmond.conf  # (Remove these lines)

metric {
   name = "cpu_steal"
value_threshold = "1.0"
  title = "CPU steal"
}

9) Start gmond as root (on all machines in the cluster):

# export PATH=/usr/local/bin:/usr/local/sbin:$PATH
# export LD_LIBRARY_PATH=/usr/local/lib:/usr/apr/1.3/lib
# gmond

7) Build Ganglia Core 3.6.0 with gmetad for the primary node:

    $ export PATH=/usr/local/bin:/usr/local/sbin:$PATH
    $ export LD_LIBRARY_PATH=/usr/local/lib:/usr/apr/1.3/lib
    $ export PKG_CONFIG_PATH=/usr/apr/1.3/lib/pkgconfig

    $ cd ganglia-3.6.0

    $ ./configure --with-gmetad

    $ make


10) Install ganglia-web-3.5.7 (on a primary server)

    $ cd ganglia-web-3.5.7

    $ vi Makefile # Set these variables

    GDESTDIR = /var/apache2/2.2/htdocs/
    APACHE_USER =  webservd


    $ sudo make install


    $ sudo mkdir -p /var/lib/ganglia/rrds
$ sudo chown -Rh nobody:nobody /var/lib/ganglia

11) Start gmond and gmetad on the primary node


12) You need to remove the "It works" page so that index.php is the default

    # sudo rm /var/apache2/2.2/htdocs/index.html

Now you can visit the primary node with a web browser.

Tuesday May 28, 2013

Adding users in Solaris 11 with power like the initial account

During Solaris 11.1 installation, the system administrator is prompted for a user name and password which will be used to create an unprivileged account. For security reasons, by default, root is a role, not a user, therefore the initial login can't be to a root account. The first login must use the initial unprivileged account. Later, the initial unprivileged user can acquire privileges through either "su" or "sudo".

For enterprise class deployments, the system administrator should be familiar with RBAC and create users with least privileges.

In contrast, I'm working in a lab environment and want to be able to simply and quickly create new users with power like the initial user. With Solaris 10, this was straight forward, but Solaris 11 adds a couple of twists.

Create a new user jenny in Solaris 11:

# useradd -d localhost:/export/home/jenny -m jenny
# passwd jenny

Jenny can't su to root:

jenny@app61:~$ su -
Password:
Roles can only be assumed by authorized users
su: Sorry

Because Jenny doesn't have that role:

jenny@app61:~$ roles
No roles

Give her the role:

root@app61:~# usermod -R root jenny


And then Jenny can su to root:

jenny@app61:~$ roles
root

jenny@app61:~$ su -
Password:
Oracle Corporation      SunOS 5.11      11.1    September 2012
You have new mail.
root@app61:~#


But even when jenny has the root role, she can't use sudo:

jenny@app61:~$ sudo -l
Password:
Sorry, user jenny may not run sudo on app61.

jenny@app61:~$ sudo touch /jenny
Password:
jenny is not in the sudoers file.  This incident will be reported.


Oh no, she is in big trouble, now.

User jeff was created as the initial account, and he can use sudo:

jeff@app61:~$ sudo -l
Password:
User jeff may run the following commands on this host:
    (ALL) ALL


But jeff isn't in the sudoers file:

root@app61:~# grep jeff /etc/sudoers

So how do you make jenny as powerful as jeff with respect to sudo?

Turns out that jeff, created during the Solaris installation, is in here:

root@app61:~# cat /etc/sudoers.d/svc-system-config-user
jeff ALL=(ALL) ALL


My coworker, Andrew, offers the following advice: "The last line of /etc/sudoers is a directive to read "drop-in" files from the /etc/sudoers.d directory. You can still edit /etc/sudoers. It may better to leave svc-system-config-user alone and create another drop-in file for local edits. If you want to edit sudoers as part of an application install then you should create a drop-in for the application - this makes the edits easy to undo if you remove the application. If you have multiple drop-ins in /etc/sudoers.d they are processed in alphabetical (sorted) order.  There are restrictions on file names and permissions for drop-ins. The permissions must be 440 (read only for owner and group) and the file name can't have a dot or ~ in it. These are in the very long man page."



Wednesday May 22, 2013

Debugging Hadoop using Solaris Studio in a Solaris 11 Zone


I've found Orgad Kimchi's How to Set Up a Hadoop Cluster Using Oracle Solaris Zones to be very useful, however, for a development environment, it is too complex. When map/reduce tasks are running in a clustered environment, it is challenging to isolate bugs. Debugging is easier when working within a standalone Hadoop installation. I've put the following instructions together for installation of a standalone Hadoop configuration in a Solaris Zone with Solaris Studio for application development.

A lovely feature of Solaris is that your global zone may host both a Hadoop cluster set up in a manner similar to Orgad's instructions and simultaneously host a zone for development that is running a Hadoop standalone configuration.

Create the Zone

These instructions assume that Solaris 11.1 is already running in the Global Zone.

Add the Hadoop Studio Zone

# dladm create-vnic -l net0 hadoop_studio

# zonecfg -z 
hadoop-studio
Use 'create' to begin configuring a new zone.
zonecfg:
hadoop-studio> create
create: Using system default template 'SYSdefault'
zonecfg:
hadoop-studio> set zonepath=/ZONES/hadoop-studio
zonecfg:
hadoop-studio> add net
zonecfg:
hadoop-studio:net> set physical=hadoop_studio
zonecfg:
hadoop-studio:net> end
zonecfg:
hadoop-studio> verify
zonecfg:
hadoop-studio> commit
zonecfg:
hadoop-studio> exit


Install and boot the zone

# zoneadm -z hadoop-studio install
# zoneadm -z 
hadoop-studio boot

Login to the zone console to set the network, time, root password, and unprivileged user.

# zlogin -C hadoop-studio

After the zone's initial configuration steps, nothing else needs to be done from within the global zone. You should be able to log into the Hadoop Studio zone with ssh as the unprivileged user and gain privileges with "su" and "sudo".

All of the remaining instructions are from inside the Hadoop Studio Zone.


Install extra Solaris software and set up the development environment

I like to start with both JDK's installed and not rely on the "/usr/java" symbolic link:

# pkg install  jdk-6
# pkg install --accept jdk-7


Verify the JDKs:

# /usr/jdk/instances/jdk1.6.0/bin/java -version
java version "1.6.0_35"
Java(TM) SE Runtime Environment (build 1.6.0_35-b10)
Java HotSpot(TM) Server VM (build 20.10-b01, mixed mode)

# /usr/jdk/instances/jdk1.7.0/bin/java -version
java version "1.7.0_07"
Java(TM) SE Runtime Environment (build 1.7.0_07-b10)
Java HotSpot(TM) Server VM (build 23.3-b01, mixed mode)


Add VNC Remote Desktop software

# pkg install --accept solaris-desktop


Create a Hadoop user:

# groupadd hadoop
# useradd -d localhost:/export/home/hadoop -m -g hadoop hadoop
# passwd hadoop
# usermod -R root hadoop


Edit /home/hadoop/.bashrc:

export PATH=/usr/bin:/usr/sbin
export PAGER="/usr/bin/less -ins"
typeset +x PS1="\u@\h:\w\\$ "

# Hadoop
export HADOOP_PREFIX=/home/hadoop/hadoop
export PATH=$HADOOP_PREFIX/bin:$PATH

# Java
export JAVA_HOME=/usr/jdk/instances/jdk1.6.0
export PATH=$JAVA_HOME/bin:$PATH

# Studio
export PATH=$PATH:/opt/solarisstudio12.3/bin
alias solstudio='solstudio --jdkhome /usr/jdk/instances/jdk1.6.0'

Edit /home/hadoop/.bash_profile:

. ~/.bashrc

And make sure that the ownership and permission make sense:

# ls -l /home/hadoop/.bash*      
-rw-r--r--   1 hadoop   hadoop        12 May 22 05:24 /home/hadoop/.bash_profile
-rw-r--r--   1 hadoop   hadoop       372 May 22 05:24 /home/hadoop/.bashrc


Now is a good time to a start remote VNC desktop for this zone:

# su - hadoop

$ vncserver


You will require a password to access your desktops.

Password:
Verify:
xauth:  file /home/hadoop/.Xauthority does not exist

New '
hadoop-studio:1 ()' desktop is hadoop-studio:1

Creating default startup script /home/hadoop/.vnc/xstartup
Starting applications specified in /home/hadoop/.vnc/xstartup
Log file is /home/hadoop/.vnc/
hadoop-studio:1.log

Access the remote desktop with your favorite VNC client

The default 10 minute time out on the VNC desktop is too fast for my preferences:

System -> Preferences -> Screensaver
  Display Modes:
  Blank after: 100
  Close the window (I always look for a "save" button, but no, just close the window without explicitly saving.)



Download and Install Hadoop

For this article, I used the "12 October, 2012 Release 1.0.4" release. Download the Hadoop tarball and copy it into the home directory of hadoop:

$ ls -l hadoop-1.0.4.tar.gz
-rw-r--r--   1 hadoop   hadoop   62793050 May 21 12:03 hadoop-1.0.4.tar.gz

Unpack the tarball into the home directory of the hadoop user:

$ gzip -dc hadoop-1.0.4.tar.gz  | tar -xvf -
$ mv hadoop-1.0.4 hadoop


Hadoop comes pre-configured in Standalone Mode

Edit /home/hadoop/hadoop/conf/hadoop-env.sh, and set JAVA_HOME:

export JAVA_HOME=/usr/jdk/instances/jdk1.6.0

That is all. Now, you can run a Hadoop example:

$ hadoop jar hadoop/hadoop-examples-1.0.4.jar pi 2 10
Number of Maps  = 2
Samples per Map = 10
Wrote input for Map #0
Wrote input for Map #1
Starting Job
...
Job Finished in 10.359 seconds
Estimated value of Pi is 3.80000000000000000000



Install Solaris Studio:

Visit https://pkg-register.oracle.com/ to obtain Oracle_Solaris_Studio_Support.key.pem, Oracle_Solaris_Studio_Support.certificate.pem and follow the instructions for "pkg set-publisher" and "pkg update" or "pkg install"

# sudo pkg set-publisher \
          -k /var/pkg/ssl/Oracle_Solaris_Studio_Support.key.pem \
          -c /var/pkg/ssl/Oracle_Solaris_Studio_Support.certificate.pem \
          -G '*' -g https://pkg.oracle.com/solarisstudio/support solarisstudio

# pkg install developer/solarisstudio-123/*


If your network requires a proxy, you will need set the proxy before starting Solaris Studio:

proxy.jpg


Start Solaris Studio:

$ solstudio

(Notice the alias in .bashrc that adds --jdkhome to the solstudio start up command.)

Go to "Tools -> Plugins.

Click on "Reload Catalog"

Load the Java SE plugins. I ran into a problem when the Maven plug in was installed. Something that I should diagnose at a future date.

plugins.jpg



Create a New Project:

File -> New Project

Step 1:
- Catagory: Java
- Project: Java Application
- Next

NewProject.jpg

Step 2: Fill in similar to this:

NewJavaApplication.jpg

Copy the example source into the project:

$ cp -r \
    $HADOOP_PREFIX/src/examples/org/apache/hadoop/examples/* \
    ~/SolStudioProjects/examples/src/org/apache/hadoop/examples/


Starting to look like a development environment:

DevEnvironment.jpg


Modify the Project to compile with Hadoop jars. Right-click on the project and select "Properties"

properties.jpg


Add in the necessary Hadoop compile jars:

CompileJars.jpg

I found that I needed these jars at run time:

RunJars.jpg

Add Program Arguments (2 10):

Arguments.jpg

Now, if you click on the "Run" button. PiEstimators will run inside the IDE:

PiRuns.jpg

And the set up behaves as expected if you set a break point and click on "Debug":

debug.jpg

Tuesday May 21, 2013

non-interactive zone configuration

When creating new Solaris zones, at initial boot up, the system administrator is prompted for the new hostname, network settings, etc of the new zone. I get tired of the brittle process of manually entering the initial settings and I prefer to be able to automate the process. I had previously figured out the process for Solaris 10, but I've only recently figured out the process for Solaris 11.



As a review, with Solaris 10, use your favorite editor to create a sysidcfg file:

system_locale=C
terminal=dtterm
security_policy=NONE
network_interface=primary {
                hostname=app-41
}
name_service=DNS {
    domain_name=us.mycorp.com
    name_server=232.23.233.33,154.45.155.15,77.88.21.211
    search=us.mycorp.com,yourcorp.com,thecorp.com
}
nfs4_domain=dynamic
timezone=US/Pacific
root_password=xOV2PpE67YUzY


1) Solaris 10 Install: Using sysidcfg to avoid answering the configuration questions in a newly installed zone:

After the "zoneadm -z app-41 install" you can copy the sysidcfg file to "/ZONES/app-41/root/etc/sysidcfg"  (assuming your "zonepath" is "/ZONES/app-41") and the initial boot process will read the settings from the file and not prompt the system administrator to manually enter the settings.

2) Solaris 10 Clone: Using sysidcfg when cloning the zone 

I used a similar trick on Solaris 10 when cloning old zone "app-41 to new zone "app-44":

#  zonecfg -z app-41 export | sed -e 's/app-41/app-44/g' | zonecfg -z app-44
#  zoneadm -z app-44 clone app-41
#  cat
/ZONES/app-41/root/etc/sysidcfg | sed -e 's/app-41/app-44/g' > /ZONES/app-44/root/etc/sysidcfg
#  zoneadm -z app-44 boot




With Solaris 11, instead of a small human readable file which will containing the configuration information, the information is contained in an XML file that would be difficult to create using an editor. Instead, create the initial profile by executing "sysconfig":

# sysconfig create-profile -o sc_profile.xml
# mkdir /root/profiles/app-61
# mv sc_profile.xml /root/profiles/app-6
1/sc_profile.xml

The new XML format is longer so I won't include it in this blog entry and it is left as an exercise for the reader to review the file that has been created.

1) Solaris 11 Install

# dladm create-vnic -l net0 app_61

# zonecfg -z app-61
Use 'create' to begin configuring a new zone.
zonecfg:p3231-zone61> create
create: Using system default template 'SYSdefault'
zonecfg:app-61> set zonepath=/ZONES/app-61
zonecfg:app-61> add net
zonecfg:app-61:net> set physical=app_61
zonecfg:app-61:net> end
zonecfg:app-61> verify
zonecfg:app-61> commit
zonecfg:app-61> exit

# zoneadm -z app-61 install -c /root/profiles/app-61
# zoneadm -z app-61 boot
# zlogin -C app-61


2) Solaris 11 Clone: If you want to clone app-61 to app-62 and have an existing sc_profile.xml, you can re-use most of the settings and only adjust what has changed:


# dladm create-vnic -l net0 app_62

# zoneadm -z app-61 halt

# mkdir /root/profiles/app-62

# sed \
-e 's/app-61/app-62/g' \
-e 's/app_61/app_62/g' \
-e 's/11.22.33.61/11.22.33.62/g' \
< /root/profiles/app-61/sc_profile.xml \
> /root/profiles/app-62/sc_profile.xml

# zonecfg -z app-61 export | sed -e 's/61/62/g' | zonecfg -z app-62

# zoneadm -z app-62 clone -c /root/profiles/app-62 app-61
# zoneadm -z app-62 boot
# zlogin -C app-62

I hope this trick saves you some time and makes your process less brittle.


Thursday May 16, 2013

Hadoop Java Error logs

I was having trouble isolating a problem with "reduce" tasks running on Hadoop slave servers. 

After poking around on the Hadoop slave, I found an interesting lead in /var/log/hadoop/userlogs/job_201302111641_0057/attempt_201302111641_0057_r_000001_1/stdout:

$ cat /tmp/hadoop-hadoop/mapred/local/userlogs/job_201302111641_0059/attempt_201302111641_0059_r_000001_1/stdout

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0xfe67cb31, pid=25828, tid=2
#
# JRE version: 6.0_35-b10
# Java VM: Java HotSpot(TM) Server VM (20.10-b01 mixed mode solaris-x86 )
# Problematic frame:
# C  [libc.so.1+0xbcb31]  pthread_mutex_trylock+0x29
#
# An error report file with more information is saved as:
# /tmp/hadoop-hadoop/mapred/local/taskTracker/hadoop/jobcache/job_201302111641_0059/attempt_201302111641_0059_r_000001_1/work/hs_err_pid25828.log
#
# If you would like to submit a bug report, please visit:
#   http://java.sun.com/webapps/bugreport/crash.jsp
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.

The HotSpot crash log (hs_err_pid25828.log in my case) will be very interesting because it contains information obtained at the time of the fatal error, including the following information, where possible:

  • The operating exception or signal that provoked the fatal error
  • Version and configuration information
  • Details on the thread that provoked the fatal error and thread's stack trace
  • The list of running threads and their state
  • Summary information about the heap
  • The list of native libraries loaded
  • Command line arguments
  • Environment variables
  • Details about the operating system and CPU

Great, but hs_err_pid25654.log had been cleaned up before I could get to it. In fact, I found that the hs_err_pid.log files were available for less than a minute and they were always gone before I could capture one.

To try to retain the Java error log file, my first incorrect guess was:

 <property>
   <name>keep.failed.task.files</name>
   <value>true</value>
 </property>


My next approach was to add "-XX:ErrorFile=/tmp/hs_err_pid%p.log" to the Java command line for the reduce task.

When I tried adding the Java option to HADOOP_OPTS in /usr/local/hadoop/conf/hadoop-env.sh, I realized that this setting isn't applied to the Map and Reduce Task JVMs.

Finally, I found that adding the Java option to the mapred.child.java.opts property in mapred-site.xml WORKED!!

$ cat /usr/local/hadoop/conf/mapred-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>

     <property>
         <name>mapred.job.tracker</name>
         <value>p3231-name-node:8021</value>
     </property>

     <property>
         <name>mapred.child.java.opts</name>
         <value>-XX:ErrorFile=/tmp/hs_err_pid%p.log</value>
     </property>


</configuration>


Now I can view the Java error logs on my Hadoop slaves:

$ ls -l /tmp/*err*
-rw-r--r--   1 hadoop   hadoop     15626 May 16 15:42 /tmp/hs_err_pid10028.log
-rw-r--r--   1 hadoop   hadoop     15795 May 16 15:43 /tmp/hs_err_pid10232.log

Tuesday Dec 04, 2012

Solaris 11 VNC Server is "blurry" or "smeared"

I've been annoyed by quality of the image that is displayed by my VNC viewer when I visit a Solaris 11 VNC server. How should I describe the image? Blurry? Grainy? Smeared? Low resolution? Compressed? Badly encoded?

This is what I have gotten used to seeing on Solaris 11:

Solaris 11 Blurry VNC image

This is not a problem for me when I view Solaris 10 VNC servers. I've finally taken the time to investigate, and the solution is simple. On the VNC client, don't allow "Tight" encoding.

My VNC Viewer will negotiate to Tight encoding if it is available. When negotiating with the Solaris 10 VNC server, Tight is not a supported option, so the Solaris 10 server and my client will agree on ZRLE. 

Now that I have disabled Tight encoding on my VNC client, the Solaris 11 VNC Servers looks much better:

Solaris 11 crisp VNC image

How should I describe the display when my VNC client is forced to negotiate to ZRLE encoding with the Solaris 11 VNC Server? Crisp? Clear? Higher resolution? Using a lossless compression algorithm?

When I'm on a low bandwidth connection, I may re-enable Tight compression on my laptop. In the mean time, the ZRLE compression is sufficient for a coast-to-coast desktop, through the corporate firewall, encoded with VPN, through my ISP and onto my laptop. YMMV.

Tuesday Nov 27, 2012

ZFS for Database Log Files


I've been troubled by drop outs in CPU usage in my application server, characterized by the CPUs suddenly going from close to 90% CPU busy to almost completely CPU idle for a few seconds. Here is an example of a drop out as shown by a snippet of vmstat data taken while the application server is under a heavy workload.

# vmstat 1
 kthr      memory            page            disk          faults      cpu
 r b w   swap  free  re  mf pi po fr de sr s3 s4 s5 s6   in   sy   cs us sy id
 1 0 0 130160176 116381952 0 16 0 0 0 0  0  0  0  0  0 207377 117715 203884 70 21 9
 12 0 0 130160160 116381936 0 25 0 0 0 0 0  0  0  0  0 200413 117162 197250 70 20 9
 11 0 0 130160176 116381920 0 16 0 0 0 0 0  0  1  0  0 203150 119365 200249 72 21 7
 8 0 0 130160176 116377808 0 19 0 0 0 0  0  0  0  0  0 169826 96144 165194 56 17 27
 0 0 0 130160176 116377800 0 16 0 0 0 0  0  0  0  0  1 10245 9376 9164 2  1 97
 0 0 0 130160176 116377792 0 16 0 0 0 0  0  0  0  0  2 15742 12401 14784 4 1 95
 0 0 0 130160176 116377776 2 16 0 0 0 0  0  0  1  0  0 19972 17703 19612 6 2 92

 14 0 0 130160176 116377696 0 16 0 0 0 0 0  0  0  0  0 202794 116793 199807 71 21 8
 9 0 0 130160160 116373584 0 30 0 0 0 0  0  0 18  0  0 203123 117857 198825 69 20 11


This behavior occurred consistently while the application server was processing synthetic transactions: HTTP requests from JMeter running on an external machine.

I explored many theories trying to explain the drop outs, including:
  • Unexpected JMeter behavior
  • Network contention
  • Java Garbage Collection
  • Application Server thread pool problems
  • Connection pool problems
  • Database transaction processing
  • Database I/O contention

Graphing the CPU %idle led to a breakthrough:

AppServerIdle.jpg

Several of the drop outs were 30 seconds apart. With that insight, I went digging through the data again and looking for other outliers that were 30 seconds apart. In the database server statistics, I found spikes in the iostat "asvc_t" (average response time of disk transactions, in milliseconds) for the disk drive that was being used for the database log files.

Here is an example:

                    extended device statistics
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device

    0.0 2053.6    0.0 8234.3  0.0  0.2    0.0    0.1   0  24 c3t60080E5...F4F6d0s0
    0.0 2162.2    0.0 8652.8  0.0  0.3    0.0    0.1   0  28 c3t60080E5...F4F6d0s0
    0.0 1102.5    0.0 10012.8  0.0  4.5    0.0    4.1   0  69 c3t60080E5...F4F6d0s0
    0.0   74.0    0.0 7920.6  0.0 10.0    0.0  135.1   0 100 c3t60080E5...F4F6d0s0
    0.0  568.7    0.0 6674.0  0.0  6.4    0.0   11.2   0  90 c3t60080E5...F4F6d0s0
    0.0 1358.0    0.0 5456.0  0.0  0.6    0.0    0.4   0  55 c3t60080E5...F4F6d0s0
    0.0 1314.3    0.0 5285.2  0.0  0.7    0.0    0.5   0  70 c3t60080E5...F4F6d0s0

Here is a little more information about my database configuration:
  • The database and application server were running on two different SPARC servers.
  • Storage for the database was on a storage array connected via 8 gigabit Fibre Channel
  • Data storage and log file were on different physical disk drives
  • Reliable low latency I/O is provided by battery backed NVRAM
  • Highly available:
    • Two Fibre Channel links accessed via MPxIO
    • Two Mirrored cache controllers
    • The log file physical disks were mirrored in the storage device
  • Database log files on a ZFS Filesystem with cutting-edge technologies, such as copy-on-write and end-to-end checksumming

Why would I be getting service time spikes in my high-end storage? First, I wanted to verify that the database log disk service time spikes aligned with the application server CPU drop outs, and they did:

AppServerIdleLogService.jpg

At first, I guessed that the disk service time spikes might be related to flushing the write through cache on the storage device, but I was unable to validate that theory.

After searching the WWW for a while, I decided to try using a separate log device:

# zpool add ZFS-db-41 log c3t60080E500017D55C000015C150A9F8A7d0

The ZFS log device is configured in a similar manner as described above: two physical disks mirrored in the storage array.

This change to the database storage configuration eliminated the application server CPU drop outs:

AppServerIdleAfter.jpg

Here is the zpool configuration:

# zpool status ZFS-db-41
  pool: ZFS-db-41
 state: ONLINE
 scan: none requested
config:

        NAME                                     STATE
        ZFS-db-41                                ONLINE
          c3t60080E5...F4F6d0  ONLINE
        logs
          c3t60080E5...F8A7d0  ONLINE


Now, the I/O spikes look like this:

                    extended device statistics             
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
    0.0 1053.5    0.0 4234.1  0.0  0.8    0.0    0.7   0  75 c3t60080E5...F8A7d0s0
                    extended device statistics             
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
    0.0 1131.8    0.0 4555.3  0.0  0.8    0.0    0.7   0  76 c3t60080E5...F8A7d0s0
                    extended device statistics             
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
    0.0 1167.6    0.0 4682.2  0.0  0.7    0.0    0.6   0  74 c3t60080E5...F8A7d0s0
    0.0  162.2    0.0 19153.9  0.0  0.7    0.0    4.2   0  12 c3t60080E5...F4F6d0s0
                    extended device statistics             
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
    0.0 1247.2    0.0 4992.6  0.0  0.7    0.0    0.6   0  71 c3t60080E5...F8A7d0s0
    0.0   41.0    0.0   70.0  0.0  0.1    0.0    1.6   0   2 c3t60080E5...F4F6d0s0
                    extended device statistics             
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
    0.0 1241.3    0.0 4989.3  0.0  0.8    0.0    0.6   0  75 c3t60080E5...F8A7d0s0
                    extended device statistics             
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
    0.0 1193.2    0.0 4772.9  0.0  0.7    0.0    0.6   0  71 c3t60080E5...F8A7d0s0

We can see the steady flow of 4k writes to the ZIL device from O_SYNC database log file writes. The spikes are from flushing the transaction group.

Like almost all problems that I run into, once I thoroughly understand the problem, I find that other people have documented similar experiences. Thanks to all of you who have documented alternative approaches.

Saved for another day: now that the problem is obvious, I should try "zfs:zfs_immediate_write_sz" as recommended in the ZFS Evil Tuning Guide.

References:

Thursday Jun 07, 2012

ndd on Solaris 10

This is mostly a repost of LaoTsao's Weblog, with some tweaks and additions. In 2008, his blog pointed out that with Solaris 9 and earlier,  an rc3 script would be used to specify ndd parameters at boot up. With Solaris 10 and later, it is more elegant to to SMF.

The last time that I tried to cut & paste directly off of his page, some of the XML was messed up, so I am reposting working Solaris 10 XML in this blog entry.

Additionally, I am including scripts that I use to distribute the settings to multiple servers. I run the distribution scripts from my MacBook, but they should also work from a windows laptop using cygwin, or from an existing Solaris installation.

Why is it necessary to set ndd parameters at boot up?

The problem being addressed is how to set ndd parameter which survive reboot. It is easy to specify ndd settings from a shell, but they only apply to the running OS and don't survive reboots.

Examples of ndd setting being necessary include performance tuning, as described in NFS Tuning for HPC Streaming Applications, and installing Oracle Database 11gR2 on Solaris 10 with prerequisites, as show here:

11gr2_ndd_check.jpg

On Solaris 10 Update 10, the default values network settings don't match the Oracle 11gR2 prerequisites:


Expected Value Actual Value
tcp_smallest_anon_port 9000 32768
tcp_largest_anon_port 65500 65535
udp_smallest_anon_port 9000 32768
udp_largest_anon_port 65500 65535

To distribute the SMF files, and for future administration, it is helpful to enable passwordless ssh from your secure laptop:

================
If not already present, create a ssh key on you laptop
================

# ssh-keygen -t rsa

================
Enable passwordless ssh from my laptop.
Need to type in the root password for the remote machines.
Then, I no longer need to type in the password when I ssh or scp from my laptop to servers.
================

#!/usr/bin/env bash

for server in `cat servers.txt`
do
  echo root@$server
  cat ~/.ssh/id_rsa.pub | ssh root@$server "cat >> .ssh/authorized_keys"
done

Specify the servers to distribute to:

================
servers.txt
================

testhost1
testhost2

In addition to ndd values, I offen use the following /etc/system setting: 

================
etc_system_addins
================

set rpcmod:clnt_max_conns=8
set zfs:zfs_arc_max=0x1000000000
set nfs:nfs3_bsize=131072
set nfs:nfs4_bsize=131072

Modify ndd-nettune.txt with the ndd values that are appropriate for your deployment: 

================
ndd-nettune.txt
================

#!/sbin/sh
#
# ident   "@(#)ndd-nettune.xml    1.0     01/08/06 SMI"

. /lib/svc/share/smf_include.sh
. /lib/svc/share/net_include.sh

# Make sure that the libraries essential to this stage of booting  can be found.
LD_LIBRARY_PATH=/lib; export LD_LIBRARY_PATH
echo "Performing Directory Server Tuning..." >> /tmp/smf.out
#
# Performance Settings
#
/usr/sbin/ndd -set /dev/tcp tcp_max_buf 2097152
/usr/sbin/ndd -set /dev/tcp tcp_xmit_hiwat 1048576
/usr/sbin/ndd -set /dev/tcp tcp_recv_hiwat 1048576
#
# Oracle Database 11gR2 Settings
#
/usr/sbin/ndd -set /dev/tcp tcp_smallest_anon_port 9000
/usr/sbin/ndd -set /dev/tcp tcp_largest_anon_port 65500
/usr/sbin/ndd -set /dev/udp udp_smallest_anon_port 9000
/usr/sbin/ndd -set /dev/udp udp_largest_anon_port 65500

# Reset the library path now that we are past the critical stage
unset LD_LIBRARY_PATH


================
ndd-nettune.xml
================

<?xml version="1.0"?>
<!DOCTYPE service_bundle SYSTEM "/usr/share/lib/xml/dtd/service_bundle.dtd.1">
<!-- ident "@(#)ndd-nettune.xml 1.0 04/09/21 SMI" -->
<service_bundle type='manifest' name='SUNWcsr:ndd'>
  <service name='network/ndd-nettune' type='service' version='1'>
    <create_default_instance enabled='true' />
    <single_instance />
    <dependency name='fs-minimal' type='service' grouping='require_all' restart_on='none'>
      <service_fmri value='svc:/system/filesystem/minimal' />
    </dependency>
    <dependency name='loopback-network' grouping='require_any' restart_on='none' type='service'>
      <service_fmri value='svc:/network/loopback' />
    </dependency>
    <dependency name='physical-network' grouping='optional_all' restart_on='none' type='service'>
      <service_fmri value='svc:/network/physical' />
    </dependency>
    <exec_method type='method' name='start' exec='/lib/svc/method/ndd-nettune' timeout_seconds='3' > </exec_method>
    <exec_method type='method' name='stop'  exec=':true'                       timeout_seconds='3' > </exec_method>
    <property_group name='startd' type='framework'>
      <propval name='duration' type='astring' value='transient' />
    </property_group>
    <stability value='Unstable' />
    <template>
      <common_name>
    <loctext xml:lang='C'> ndd network tuning </loctext>
      </common_name>
      <documentation>
    <manpage title='ndd' section='1M' manpath='/usr/share/man' />
      </documentation>
    </template>
  </service>
</service_bundle>

Execute this shell script to distribute the files. The ndd values will be immediately modified and then survive reboot. The servers will need to rebooted to pick up the /etc/system settings:

================
system_tuning.sh
================

#!/usr/bin/env bash

for server in `cat servers.txt`
do
  cat etc_system_addins | ssh root@$server "cat >> /etc/system"

  scp ndd-nettune.xml root@${server}:/var/svc/manifest/site/ndd-nettune.xml
  scp ndd-nettune.txt root@${server}:/lib/svc/method/ndd-nettune
  ssh root@$server chmod +x /lib/svc/method/ndd-nettune
  ssh root@$server svccfg validate /var/svc/manifest/site/ndd-nettune.xml
  ssh root@$server svccfg import /var/svc/manifest/site/ndd-nettune.xml
done





About

user12620111

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today