Graphing Solaris Performance Stats

Graphing Solaris Performance Stats with gnuplot

It is not unusual to see an engineer import text from "vmstat" or "iostat" to a spreadsheet application such as Microsoft Office Excel or OpenOffice Calc to visualize the data.  This is a fine approach when used periodically but impractical when used frequently.  The process of transferring the data to a laptop, manually massaging the data, launching the office application, importing the data and selecting the columns to chart is too cumbersome when used as a daily process or if there are a large number of machines that are being monitored.  It my case, I needed to visualize the performance from a few servers that were under test, and needed a few graphs from the servers, a few times a day.  I used some traditional Unix scripts and gnuplot (http://www.gnuplot.info) from the Companion CD (http://www.sun.com/software/solaris/freeware) to quickly graph the data.

The right tool for graphing Solaris data depends on your use case scenario:

  • One or two graphs, now and then: Import the data into your favorite spreadsheet application.
  • Historic data, more graphs, more frequently: use gnuplot
  • Many graphs, real-time or historic data, for more machines, such as a grid of servers being managed by Sun Grid Engine:  a formal tool such a Ganglia (http://ganglia.info, http://www.sunfreeware.com/) is recommended. An advantage of Ganglia is that performance data is exposed via a web interface to a potentially large number of viewers in real time.

That being said, here are some scripts that I used to view Solaris Performance data with gnuplot.

1. Gathering data.  For each benchmark run, a script was used to start gathering performance data:


#!/usr/bin/ksh

dir=$1
mkdir $dir
vmstat 1        > $dir/vmstat.out        2>&1 &
zpool iostat 1  > $dir/zpool_iostat.out  2>&1 &
nicstat 1       > $dir/nicstat.out       2>&1 &
iostat -nmzxc 1 > $dir/iostat.out        2>&1 &
/opt/DTraceToolkit-0.99/Bin/iopattern 1 > $dir/iopattern.out   2>&1 &

The statistics gathering processes were all killed at the end of the benchmark run. Hence, each test had a directory with a comprehensive set of statistics files.

Next it was necessary to write a set of scripts to operate on the directories.

2. Graphing CPU utilization from "vmstat".

This script was fairly short and straightforward.  The "User CPU Utilization" and "System CPU Utilization" are in the 20th and 21st columns.  I added an optional argument to truncate the graph after a specific amount of time to account for the cases where the vmstat process was not killed immediately after the benchmark.  A bash "here document" is used to enter gnuplot commands.

#!/usr/bin/bash

dir=$1
file=$1/vmstat.out

if [ $# == 2 ] ; then
  minutes=$2
  (( seconds = minutes \* 60 ))
  cat $file | head -$seconds > /tmp/data
  file=/tmp/data
fi

gnuplot -persist <<EOF
set title "$dir"
plot "$file" using 20 title "%user" with lines, \\
     "$file" using 21 title "%sys" with lines

EOF

Graph of CPU utilization based on vmstat output

3. Graphing IO throughput from "iostat -nmzxc 1" data

This script was a little bit more complicated for three reasons:

  • The data file contains statistics for several filesystems that are not interesting and will be filtered out.  The script needs to be launched with an argument that will be used to select one device.
  • I used the 'z' option to iostat which does not print traces when the device is idle (Zero I/O).  The 'z' option makes a smaller file that is more human readable, but it it not good for graphing.  Thus I needed synthesize the zero traces before passing the data to gnuplot.
  • I wanted to include a smooth line for the iostat "%w" and "%b" columns with a scale of 0 to 100.
#!/usr/bin/bash

# This script is used to parse "iostat -nmzxc" data which is formatted like this:
#
#                     extended device statistics
#     r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
#     0.0    0.9    0.8    3.8  0.0  0.0    0.0    0.5   0   0 c0t1d0
#     0.0    0.0    0.0    0.0  0.0  0.0    0.0    2.4   0   0 sge_master:/opt/sge6-2/default/common
#     0.0    0.8    1.9  184.5  0.0  0.0    4.1   31.1   0   1 192.168.2.9:/jbod



if [ $# -lt 2 -o $# -gt 3 ] ; then
  echo "Usage: $0 pattern dir [minutes]"
  exit 1
fi

pattern=$1
dir=$2
(( minutes = 24 \* 60 )) #default: graph 1 day

if [ $# == 3 ] ; then
  minutes=$3
fi

(( seconds = minutes \* 60 ))
all_data=$dir/iostat.out
plot_data=/tmp/plot_data

if [ ! -r $all_data ] ; then
  echo "can not read $all_data"
  exit 1
fi

# For each time interval, either:
#   print the trace for the device that matches the pattern, or
#   print a "zero" trace if there is not one in the data file 
# You can tell that there was no trace for the device during an
# interval if you reach the "extended device statistics" line 
# without finding a trace
gawk -v pattern=$pattern '
$0 ~ pattern {
  printf("%s\\n",$0);
  found = 1 ;
}

/extended/ {
  if (found == 0)
    printf("    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0 192.168.2.9:/jbod \\n")
  found = 0;
} ' $all_data | head -$seconds > $plot_data

gnuplot -persist <<EOF
set title "$pattern - $dir"
set ytics nomirror
set y2range [0:100]
set y2tics 0, 20
plot "$plot_data" using  3 title "read (kb/sec)" axis x1y1 with lines, \\
     "$plot_data" using  4 title "write (kb/sec)" axis x1y1 with lines, \\
     "$plot_data" using  9 title "%w" axis x1y2 smooth bezier with lines, \\
     "$plot_data" using 10 title "%b" axis x1y2 smooth bezier with lines

EOF

I created the following graph with the command "graph_iostat.bash jbod NFS_client_10GbE 5" to select data only from the "jbod" NFS mount, where the data is stored in the directory named "NFS_client_10GbE" and only graph the first 5 minutes worth of data.


iostat_NFS_client_10GbE.png

The iostat data was collected on an NFS client connected with a 10 gigabit network.  There is some write activity (green) at the start of the 5 minute sample period, followed by several minutes of intense reading (red) where the client hits speeds of 600-700MB/sec. The purple "%b" line, with values on the right x1y2 axis, indicates that during the intense read phase, the mount point is busy about 90% of the time.  

4. Graphing I/O Service time from "iostat -nmzxc" data.

I also find that columns 6 and 7 from iostat are very interesting and can be graphed using a simplification of the previous script.

  • actv: average number of transactions actively being serviced
  • svc_t: average response time  of  transactions,  in  milliseconds


#!/usr/bin/bash

#                     extended device statistics
#     r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
#     0.0    0.9    0.8    3.8  0.0  0.0    0.0    0.5   0   0 c0t1d0
#     0.0    0.0    0.0    0.0  0.0  0.0    0.0    2.4   0   0 sge_master:/opt/sge6-2/default/common
#     0.0    0.8    1.9  184.5  0.0  0.0    4.1   31.1   0   1 192.168.2.9:/jbod


if [ $# -lt 2 -o $# -gt 3 ] ; then
  echo "Usage: $0 pattern dir [minutes]"
  exit 1
fi

pattern=$1
dir=$2
(( minutes = 24 \* 60 )) #default: graph 1 day

if [ $# == 3 ] ; then
  minutes=$3
fi

(( seconds = minutes \* 60 ))
all_data=$dir/iostat.out
plot_data=/tmp/plot_data

# For each time interval, either:
#   print the trace for the device that matches the pattern, or
#   print a "zero" trace if there is not one in the data file 
# You can tell that there was no trace for the device during an
# interval if you reach the "extended device statistics" line 
# without finding a trace
gawk -v pattern=$pattern '
$0 ~ pattern {
  printf("%s\\n",$0);
  found = 1 ;
}

/extended/ {
  if (found == 0)
    printf("    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0 192.168.2.9:/jbod \\n")
  found = 0;
} ' $all_data | head -$seconds > $plot_data

gnuplot -persist <<EOF
set title "$pattern - $dir"
set log y
plot "$plot_data" using  6 title "wsvc_t" with lines, \\
     "$plot_data" using  7 title "asvc_t" with lines \\

EOF

Here is the graph produced by the command "graph_iostat_svc_t.bash jbod NFS_client_10GbE 5"

iostat_NFS_client_svc_t_10GbE

5. Graphing network throughput data from "nicstat"

Another very valuable Solaris performance statistics tool is "nicstat".  For the download link, see http://blogs.sun.com/timc/entry/nicstat_the_solaris_and_linux .  A script to graph the data from nicstat follows the same pattern.

#!/usr/bin/bash

if [ $# -lt 2 -o $# -gt 3 ] ; then
  echo "Usage: $0 interface dir [minutes]"
  exit 1
fi

interface=$1
dir=$2
(( minutes = 24 \* 60 )) #default: graph 1 day

if [ $# == 3 ] ; then
  minutes=$3
fi

(( seconds = $minutes \* 60 ))
all_data=$dir/nicstat.out
plot_data=/tmp/plot_data

if [ ! -r $all_data ] ; then
  echo "can not read $all_data"
  exit 1
fi

grep $interface $all_data | head -$seconds > $plot_data

gnuplot -persist <<EOF
set title "$interface - $dir"
plot "$plot_data" using 3 title "read" with lines, \\
     "$plot_data" using 4 title "write" with lines
EOF

 "graph_nicstat.bash ixgbe2 NFS_server_10GbE 5"

nicstat_NFS_server_10GbE

6. Graphing IO throughput from "zpool iostat" data

The challenge for plotting "zpool iostat" data is that the traces are not in constant units and therefore it is necessary to re-compute the data in constant units, in this example, MB/sec. 

#!/usr/bin/bash

if [ $# -lt 2 -o $# -gt 3 ] ; then
  echo "Usage: $0 pattern dir [minutes]"
  exit 1
fi

pool=$1
dir=$2
(( minutes = 24 \* 60 )) #default: graph 1 day

if [ $# == 3 ] ; then
  minutes=$3
fi

(( seconds = minutes \* 60 ))
all_data=$dir/zpool_iostat.out
plot_data1=/tmp/plot_data1
plot_data2=/tmp/plot_data2

if [ ! -r $all_data ] ; then
  echo "can not read $all_data"
  exit 1
fi

grep $pool $all_data | awk '{printf("%s/1048576\\n",$6)}' | sed -e 's/K/\*1024/g' -e 's/M/\*1048576/g' -e 's/G/\*1073741824/g' | bc | head -$seconds > $plot_data1
grep $pool $all_data | awk '{printf("%s/1048576\\n",$7)}' | sed -e 's/K/\*1024/g' -e 's/M/\*1048576/g' -e 's/G/\*1073741824/g' | bc | head -$seconds > $plot_data2

gnuplot -persist <<EOF
set title "$pool - $dir"
set log y
plot "$plot_data1" using 1 title "read (MB/sec)" with lines, \\
     "$plot_data2" using 1 title "write (MB/sec)" with lines

EOF

Graphing the IO throughput of the zpool named "jbod" using the command  "graph_iostat_svc_t.bash jbod NFS_client_10GbE 5" shows that zpool can deliver data at speeds of close to one gigabyte per second.

zpool_iostat_NFS_client_10GbE.png

It is easy to modify the scripts above to graph the output of many tools that output a table of data in text format.

Comments:

Nice writeup.
Thank you all the scripts which are ready to go.

Posted by Stefan Schneider on January 22, 2010 at 06:25 AM EST #

Thanks, saved me from re-inventing the wheel!

Posted by Thomas on August 23, 2010 at 06:24 PM EDT #

Hello. I was having issues with our systems because we have so many devices. I found that devices that were named d0 would pick up all devices named d01 d02 d03 etc. Also any devices that are c2t0d1 will grab and plot c2t0d13 and c2t0d15 etc.

I also had an issue because I was creating so many plot_data in /tmp so i randomized it.

So I made a couple changes and here are the scripts.

------------------------------------

cat io_throughputMD.sh
#!/usr/bin/bash

# This script is used to parse "iostat -nmzxc" data which is formatted like this:
#
# extended device statistics
# r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
# 0.0 0.9 0.8 3.8 0.0 0.0 0.0 0.5 0 0 c0t1d0
# 0.0 0.0 0.0 0.0 0.0 0.0 0.0 2.4 0 0 sge_master:/opt/sge6-2/default/common
# 0.0 0.8 1.9 184.5 0.0 0.0 4.1 31.1 0 1 192.168.2.9:/jbod

if [ $# -lt 2 -o $# -gt 3 ] ; then
echo "Usage: $0 pattern dir [minutes]"
exit 1
fi

pattern=$1
dir=$2
(( minutes = 24 \* 60 )) #default: graph 1 day

if [ $# == 3 ] ; then
minutes=$3
fi

(( seconds = minutes \* 60 ))
all_data=$dir/iostat.out
plot_data=/tmp/plot_data`echo $RANDOM`
plot_dataf=/tmp/plot_dataf`echo $RANDOM`

if [ ! -r $all_data ] ; then
echo "can not read $all_data"
exit 1
fi

# For each time interval, either:
# print the trace for the device that matches the pattern, or
# print a "zero" trace if there is not one in the data file
# You can tell that there was no trace for the device during an
# interval if you reach the "extended device statistics" line
# without finding a trace
gawk -v pattern=$pattern '
$0 ~ pattern {
printf("%s\\n",$0);
found = 1 ;
}

/extended/ {
if (found == 0)
printf(" 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 192.168.2.9:/jbod \\n")
found = 0;
} ' $all_data | head -$seconds > $plot_data

grep "$pattern " $plot_data >$plot_dataf

gnuplot -persist <<EOF
set title "$pattern - $dir"
set grid
set ytics nomirror
set y2range [0:100]
set y2tics 0, 20
plot "$plot_dataf" using 3 title "read (kb/sec)" axis x1y1 with lines, \\
"$plot_dataf" using 4 title "write (kb/sec)" axis x1y1 with lines, \\
"$plot_dataf" using 9 title "%w" axis x1y2 smooth bezier with lines, \\
"$plot_dataf" using 10 title "%b" axis x1y2 smooth bezier with lines

EOF

-----------------------------

cat io_throughputDevice.sh
#!/usr/bin/bash

# This script is used to parse "iostat -nmzxc" data which is formatted like this:
#
# extended device statistics
# r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
# 0.0 0.9 0.8 3.8 0.0 0.0 0.0 0.5 0 0 c0t1d0
# 0.0 0.0 0.0 0.0 0.0 0.0 0.0 2.4 0 0 sge_master:/opt/sge6-2/default/common
# 0.0 0.8 1.9 184.5 0.0 0.0 4.1 31.1 0 1 192.168.2.9:/jbod

if [ $# -lt 2 -o $# -gt 3 ] ; then
echo "Usage: $0 pattern dir [minutes]"
exit 1
fi

pattern=$1
dir=$2
(( minutes = 24 \* 60 )) #default: graph 1 day

if [ $# == 3 ] ; then
minutes=$3
fi

(( seconds = minutes \* 60 ))
all_data=$dir/iostat.out
plot_data=/tmp/plot_data`echo $RANDOM`
plot_dataf=/tmp/plot_dataf`echo $RANDOM`

if [ ! -r $all_data ] ; then
echo "can not read $all_data"
exit 1
fi

# For each time interval, either:
# print the trace for the device that matches the pattern, or
# print a "zero" trace if there is not one in the data file
# You can tell that there was no trace for the device during an
# interval if you reach the "extended device statistics" line
# without finding a trace
gawk -v pattern=$pattern '
$0 ~ pattern {
printf("%s\\n",$0);
found = 1 ;
}

/extended/ {
if (found == 0)
printf(" 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 192.168.2.9:/jbod \\n")
found = 0;
} ' $all_data | head -$seconds > $plot_data

grep $pattern $plot_data |grep -v ${pattern}[0-9] > $plot_dataf

gnuplot -persist <<EOF
set title "$pattern - $dir"
set grid
set ytics nomirror
set y2range [0:100]
set y2tics 0, 20
plot "$plot_dataf" using 3 title "read (kb/sec)" axis x1y1 with lines, \\
"$plot_dataf" using 4 title "write (kb/sec)" axis x1y1 with lines, \\
"$plot_dataf" using 9 title "%w" axis x1y2 smooth bezier with lines, \\
"$plot_dataf" using 10 title "%b" axis x1y2 smooth bezier with lines

EOF

Posted by Lucas Gardner on February 10, 2011 at 06:19 AM EST #

Post a Comment:
  • HTML Syntax: NOT allowed
About

user12620111

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today