X

Jeff Taylor's Weblog

  • Sun
    January 21, 2010

Graphing Solaris Performance Stats

Graphing Solaris Performance Stats with gnuplot

It is not unusual to see an engineer import text from "vmstat" or "iostat" to a spreadsheet application such as Microsoft Office Excel or OpenOffice Calc to visualize the data.  This is a fine approach when used periodically but impractical when used frequently.  The process of transferring the data to a laptop, manually massaging the data, launching the office application, importing the data and selecting the columns to chart is too cumbersome when used as a daily process or if there are a large number of machines that are being monitored.  It my case, I needed to visualize the performance from a few servers that were under test, and needed a few graphs from the servers, a few times a day.  I used some traditional Unix scripts and gnuplot (http://www.gnuplot.info) from the Companion CD (http://www.sun.com/software/solaris/freeware) to quickly graph the data.

The right tool for graphing Solaris data depends on your use case scenario:

  • One or two graphs, now and then: Import the data into your favorite spreadsheet application.
  • Historic data, more graphs, more frequently: use gnuplot
  • Many graphs, real-time or historic data, for more machines, such as a grid of servers being managed by Sun Grid Engine:  a formal tool such a Ganglia (http://ganglia.info, http://www.sunfreeware.com/) is recommended. An advantage of Ganglia is that performance data is exposed via a web interface to a potentially large number of viewers in real time.

That being said, here are some scripts that I used to view Solaris Performance data with gnuplot.

1. Gathering data.  For each benchmark run, a script was used to start gathering performance data:


#!/usr/bin/ksh
dir=$1
mkdir $dir
vmstat 1 > $dir/vmstat.out 2>&1 &
zpool iostat 1 > $dir/zpool_iostat.out 2>&1 &
nicstat 1 > $dir/nicstat.out 2>&1 &
iostat -nmzxc 1 > $dir/iostat.out 2>&1 &
/opt/DTraceToolkit-0.99/Bin/iopattern 1 > $dir/iopattern.out 2>&1 &

The statistics gathering processes were all killed at the end of the benchmark run. Hence, each test had a directory with a comprehensive set of statistics files.

Next it was necessary to write a set of scripts to operate on the directories.

2. Graphing CPU utilization from "vmstat".

This script was fairly short and straightforward.  The "User CPU Utilization" and "System CPU Utilization" are in the 20th and 21st columns.  I added an optional argument to truncate the graph after a specific amount of time to account for the cases where the vmstat process was not killed immediately after the benchmark.  A bash "here document" is used to enter gnuplot commands.

#!/usr/bin/bash
dir=$1
file=$1/vmstat.out
if [ $# == 2 ] ; then
minutes=$2
(( seconds = minutes \* 60 ))
cat $file | head -$seconds > /tmp/data
file=/tmp/data
fi
gnuplot -persist <<EOF
set title "$dir"
plot "$file" using 20 title "%user" with lines, \\
"$file" using 21 title "%sys" with lines
EOF

Graph of CPU utilization based on vmstat output

3. Graphing IO throughput from "iostat -nmzxc 1" data

This script was a little bit more complicated for three reasons:

  • The data file contains statistics for several filesystems that are not interesting and will be filtered out.  The script needs to be launched with an argument that will be used to select one device.
  • I used the 'z' option to iostat which does not print traces when the device is idle (Zero I/O).  The 'z' option makes a smaller file that is more human readable, but it it not good for graphing.  Thus I needed synthesize the zero traces before passing the data to gnuplot.
  • I wanted to include a smooth line for the iostat "%w" and "%b" columns with a scale of 0 to 100.
#!/usr/bin/bash
# This script is used to parse "iostat -nmzxc" data which is formatted like this:
#
# extended device statistics
# r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
# 0.0 0.9 0.8 3.8 0.0 0.0 0.0 0.5 0 0 c0t1d0
# 0.0 0.0 0.0 0.0 0.0 0.0 0.0 2.4 0 0 sge_master:/opt/sge6-2/default/common
# 0.0 0.8 1.9 184.5 0.0 0.0 4.1 31.1 0 1 192.168.2.9:/jbod
if [ $# -lt 2 -o $# -gt 3 ] ; then
echo "Usage: $0 pattern dir [minutes]"
exit 1
fi
pattern=$1
dir=$2
(( minutes = 24 \* 60 )) #default: graph 1 day
if [ $# == 3 ] ; then
minutes=$3
fi
(( seconds = minutes \* 60 ))
all_data=$dir/iostat.out
plot_data=/tmp/plot_data
if [ ! -r $all_data ] ; then
echo "can not read $all_data"
exit 1
fi
# For each time interval, either:
# print the trace for the device that matches the pattern, or
# print a "zero" trace if there is not one in the data file
# You can tell that there was no trace for the device during an
# interval if you reach the "extended device statistics" line
# without finding a trace
gawk -v pattern=$pattern '
$0 ~ pattern {
printf("%s\\n",$0);
found = 1 ;
}
/extended/ {
if (found == 0)
printf(" 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 192.168.2.9:/jbod \\n")
found = 0;
} ' $all_data | head -$seconds > $plot_data
gnuplot -persist <<EOF
set title "$pattern - $dir"
set ytics nomirror
set y2range [0:100]
set y2tics 0, 20
plot "$plot_data" using 3 title "read (kb/sec)" axis x1y1 with lines, \\
"$plot_data" using 4 title "write (kb/sec)" axis x1y1 with lines, \\
"$plot_data" using 9 title "%w" axis x1y2 smooth bezier with lines, \\
"$plot_data" using 10 title "%b" axis x1y2 smooth bezier with lines
EOF

I created the following graph with the command "graph_iostat.bash jbod NFS_client_10GbE 5" to select data only from the "jbod" NFS mount, where the data is stored in the directory named "NFS_client_10GbE" and only graph the first 5 minutes worth of data.


iostat_NFS_client_10GbE.png

The iostat data was collected on an NFS client connected with a 10 gigabit network.  There
is some write activity (green) at the start of the 5 minute sample period,
followed by several minutes of intense reading (red) where the client hits speeds of 600-700MB/sec. The purple "%b" line, with values on the right x1y2 axis, indicates that during the intense read phase, the mount point is busy about 90% of the time.  

4. Graphing I/O Service time from "iostat -nmzxc" data.

I also find that columns 6 and 7 from iostat are very interesting and can be graphed using a simplification of the previous script.

  • actv: average number of transactions actively being serviced
  • svc_t: average response time  of  transactions,  in  milliseconds


#!/usr/bin/bash
# extended device statistics
# r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
# 0.0 0.9 0.8 3.8 0.0 0.0 0.0 0.5 0 0 c0t1d0
# 0.0 0.0 0.0 0.0 0.0 0.0 0.0 2.4 0 0 sge_master:/opt/sge6-2/default/common
# 0.0 0.8 1.9 184.5 0.0 0.0 4.1 31.1 0 1 192.168.2.9:/jbod
if [ $# -lt 2 -o $# -gt 3 ] ; then
echo "Usage: $0 pattern dir [minutes]"
exit 1
fi
pattern=$1
dir=$2
(( minutes = 24 \* 60 )) #default: graph 1 day
if [ $# == 3 ] ; then
minutes=$3
fi
(( seconds = minutes \* 60 ))
all_data=$dir/iostat.out
plot_data=/tmp/plot_data
# For each time interval, either:
# print the trace for the device that matches the pattern, or
# print a "zero" trace if there is not one in the data file
# You can tell that there was no trace for the device during an
# interval if you reach the "extended device statistics" line
# without finding a trace
gawk -v pattern=$pattern '
$0 ~ pattern {
printf("%s\\n",$0);
found = 1 ;
}
/extended/ {
if (found == 0)
printf(" 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 192.168.2.9:/jbod \\n")
found = 0;
} ' $all_data | head -$seconds > $plot_data
gnuplot -persist <<EOF
set title "$pattern - $dir"
set log y
plot "$plot_data" using 6 title "wsvc_t" with lines, \\
"$plot_data" using 7 title "asvc_t" with lines \\
EOF

Here is the graph produced by the command "graph_iostat_svc_t.bash jbod NFS_client_10GbE 5"

iostat_NFS_client_svc_t_10GbE

5. Graphing network throughput data from "nicstat"

Another very valuable Solaris performance statistics tool is "nicstat".  For the download link, see http://blogs.sun.com/timc/entry/nicstat_the_solaris_and_linux .  A script to graph the data from nicstat follows the same pattern.

#!/usr/bin/bash
if [ $# -lt 2 -o $# -gt 3 ] ; then
echo "Usage: $0 interface dir [minutes]"
exit 1
fi
interface=$1
dir=$2
(( minutes = 24 \* 60 )) #default: graph 1 day
if [ $# == 3 ] ; then
minutes=$3
fi
(( seconds = $minutes \* 60 ))
all_data=$dir/nicstat.out
plot_data=/tmp/plot_data
if [ ! -r $all_data ] ; then
echo "can not read $all_data"
exit 1
fi
grep $interface $all_data | head -$seconds > $plot_data
gnuplot -persist <<EOF
set title "$interface - $dir"
plot "$plot_data" using 3 title "read" with lines, \\
"$plot_data" using 4 title "write" with lines
EOF

 "graph_nicstat.bash ixgbe2 NFS_server_10GbE 5"

nicstat_NFS_server_10GbE

6. Graphing IO throughput from "zpool iostat" data

The challenge for plotting "zpool iostat" data is that the traces are not in constant units and therefore it is necessary to re-compute the data in constant units, in this example, MB/sec. 

#!/usr/bin/bash
if [ $# -lt 2 -o $# -gt 3 ] ; then
echo "Usage: $0 pattern dir [minutes]"
exit 1
fi
pool=$1
dir=$2
(( minutes = 24 \* 60 )) #default: graph 1 day
if [ $# == 3 ] ; then
minutes=$3
fi
(( seconds = minutes \* 60 ))
all_data=$dir/zpool_iostat.out
plot_data1=/tmp/plot_data1
plot_data2=/tmp/plot_data2
if [ ! -r $all_data ] ; then
echo "can not read $all_data"
exit 1
fi
grep $pool $all_data | awk '{printf("%s/1048576\\n",$6)}' | sed -e 's/K/\*1024/g' -e 's/M/\*1048576/g' -e 's/G/\*1073741824/g' | bc | head -$seconds > $plot_data1
grep $pool $all_data | awk '{printf("%s/1048576\\n",$7)}' | sed -e 's/K/\*1024/g' -e 's/M/\*1048576/g' -e 's/G/\*1073741824/g' | bc | head -$seconds > $plot_data2
gnuplot -persist <<EOF
set title "$pool - $dir"
set log y
plot "$plot_data1" using 1 title "read (MB/sec)" with lines, \\
"$plot_data2" using 1 title "write (MB/sec)" with lines
EOF

Graphing the IO throughput of the zpool named "jbod" using the command 
"graph_iostat_svc_t.bash jbod NFS_client_10GbE 5" shows that zpool can deliver data at speeds of close to one gigabyte per second.

zpool_iostat_NFS_client_10GbE.png

It is easy to modify the scripts above to graph the output of many tools that output a table of data in text format.

Join the discussion

Comments ( 3 )
  • Stefan Schneider Friday, January 22, 2010

    Nice writeup.

    Thank you all the scripts which are ready to go.


  • Thomas Monday, August 23, 2010

    Thanks, saved me from re-inventing the wheel!


  • Lucas Gardner Thursday, February 10, 2011

    Hello. I was having issues with our systems because we have so many devices. I found that devices that were named d0 would pick up all devices named d01 d02 d03 etc. Also any devices that are c2t0d1 will grab and plot c2t0d13 and c2t0d15 etc.

    I also had an issue because I was creating so many plot_data in /tmp so i randomized it.

    So I made a couple changes and here are the scripts.

    ------------------------------------

    cat io_throughputMD.sh

    #!/usr/bin/bash

    # This script is used to parse "iostat -nmzxc" data which is formatted like this:

    #

    # extended device statistics

    # r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device

    # 0.0 0.9 0.8 3.8 0.0 0.0 0.0 0.5 0 0 c0t1d0

    # 0.0 0.0 0.0 0.0 0.0 0.0 0.0 2.4 0 0 sge_master:/opt/sge6-2/default/common

    # 0.0 0.8 1.9 184.5 0.0 0.0 4.1 31.1 0 1 192.168.2.9:/jbod

    if [ $# -lt 2 -o $# -gt 3 ] ; then

    echo "Usage: $0 pattern dir [minutes]"

    exit 1

    fi

    pattern=$1

    dir=$2

    (( minutes = 24 \* 60 )) #default: graph 1 day

    if [ $# == 3 ] ; then

    minutes=$3

    fi

    (( seconds = minutes \* 60 ))

    all_data=$dir/iostat.out

    plot_data=/tmp/plot_data`echo $RANDOM`

    plot_dataf=/tmp/plot_dataf`echo $RANDOM`

    if [ ! -r $all_data ] ; then

    echo "can not read $all_data"

    exit 1

    fi

    # For each time interval, either:

    # print the trace for the device that matches the pattern, or

    # print a "zero" trace if there is not one in the data file

    # You can tell that there was no trace for the device during an

    # interval if you reach the "extended device statistics" line

    # without finding a trace

    gawk -v pattern=$pattern '

    $0 ~ pattern {

    printf("%s\\n",$0);

    found = 1 ;

    }

    /extended/ {

    if (found == 0)

    printf(" 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 192.168.2.9:/jbod \\n")

    found = 0;

    } ' $all_data | head -$seconds > $plot_data

    grep "$pattern " $plot_data >$plot_dataf

    gnuplot -persist <<EOF

    set title "$pattern - $dir"

    set grid

    set ytics nomirror

    set y2range [0:100]

    set y2tics 0, 20

    plot "$plot_dataf" using 3 title "read (kb/sec)" axis x1y1 with lines, \\

    "$plot_dataf" using 4 title "write (kb/sec)" axis x1y1 with lines, \\

    "$plot_dataf" using 9 title "%w" axis x1y2 smooth bezier with lines, \\

    "$plot_dataf" using 10 title "%b" axis x1y2 smooth bezier with lines

    EOF

    -----------------------------

    cat io_throughputDevice.sh

    #!/usr/bin/bash

    # This script is used to parse "iostat -nmzxc" data which is formatted like this:

    #

    # extended device statistics

    # r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device

    # 0.0 0.9 0.8 3.8 0.0 0.0 0.0 0.5 0 0 c0t1d0

    # 0.0 0.0 0.0 0.0 0.0 0.0 0.0 2.4 0 0 sge_master:/opt/sge6-2/default/common

    # 0.0 0.8 1.9 184.5 0.0 0.0 4.1 31.1 0 1 192.168.2.9:/jbod

    if [ $# -lt 2 -o $# -gt 3 ] ; then

    echo "Usage: $0 pattern dir [minutes]"

    exit 1

    fi

    pattern=$1

    dir=$2

    (( minutes = 24 \* 60 )) #default: graph 1 day

    if [ $# == 3 ] ; then

    minutes=$3

    fi

    (( seconds = minutes \* 60 ))

    all_data=$dir/iostat.out

    plot_data=/tmp/plot_data`echo $RANDOM`

    plot_dataf=/tmp/plot_dataf`echo $RANDOM`

    if [ ! -r $all_data ] ; then

    echo "can not read $all_data"

    exit 1

    fi

    # For each time interval, either:

    # print the trace for the device that matches the pattern, or

    # print a "zero" trace if there is not one in the data file

    # You can tell that there was no trace for the device during an

    # interval if you reach the "extended device statistics" line

    # without finding a trace

    gawk -v pattern=$pattern '

    $0 ~ pattern {

    printf("%s\\n",$0);

    found = 1 ;

    }

    /extended/ {

    if (found == 0)

    printf(" 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 192.168.2.9:/jbod \\n")

    found = 0;

    } ' $all_data | head -$seconds > $plot_data

    grep $pattern $plot_data |grep -v ${pattern}[0-9] > $plot_dataf

    gnuplot -persist <<EOF

    set title "$pattern - $dir"

    set grid

    set ytics nomirror

    set y2range [0:100]

    set y2tics 0, 20

    plot "$plot_dataf" using 3 title "read (kb/sec)" axis x1y1 with lines, \\

    "$plot_dataf" using 4 title "write (kb/sec)" axis x1y1 with lines, \\

    "$plot_dataf" using 9 title "%w" axis x1y2 smooth bezier with lines, \\

    "$plot_dataf" using 10 title "%b" axis x1y2 smooth bezier with lines

    EOF


Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.