It is not unusual to see an engineer import text from "vmstat" or "iostat" to a spreadsheet application such as Microsoft Office Excel or OpenOffice Calc to visualize the data. This is a fine approach when used periodically but impractical when used frequently. The process of transferring the data to a laptop, manually massaging the data, launching the office application, importing the data and selecting the columns to chart is too cumbersome when used as a daily process or if there are a large number of machines that are being monitored. It my case, I needed to visualize the performance from a few servers that were under test, and needed a few graphs from the servers, a few times a day. I used some traditional Unix scripts and gnuplot (http://www.gnuplot.info) from the Companion CD (http://www.sun.com/software/solaris/freeware) to quickly graph the data.
The right tool for graphing Solaris data depends on your use case scenario:
That being said, here are some scripts that I used to view Solaris Performance data with gnuplot.
1. Gathering data. For each benchmark run, a script was used to start gathering performance data:
#!/usr/bin/ksh
dir=$1
mkdir $dir
vmstat 1 > $dir/vmstat.out 2>&1 &
zpool iostat 1 > $dir/zpool_iostat.out 2>&1 &
nicstat 1 > $dir/nicstat.out 2>&1 &
iostat -nmzxc 1 > $dir/iostat.out 2>&1 &
/opt/DTraceToolkit-0.99/Bin/iopattern 1 > $dir/iopattern.out 2>&1 &
The statistics gathering processes were all killed at the end of the benchmark run. Hence, each test had a directory with a comprehensive set of statistics files.
Next it was necessary to write a set of scripts to operate on the directories.
2. Graphing CPU utilization from "vmstat".
This script was fairly short and straightforward. The "User CPU Utilization" and "System CPU Utilization" are in the 20th and 21st columns. I added an optional argument to truncate the graph after a specific amount of time to account for the cases where the vmstat process was not killed immediately after the benchmark. A bash "here document" is used to enter gnuplot commands.
#!/usr/bin/bash
dir=$1
file=$1/vmstat.out
if [ $# == 2 ] ; then
minutes=$2
(( seconds = minutes \* 60 ))
cat $file | head -$seconds > /tmp/data
file=/tmp/data
fi
gnuplot -persist <<EOF
set title "$dir"
plot "$file" using 20 title "%user" with lines, \\
"$file" using 21 title "%sys" with lines
EOF
3. Graphing IO throughput from "iostat -nmzxc 1" data
This script was a little bit more complicated for three reasons:
#!/usr/bin/bash
# This script is used to parse "iostat -nmzxc" data which is formatted like this:
#
# extended device statistics
# r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
# 0.0 0.9 0.8 3.8 0.0 0.0 0.0 0.5 0 0 c0t1d0
# 0.0 0.0 0.0 0.0 0.0 0.0 0.0 2.4 0 0 sge_master:/opt/sge6-2/default/common
# 0.0 0.8 1.9 184.5 0.0 0.0 4.1 31.1 0 1 192.168.2.9:/jbod
if [ $# -lt 2 -o $# -gt 3 ] ; then
echo "Usage: $0 pattern dir [minutes]"
exit 1
fi
pattern=$1
dir=$2
(( minutes = 24 \* 60 )) #default: graph 1 day
if [ $# == 3 ] ; then
minutes=$3
fi
(( seconds = minutes \* 60 ))
all_data=$dir/iostat.out
plot_data=/tmp/plot_data
if [ ! -r $all_data ] ; then
echo "can not read $all_data"
exit 1
fi
# For each time interval, either:
# print the trace for the device that matches the pattern, or
# print a "zero" trace if there is not one in the data file
# You can tell that there was no trace for the device during an
# interval if you reach the "extended device statistics" line
# without finding a trace
gawk -v pattern=$pattern '
$0 ~ pattern {
printf("%s\\n",$0);
found = 1 ;
}
/extended/ {
if (found == 0)
printf(" 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 192.168.2.9:/jbod \\n")
found = 0;
} ' $all_data | head -$seconds > $plot_data
gnuplot -persist <<EOF
set title "$pattern - $dir"
set ytics nomirror
set y2range [0:100]
set y2tics 0, 20
plot "$plot_data" using 3 title "read (kb/sec)" axis x1y1 with lines, \\
"$plot_data" using 4 title "write (kb/sec)" axis x1y1 with lines, \\
"$plot_data" using 9 title "%w" axis x1y2 smooth bezier with lines, \\
"$plot_data" using 10 title "%b" axis x1y2 smooth bezier with lines
EOF
I created the following graph with the command "graph_iostat.bash jbod NFS_client_10GbE 5" to select data only from the "jbod" NFS mount, where the data is stored in the directory named "NFS_client_10GbE" and only graph the first 5 minutes worth of data.
The iostat data was collected on an NFS client connected with a 10 gigabit network. There
is some write activity (green) at the start of the 5 minute sample period,
followed by several minutes of intense reading (red) where the client hits speeds of 600-700MB/sec. The purple "%b" line, with values on the right x1y2 axis, indicates that during the intense read phase, the mount point is busy about 90% of the time.
4. Graphing I/O Service time from "iostat -nmzxc" data.
I also find that columns 6 and 7 from iostat are very interesting and can be graphed using a simplification of the previous script.
#!/usr/bin/bash
# extended device statistics
# r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
# 0.0 0.9 0.8 3.8 0.0 0.0 0.0 0.5 0 0 c0t1d0
# 0.0 0.0 0.0 0.0 0.0 0.0 0.0 2.4 0 0 sge_master:/opt/sge6-2/default/common
# 0.0 0.8 1.9 184.5 0.0 0.0 4.1 31.1 0 1 192.168.2.9:/jbod
if [ $# -lt 2 -o $# -gt 3 ] ; then
echo "Usage: $0 pattern dir [minutes]"
exit 1
fi
pattern=$1
dir=$2
(( minutes = 24 \* 60 )) #default: graph 1 day
if [ $# == 3 ] ; then
minutes=$3
fi
(( seconds = minutes \* 60 ))
all_data=$dir/iostat.out
plot_data=/tmp/plot_data
# For each time interval, either:
# print the trace for the device that matches the pattern, or
# print a "zero" trace if there is not one in the data file
# You can tell that there was no trace for the device during an
# interval if you reach the "extended device statistics" line
# without finding a trace
gawk -v pattern=$pattern '
$0 ~ pattern {
printf("%s\\n",$0);
found = 1 ;
}
/extended/ {
if (found == 0)
printf(" 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 192.168.2.9:/jbod \\n")
found = 0;
} ' $all_data | head -$seconds > $plot_data
gnuplot -persist <<EOF
set title "$pattern - $dir"
set log y
plot "$plot_data" using 6 title "wsvc_t" with lines, \\
"$plot_data" using 7 title "asvc_t" with lines \\
EOF
Here is the graph produced by the command "graph_iostat_svc_t.bash jbod NFS_client_10GbE 5"
5. Graphing network throughput data from "nicstat"
Another very valuable Solaris performance statistics tool is "nicstat". For the download link, see http://blogs.sun.com/timc/entry/nicstat_the_solaris_and_linux . A script to graph the data from nicstat follows the same pattern.
#!/usr/bin/bash
if [ $# -lt 2 -o $# -gt 3 ] ; then
echo "Usage: $0 interface dir [minutes]"
exit 1
fi
interface=$1
dir=$2
(( minutes = 24 \* 60 )) #default: graph 1 day
if [ $# == 3 ] ; then
minutes=$3
fi
(( seconds = $minutes \* 60 ))
all_data=$dir/nicstat.out
plot_data=/tmp/plot_data
if [ ! -r $all_data ] ; then
echo "can not read $all_data"
exit 1
fi
grep $interface $all_data | head -$seconds > $plot_data
gnuplot -persist <<EOF
set title "$interface - $dir"
plot "$plot_data" using 3 title "read" with lines, \\
"$plot_data" using 4 title "write" with lines
EOF
"graph_nicstat.bash ixgbe2 NFS_server_10GbE 5"
6. Graphing IO throughput from "zpool iostat" data
The challenge for plotting "zpool iostat" data is that the traces are not in constant units and therefore it is necessary to re-compute the data in constant units, in this example, MB/sec.
#!/usr/bin/bash
if [ $# -lt 2 -o $# -gt 3 ] ; then
echo "Usage: $0 pattern dir [minutes]"
exit 1
fi
pool=$1
dir=$2
(( minutes = 24 \* 60 )) #default: graph 1 day
if [ $# == 3 ] ; then
minutes=$3
fi
(( seconds = minutes \* 60 ))
all_data=$dir/zpool_iostat.out
plot_data1=/tmp/plot_data1
plot_data2=/tmp/plot_data2
if [ ! -r $all_data ] ; then
echo "can not read $all_data"
exit 1
fi
grep $pool $all_data | awk '{printf("%s/1048576\\n",$6)}' | sed -e 's/K/\*1024/g' -e 's/M/\*1048576/g' -e 's/G/\*1073741824/g' | bc | head -$seconds > $plot_data1
grep $pool $all_data | awk '{printf("%s/1048576\\n",$7)}' | sed -e 's/K/\*1024/g' -e 's/M/\*1048576/g' -e 's/G/\*1073741824/g' | bc | head -$seconds > $plot_data2
gnuplot -persist <<EOF
set title "$pool - $dir"
set log y
plot "$plot_data1" using 1 title "read (MB/sec)" with lines, \\
"$plot_data2" using 1 title "write (MB/sec)" with lines
EOF
Graphing the IO throughput of the zpool named "jbod" using the command
"graph_iostat_svc_t.bash jbod NFS_client_10GbE 5" shows that zpool can deliver data at speeds of close to one gigabyte per second.
It is easy to modify the scripts above to graph the output of many tools that output a table of data in text format.
Nice writeup.
Thank you all the scripts which are ready to go.
Thanks, saved me from re-inventing the wheel!
Hello. I was having issues with our systems because we have so many devices. I found that devices that were named d0 would pick up all devices named d01 d02 d03 etc. Also any devices that are c2t0d1 will grab and plot c2t0d13 and c2t0d15 etc.
I also had an issue because I was creating so many plot_data in /tmp so i randomized it.
So I made a couple changes and here are the scripts.
------------------------------------
cat io_throughputMD.sh
#!/usr/bin/bash
# This script is used to parse "iostat -nmzxc" data which is formatted like this:
#
# extended device statistics
# r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
# 0.0 0.9 0.8 3.8 0.0 0.0 0.0 0.5 0 0 c0t1d0
# 0.0 0.0 0.0 0.0 0.0 0.0 0.0 2.4 0 0 sge_master:/opt/sge6-2/default/common
# 0.0 0.8 1.9 184.5 0.0 0.0 4.1 31.1 0 1 192.168.2.9:/jbod
if [ $# -lt 2 -o $# -gt 3 ] ; then
echo "Usage: $0 pattern dir [minutes]"
exit 1
fi
pattern=$1
dir=$2
(( minutes = 24 \* 60 )) #default: graph 1 day
if [ $# == 3 ] ; then
minutes=$3
fi
(( seconds = minutes \* 60 ))
all_data=$dir/iostat.out
plot_data=/tmp/plot_data`echo $RANDOM`
plot_dataf=/tmp/plot_dataf`echo $RANDOM`
if [ ! -r $all_data ] ; then
echo "can not read $all_data"
exit 1
fi
# For each time interval, either:
# print the trace for the device that matches the pattern, or
# print a "zero" trace if there is not one in the data file
# You can tell that there was no trace for the device during an
# interval if you reach the "extended device statistics" line
# without finding a trace
gawk -v pattern=$pattern '
$0 ~ pattern {
printf("%s\\n",$0);
found = 1 ;
}
/extended/ {
if (found == 0)
printf(" 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 192.168.2.9:/jbod \\n")
found = 0;
} ' $all_data | head -$seconds > $plot_data
grep "$pattern " $plot_data >$plot_dataf
gnuplot -persist <<EOF
set title "$pattern - $dir"
set grid
set ytics nomirror
set y2range [0:100]
set y2tics 0, 20
plot "$plot_dataf" using 3 title "read (kb/sec)" axis x1y1 with lines, \\
"$plot_dataf" using 4 title "write (kb/sec)" axis x1y1 with lines, \\
"$plot_dataf" using 9 title "%w" axis x1y2 smooth bezier with lines, \\
"$plot_dataf" using 10 title "%b" axis x1y2 smooth bezier with lines
EOF
-----------------------------
cat io_throughputDevice.sh
#!/usr/bin/bash
# This script is used to parse "iostat -nmzxc" data which is formatted like this:
#
# extended device statistics
# r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
# 0.0 0.9 0.8 3.8 0.0 0.0 0.0 0.5 0 0 c0t1d0
# 0.0 0.0 0.0 0.0 0.0 0.0 0.0 2.4 0 0 sge_master:/opt/sge6-2/default/common
# 0.0 0.8 1.9 184.5 0.0 0.0 4.1 31.1 0 1 192.168.2.9:/jbod
if [ $# -lt 2 -o $# -gt 3 ] ; then
echo "Usage: $0 pattern dir [minutes]"
exit 1
fi
pattern=$1
dir=$2
(( minutes = 24 \* 60 )) #default: graph 1 day
if [ $# == 3 ] ; then
minutes=$3
fi
(( seconds = minutes \* 60 ))
all_data=$dir/iostat.out
plot_data=/tmp/plot_data`echo $RANDOM`
plot_dataf=/tmp/plot_dataf`echo $RANDOM`
if [ ! -r $all_data ] ; then
echo "can not read $all_data"
exit 1
fi
# For each time interval, either:
# print the trace for the device that matches the pattern, or
# print a "zero" trace if there is not one in the data file
# You can tell that there was no trace for the device during an
# interval if you reach the "extended device statistics" line
# without finding a trace
gawk -v pattern=$pattern '
$0 ~ pattern {
printf("%s\\n",$0);
found = 1 ;
}
/extended/ {
if (found == 0)
printf(" 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 192.168.2.9:/jbod \\n")
found = 0;
} ' $all_data | head -$seconds > $plot_data
grep $pattern $plot_data |grep -v ${pattern}[0-9] > $plot_dataf
gnuplot -persist <<EOF
set title "$pattern - $dir"
set grid
set ytics nomirror
set y2range [0:100]
set y2tics 0, 20
plot "$plot_dataf" using 3 title "read (kb/sec)" axis x1y1 with lines, \\
"$plot_dataf" using 4 title "write (kb/sec)" axis x1y1 with lines, \\
"$plot_dataf" using 9 title "%w" axis x1y2 smooth bezier with lines, \\
"$plot_dataf" using 10 title "%b" axis x1y2 smooth bezier with lines
EOF