Vdbench 5.02 contains numerous large and small enhancements.
Highlights: - Data Validation for file system testing. Vdbench for years has had a huge success running Data Validation against raw disks or files. Release 5.02 now introduces the same powerful Data Validation functionality for its file system testing. - Numerous enhancements to deal with multi-client file system performance testing - A Data Validation post-processing GUI, giving you a quick understand about what data is corrupted and (possibly) why.
The objective of Vdbench is to measure storage performance. When you save the performance information generated by Vdbench for future use it can happen that 6 months down the road you ask your self: "what were the details of the system status at the time of this run"? Vdbench on Solaris each time runs the 'config.sh' script distributed in the /solaris/ or /solx86/ sub directory. This script includes commands like prtdiag and cfgadm; the output is stored in file 'config.html'
With systems getting larger and more complex every day, these commands can take quite a while to complete and can some times take 30-60 seconds, delaying the actual starting of Vdbench.
If you do not care about recording this data create a file named 'noconfig' in your Vdbench install directory, and Vdbench from that point on will bypass running 'config.sh'.
In short: doing random I/O against very small
files can inflate
your throughput numbers.
When doing random I/O against a file using File
Workload Definition (FWD) parameters Vdbench needs to know when to stop
the currently selected file.
The ‘stopafter=100’ parameter (default 100) tells Vdbench to
stop after 100 blocks. For Vdbench 5.02 you can also specify ‘stopafter=nn%’,
or ‘nn %’ of the size of the file.
This all works great, but here’s the catch: if
size is very small, for instance just 8k, the default stopafter=100
cause the same block to be read 100 times.
The stopafter= parameter was really only meant for
files, and this side effect was not anticipated.
For Vdbench 5.01, change ‘stopafter=’ to a value that
matches the file size. ‘stopafter=’ allows for only one fixed value so
have multiple different file sizes this won’t work for you.
For Vdbench502 (beta), use stopafter=100%. This makes sure
that you never read or write more blocks than that the file contains. I will modify 502 as
soon as possible to change the default value to be no more than the
Note: 5.02 is currently only available (in beta)
A sequential workload for me is the sequential reading of
blocks 1,2,3,4,5 etc.
Running concurrent sequential workloads against the same lun
or file then will result in reading blocks 1,1,2,2,3,3,4,4,5,5 etc, something
that I have considered incorrect since day one of Vdbench.
When spreading out random I/O workloads across multiple Vdbench
slaves, I allow each slave to do some of the random work. For sequential
workloads however, the above issue forces me to make sure that only one slave
receives a sequential workload. This is all transparent to the user.
That all has worked fine, until last Friday I received an email
about the following Vdbench abort message: “rd=rd1,wd=wd2 not used. Could it
be that more hosts have been requested than there are threads?”
It took me a while to figure this one out, until it became
clear that this was caused by making sure that a sequential workload does not
run more than once across slaves. In this case there were two different sequential
workloads however that were specifically requested to run against the same
device, one to read and one to write. The result was that Vdbench ignored the
second workload without notifying the user. This was not a case of not
spreading out the same workload across slaves, but instead there were two different
Somewhere in the bowels of the Vdbench code is a check to
make sure that I did not lose track of one or more workloads (believe me, it can
get complex allowing multiple concurrent different workloads to run across
different slaves and/or hosts). This code noticed that the second workload was
not included at all. Therefore the “wd2 not used” abort.
So how to get around this if you still really want to do
this? The code above only looks at a 100% sequential workload (seekpct=0,
seekpct=seq, or seekpct=eof). By specifying for instance seekpct=1 you can tell Vdbench
to generate a new random lba on average each 1% (one in a hundred) of the I/O generated.
Then, on average again, 100 blocks will be sequentially read or written. Specify seekpct=0.01 and a new random lba will be generated only every 10,000 I/O’s.
This should suffice without changing the Vdbench logic.
Vdbench on Solaris scans /var/adm/messages every 5 seconds to see if any new messages have been generated. Just in case the new message is related to the testing (for instance scsi timeouts), Vdbench displays the new message.
Frequently this message is not related to the Vdbench run so it only pollutes your terminal window.
To suppress the message display, add '-d25' as an execution parameter or 'debug=25' at the top of your parameter file. Realize though that IF there is an important message you won't see it. When you use this option the message instead will be written to your 'localhost-0.stdout.html' file.
The 'patterns=' parameter allows you to tell Vdbench what data pattern to write on your storage.
Alas, I just noticed that this no longer works since the LFSR rewrite in Vdbench 5.01. I am not sure if it is worth it to put in any effort to correct this. To be honest, it is unlikely anyone is using this option, I would have expected a question about this by now.
There are two sets of 'start_cmd=' and 'end_cmd=' parameters. One set is used as a 'general' parameter, allowing these commands to be executed at the start and at the end of an execution. The other set is a sub parameter of a Run Definition (RD), allowing these commands to be executed at the start and at the end of each run (RD). Francois just notified me that the former 'end_cmd=' command is not executed at the end of a Vdbench execution, but instead at the end of each RD.
To fix this, place 'endcmd.tar' in your Vdbench install directory, untar (tar -xvf endcmd.tar) and you now should have a new directory and file: vdbench/classes/Vdb/Reporter.class.
Just noticed a few days ago that the code attempting to translate the GetLastError() error codes returned by Windows reads and writes disappeared. Vdbench therefore now puts a Unix error text with a Windows error code and that's pretty confusing.
Two new problems were discovered last week, both resulting in a NullPointerException.
With the first one, four hosts were defined, but only three were used.Vdbench was trying to print a 'rd started' message on the device reports for that host, but of course no devices were busy. Just remove the unused host to get around this problem.
The second one was caused by the way Vdbench calculates the amount of JVM's needed. Normally that is one JVM per SD per 5000 iops, with a max of eight. This run had 4 SDs per host for a total of 16. Vdbench should have taken multi-host into consideration here, but did not. The resulting eight JVMs per host then caused the first JVM (slave 0) on a host not to have any work to do. Since slave 0 on each host is responsible for collection Kstat statistics and slave 0 was idle, no Kstat data was returned causing the NULLPointerException. You can work around this problem by either specifying '-m4' as an execution parameter, or adding 'hd=default,jvms=4' as you first Host Definition.
After I sent out Vdbench 5.01 some interesting problems were discovered around Data Validation and Journaling. See the release notes on vdbench.org.
I also added the new Vdbench '-print' function that was accidentally dropped during the 501 build.
Enter './vdbench -print device lba xfersize', and the data block on logical byte address 'lba' from device or file 'device' for a length of 'xfersize' will be printed. This new function is very useful when analyzing data corruption issues identified by Vdbench.