Thursday Feb 11, 2010

Fixes for Swat 3.02

There are two little annoying bugs in swat 3.02:

- When using the 'swat=' parameter in a Vdbench parameter file, Swat runs into a date parsing error. (java.text.ParseException: Unparseable date)

- The Workload Visualizer in Swat Trace Facility (STF) does not work.

Download http://blogs.sun.com/henk/resource/fixes/swat302fix1.jar, and copy it as 'swat.jar' in your Swat installation directory.

 

Henk.

 

Vdbench 5.02 now available for download

Vdbench 5.02 contains numerous large and small enhancements.

Highlights:
- Data Validation for file system testing. Vdbench for years has had a huge success running Data Validation against raw disks or files. Release 5.02 now introduces the same powerful Data Validation functionality for its file system testing.
- Numerous enhancements to deal with multi-client file system performance testing
- A Data Validation post-processing GUI, giving you a quick understand about what data is corrupted and (possibly) why.

For more detail, see http://blogs.sun.com/henk/resource/stuff/vdbench502_notes.html

Questions or problems? Contact me at vdbench@sun.com

 

Henk.

Friday Jan 29, 2010

Vdbench running prtdiag, cfgadm, etc, slowing down vdbench startup

The objective of Vdbench is to measure storage performance. When you save the performance information generated by Vdbench for future use it can happen that 6 months down the road you ask your self: "what were the details of the system status at the time of this run"?  Vdbench on Solaris each time runs the 'config.sh' script distributed in the /solaris/ or /solx86/ sub directory. This script includes commands like prtdiag and cfgadm; the output is stored in file 'config.html'

With systems getting larger and more complex every day, these commands can take quite a while to complete and can some times take 30-60 seconds, delaying the actual starting of Vdbench.

If you do not care about recording this data create a file named 'noconfig' in your Vdbench install directory, and Vdbench from that point on will bypass running 'config.sh'.

Henk.

Tuesday Jan 26, 2010

Vdbench: dangerous use of stopafter=100, possibly inflating throughput results.

In short: doing random I/O against very small files can inflate your throughput numbers.

When doing random I/O against a file using File system Workload Definition (FWD) parameters Vdbench needs to know when to stop using the currently selected file.
The ‘stopafter=100’ parameter (default 100) tells Vdbench to stop after 100 blocks. For Vdbench 5.02 you can also specify ‘stopafter=nn%’, or ‘nn %’ of the size of the file.

This all works great, but here’s the catch: if your file size is very small, for instance just 8k, the default stopafter=100 value will cause the same block to be read 100 times.

The stopafter= parameter was really only meant for large files, and this side effect was not anticipated.

Solution:
For Vdbench 5.01, change ‘stopafter=’ to a value that matches the file size. ‘stopafter=’ allows for only one fixed value so if you have multiple different file sizes this won’t work for you.
For Vdbench502 (beta), use stopafter=100%. This makes sure that you never read or write more blocks than that the file contains.
I will modify 502 as soon as possible to change the default value to be no more than the current file size.

Note: 5.02 is currently only available (in beta) internally at Sun/Oracle.

Henk.

Monday Dec 21, 2009

Shared library available for AIX 32 and 64bit

For 32 bit java, download libvdbench.so.32 and place it in /vdbench501fix1/aix/libvdbench.so

For 64 bit java, download libvdbench.so.64  and place it in /vdbench501fix1/aix/libvdbench.so

Henk.

Shared library available for HP/UX (Itanium)

Download this file for Itanium. Place it in /vdbench501fix1/hp/libvdbench.sl

Henk

Monday Dec 14, 2009

Vdbench and concurrent sequential workloads.

A sequential workload for me is the sequential reading of blocks 1,2,3,4,5 etc.

Running concurrent sequential workloads against the same lun or file then will result in reading blocks 1,1,2,2,3,3,4,4,5,5 etc, something that I have considered incorrect since day one of Vdbench.

When spreading out random I/O workloads across multiple Vdbench slaves, I allow each slave to do some of the random work. For sequential workloads however, the above issue forces me to make sure that only one slave receives a sequential workload. This is all transparent to the user.

That all has worked fine, until last Friday I received an email about the following Vdbench abort message: “rd=rd1,wd=wd2 not used. Could it be that more hosts have been requested than there are threads?”

It took me a while to figure this one out, until it became clear that this was caused by making sure that a sequential workload does not run more than once across slaves. In this case there were two different sequential workloads however that were specifically requested to run against the same device, one to read and one to write. The result was that Vdbench ignored the second workload without notifying the user. This was not a case of not spreading out the same workload across slaves, but instead there were two different sequential workloads.

Somewhere in the bowels of the Vdbench code is a check to make sure that I did not lose track of one or more workloads (believe me, it can get complex allowing multiple concurrent different workloads to run across different slaves and/or hosts). This code noticed that the second workload was not included at all. Therefore the “wd2 not used” abort.

So how to get around this if you still really want to do this? The code above only looks at a 100% sequential workload (seekpct=0, seekpct=seq, or seekpct=eof). By specifying for instance seekpct=1 you can tell Vdbench to generate a new random lba on average each 1% (one in a hundred) of the I/O generated. Then, on average again, 100 blocks will be sequentially read or written. Specify seekpct=0.01 and a new random lba will be generated only every 10,000 I/O’s. This should suffice without changing the Vdbench logic.

Henk

Thursday Nov 19, 2009

/var/adm/messages in Vdbench on Solaris

Vdbench on Solaris scans /var/adm/messages every 5 seconds to see if any new messages have been generated. Just in case the new message is related to the testing (for instance scsi timeouts), Vdbench displays the new message.

Frequently this message is not related to the Vdbench run so it only pollutes your terminal window.

To suppress the message display, add '-d25' as an execution parameter or 'debug=25' at the top of your parameter file. Realize though that IF there is an important message you won't see it. When you use this option the message instead will be written to your 'localhost-0.stdout.html' file.

Henk.

Wednesday Nov 18, 2009

'patterns=' parameter no longer works

The 'patterns=' parameter allows you to tell Vdbench what data pattern to write on your storage.

Alas, I just noticed that this no longer works since the LFSR rewrite in Vdbench 5.01. I am not sure if it is worth it to put in any effort to correct this. To be honest, it is unlikely anyone is using this option, I would have expected a question about this by now.

Henk.

Thursday Oct 22, 2009

Fix for NullPointerException in Vdb.Report.reportKstatDetail

For vdbench501fix1. In an earlier blog I mentioned a problem with Vdbench spreading the requested work over its available JVMs: http://blogs.sun.com/henk/entry/nullpointerexception_running_multi_host_vdbench501.

Here is a fix: Download  seqjvm.tar, place it in the Vdbench install directory. Then untar (tar -xvf seqjvm.tar) and you'll have a new directory and file: /vdbenchxx/classes/Vdb/RD_entry.class

Henk.

Wednesday Oct 21, 2009

'end_cmd=' parameter executed too often

There are two sets of 'start_cmd=' and 'end_cmd=' parameters. One set is used as a 'general' parameter, allowing these commands to be executed at the start and at the end of an execution. The other set is a sub parameter of a Run Definition (RD), allowing these commands to be executed at the start and at the end of each run (RD). Francois just notified me that the former 'end_cmd=' command is not executed at the end of a Vdbench execution, but instead at the end of each RD.

To fix this, place 'endcmd.tar' in your Vdbench install directory, untar (tar -xvf endcmd.tar) and you now should have a new directory and file: vdbench/classes/Vdb/Reporter.class.

Henk.

Tuesday Oct 06, 2009

Vdbench 5.01 reporting incorrect error code text for Windows

Just noticed a few days ago that the code attempting to translate the GetLastError() error codes returned by Windows reads and writes disappeared. Vdbench therefore now puts a Unix error text with a Windows error code and that's pretty confusing.

Until I fix this, just look up your errors on http://msdn.microsoft.com/en-us/library/ms681381%28VS.85%29.aspx

Monday Sep 28, 2009

NullPointerException running Multi-host Vdbench501

Two new problems were discovered last week, both resulting in a NullPointerException.

With the first one, four hosts were defined, but only three were used.Vdbench was trying to print a 'rd started' message on the device reports for that host, but of course no devices were busy. Just remove the unused host to get around this problem.

The second one was caused by the way Vdbench calculates the amount of JVM's needed. Normally that is one JVM per SD per 5000 iops, with a max of eight. This run had 4 SDs per host for a total of 16. Vdbench should have taken multi-host into consideration here, but did not. The resulting eight JVMs per host then caused the first JVM (slave 0) on a host not to have any work to do. Since slave 0 on each host is responsible for collection Kstat statistics and slave 0 was idle, no Kstat data was returned causing the NULLPointerException. You can work around this problem by either specifying '-m4' as an execution parameter, or adding 'hd=default,jvms=4' as you first Host Definition.

Henk.

Tuesday Sep 15, 2009

New Version of Vdbench: vdbench501fix1

After I sent out Vdbench 5.01 some interesting problems were discovered around Data Validation and Journaling. See the release notes on vdbench.org.

I also added the new Vdbench '-print' function that was accidentally dropped during the 501 build.

Enter './vdbench -print device lba xfersize', and the data block on logical byte address 'lba' from device or file 'device' for a length of 'xfersize' will be printed. This new function is very useful when analyzing data corruption issues identified by Vdbench.

Henk.

Wednesday Sep 09, 2009

Fix for 'STF trace start not working' in swat 302

The 'start trace' function on STF for swat 3.02 does not work.

Symptoms: 'invalid parameter: s' or 'Parameter scan error'.

Just put this file into your /swat/ directory, run 'tar -xvf stf_start.tar', and you're done.


About

Blog for Henk Vandenbergh, author of Vdbench, and Sun StorageTek Workload Analysis Tool (Swat). This blog is used to keep you up to date about anything revolving around Swat and Vdbench.

Search

Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today