Thursday Apr 22, 2010

Vdbench file system testing WITHOUT using 'format=yes'

Just found a bug in Vdbench 5.02 around file system testing.

When not having Vdbench create all the files using the 'format=yes' parameter, but instead coding fileselect=sequential,fileio=sequential,operation=write in the File system Workload Definition (FWD) to simulate the format, Vdbench only creates the first 'threads=n' files and then overwrites them again and again. If you are in desperate need for a fix, let me know.

 Henk.

Vdbench and SSD alignment

These last months I have heard a lot about issues related to solid state devices not properly being aligned to the expected data transfer sizes. Each OS has its own way of creating volumes and partitions so trying to figure out if everything is neatly aligned is not an easy job. Add to that the possibility of the OS thinking everything is in order but alignment somewhere down the line not being accurate in one of the many possible layers of software when we have virtual volumes.

Without really being interested in the 'how to figure it all out and how to fix alignment issues' I created a small Vdbench parameter file that will allow you to at least figure out whether things are properly aligned or not. It revolves around the use of the Vdbench 'offset=' parameter that allows you to artificially change the alignment from Vdbench's point of view.

If your SSDs are on a storage subsystem that has a large cache, make sure that your volume is much larger than that cache. You rally need to make sure you are getting your data from the SSD, not from cache.

Henk:

hd=default,jvms=1
sd=default,th=32
sd=sd_0000,lun=/dev/rdsk/c7t0d0s4,offset=0000
sd=sd_0512,lun=/dev/rdsk/c7t0d0s4,offset=0512
sd=sd_1024,lun=/dev/rdsk/c7t0d0s4,offset=1024
sd=sd_1536,lun=/dev/rdsk/c7t0d0s4,offset=1536
sd=sd_2048,lun=/dev/rdsk/c7t0d0s4,offset=2048
sd=sd_2560,lun=/dev/rdsk/c7t0d0s4,offset=2560
sd=sd_3072,lun=/dev/rdsk/c7t0d0s4,offset=3072
sd=sd_3584,lun=/dev/rdsk/c7t0d0s4,offset=3584
sd=sd_4096,lun=/dev/rdsk/c7t0d0s4,offset=4096
wd=wd1,sd=sd_1,xf=4k,rdpct=100
rd=default,iorate=max,elapsed=60,interval=1,dist=d,wd=wd1
rd=rd_0000,sd=sd_0000
rd=rd_0512,sd=sd_0512
rd=rd_1024,sd=sd_1024
rd=rd_1536,sd=sd_1536
rd=rd_2048,sd=sd_2048
rd=rd_2560,sd=sd_2560
rd=rd_3072,sd=sd_3072
rd=rd_3584,sd=sd_3584
rd=rd_4096,sd=sd_4096


These are the 'avg' lines:

offset=0000   avg_2-3   19223.00    75.09    4096 100.00    1.580    2.803    0.231     1.1   0.9
offset=0512   avg_2-3    3655.50    14.28    4096 100.00    8.772    9.473    0.067     0.3   0.2
offset=1024   avg_2-3    3634.00    14.20    4096 100.00    8.784    9.390    0.064     0.3   0.2
offset=1536   avg_2-3    3633.00    14.19    4096 100.00    8.799    9.472    0.062     0.3   0.2
offset=2048   avg_2-3    3614.50    14.12    4096 100.00    8.831    9.440    0.066     0.3   0.2
offset=2560   avg_2-3    3604.00    14.08    4096 100.00    8.852    9.477    0.067     0.2   0.2
offset=3072   avg_2-3    3602.50    14.07    4096 100.00    8.853    9.430    0.059     0.3   0.2
offset=3584   avg_2-3    3597.50    14.05    4096 100.00    8.888    9.468    0.069     0.2   0.2
offset=4096   avg_2-3   20050.50    78.32    4096 100.00    1.584    2.811    0.231     1.0   0.9

As you can see, the runs with offset=0 and offset=4096 offer more than 5 times the throughput than the others. This tells me that this volume is properly aligned.
If for instance the run results would show that offset=512 has the best results the volume is on a 512 byte offset.
To then run properly 4k aligned tests with Vdbench, add to all your runs:
sd=default,offset=512
and Vdbench, after generating each lba, will always add 512.




Tuesday Apr 13, 2010

Swat Trace Facility (STF) memory problems during the Analyze phase.

java.lang.OutOfMemoryError 

STF during the Analyze phase keeps thirty seconds worth of I/O detail in memory. That is done so that when a trace probe of an I/O completion is found STF can still find the I/O start probe if that occurred less than thirty seconds earlier. Believe me, in some very difficult customer problem scenarios I have seen I/O that have taken that long.

If you run 5000 iops, keeping thirty seconds worth of detailed information in memory is not a real problem. Even 10k or 20k will work fine.

And now we have solid state devices, running 100k, 200k iops and up. Just do the math, and you know you will run into memory problems.

There is an undocumented option in STF to lower this 30-second value. In STF, go to the Settings tab, click on ‘batch_prm’, enter ‘-a5’, click ‘Save’, and run the Analyze phase again. Of course, -a2 works fine too. Note that any I/O that takes longer than the new ‘age’ value you specified will not be recognized by STF, but usually two seconds should just be fine.

Henk.


Thursday Apr 01, 2010

Vdbench and multipath I/O

A question just asked: does Vdbench support multipath I/O?

Vdbench tries to be "just any other user application". To make sure that Vdbench is portable across multiple platforms I decided to depend on the OS for doing i/o. All I need from the OS is a working open/close/read/write() function and I am happy.
Just imagine writing a simple customer application, my favorite (and I am sure yours too): the payroll application. Would you as a programmer have to worry about HOW things get read or written? Those days are over when the application had to know and handle this level of detail.
That's now the role of the OS.

So, Vdbench is not multi pathing aware whatsoever. It's the OS's responsibility.

Henk.

AIX shared libraries for Vdbench 5.02

Thank you IBM!

Please place https://blogs.oracle.com/henk/resource/502/aix-32.so  and http://blogs.oracle.com/henk/resource/502/aix-64.so in your /vdbench502/aix/ directory.


Henk

Heartbeat timeouts during Vdbench startup

One of the most complex things in Vdbench (and Swat also) on Solaris is trying to translate  file names, volume names and device numbers to the proper Kstat instance names. To accomplish that I use output of iostat and the ls /dev/rdsk command, and match and merge those outputs. Never realizing how bad that could be I did not bother with efficiency, so I do numerous sequential lookups in both iostat and ls output. And then of course I am finding iostat outputs with several thousand lines, and then ls output of course with at least that many. At times it can take over two minutes to do all this matching and merging, causing heartbeat timeouts during Vdbench startup. Add -d27 as an extra execution parameter to change the default two minutes heartbeat timeout to 15 minutes.

The next release of Vdbench will have a fix for this problem by creating hash tables for all the ls and iostat output.

Henk.

Tuesday Mar 16, 2010

MAC OS X shared library for Vdbench5.02

Here is a copy of the MAC shared library for Vdbench 5.02. Download this file and place it in your /vdbench502/mac/ directory.

Henk.

Monday Mar 08, 2010

Vdbench Replay run not terminating properly

The i/o collected by an i/o trace created by Sun StorageTek Workload Analysis Tool (Swat), can be replayed using Vdbench. I just noticed a problem that when you give Vdbench more Storage Definitions (SDs) than needed, Vdbench does not terminate after the last i/o has been replayed. Each SD in Vdbench keeps track of when it executes its last i/o and then checks with all other SDs to see if they also are all done. If one or more of the SDs never is used this 'end-of-run' checking is not done, causing Vdbench to just wait until the elapsed= time is reached.

To correct this, remove the last 'n' SDs. A quick way to see if an SD is used is by looking at the SD reports. If all interval counters are zero then you know it has not been used.

Henk.

Thursday Feb 11, 2010

Fixes for Swat 3.02

There are two little annoying bugs in swat 3.02:

- When using the 'swat=' parameter in a Vdbench parameter file, Swat runs into a date parsing error. (java.text.ParseException: Unparseable date)

- The Workload Visualizer in Swat Trace Facility (STF) does not work.

Download http://blogs.sun.com/henk/resource/fixes/swat302fix1.jar, and copy it as 'swat.jar' in your Swat installation directory.

 

Henk.

 

Vdbench 5.02 now available for download

Vdbench 5.02 contains numerous large and small enhancements.

Highlights:
- Data Validation for file system testing. Vdbench for years has had a huge success running Data Validation against raw disks or files. Release 5.02 now introduces the same powerful Data Validation functionality for its file system testing.
- Numerous enhancements to deal with multi-client file system performance testing
- A Data Validation post-processing GUI, giving you a quick understand about what data is corrupted and (possibly) why.

For more detail, see http://blogs.sun.com/henk/resource/stuff/vdbench502_notes.html

Questions or problems? Contact me at vdbench@sun.com

 

Henk.

Tuesday Jan 26, 2010

Vdbench: dangerous use of stopafter=100, possibly inflating throughput results.

In short: doing random I/O against very small files can inflate your throughput numbers.

When doing random I/O against a file using File system Workload Definition (FWD) parameters Vdbench needs to know when to stop using the currently selected file.
The ‘stopafter=100’ parameter (default 100) tells Vdbench to stop after 100 blocks. For Vdbench 5.02 you can also specify ‘stopafter=nn%’, or ‘nn %’ of the size of the file.

This all works great, but here’s the catch: if your file size is very small, for instance just 8k, the default stopafter=100 value will cause the same block to be read 100 times.

The stopafter= parameter was really only meant for large files, and this side effect was not anticipated.

Solution:
For Vdbench 5.01, change ‘stopafter=’ to a value that matches the file size. ‘stopafter=’ allows for only one fixed value so if you have multiple different file sizes this won’t work for you.
For Vdbench502 (beta), use stopafter=100%. This makes sure that you never read or write more blocks than that the file contains.
I will modify 502 as soon as possible to change the default value to be no more than the current file size.

Note: 5.02 is currently only available (in beta) internally at Sun/Oracle.

Henk.

Monday Dec 21, 2009

Shared library available for AIX 32 and 64bit

For 32 bit java, download libvdbench.so.32 and place it in /vdbench501fix1/aix/libvdbench.so

For 64 bit java, download libvdbench.so.64  and place it in /vdbench501fix1/aix/libvdbench.so

Henk.

Monday Dec 14, 2009

Vdbench and concurrent sequential workloads.

A sequential workload for me is the sequential reading of blocks 1,2,3,4,5 etc.

Running concurrent sequential workloads against the same lun or file then will result in reading blocks 1,1,2,2,3,3,4,4,5,5 etc, something that I have considered incorrect since day one of Vdbench.

When spreading out random I/O workloads across multiple Vdbench slaves, I allow each slave to do some of the random work. For sequential workloads however, the above issue forces me to make sure that only one slave receives a sequential workload. This is all transparent to the user.

That all has worked fine, until last Friday I received an email about the following Vdbench abort message: “rd=rd1,wd=wd2 not used. Could it be that more hosts have been requested than there are threads?”

It took me a while to figure this one out, until it became clear that this was caused by making sure that a sequential workload does not run more than once across slaves. In this case there were two different sequential workloads however that were specifically requested to run against the same device, one to read and one to write. The result was that Vdbench ignored the second workload without notifying the user. This was not a case of not spreading out the same workload across slaves, but instead there were two different sequential workloads.

Somewhere in the bowels of the Vdbench code is a check to make sure that I did not lose track of one or more workloads (believe me, it can get complex allowing multiple concurrent different workloads to run across different slaves and/or hosts). This code noticed that the second workload was not included at all. Therefore the “wd2 not used” abort.

So how to get around this if you still really want to do this? The code above only looks at a 100% sequential workload (seekpct=0, seekpct=seq, or seekpct=eof). By specifying for instance seekpct=1 you can tell Vdbench to generate a new random lba on average each 1% (one in a hundred) of the I/O generated. Then, on average again, 100 blocks will be sequentially read or written. Specify seekpct=0.01 and a new random lba will be generated only every 10,000 I/O’s. This should suffice without changing the Vdbench logic.

Henk

Thursday Nov 19, 2009

/var/adm/messages in Vdbench on Solaris

Vdbench on Solaris scans /var/adm/messages every 5 seconds to see if any new messages have been generated. Just in case the new message is related to the testing (for instance scsi timeouts), Vdbench displays the new message.

Frequently this message is not related to the Vdbench run so it only pollutes your terminal window.

To suppress the message display, add '-d25' as an execution parameter or 'debug=25' at the top of your parameter file. Realize though that IF there is an important message you won't see it. When you use this option the message instead will be written to your 'localhost-0.stdout.html' file.

Henk.

Wednesday Nov 18, 2009

'patterns=' parameter no longer works

The 'patterns=' parameter allows you to tell Vdbench what data pattern to write on your storage.

Alas, I just noticed that this no longer works since the LFSR rewrite in Vdbench 5.01. I am not sure if it is worth it to put in any effort to correct this. To be honest, it is unlikely anyone is using this option, I would have expected a question about this by now.

Henk.

About

Blog for Henk Vandenbergh, author of Vdbench, and Sun StorageTek Workload Analysis Tool (Swat). This blog is used to keep you up to date about anything revolving around Swat and Vdbench.

Search

Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today