Thursday Apr 22, 2010

Vdbench and SSD alignment, continued.

Of course, it took only a few minutes before someone asked 'how can I run this against files, not volumes'. Here is the response:

Just change the lun to a file name (use the same file name each  time) and add a size.
Vdbench then will first create the file for you.
The problem will be that you need to make sure Vdbench will not read from file system or file server cache, so the file size must be at least 5 times the system's cache size,
Unless of course you mount stuff directio, but then you still have the file server cache to deal with.
Just take your time and create a large file (I am using 100g). Vdbench will automatically create it for you.

BTW: the elapsed time the elapsed time needs to be long enough to make sure you get away from cache. I set it here to 60 seconds, which should be a good start..

Henk.

hd=default,jvms=1
sd=default,th=32
sd=default,size=100g
sd=sd_0000,lun=/dir/filename,offset=0000
sd=sd_0512,lun=/dir/filename,offset=0512
sd=sd_1024,lun=/dir/filename,offset=1024
sd=sd_1536,lun=/dir/filename,offset=1536
sd=sd_2048,lun=/dir/filename,offset=2048
sd=sd_2560,lun=/dir/filename,offset=2560
sd=sd_3072,lun=/dir/filename,offset=3072
sd=sd_3584,lun=/dir/filename,offset=3584
sd=sd_4096,lun=/dir/filename,offset=4096
wd=wd1,sd=sd_1,xf=4k,rdpct=100
rd=default,iorate=max,elapsed=60,interval=1,dist=d,wd=wd1
rd=rd_0000,sd=sd_0000
rd=rd_0512,sd=sd_0512
rd=rd_1024,sd=sd_1024
rd=rd_1536,sd=sd_1536
rd=rd_2048,sd=sd_2048
rd=rd_2560,sd=sd_2560
rd=rd_3072,sd=sd_3072
rd=rd_3584,sd=sd_3584
rd=rd_4096,sd=sd_4096


Vdbench file system testing WITHOUT using 'format=yes'

Just found a bug in Vdbench 5.02 around file system testing.

When not having Vdbench create all the files using the 'format=yes' parameter, but instead coding fileselect=sequential,fileio=sequential,operation=write in the File system Workload Definition (FWD) to simulate the format, Vdbench only creates the first 'threads=n' files and then overwrites them again and again. If you are in desperate need for a fix, let me know.

 Henk.

Vdbench and SSD alignment

These last months I have heard a lot about issues related to solid state devices not properly being aligned to the expected data transfer sizes. Each OS has its own way of creating volumes and partitions so trying to figure out if everything is neatly aligned is not an easy job. Add to that the possibility of the OS thinking everything is in order but alignment somewhere down the line not being accurate in one of the many possible layers of software when we have virtual volumes.

Without really being interested in the 'how to figure it all out and how to fix alignment issues' I created a small Vdbench parameter file that will allow you to at least figure out whether things are properly aligned or not. It revolves around the use of the Vdbench 'offset=' parameter that allows you to artificially change the alignment from Vdbench's point of view.

If your SSDs are on a storage subsystem that has a large cache, make sure that your volume is much larger than that cache. You rally need to make sure you are getting your data from the SSD, not from cache.

Henk:

hd=default,jvms=1
sd=default,th=32
sd=sd_0000,lun=/dev/rdsk/c7t0d0s4,offset=0000
sd=sd_0512,lun=/dev/rdsk/c7t0d0s4,offset=0512
sd=sd_1024,lun=/dev/rdsk/c7t0d0s4,offset=1024
sd=sd_1536,lun=/dev/rdsk/c7t0d0s4,offset=1536
sd=sd_2048,lun=/dev/rdsk/c7t0d0s4,offset=2048
sd=sd_2560,lun=/dev/rdsk/c7t0d0s4,offset=2560
sd=sd_3072,lun=/dev/rdsk/c7t0d0s4,offset=3072
sd=sd_3584,lun=/dev/rdsk/c7t0d0s4,offset=3584
sd=sd_4096,lun=/dev/rdsk/c7t0d0s4,offset=4096
wd=wd1,sd=sd_1,xf=4k,rdpct=100
rd=default,iorate=max,elapsed=60,interval=1,dist=d,wd=wd1
rd=rd_0000,sd=sd_0000
rd=rd_0512,sd=sd_0512
rd=rd_1024,sd=sd_1024
rd=rd_1536,sd=sd_1536
rd=rd_2048,sd=sd_2048
rd=rd_2560,sd=sd_2560
rd=rd_3072,sd=sd_3072
rd=rd_3584,sd=sd_3584
rd=rd_4096,sd=sd_4096


These are the 'avg' lines:

offset=0000   avg_2-3   19223.00    75.09    4096 100.00    1.580    2.803    0.231     1.1   0.9
offset=0512   avg_2-3    3655.50    14.28    4096 100.00    8.772    9.473    0.067     0.3   0.2
offset=1024   avg_2-3    3634.00    14.20    4096 100.00    8.784    9.390    0.064     0.3   0.2
offset=1536   avg_2-3    3633.00    14.19    4096 100.00    8.799    9.472    0.062     0.3   0.2
offset=2048   avg_2-3    3614.50    14.12    4096 100.00    8.831    9.440    0.066     0.3   0.2
offset=2560   avg_2-3    3604.00    14.08    4096 100.00    8.852    9.477    0.067     0.2   0.2
offset=3072   avg_2-3    3602.50    14.07    4096 100.00    8.853    9.430    0.059     0.3   0.2
offset=3584   avg_2-3    3597.50    14.05    4096 100.00    8.888    9.468    0.069     0.2   0.2
offset=4096   avg_2-3   20050.50    78.32    4096 100.00    1.584    2.811    0.231     1.0   0.9

As you can see, the runs with offset=0 and offset=4096 offer more than 5 times the throughput than the others. This tells me that this volume is properly aligned.
If for instance the run results would show that offset=512 has the best results the volume is on a 512 byte offset.
To then run properly 4k aligned tests with Vdbench, add to all your runs:
sd=default,offset=512
and Vdbench, after generating each lba, will always add 512.




Tuesday Apr 13, 2010

Swat Trace Facility (STF) memory problems during the Analyze phase.

java.lang.OutOfMemoryError 

STF during the Analyze phase keeps thirty seconds worth of I/O detail in memory. That is done so that when a trace probe of an I/O completion is found STF can still find the I/O start probe if that occurred less than thirty seconds earlier. Believe me, in some very difficult customer problem scenarios I have seen I/O that have taken that long.

If you run 5000 iops, keeping thirty seconds worth of detailed information in memory is not a real problem. Even 10k or 20k will work fine.

And now we have solid state devices, running 100k, 200k iops and up. Just do the math, and you know you will run into memory problems.

There is an undocumented option in STF to lower this 30-second value. In STF, go to the Settings tab, click on ‘batch_prm’, enter ‘-a5’, click ‘Save’, and run the Analyze phase again. Of course, -a2 works fine too. Note that any I/O that takes longer than the new ‘age’ value you specified will not be recognized by STF, but usually two seconds should just be fine.

Henk.


Thursday Apr 01, 2010

Vdbench and multipath I/O

A question just asked: does Vdbench support multipath I/O?

Vdbench tries to be "just any other user application". To make sure that Vdbench is portable across multiple platforms I decided to depend on the OS for doing i/o. All I need from the OS is a working open/close/read/write() function and I am happy.
Just imagine writing a simple customer application, my favorite (and I am sure yours too): the payroll application. Would you as a programmer have to worry about HOW things get read or written? Those days are over when the application had to know and handle this level of detail.
That's now the role of the OS.

So, Vdbench is not multi pathing aware whatsoever. It's the OS's responsibility.

Henk.

AIX shared libraries for Vdbench 5.02

Thank you IBM!

Please place https://blogs.oracle.com/henk/resource/502/aix-32.so  and http://blogs.oracle.com/henk/resource/502/aix-64.so in your /vdbench502/aix/ directory.


Henk

Heartbeat timeouts during Vdbench startup

One of the most complex things in Vdbench (and Swat also) on Solaris is trying to translate  file names, volume names and device numbers to the proper Kstat instance names. To accomplish that I use output of iostat and the ls /dev/rdsk command, and match and merge those outputs. Never realizing how bad that could be I did not bother with efficiency, so I do numerous sequential lookups in both iostat and ls output. And then of course I am finding iostat outputs with several thousand lines, and then ls output of course with at least that many. At times it can take over two minutes to do all this matching and merging, causing heartbeat timeouts during Vdbench startup. Add -d27 as an extra execution parameter to change the default two minutes heartbeat timeout to 15 minutes.

The next release of Vdbench will have a fix for this problem by creating hash tables for all the ls and iostat output.

Henk.

Tuesday Mar 16, 2010

MAC OS X shared library for Vdbench5.02

Here is a copy of the MAC shared library for Vdbench 5.02. Download this file and place it in your /vdbench502/mac/ directory.

Henk.

Monday Mar 08, 2010

Vdbench Replay run not terminating properly

The i/o collected by an i/o trace created by Sun StorageTek Workload Analysis Tool (Swat), can be replayed using Vdbench. I just noticed a problem that when you give Vdbench more Storage Definitions (SDs) than needed, Vdbench does not terminate after the last i/o has been replayed. Each SD in Vdbench keeps track of when it executes its last i/o and then checks with all other SDs to see if they also are all done. If one or more of the SDs never is used this 'end-of-run' checking is not done, causing Vdbench to just wait until the elapsed= time is reached.

To correct this, remove the last 'n' SDs. A quick way to see if an SD is used is by looking at the SD reports. If all interval counters are zero then you know it has not been used.

Henk.

Thursday Feb 11, 2010

Vdbench, problems with the patterns= parameter

I have written a blog entry about problems with the patterns= parameter before, even mentioning that I may no longer support it. I have concluded since, that I need to continue supporting it though in a different format than currently, where you (in older versions) could specify 127 different data patterns.

In Vdbench 5.01 and and 5.02 (brandnew), patterns= works as follows: patterns=/pattern/dir where file name '/pattern/dir/default' gets picked up, and its contents stored in the data buffer used for writing data. That works.

However, (and these things always happen when it is too late) a few hours after I did the last build of Vdbench 5.02 I realized that yes, I put the pattern in the buffer, but I use the same buffer for reading which means that if you have a mixed read/write workload your data pattern can be overlaid by whatever data is on your disk. Since the pattern is copied only once into your buffer all new writes will NOT contain this pattern. So, until I fix this, if you want a fixed pattern to be written, do not include reads in your test.

In normal operations I use only a single data buffer, both for reads and writes. This is done to save on the amount of memory needed during the run. Loads of luns \* Loads of threads = Loads of memory. This now needs to change when using specific data patterns.

 

Henk.

Fixes for Swat 3.02

There are two little annoying bugs in swat 3.02:

- When using the 'swat=' parameter in a Vdbench parameter file, Swat runs into a date parsing error. (java.text.ParseException: Unparseable date)

- The Workload Visualizer in Swat Trace Facility (STF) does not work.

Download http://blogs.sun.com/henk/resource/fixes/swat302fix1.jar, and copy it as 'swat.jar' in your Swat installation directory.

 

Henk.

 

Vdbench 5.02 now available for download

Vdbench 5.02 contains numerous large and small enhancements.

Highlights:
- Data Validation for file system testing. Vdbench for years has had a huge success running Data Validation against raw disks or files. Release 5.02 now introduces the same powerful Data Validation functionality for its file system testing.
- Numerous enhancements to deal with multi-client file system performance testing
- A Data Validation post-processing GUI, giving you a quick understand about what data is corrupted and (possibly) why.

For more detail, see http://blogs.sun.com/henk/resource/stuff/vdbench502_notes.html

Questions or problems? Contact me at vdbench@sun.com

 

Henk.

Friday Jan 29, 2010

Vdbench running prtdiag, cfgadm, etc, slowing down vdbench startup

The objective of Vdbench is to measure storage performance. When you save the performance information generated by Vdbench for future use it can happen that 6 months down the road you ask your self: "what were the details of the system status at the time of this run"?  Vdbench on Solaris each time runs the 'config.sh' script distributed in the /solaris/ or /solx86/ sub directory. This script includes commands like prtdiag and cfgadm; the output is stored in file 'config.html'

With systems getting larger and more complex every day, these commands can take quite a while to complete and can some times take 30-60 seconds, delaying the actual starting of Vdbench.

If you do not care about recording this data create a file named 'noconfig' in your Vdbench install directory, and Vdbench from that point on will bypass running 'config.sh'.

Henk.

Tuesday Jan 26, 2010

Vdbench: dangerous use of stopafter=100, possibly inflating throughput results.

In short: doing random I/O against very small files can inflate your throughput numbers.

When doing random I/O against a file using File system Workload Definition (FWD) parameters Vdbench needs to know when to stop using the currently selected file.
The ‘stopafter=100’ parameter (default 100) tells Vdbench to stop after 100 blocks. For Vdbench 5.02 you can also specify ‘stopafter=nn%’, or ‘nn %’ of the size of the file.

This all works great, but here’s the catch: if your file size is very small, for instance just 8k, the default stopafter=100 value will cause the same block to be read 100 times.

The stopafter= parameter was really only meant for large files, and this side effect was not anticipated.

Solution:
For Vdbench 5.01, change ‘stopafter=’ to a value that matches the file size. ‘stopafter=’ allows for only one fixed value so if you have multiple different file sizes this won’t work for you.
For Vdbench502 (beta), use stopafter=100%. This makes sure that you never read or write more blocks than that the file contains.
I will modify 502 as soon as possible to change the default value to be no more than the current file size.

Note: 5.02 is currently only available (in beta) internally at Sun/Oracle.

Henk.

Monday Dec 21, 2009

Shared library available for AIX 32 and 64bit

For 32 bit java, download libvdbench.so.32 and place it in /vdbench501fix1/aix/libvdbench.so

For 64 bit java, download libvdbench.so.64  and place it in /vdbench501fix1/aix/libvdbench.so

Henk.

About

Blog for Henk Vandenbergh, author of Vdbench, and Sun StorageTek Workload Analysis Tool (Swat). This blog is used to keep you up to date about anything revolving around Swat and Vdbench.

Search

Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today