X

Recent Posts

Sun

Creating Vdbench503 Swat charts using Swat302

A few weeks ago I was reminded of an incompatibility between Vdbench503and Swat302 that I introduced quite a while back. The file format when using Swat302 'File' 'Import Vdbench data' for Vdbench503 data has changed. Since at this time no new version of Swat is available outside of Oracle that of course is a problem. To help you convert the new 503 'swat_mon.txt' file contents so that it can be used by Swat302, run the  AWK program below.Henk ## Conversion of Vdbench503 swat_mon.txt file to a format that can be read# by Swat302 using 'Import Vdbench data'.## Note that this AWK program reads file swat_mon.txt (or a copy of it) and# then replaces swat_mon.txt again.# I suggest therefore that you first make a copy of swat_mon.txt and use that# as input, because if you would accidentally run TWICE using swat_mon.txt as# both input AND output you will have destroyed the original content of the file.### To run:# - cd /vdbench503# - awk -f convert.awk output/swat_mon.txt.copy > output/swat_mon.txt## Henk.#BEGIN {}{  if ($1 == ":vdbench503_vdbench_data_for_swat303")    print ":vdbench_data";  else if (NF != 14)    print($0);  else  {    printf("%d %d %d %d %d %d %d %d %d %d %d %d %d %d\n",            $1, $2, $3, $4, $5/1000, $6, $7, $8, $9/1000, 0, 0, 0, 0, 0);  }}

A few weeks ago I was reminded of an incompatibility between Vdbench503and Swat302 that I introduced quite a while back. The file format when using Swat302 'File' 'Import Vdbench data' for Vdbench503...

Sun

Vdbench and Swat: how to identify what is what when using 'ps -ef'

For obvious reasons I frequently have multiple Swat or Vdbench processes running, and sometimes get confused as to what is what. 'ps' output is not very helpful. I can maybe compare the heap size values, but I don't always remember them:hvxxxx 21027 21008   0 09:02:01 pts/12      0:06 java -Xmx512m -Xms128m -cp ./:./classes:./swat.jar:./javachart.jar:./swing-layohvxxxx 21060 21041   0 09:02:04 pts/6       0:03 java -Xmx1024m -Xms512m -cp ./:./classes:./swat.jar:./javachart.jar:./swing-layA primitive little trick now is making my life easier by using the -D java parameter. It shows me that I am running one background data collector and one local real time monitor (swat -c and swat -l)hvxxxx 21102 21083   0 09:04:49 pts/12      0:07 java -Dreq=-c -Xmx512m -Xms128m -cp ./:./classes:./swat.jar:./javachart.jar:./shvxxxx 21240 21221   0 09:06:01 pts/6       0:34 java -Dreq=-l -Xmx1024m -Xms512m -cp ./:./classes:./swat.jar:./javachart.jar:./Update to the swat script, you can make a similar change to the Vdbench scriptif ("$1" == "-t" || "$1" == "-p" || "$1" == "-l") then  $java -Dreq=$1 -Xmx1024m -Xms512m -cp $cp Swt.swat $\*else  $java -Dreq=$1 -Xmx512m  -Xms128m -cp $cp Swt.swat $\*endif Henk.

For obvious reasons I frequently have multiple Swat or Vdbench processes running, and sometimes get confused as to what is what. 'ps' output is not very helpful. I can maybe compare the heap...

Sun

Vdbench and SSD alignment, continued.

Of course, it took only a few minutes before someone asked 'how can I run this against files, not volumes'. Here is the response: Just change the lun to a file name (use the same file name each  time)and add a size.Vdbench then will first create the file for you.The problem will be that you need to make sure Vdbench will not readfrom file system or file server cache, so the file size must be atleast 5 times the system's cache size,Unless of course you mount stuff directio, but then you stillhave the file server cache to deal with.Just take your time and create a large file (I am using 100g). Vdbenchwill automatically create it for you. BTW: the elapsed time the elapsed time needs to be long enough to make sure you get away from cache. I set it here to 60 seconds, which should be a good start.. Henk. hd=default,jvms=1sd=default,th=32sd=default,size=100gsd=sd_0000,lun=/dir/filename,offset=0000sd=sd_0512,lun=/dir/filename,offset=0512sd=sd_1024,lun=/dir/filename,offset=1024sd=sd_1536,lun=/dir/filename,offset=1536sd=sd_2048,lun=/dir/filename,offset=2048sd=sd_2560,lun=/dir/filename,offset=2560sd=sd_3072,lun=/dir/filename,offset=3072sd=sd_3584,lun=/dir/filename,offset=3584sd=sd_4096,lun=/dir/filename,offset=4096wd=wd1,sd=sd_1,xf=4k,rdpct=100rd=default,iorate=max,elapsed=60,interval=1,dist=d,wd=wd1rd=rd_0000,sd=sd_0000rd=rd_0512,sd=sd_0512rd=rd_1024,sd=sd_1024rd=rd_1536,sd=sd_1536rd=rd_2048,sd=sd_2048rd=rd_2560,sd=sd_2560rd=rd_3072,sd=sd_3072rd=rd_3584,sd=sd_3584rd=rd_4096,sd=sd_4096

Of course, it took only a few minutes before someone asked 'how can I run this against files, not volumes'. Here is the response: Just change the lun to a file name (use the same file name each  time)an...

Sun

Vdbench and SSD alignment

These last months I have heard a lot about issues related to solid state devices not properly being aligned to the expected data transfer sizes. Each OS has its own way of creating volumes and partitions so trying to figure out if everything is neatly aligned is not an easy job. Add to that the possibility of the OS thinking everything is in order but alignment somewhere down the line not being accurate in one of the many possible layers of software when we have virtual volumes. Without really being interested in the 'how to figure it all out and how to fix alignment issues' I created a small Vdbench parameter file that will allow you to at least figure out whether things are properly aligned or not. It revolves around the use of the Vdbench 'offset=' parameter that allows you to artificially change the alignment from Vdbench's point of view. If your SSDs are on a storage subsystem that has a large cache, make sure that your volume is much larger than that cache. You rally need to make sure you are getting your data from the SSD, not from cache. Henk: hd=default,jvms=1sd=default,th=32sd=sd_0000,lun=/dev/rdsk/c7t0d0s4,offset=0000sd=sd_0512,lun=/dev/rdsk/c7t0d0s4,offset=0512sd=sd_1024,lun=/dev/rdsk/c7t0d0s4,offset=1024sd=sd_1536,lun=/dev/rdsk/c7t0d0s4,offset=1536sd=sd_2048,lun=/dev/rdsk/c7t0d0s4,offset=2048sd=sd_2560,lun=/dev/rdsk/c7t0d0s4,offset=2560sd=sd_3072,lun=/dev/rdsk/c7t0d0s4,offset=3072sd=sd_3584,lun=/dev/rdsk/c7t0d0s4,offset=3584sd=sd_4096,lun=/dev/rdsk/c7t0d0s4,offset=4096wd=wd1,sd=sd_1,xf=4k,rdpct=100rd=default,iorate=max,elapsed=60,interval=1,dist=d,wd=wd1rd=rd_0000,sd=sd_0000rd=rd_0512,sd=sd_0512rd=rd_1024,sd=sd_1024rd=rd_1536,sd=sd_1536rd=rd_2048,sd=sd_2048rd=rd_2560,sd=sd_2560rd=rd_3072,sd=sd_3072rd=rd_3584,sd=sd_3584rd=rd_4096,sd=sd_4096 These are the 'avg' lines: offset=0000   avg_2-3   19223.00    75.09    4096100.00    1.580    2.803    0.231     1.1   0.9offset=0512   avg_2-3    3655.50    14.28    4096 100.00    8.772   9.473    0.067     0.3   0.2offset=1024   avg_2-3    3634.00    14.20    4096 100.00    8.784   9.390    0.064     0.3   0.2offset=1536   avg_2-3    3633.00    14.19    4096 100.00    8.799   9.472    0.062     0.3   0.2offset=2048   avg_2-3    3614.50    14.12    4096 100.00    8.831   9.440    0.066     0.3   0.2offset=2560   avg_2-3    3604.00    14.08    4096 100.00    8.852   9.477    0.067     0.2   0.2offset=3072   avg_2-3    3602.50    14.07    4096 100.00    8.853   9.430    0.059     0.3   0.2offset=3584   avg_2-3    3597.50    14.05    4096 100.00    8.888   9.468    0.069     0.2   0.2offset=4096   avg_2-3   20050.50    78.32    4096 100.00    1.584   2.811    0.231     1.0   0.9 As you can see, the runs with offset=0 and offset=4096 offermore than 5 times the throughput than the others. This tells me thatthis volume is properly aligned.If for instance the run results would show that offset=512 has the bestresults the volume is on a 512 byte offset.To then run properly 4k aligned tests with Vdbench, add to all yourruns:sd=default,offset=512and Vdbench, after generating each lba, will always add 512.

These last months I have heard a lot about issues related to solid state devices not properly being aligned to the expected data transfer sizes. Each OS has its own way of creating volumes...

Sun

Running high IOPS against a single lun/SSD

On Solaris, and I expect the same with other operatingsystems, whenever and I/O is requested some process-level lock is set. Thismeans that if you try to run very high IOPS, this lock can become 100% busy,causing all threads that need this lock to start spinning. End result istwo-fold: high CPU utilization and/or lower than expected IOPS. This is not a new problem. The problem was discovered severalyears ago when storage subsystems became fast enough to handle 5000 IOPS andmore. Since that time cpus have become much faster and Solaris code has beenenhanced several times to lower the need and duration for these locks. I haveseen Vdbench runs where we were able to do 100k IOPS without problems. Vdbench is written in Java, and Java runs as a singleprocess. Vdbench therefore introduced what is called multi-JVM mode, theability of Vdbench to split the requested workload over multiple JVMs (JavaVirtual Machines). By default Vdbench starts one JVM for each 5000 IOPS requested,with a maximum of 8, and no more than one per Storage Definition (SD). The 5000-numberprobably should be changed some day; it is a leftover of the initial discoveryof this problem. So, when you ask for iorate=max with only a single SD andyou’re lucky enough to be running against a Solid State Device (SSD) guesswhat: you may run into this locking problem. To work around this you have to override the default JVMcount: Specify hd=localhost,jvms=nn I suggest you request one JVM for each 50k IOPS that you expect Add ‘-m nn’ as an execution parameter, for instance ‘-m4’. There is one exception though, and that is for 100% sequentialworkloads using the seekpct=sequential or seekpct=eof Workload Definition (WD)parameter. A sequential workload will only run using one single JVM. This isdone to prevent for instance with two JVMs that the workload would look likethis: read block 1,1,2,2,3,3,4,4,5,5, etc. The performance numbers of course willlook great because the second read of a block will be guaranteed a cache hit,but this is not really a valid sequential workload. Henk.

On Solaris, and I expect the same with other operating systems, whenever and I/O is requested some process-level lock is set. Thismeans that if you try to run very high IOPS, this lock can become 100%...

Sun

Vdbench, problems with the patterns= parameter

I have written a blog entry about problems with the patterns= parameter before, even mentioning that I may no longer support it. I have concluded since, that I need to continue supporting it though in a different format than currently, where you (in older versions) could specify 127 different data patterns.In Vdbench 5.01 and and 5.02 (brandnew), patterns= works as follows: patterns=/pattern/dir where file name '/pattern/dir/default' gets picked up, and its contents stored in the data buffer used for writing data. That works.However, (and these things always happen when it is too late) a few hours after I did the last build of Vdbench 5.02 I realized that yes, I put the pattern in the buffer, but I use the same buffer for reading which means that if you have a mixed read/write workload your data pattern can be overlaid by whatever data is on your disk. Since the pattern is copied only once into your buffer all new writes will NOT contain this pattern. So, until I fix this, if you want a fixed pattern to be written, do not include reads in your test.In normal operations I use only a single data buffer, both for reads and writes. This is done to save on the amount of memory needed during the run. Loads of luns \* Loads of threads = Loads of memory. This now needs to change when using specific data patterns.Henk.

I have written a blog entry about problems with the patterns= parameter before, even mentioning that I may no longer support it. I have concluded since, that I need to continue supporting it though in...

Sun

Vdbench: dangerous use of stopafter=100, possibly inflating throughput results.

In short: doing random I/O against very smallfiles can inflateyour throughput numbers. When doing random I/O against a file using FilesystemWorkload Definition (FWD) parameters Vdbench needs to know when to stopusingthe currently selected file.The ‘stopafter=100’ parameter (default 100) tells Vdbench tostop after 100 blocks. For Vdbench 5.02 you can also specify ‘stopafter=nn%’,or ‘nn %’ of the size of the file. This all works great, but here’s the catch: ifyour filesize is very small, for instance just 8k, the default stopafter=100value willcause the same block to be read 100 times. The stopafter= parameter was really only meant forlargefiles, and this side effect was not anticipated. Solution:For Vdbench 5.01, change ‘stopafter=’ to a value thatmatches the file size. ‘stopafter=’ allows for only one fixed value soif youhave multiple different file sizes this won’t work for you.For Vdbench502 (beta), use stopafter=100%. This makes surethat you never read or write more blocks than that the file contains. I will modify 502 assoon as possible to change the default value to be no more than thecurrentfile size. Note: 5.02 is currently only available (in beta)internallyat Sun/Oracle. Henk.

In short: doing random I/O against very small files can inflate your throughput numbers. When doing random I/O against a file using File systemWorkload Definition (FWD) parameters Vdbench needs to know...

Sun

Vdbench and concurrent sequential workloads.

A sequential workload for me is the sequential reading ofblocks 1,2,3,4,5 etc. Running concurrent sequential workloads against the same lunor file then will result in reading blocks 1,1,2,2,3,3,4,4,5,5 etc, somethingthat I have considered incorrect since day one of Vdbench. When spreading out random I/O workloads across multiple Vdbenchslaves, I allow each slave to do some of the random work. For sequentialworkloads however, the above issue forces me to make sure that only one slavereceives a sequential workload. This is all transparent to the user. That all has worked fine, until last Friday I received an emailabout the following Vdbench abort message: “rd=rd1,wd=wd2 not used. Could itbe that more hosts have been requested than there are threads?” It took me a while to figure this one out, until it becameclear that this was caused by making sure that a sequential workload does notrun more than once across slaves. In this case there were two different sequentialworkloads however that were specifically requested to run against the samedevice, one to read and one to write. The result was that Vdbench ignored thesecond workload without notifying the user. This was not a case of notspreading out the same workload across slaves, but instead there were two differentsequential workloads. Somewhere in the bowels of the Vdbench code is a check tomake sure that I did not lose track of one or more workloads (believe me, it canget complex allowing multiple concurrent different workloads to run acrossdifferent slaves and/or hosts). This code noticed that the second workload wasnot included at all. Therefore the “wd2 not used” abort. So how to get around this if you still really want to dothis? The code above only looks at a 100% sequential workload (seekpct=0,seekpct=seq, or seekpct=eof). By specifying for instance seekpct=1 you can tell Vdbenchto generate a new random lba on average each 1% (one in a hundred) of the I/O generated.Then, on average again, 100 blocks will be sequentially read or written. Specify seekpct=0.01 and a new random lba will be generated only every 10,000 I/O’s.This should suffice without changing the Vdbench logic. Henk

A sequential workload for me is the sequential reading of blocks 1,2,3,4,5 etc. Running concurrent sequential workloads against the same lunor file then will result in reading blocks...

Sun

Vdbench Data Validation, synchronous vs. asynchronous journal files

There are times for all of us where our job becomesmonotonous. You just grind it out and wait for better days. This week has notbeen easy, but I survived it partly because of a very fun problem that we raninto yesterday (Thank you Jim Kapus). Imagine the following scenario: “Henk, I ran two identicaltests against two identical storage devices on identical servers, and I gotaround 17,000 IOPS on each. But I added journaling and one system now gets only1000 IOPS and the other gets 11,000 IOPS. Shouldn’t I get the same IOPS onboth?” Jim loves to use Swat,so that is how he noticed the huge discrepancy between these runs. Because of the overhead involved with journaling a drop inthroughput is expected, but going from 17,000 IOPS down to 1000 while the othersystem gets 11,000 is just too much. After a lot of checking, guessing, comparing and looking atloads of different Swat charts all of a sudden a light bulb when on above myhead when Jim mentioned the word ‘journaling’ again. Oops. Journaling by default does a synchronous write to theVdbench journal file just before the write is issued, then followed byan other synchronous journal write just after the write is completed.‘Synchronous’ of course means ‘slow’, which has a clear negative impact on theresponse time of the journal file and therefore also hinders the throughput onthe devices being tested. But again, why such a big difference between the twosystems? It turned out that, though the servers were identicalservers, the internal disk drives were not identical. I bet you one driveprobably even had its write cache turned on. The response time difference of thejournal files was the cause of the huge discrepancy in IOPS. Solution: Of course we could look for identical disk drives,but since there were no expectations during these tests that the OS or the diskdrives where the journal files reside would fail there was a much simplersolution. We did not really need to do synchronous writes. Using the‘-jn’ Vdbench execution parameter we switched to using asynchronousjournal writes. Not only did we end up with equal IOPS on both systems, theIOPS even went back to 17,000 IOPS because we now no longer depended on anyjournal file speed. Henk PS: by the end of this week I will release Vdbench 5.01-1, arelease that contains fixes and enhancements to Data Validation and Journaling

There are times for all of us where our job becomes monotonous. You just grind it out and wait for better days. This week has notbeen easy, but I survived it partly because of a very fun problem that...

Sun

Vdbench, VmWare, and bash

Today I had a user who tried to run Vdbench on some custom made Linux version that does not contain csh, only bash. This happened before when running Vdbench on VmWare, and below you'll find a replacement for the vdbench script needed to fix this. First copy /bin/bash to /bin/csh, cloning bash., then replace the ./vdbench script with the script below, or for the soon to be out vdbench 5.01, replace it with ./vdbench.bash. The reason for the cloning of bash is that Vdbench internally also calls csh.  Henk. ## This script was written specifically for running Vdbench on native VmWare .# It turns out that VmWare does NOT have the Cshell.# This also works for some brand-x version of Linux that did not have csh.## Intructions:# - cp /bin/bash /bin/csh   ===> This creates a clone of bash, naming it csh.# - cp vdbench.bash vdbench ===> vdbench will now use THIS script instead of ./vdbench### Directory where script was started from:dir=`dirname $0`# If the first parameter equals -SlaveJvm then this means that# the script must start vdbench with more memory.# Since all the real work is done in a slave, vdbench itself can be# started with just a little bit of memory, while the slaves must# have enough memory to handle large amount of threads and buffers.# Set classpath.# $dir                 - parent of $dir/solaris/solx86/linux/aix/hp/mac subdirectory# $dir/../classes      - for development overrides# $dir/vdbench.jar     - everything, including vdbench.classcp=$dir/:$dir/classes:$dir/vdbench.jar# Proper path for java:java=java# When out of memory, modify the first set of memory parameters. See above.# '-client' is an option for Sun's Java. Remove if not needed.if [ "$1" = "SlaveJvm" ]; then  $java -client -Xmx1024m -Xms128m -cp $cp Vdb.SlaveJvm $\*  exit $statuselse  $java -client -Xmx512m  -Xms64m  -cp $cp Vdb.Vdbmain $\*  exit $statusfi

Today I had a user who tried to run Vdbench on some custom made Linux version that does not contain csh, only bash. This happened before when running Vdbench on VmWare, and below you'll find...

Sun

Vdbench: Sun StorageTek Vdbench, a storage I/O workload generator.

This is a copy of the blog entry I just created on Sun's BestPerf blog: http://blogs.sun.com/BestPerf Vdbench is written in Java (and a little C) and runs onSolaris Sparc and X86, Windows, AIX, Linux, zLinux, HP/UX, and OS/X. I wrote the SPC1 and SPC2 workload generator using theVdbench base code for the Storage Performance Council:http://www.storageperformance.org Vdbench is a disk and tape I/O workload generator, allowingdetailed control over numerous workload parameters like: Options: · For raw disk (and tape) and large disk files: o Read vs. write o Random vs. sequential or skip-sequential o I/O rate o Data transfer size o Cache hit rates o I/O queue depth control o Unlimited amount of concurrent devices and workloads o Compression (tape) · For file systems: o Number of directory and files o File sizes o Read vs. write o Data transfer size o Directory create/delete, file create/delete, o Unlimited amount of concurrent file systems andworkloads Single host or Multi-host: All work is centrally controlled, running either on a singlehost or on multiple hosts concurrently. Reporting: Centralized reporting, reporting and reporting using thesimple idea that you can't understand performance of a workload unless you cansee the detail. If you just look at run totals you'll miss the fact that forsome reason the storage configuration was idle for several seconds or evenminutes! Second by second detail of by Vdbench accumulated performance statistics for total workload and for each individual logical device used by Vdbench. For Solaris Sparc and X86: second by second detail of Kstat statistics for total workload and for each physical lun or NFS mounted device used. All Vdbench reports are HTML files. Just point your browser to the summary.html file in your Vdbench output directory and all the reports link together. Swat (an other of my tools) allows you to display performance charts of the data created by Vdbench: Just start SPM, then 'File' 'Import Vdbench data'. Vdbench will (optionally) automatically call Swat to create JPG files of your performance charts. Vdbench has a GUI that will allow you to compare the results of two different Vdbench workload executions. It shows the differences between the two runs in different grades of green, yellow and red. Green is good, red is bad. Data Validation: Data Validation is a highly sophisticated methodology toassure data integrity by always writing unique data contents to each block andthen doing a compare after the next read or before the next write. The historytables containing information about what is written where is maintained inmemory and optionally in journal files. Journaling allows data to be written todisk in one execution of Vdbench with Data Validation and then continued in afuture Vdbench execution to make sure that after a system shutdown all data isstill there. Great for testing mirrors: write some data using journaling, breakthe mirror, and have Vdbench validate the contents of the mirror. I/O Replay A disk I/O workload traced using Swat (an other of my tools)can be replayed using Vdbench on any test system to any type of storage. Thisallows you to trace a production I/O workload, bring the trace data to yourlab, and then replay your I/O workload on whatever storage you want. Want tosee how the storage performs when the I/O rate doubles? Vdbench Replay willshow you. With this you can test your production workload without the hassle ofhaving to get your data base software and licenses, your application software,or even your production data on your test system. For more detailed information about Vdbench go tohttp://vdbench.org where you can download the documentation or the latest GAversion of Vdbench. You can find continuing updates about Swat and Vdbench on my blog: http://blogs.sun.com/henk/ Henk Vandenbergh PS: If you're wondering where the name Vdbench came from :  Henk Vandenbergh benchmarking.

This is a copy of the blog entry I just created on Sun's BestPerf blog: http://blogs.sun.com/BestPerf Vdbench is written in Java (and a little C) and runs onSolaris Sparc and X86, Windows, AIX, Linux,...

Sun

Storage performance and workload analysis using Swat.

This is a copy of the blog entry I just created on Sun's BestPerf blog:  http://blogs.sun.com/BestPerf Swat (Sun StorageTek Workload Analysis Tool) is ahost-based, storage-centric Java application that thoroughly captures,summarizes, and analyzes storage workloads for both Solaris and Windowsenvironments. This tool was written to help Sun’s engineering, sales andservice organizations and Sun’s customers understand storage I/O workloads.  Sample screenshot: Swat can be used for among many other reasons: Problem analysis Configuration sizing (just buying x GB of storage just won't do anymore) Trend analysis: is my workload growing, and can I identify/resolve problems before they happen? Swat is storage agnostic, so it does not matter what type orbrand of storage you are trying to report on. Swat reports the host's view ofthe storage performance and workload, using the same Kstat (Solaris) data thatiostat uses. Swat consists of several different major functions: · Swat Performance Monitor (SPM) · Swat Trace Facility (STF) · Swat Trace Monitor (STM) · Swat Real Time Monitor · Swat Local Real Time Monitor · Swat Reporter Swat Performance Monitor (SPM): Works on Solaris and Windows. An attempt has been made inthe current Swat 3.02 to also collect data on AIX and Linux. Swat 3.02 alsoreports Network Adapter statistics on Solaris, Windows, and Linux. A Swat DataCollector (agent) runs on some or all of your servers/hosts, collecting I/Operformance statistics every 5, 10, or 15 minutes and writes the data to a diskfile, one new file every day, automatically switched at midnight. The data then can be analyzed using the Swat Reporter. Swat Trace Facility (STF): For Solaris and Windows. STF collects detailed I/O traceinformation. This data then goes through a data Extraction and Analysis phasethat generates hundreds or thousands of second-by-second statistics counters.That data then can be analyzed using the Swat Reporter. You create this tracefor between 30 and 60 minutes for instance at a time when you know you willhave a performance problem. A disk I/O workload traced using Swat can be replayed on anytest system to any type of storage using Vdbench (an other of my tools,available at http://vdbench.org). This allows you to trace a production I/Oworkload, bring the trace data to your lab, and then replay that I/O workloadon whatever storage you want. Want to see how the storage performs when the I/Orate doubles or triples? Vdbench Replay will show you. With this you can testyour production workload without the hassle of having to get your data basesoftware and licenses, your application software and licenses, or even yourproduction data. Note: STF is currently limited to the collection of about20,000 IOPS. Some development effort is required to handle the current increasein IOPS made possible by Solid State Devices (SSDs). Note: STF, while collecting the trace data is the onlySwat function that requires root access. This functionality is all handled byone single KSH script which can be run independently. (Script uses TNF andADB). Swat Trace Monitor (STM): With STF you need to know when the performance problem willoccur so that you can schedule the trace data to be collected. Not everyperformance problem however is predictable. STM will run an in-memory trace andthen monitors the overall storage performance. Once a certain threshold isreach, for instance response time greater than 100 milliseconds, the in-memorytrace buffer is dumped to disk and the trace then continues collecting tracedata for an amount of seconds before terminating. Swat Real Time Monitor: When a Data Collector is active on your current or anynetwork-connected host, Swat Real Time Monitor will open a Java socketconnection with that host, allowing you to actively monitor the current storageperformance either from your local or any of your remote hosts. Swat Local Real Time Monitor: Local Real Time Monitor is the quickest way to start usingSwat. Just enter './swat -l' and Swat will start a private Data Collector foryour local system and then will show you exactly what is happening to yourcurrent storage workload. No more fiddling trying to get some useful data outof a pile of iostat output. Swat Reporter: The Swat Reporter ties everything together. All datacollected by the above Swat functions can be displayed using this powerful GUIreporting and charting function. You can generate hundreds of differentperformance charts or tabulated reports giving you intimate understanding ofyour storage workload and performance. Swat will even create JPG files for youthat then can be included in documents and/or presentations. There is even abatch utility (Swat Batch Reporter) that will automate the JPG generation foryou. If you want, Swat will even create a script for this batch utility foryou. Some of the many available charts: Response time per controller or device I/O rate per controller or device Read percentage Data transfer size Queue depth Random vs. sequential (STF only) CPU usage Device skew Etc. etc. Swat has been written in Java. This means, that once yourdata has been collected on its originating system, the data can be displayedand analyzed using the Swat Reporter on ANY Java enabled system, including anytype of laptop. For more detailed information go to (longURL)where you can download the latest release, Swat 3.02. You can find continuing updates about Swat and Vdbench on my blog: http://blogs.sun.com/henk/ Henk Vandenbergh

This is a copy of the blog entry I just created on Sun's BestPerf blog:  http://blogs.sun.com/BestPerf Swat (Sun StorageTek Workload Analysis Tool) is ahost-based, storage-centric Java application that...

Sun

Swat 3.02 now generally available, including Sun Flash Analyzer (SSD)

Sun StorageTek Workload Analysis Tool (Swat) version 3.02 is now available. The main change is the addition of the Sun Flash Analyzer, a tool that allows you to look at the workload and performance data of your systems assisting you with the identification of disks/luns that can benefit from placement on a Solid State device (SSD). Swat 3.02 can be found here: Summary of changes since Swat version 3.01  ‘SSD Configurator’ or ‘Sun Flash Analyzer’ functionality The SSD Configurator has been added to assist Swat users with the identification of those diskdevices whose performance may benefit from placement on a Solid State Device (SSD).· To use the SSD Configurator in beginner mode, just start the flash or flash.bat script.· To use the SSD Configurator in expert mode, just start Swat using the swat or swat.batscript, and then, once data has been loaded, select ‘SSD Configurator’Miscellaneous changes: Data Collector (swat –c) and Trace Start (swat –i) options have changed to accomodate being able to run the Data Collector for a specific amount of time, instead of only 7\*24: ‘swat –c –v nnn’ has been replaced with ‘swat –c -i nnn’ ‘swat –c’ now allows you to specify elapsed time: ‘swat –c –i nnn’ ‘swat –i ‘ has been replaced by ‘swat –s’ The ‘Detail’ Tab has been replaced by a ‘Detail’ menu option The ‘Device Translation’ menu option has been removed. The ‘Show Columns’ menu option has been removed. The ‘busy%’ column has been removed.The addition of the 'dad' device driver allowing Swat to recognize certain Solaris laptop disk drives.

Sun StorageTek Workload Analysis Tool (Swat) version 3.02 is now available. The main change is the addition of the Sun Flash Analyzer, a tool that allows you to look at the workload and performance...

Sun

Vdbench Data Validation and Journaling

Vdbench Data Validation (-v) gives Vdbench the ability to make sure that data once written can be read back and compared to make sure that the data is still correct. To accomplish this, Vdbench has a large table in memory that tells him what was written where. Once Vdbench terminates that in-memory table of course is gone. By using the '-j' execution parameter Vdbench does not only do Data Validation, it also maintains a Journal file that is used to keep a copy of the in-memory table, allowing upon a Vdbench Journal restart (-jr) for the table to be recovered so that Vdbench knows what was written in the previous run. (There is a 'Data Validation and Journaling' chapter in the doc explaining all this). I saw an interesting case yesterday: one of my users was running Journaling without really needing it. This can cause problems. Why? Journaling for each write to a target lun causes two synchronized writes of a journal record, one before the write to the lun, and one just after the write to the lun is completed. Synchronized writes, especially when the write is done against a storage device that does not have non-volatile write cache, can be pretty slow. The result of that is that the IOPS running against the lun that is being tested is quite lower than when you do not use journaling. The objective of the test in question was to recreate a data integrity problem in a lab environment. This data integrity problem in theory can and usually will be faster to recreate when running higher iops. If for instance your problem happens after 100,000 i/o operations, the higher your IOPS, the sooner you hit the problem. Henk.

Vdbench Data Validation (-v) gives Vdbench the ability to make sure that data once written can be read back and compared to make sure that the data is still correct. To accomplish this, Vdbench has a...

Sun

Swat Trace Facility (STF), high IOPS, and memory problems during STF Analyze.

During the STF Analyze phase, STF reads and interprets allthe trace data generated by TNF. Since the trace probes describe I/O starts andI/O completions, and TNF frequently generates duplicate I/O completion probesor some times does not generate a completion probe at all, STF needs to keepall I/O start and I/O completion probes in memory so that it can identify and ignorethese duplicate probes. After three minutes STF finally says: OK, if any duplicatedata shows up or if any (infrequent) I/O completion does not get generated,three minutes wait time is enough, and starts aging the retained probes out ofmemory. This means however that if there are any I/O operations thattake longer than three minutes, the I/O start probe is already gone when theI/O completion finally shows up. This I/O then is completely lost to STF. You may think that three-minute I/O response times neverhappen, and I agree with that. However, in our lab environments people do a lotof fancy things trying to break our hardware and software to make sure that ourcustomers ultimately get a product that is as good as can be. I used to have 30seconds set as an aging limit, but there were occasions that this was notenough, so I increased it to 180 seconds, three minutes. This all worked fine when running a ‘decent’ amount of IOPS. Storage devices however are becoming faster, especiallysince the arrival of solid-state devices (SSD). And now the very high IOPS arestarting to create memory problems for STF: keeping three minutes worth of datafor 30,000 IOPS in memory starts becoming a problem (30,000 \* 180 seconds \*(start + completion probe) = 10 million probes kept in memory). Two solutions: - Increase the Java heap space using the STF ‘Settings’ tab from–Xmx1024m to –Xmx3500m, but that some times is not enough (3500m is some 32-bitJava limit, I have never run STF in a 64-bit environment). - Lower the probe aging in STF. Use the ‘Settings’ tab again,and enter ‘-a30’ as a batch_prm parameter for 30 second aging, or lowerif you know for sure that in your environment no I/O response time ever islonger than the value that you specify. Henk.

During the STF Analyze phase, STF reads and interprets all the trace data generated by TNF. Since the trace probes describe I/O starts andI/O completions, and TNF frequently generates duplicate...

Sun

Swat Trace Facility (STF) with very high IOPS workloads.

On Solaris, STF uses TNF to collect detailed I/O trace data.Since TNF uses only a single kernel trace buffer of maximum 128 megabytes, thistrace buffer will overflow quickly, therefore not allowing for the collectionof trace data over longer periods of time. STF therefore every 5 seconds checks to see how full thetrace buffer is. When it is 80% full, STF will use tnfxtract to offload thetrace buffer to disk, while allowing TNF to continue adding data to the tracebuffer. Then 80% later the same will happen and so on. Offloading the trace buffer when it is 80% full allows for a20% overlap in available buffer space, but with very high IOPS that is notalways enough when STF checks only every 5 seconds. In that case STF willgenerate the following messages: ====> Trace buffer is filling faster than we canoffload to disk. ====> Trace will be cancelled after this happens 20times. I just noticed in one trace that some small amount of datawas even lost without this message being displayed, so this logic is not100% perfect. I hope to start using Dtrace at some point in time, but thefirst time I tried it a few years back the overhead of using Dtrace was sosignificantly higher than using TNF, that I had to give up on that effort.Dtrace since then has improved, so time permitting I will try again. Until then, make sure that when you collect a high IOPS STFtrace, you offload your trace data to a device that gives you decentthroughput. For the above-mentioned trace at 30,000 IOPS it needed to offloadtrace data at about 2.5 megabytes per second. Note: when using SVM each I/O is traced twice (logical andphysical), with VXVM each I/O is traced three times (logical, multi-path, andphysical), so be aware of the extra amount of trace data that will be created.

On Solaris, STF uses TNF to collect detailed I/O trace data. Since TNF uses only a single kernel trace buffer of maximum 128 megabytes, thistrace buffer will overflow quickly, therefore not allowing...

Sun

Vdbench, Java garbage collection, memory, and CPU problems.

Vdbench keeps detailed performance statistics generated for eachreporting interval in memory. This is done so that at the end of a run it cancalculate run totals. Starting Vdbench 5.00 the amount of detailed reporting doneby Vdbench has tripled, therefore tripling the amount of memory that is neededto store all that detailed data. Normally that should not be a problem, unless you have a lotof different Storage Definitions (SDs) AND a very long run with very smallreporting intervals. With the default amount of Java memory set to 256m inthe vdbench script what can happen for these long runs is that Java runs out ofmemory and starts spending huge amounts of CPU cycles on garbage collectioncontinually trying to find the extra little pieces of memory that it needs. Andthat can take so many CPU cycles that there are hardly any cycles left to doany real work, causing Vdbench to slow down more and more, ultimately completelyhanging or timing out with a heartbeat failure message. In the next major release of Vdbench I will rewrite the codeto eliminate the need to save all the detailed data and instead accumulate thestatistics continually at the end of each interval. The reason that all the detaileddata was kept was for the old ‘steady_rate=’ parameter which automaticallydetected steady state for performance workloads. This option was removed in Vdbench5.00. Until then, modify the vdbench and vdbench.bat scriptsreplacing the –Xmx256m for Vdbmain with –Xmx512m. Henk.

Vdbench keeps detailed performance statistics generated for each reporting interval in memory. This is done so that at the end of a run it can calculate run totals. Starting Vdbench 5.00 the amount of...

Sun

Missing multi-host sample parameter file in Vdbench5.00

 I just realized that example 5 in both the documentation and in the installation directory is a multi-host parameter file of earlier versions of Vdbench which is completely incompatible with Vdbench 5..00 Below are some sample files from my original Vdbench 5.00 release notes. Henk Sample multi-host parameter files. Simple multi-host parameter file. host=default,vdbench=/home/user/vdbench,user=user host=(systema,one) host=(systemb,two) sd=sd1,host=\*,lun=/home/user/junk/vdbench_test,size=10m wd=wd1,sd=sd\*,rdpct=100,xf=4k rd=rd1,wd=wd1,el=3,in=1,io=10 Multi_host, different lun names: host=default,vdbench=/home/user/vdbench host=(systema,one),user=user host=(systemb,two),user=user sd=sd1,host=one,lun=/dev/rdsk/c0t0d0s2,host=two,lun=/dev/rdsk/c1t0d0s2 wd=wd1,sd=sd\*,rdpct=100,xf=4k rd=rd1,wd=wd1,el=3,in=1,io=10 This parameter file was used for testing a VTLacross three hosts: compression=100 host=default,user=root host=(xx.yy.29.6,h6) host=(xx.yy.29.8,h8) host=(xx.yy.29.4,h4) sd=default,size=36g,threads=2 sd=sd4_,host=h4,lun=/dev/rmt/,count=(1,149) sd=sd5_,host=h5,lun=/dev/rmt/,count=(1,149) sd=sd6_,host=h6,lun=/dev/rmt/,count=(1,149) sd=sd8_,host=h8,lun=/dev/rmt/,count=(1,149) wd=wd1,sd=sd\* rd=default,io=max,elapsed=30,interval=1,wd=wd1 rd=rd1,sd=sd\*,forx=(64k-2m,d),forrdpct=(0,100)

 I just realized that example 5 in both the documentation and in the installation directory is a multi-host parameter file of earlier versions of Vdbench which is completely incompatible with Vdbench...

Sun

A common mistake made when using Vdbench data validation.

When using Vdbench data validation, the content of a datablock is validated only when this block is accessed for the second or thirdetc., time. This means that if you have a very large LUN or file, and your runhas a relatively small elapsed time, a run can complete without any data blockever having been accessed more than once. This means then that the run appearssuccessful, but no real data validation has ever been done. Now of course you canhave undetected data integrity problems! I therefore decided to make a change in Vdbench 5.00: I willcount the amount of data block validations that have been done, and if at theend of a run that count still is zero, Vdbench aborts Vdbench with an explanationabout what happened and suggestions on how to fix this, things like: Use larger block sizes Use longer elapsed= times. Use only a portion of the LUN, using: sd=sd1,….,size=1g sd=sd1,….,range=(xx,yy) (Vdbench 5.00) wd=wd1,…..,range=(xx,yy) Error message: 08:31:10.258 No read validations done during a DataValidation run. 08:31:10.258 This means very likely that your runwas not long enough to 08:31:10.258 access the same data block twice. 08:31:10.258 There are several solutions to this: 08:31:10.259 - increase elapsed time. 08:31:10.259 - use larger xfersize. 08:31:10.259 - use only a subset of your lun byusing the 'sd=...,size=' 08:31:10.259 parameter or the 'wd=...,range=' parameter. Normal completion message on logfile.html: 08:33:23.847 localhost-0: Total amount of blocksread and validated: 1271

When using Vdbench data validation, the content of a data block is validated only when this block is accessed for the second or thirdetc., time. This means that if you have a very large LUN or...

Sun

Vdbench: workload skew

These past few weeks some questions have come up about whatworkload we expect Vdbench to run, and what is actually done looking at theoutput reports. I spent several days trying to figure out how to explaineverything, first I ended up planning five different blog entries, but thenrealized that they tie in all together in such a way that writing just oneentry appears to be the right thing to do. The main problem that I am trying to explain is described in‘Vdbench and sequential read/write’ but you’ll need to read the rest first. So here is one entry, but with five chapters: 1. Vdbenchand Multi JVM processing 2. Assignmentof workloads to JVMs 3. Vdbenchskew control. 4. Vdbench:Controlled vs. Uncontrolled workloads. 5. Vdbenchand sequential read/write. 1. Vdbenchand Multi JVM processing JVM: Java Virtual Machine. Java runs as a single process, with the result that if yourequest too many IOPS within that single process you can run into some seriousprocess lock contention. When Vdbench was initially written Solaris would get boggeddown when going above 5000 IOPS. Since then systems have become faster and hugeperformance improvements have been made to Solaris, so by now that IOPS limitper process is much higher. I have seen successful tests at 100,000 IOPS perprocess. Vdbench Multi JVM processing allows Vdbench to startmultiple copies of it selves, therefore spreading out the many IOPS over morethan one process. Vdbench by default will start up to 8 JVMs, limited by thenumber of processors and the amount of SDs. You can override the amount of JVMs to be used by usingeither the (vdbench500) ‘hd=localhost,jvms=nn’ or the ‘-m’ execution parameter. In multi JVM mode, Vdbench has one master and one or moreslaves. The master does all the parameter parsing and reporting and also doesthe scheduling of work. Vdbench407 runs in multi JVM mode only if the amount ofwork requested is relatively small. Vdbench 5.00 always runs in multi JVM mode.(There used to be three different modes of operation in Vdbench: single JVM,Multi JVM, and multi-host. This is now all encapsulated into one: multi-host, where it does not matter thatthere is only one host, the current). 2. Assignmentof workloads to JVMs Vdbench 4.07: SDs are given to the next JVM in a round-robinfashion; an SD is used by only one JVM. Vdbench 5.00: SDs are given to each JVM. A 100% sequentialworkload that has a skew specified however all go to the first JVM. This isdone to prevent each JVM from reading the same blocks e.g. blocks 1/1, 2/2,3/3, etc. That would NOT be a proper sequential workload. The big change here is that in Vdbench 5.00 random workloadswill be using multiple JVMs instead of only one. This has been done toaccommodate the high IOPS solid-state devices. 3. Vdbenchskew control. When creating a Vdbench workload in a parameter file you canspecify more than one workload to run at the same time. Optionally you canspecify a workload skew if you want the different workloads to run withdifferent IOPS using the ‘skew=nn’ parameter. Vdbench controls individual IOPS and workload skew bysending new I/O requests to each SD’s internal work queue. This work queue has a maximum queue depth of2000 per SD. The I/O threads for each SD then pick up a new request from thisqueue to start the I/O. When an SD cannot keep up with it’s requested workload andthe SD’s internal work queue fills up, Vdbench will not generate new I/Orequests for this and all other SDs until space in this queue becomes availableagain. This means that if you send 1000 IOPS to an SD that can handleonly 100 IOPS, and 50 IOPS to a similar device, the queue for the first devicewill fill up, and I/O request generation for the second device will be heldup. This has been done to enable Vdbench to preserve therequested workload skew while still allowing for a temporary 'backlog' of 2000requested I/Os. Vdbench only controls the skew within one JVM. The amount ofCPU cycles needed to control the skew between different JVMs would be far toomuch. 4. Vdbench:Controlled vs. Uncontrolled workloads. When asking Vdbench to run a workload requesting as manyIOPS as the storage/server combination can handle there are two options: Specify ‘iorate=max’ (Uncontrolled workload). Specify a high numerical value for instance ‘iorate=9999999’ (any high value) (Controlled workload). There is an important difference between these two whenusing multiple SDs or using multiple concurrent workloads. Example: Two or more SDs that are so different that one SD’s response time is much better than the other. For instance a cached vs. an uncached device. or Two or more workloads that are so different that one workload’s response time is much better than the other. For instance 512-byte reads vs. 1mb reads. Uncontrolled workload: when using iorate=max, withoutany skew= parameters used, Vdbench allows each SD or workload to run as fast asit can. Controlled workload: when using numeric ‘iorate=’values, Vdbench will make sure that the total IOPS generated for each SD and/orworkload honors the requested workload skew. If no skew has been specified, theIOPS will be evenly spread out over all workloads and SDs. 5. Vdbenchand sequential read/write. This past week there were two occasions where my users usedthe following parameters in a Vdbench run. wd=wd1,sd=sd1,xfersize=4096,seekpct=seq,rdpct=50 What the users were expecting is that Vdbench wouldsequentially read and write to the requested storage. And indeed, Vdbench doesjust that. Great! But what does Vdbench really do? It generates a sequentialworkload with on average 50% reads and 50% writes. This means that (dependingon the results of the randomizers used) this is how things really look like: Read block1, write block2, read block3, write block4, readblock5, write block6, etc.etc. Code works as designed. But this is not what the userwanted. He wanted a sequential read workload, and a sequential write workload,not a sequential read/write mix. What to do to get a sequential read workload and asequential write workload? Ask Vdbench for two workloads, one for readsand one for writes: wd=wd1,sd=sd1,xfersize=4096,seekpct=seq,rdpct=100 wd=wd2,sd=sd1,xfersize=4096,seekpct=seq,rdpct=0 This will create TWO workloads with the following results: Workload wd1: read blocks 1,2,3,4,5,6,etc. Workload wd2: write blocks 1,2,3,4,5,6,etc. Note that both workloads are reading and writing the sameblocks. If this is not what you want, then there are several options: Use two separate devices, one to read from and one to write to. Use the ‘range=’ parameter to send the reads and the writes to different portions of a device, e.g. wd=wd1,sd=sd1,xfersize=4096,seekpct=seq,rdpct=100,range=(0,50) wd=wd2,sd=sd1,xfersize=4096,seekpct=seq,rdpct=0,range=(50,100) For Vdbench 5.00 the range parameter is also available in the SD: sd=sd1,lun=/dev/rdsk/abc,range=(0,50) sd=sd2,lun=/dev/rdsk/abc,range=(50,100) Change ‘seekpct=seq’ (or ‘seekpct=0’) to ‘seekpct=1’ which will cause a new random lba to be selected on average every 100 blocks, or even ‘seekpct=0.1’ to select a random lba once every 1000 blocks. Note: since each sequential workload always startsat block1 it may take a few hundred ios before a new random lba is selected. But now there is a problem! When you request two (or more) controlled workloads andthese workloads end up running on different JVMs, Vdbench can no longer controlthe IOPS since the JVMs do not communicate with each other. Example: wd=wd1,sd=sd1,xfersize=1m,rdpct=100 wd=wd2,sd=sd2,xfersize=1m,rdpct=0 rd=rd1,wd=wd\*,iorate=999999 Reads are usually so much faster than writes that the readswill monopolize the storage and you can end up with 90/10% read/write and notthe expected 50/50%. Note: when you specify iorate=max then of course we have an Uncontrolledworkload where we accept the fact that the workloads will have different IOPS. So how to avoid this problem? The only way to do this is tomake sure that these workloads all run in the same JVM by overriding the JVMcount: hd=localhost,jvms=1 (Vdbench 5.00) or add the ‘-m1’ execution parameter. Henk.

These past few weeks some questions have come up about what workload we expect Vdbench to run, and what is actually done looking at the output reports. I spent several days trying to figure out how to...

Sun

Vdbench: a disk and tape I/O workload generator and performance reporter.

Vdbench: a disk and tape I/O workload generator and performance reporter. Vdbench: Sun StorageTek Vdbench, a storage I/O workloadgenerator.(For those interested in details: Vdbench stands for Vandenberghbenchmarking). Vdbench is written in Java (and a little C) and runs onSolaris Sparc and X86, Windows, AIX, Linux, HP, and OS/X. I wrote the SPC1 and SPC2 workload generator using theVdbench base code for the Storage Performance Council:http://www.storageperformance.org Vdbench is a disk and tape I/O workload generator, allowingdetailed control over numerous workload parameters like: - For raw disk (and tape) and large disk files: o Read vs. write o Random vs. sequential or skip-sequential o I/O rate o Data transfer size o Cache hit rates o I/O queue depth control o Unlimited amount of concurrent devices and workloads o Compression (tape) - For file systems: o Number of directory and files o File sizes o Read vs. write o Data transfer size o Directory create/delete, file create/delete,read/write, copy/move (Vdbench 5.00 beta)        o Unlimited amount of concurrent file systems andworkloads Centralized control, running either on a single host or onmultiple hosts (vdbench 5.00 beta) concurrently . Centralized reporting, reporting and reporting using the simpleidea that you can’t understand performance of a workload unless you can see thedetail. If you just look at run totals you’ll miss the fact that for somereason the storage configuration was idle for several seconds or even minutes! - Second by second detail of Vdbench accumulated performancestatistics for total workload and for each individual logical device used byVdbench. - For Solaris Sparc and X86: second by second detail of Kstatstatistics for total workload and for each physical lun or NFS mounted deviceused. - All Vdbench reports are HTML files. Just point your browser tothe summary.html file in your Vdbench output directory and all the reports linktogether. - Swat (an other of my tools) allows you to display performancecharts of the data created by Vdbench: Just start SPM, then ‘File’ ‘ImportVdbench data’. - Starting Vdbench 5.00 (beta) Vdbench will (optionally)automatically call Swat to create JPG files of your performance charts. - Vdbench has a GUI that will allow you to compare the resultsof two different Vdbench workload executions. It shows the differences betweenthe two runs in different grades of green, yellow and red. Green is good, redis bad. Data Validation: a highly sophisticated methodology toassure data integrity by always writing unique data contents to each block andthen doing a compare after the next read or before the next write. The historytables containing information about what is written where is maintained inmemory or optionally in journal files. Journaling allows data to be written todisk in one execution of Vdbench with Data Validation then continued in afuture Vdbench execution to make sure that after a system shutdown all data isstill there. I/O Replay: a disk I/O workload traced using Swat (an otherof my tools) can be replayed using Vdbench on any test system to any type ofstorage. This allows you to trace a customer’s production I/O workload, bringthe trace data to your lab, and then replay your customers I/O workload onwhatever storage you want. Want to see how the storage performs when the I/Orate doubles? Vdbench Replay will show you. With this you can test your customer’s production workloadwithout the hassle of having to get your customer’s data base software andlicenses, his application software, or even his production data. Vdbench is now available for the general public: This old 4.07 link is obsolete: you can find Vdbench at vdbench.org https://cds.sun.com/is-bin/INTERSHOP.enfinity/WFS/CDS-CDS_SMI-Site/en_US/-/USD/ViewProductDetail-Start?ProductRef=VDB-4.07-OTH-G-F@CDS-CDS_SMI The version currently available for the general public willexpire on April 25, 2009. By then there will be a non-expiring version at vdbench.org Henk.

Vdbench: a disk and tape I/O workload generator and performance reporter. Vdbench: Sun StorageTek Vdbench, a storage I/O workload generator. (For those interested in details: Vdbench stands for Vandenbe...

Sun

Swat: a disk and tape I/O performance and workload reporter.

Swat: a disk and tape I/O performance and workload reporter. Swat: Sun StorageTek Workload Analysis Tool. This tool was written to help Sun’s engineering and salesand service organizations understand a customer’s storage I/O workload. Swat can be used for among many other reasons: Problem analysis Configuration sizing (just buying x GB of storage just won’t do anymore) Trend analysis: is my workload growing, and can I identify/resolve problems before they happen? Swat is storage agnostic, so it does not matter what type orbrand of storage you are trying to report on. Swat reports the host’s view ofthe storage performance, using the same Kstat (Solaris) data that iostat uses. Swat consists of several different major functions: Swat Performance Monitor (SPM) Swat Trace Facility (STF) Swat Trace Monitor (STM) Swat Real Time Monitor Swat Local Real Time Monitor Swat Reporter Swat Performance Monitor (SPM): For Solaris and Windows, with an attempt in the current Swat3.01 beta to also collect data on AIX and Linux. Swat 3.01 also reports Network Adapter statistics onSolaris, Windows, and Linux. A Swat Data Collector (agent) runs on some or all of yourservers/hosts, collecting I/O performance statistics every 5, 10, or 15 minutesand writes the data to a disk file, one new file every day, automaticallyswitched at midnight. The data then can be analyzed using the Swat Reporter. Swat Trace Facility (STF): For Solaris and Windows. STF collects detailed I/O traceinformation. This data then goes through a data Extraction and Analysis phasethat generates hundreds or thousands of second by second statistics counters.That data then can be analyzed using the Swat Reporter. You create this trace for between 30 and 60 minutes for instanceat a time when you know you will have a performance problem. A disk I/O workload traced using Swat can be replayed on anytest system to any type of storage using Vdbench (an other of my tools). Thisallows you to trace a customer’s production I/O workload, bring the trace datato your lab, and then replay your customer’s I/O workload on whatever storageyou want. Want to see how the storage performs when the I/O rate doubles ortriples? Vdbench Replay will show you. With this you can test your customer’s production workloadwithout the hassle of having to get your customer’s data base software andlicenses, his application software and licenses, or even his production data. Solaris uses TNF to collect trace data. Solaris alas onlyallows for a 128MB trace buffer. To allow for longer traces Swat monitors howfull the trace buffer is, and once it is 80% full it will offload the tracebuffer to disk, all the while allowing the creating of new trace data tocontinue. This continues until the requested trace duration has been reached. Swat Trace Monitor (STM): With STF you need to know when the perform problem willoccur so that you can schedule it to run. Not every performance problem howeveris predictable. STM will run an in-memory trace and then monitors the overallstorage performance. Once a certain threshold is reach, for instance responsetime greater than 100 milliseconds, the in-memory trace buffer is dumped todisk and the trace then continues for an other nn-seconds before terminating. Swat Real Time Monitor: When a data collector is active on your current or anynetwork-connected host, Swat Real Time Monitor will open a java socketconnection with that host, allowing you to actively monitor the current storageperformance either from your local or any of your remote hosts. Swat Local Real Time Monitor: The quickest way to start using Swat. Just enter ‘./swat –l’and Swat will start a private Data Collector for your local system and thenwill show you exactly what is happening to your current storage workload. Nomore fiddling trying to get some useful data out of a pile of iostat output. Swat Reporter: The Swat Reporter ties everything together. All datacollected by the above Swat functions can be displayed using this powerful GUIreporting and charting function. You can generate hundreds of different performance charts ortabulated reports giving you intimate understanding of your storage workloadand performance. Swat will even create JPG files for you that then can beincluded in documents and/or presentations. There is even a batch utility (SwatBatch Reporter) that will automate the JPG generation for you. If you want,Swat will create a script for this batch utility for you. Some of the many available charts: Response time per controller or device I/O rate per controller or device Read percentage Data transfer size Queue depth Random vs. sequential (STF only) CPU usage Device skew Etc. etc. Swat has been written in Java. This means, that once your datahas been collected on its originating system, the data can be displayed andanalyzed using the Swat Reporter on ANY Java enabled system, including any typeof laptop. Swat is now available for the general public: https://cds.sun.com/is-bin/INTERSHOP.enfinity/WFS/CDS-CDS_SMI-Site/en_US/-/USD/ViewProductDetail-Start?ProductRef=SWAT-3.00-OTH-G-F@CDS-CDS_SMI The version currently available for the general public (Swat3.00) will expire on September 3, 2009. We are also planning to make Swat open source. One majorissue that needs to be resolved is the fact that Swat uses a licensedthird-party Java based charting package named KavaChart. All that code willhave to be rewritten. I’ll keep you posted. Henk.

Swat: a disk and tape I/O performance and workload reporter. Swat: Sun StorageTek Workload Analysis Tool. This tool was written to help Sun’s engineering and salesand service organizations understand a...

Oracle

Integrated Cloud Applications & Platform Services