Vdbench 5.03 beta

 

A lot of work and complexity went into this new release of Vdbench. I have done my best to test everything, but we all know through experience that things still can and will go wrong.

The most difficult part of the change was to Data Validation. DV is built around knowing that every single block has a unique content. Dedup is built around knowing that not every block is unique.  That’s a huge contradiction, but I have done my best to make sure that Data Validation still works within these limitations.

Because of this complexity please be careful with this beta release. If Vdbench calls out a Data Validation-identified data corruption, check and double-check everything, and also send me a copy of the output directory so that I can personally verify that this is not a code bug. Data corruptions are bad, but Vdbench calling out a corruption erroneously is really bad.

 

1         Summary of changes since Vdbench 5.02

 

  1. Addition of Data Deduplication functionality, using the dedupratio=n parameter.
  2. ‘compression=pct’ has been replaced with ‘compratio=n’. While the former used to be a ‘percentage of the data remaining’, the new compratio= parameter now indeed is a ratio. compratio=2 now means ‘a 2:1 ratio’.
  3. Major changes to the default data patterns and therefore the patterns= parameter.
  4. Warning: by default Vdbench now always generates a brand-new random data pattern for each new write request, and this may impact your maximum throughput and/or performance, especially for high bandwidth test. You may override the default though.
  5. Warning: due to the need to switch from shared read/write buffers to now each thread having separate read and write buffers you may need more memory.
  6. ‘start_cmd’ and ‘end_cmd’ now allows a series of command.
  7. Default data transfer size for file system testing changed from 64k to 128k.
  8. For tape testing there is no longer a requirement to write a tape in the same Vdbench execution before it is read when you do single threaded tape I/O.
  9. ‘range=’ wraparound for boundary testing.
  10. Added compratio and dedupratio to flatfile.html.

1.1      Data Deduplication:

 

Dedup is built into Vdbench with the understanding that the dedup logic included in the target storage device looks at each n-byte data block to see if a block with identical content already exists. When there is a match the block no longer needs to be written to storage and a pointer to the already existing block is stored instead.

Since it is possible for dedup and data compression algorithms to be used at the same time, dedup by default generates data patterns that do not compress.

 

dedupratio=n

Ratio between the original data and the actually written data, e.g. dedupratio=2 for a 2:1 ratio. Default: no dedup.

compratio=n

Ratio between the original data and the actually written data, e.g. compratio=2 for a 2:1 ratio. Default: see Changes to the pattern= parameter

dedupunit=nnn

The size of a data block that dedup tries to match with already existing data. Default dedupunit=128k

dedupsets=nn

How many different sets or groups of duplicate blocks to have. See below. Default: dedupsets=5% (You can also just code a numeric value, e.g. dedupsets=100)

 

For a Storage Definition (SD) dedup is controlled on an SD level; For a File System Definition (FSD) dedup is controlled on an FSD level, so not on a file level.

 

There are two different dedup data patterns that Vdbench creates:

1.1.1      Unique blocks

Unique blocks: These blocks are unique and will always be unique, even when they are rewritten. In other words, a unique block will be rewritten with a different content than all its previous versions.

1.1.2      Duplicate blocks:  

With dedupratio=1 there of course will not be any duplicate blocks.

Duplicate blocks as the name indicates are duplicates of each other. They are not all duplicates of one single block though. That would have been too easy. There are ‘nn’ sets or groups of duplicate blocks. All blocks within a set are duplicates of each other. How many sets are there? I have set the default to dedupsets=5%, or 5% of the estimated total amount of dedupunit=nn blocks.

 

Example: a 128m SD file, for 1024 128k blocks. There will be (5% of 1024) 51 sets of duplicate blocks.

Dedupratio=2 ultimately will result in wanting 512 data blocks to be written to disk and 512 blocks that are duplicates of other blocks.

 

‘512-51 = 461 unique blocks’ + ‘512+51=563 duplicate blocks’ = 1024 blocks.

The 461 unique blocks and the 51 sets make for a total of 512 different data blocks that are written to disk. 1024 / 512 = 2:1 dedup ratio. (The real numbers will be slightly different because of integer rounding and /or truncation).

 

1.1.3      Vdbench xfersize= limitations.

Since the accuracy of dedup all revolves around the dedupunit= parameter, all read and write operations must be a multiple of that dedupunit= size. This means that if you use dedupunit=8k, all data transfer sizes used must me multiples of that: 8k, 16k, 24k, etc. Vdbench will fail if it finds transfer sizes that do not follow these rules.

Technically of course (unless you are running with data validation) there is no need for read requests to follow these rules. I thought it best though  to follow the same rules for both reads and writes.

 

1.1.4      Rewriting of data blocks.

As mentioned above, the unique block’s data contents will change each time they are written. The data pattern includes the current time of day in microseconds, together with the SD or FSD name. This makes the content pretty unique unless of course the same block to the same SD is written more than once within the same microsecond.

 

For the duplicate blocks that was a whole different trail of discovery. Initially I had planned to never change these blocks until I realized that if I do not change them there will never be an other physical disk write because there always will be an already existing copy of each duplicate block. That makes for great benchmark numbers, but that never is my objective. Honesty is always the best way to go.

But changing the contents of these blocks for each write operation then causes a new problem: I won’t get my expected dedupratio. Catch 22.

That’s when I decided that yes, I will change the contents, but only once. And the next time that this block is written it will be changed back to its original content, (flip flop). It still means that my expected dedupratio can be a little off, this because within a set of duplicates there now can be two different data contents, but it stays close. I typically see 1.87:1 instead of the requested 2:1, which is close enough.

 

1.1.5      Use of Data Validation code.

So how to keep track of what the current content is of a duplicate data block?

Data Validation already had everything that is needed; it knows exactly what is written where. Using this ability was a very easy decision to make. Of course, unless specifically requested the actual contents of a block after a read operation will not be validated.

So now, when dedup is used, Data Validation instead of keeping track of 126 different data patterns per block now keeps track of only two different data patterns to support the flip-flop mentioned above.

One more problem needed to be resolved: how to pass on the information about each block’s current content between Vdbench runs? Of course there is Journaling, but journaling is very expensive and is only needed to allow for some serious testing around possible data integrity issues and that is when optimum performance is not a 100% requirement.

I therefore decided against the use of Journaling, but instead moved the in-memory Data Validation maps from Java heap space to a memory mapped (mmap) disk file. Unless your operating system goes down, memory mapping assures that the information of what is written where is preserved on disk.

To ask Vdbench to reuse the existing information, code ‘validate=continue’. By default Vdbench will create a brand new map, but this parameter reuses the existing contents.

 

1.1.6      Swat/Vdbench Replay with dedup.

One of the great features when you combine these two tools is the fact that you can take any customer I/O trace (Solaris and windows), and replay the exact I/O workload whenever and wherever you want.

Of course, the originally traced I/O workload does not properly follow the above-mentioned requirements of all data transfer sizes and therefore lba’s being a multiple of the dedupunit= size.

For Replay Vdbench adjusts all data transfer sizes to its nearest multiple of the required size and lba, and then also reports the average difference between the original and modified size.

 

1.1.7      offset= and align= parameter and dedup.

Because of the requirement for everything to be properly aligned the offset= and align= parameters of course cannot be used with dedup.

 

1.1.8      Changes to default data pattern and the pattern= parameter.

 

Until Vdbench503 the default data pattern was very primitive. With the introduction of dedup and data compression this can lead to artificially good performance numbers. That is fine with me, as long as it is not abused. I therefore decided that the default pattern has to change. If the user still wants to do it  ‘the old way’ that’s fine, but the Vdbench output directory will show that he purposely overrode the default and did not ‘accidentally’ get these overly good performance numbers if his storage happens to support compression or dedup.

 

Starting 503 a default random data pattern will be generated for each write. This makes sure that no accidental compression or dedup will occur. There is a price for this however: the extra CPU cycles needed to accomplish this. Especially for high bandwidth write workloads that can be a problem because it may lower the maximum observed throughput that you saw with earlier versions of Vdbench.

As was done in the past, on Solaris, Windows and Linux systems Vdbench reports overall CPU utilization and Vdbench will continue to display a large warning if the average amount of CPU usage goes above 80% to give you a warning that you may have run into problems.

 

pattern=random

This is the default.

For each new write request a brand-new data pattern is generated using Linear Feedback Shift Register (LFSR) logic. This guarantees random data. This is the default

pattern=randomonce

Uses LFSR only once before the first write operation. This means that all blocks will be written with the same contents. However, a portion of each 4k of each data block will be modified to prevent the block from being a dedup candidate. Since each thread has its own write buffer in 503 the contents of this buffer will be preserved.

pattern=old

Fall back to the default data pattern used in Vdbench 5.02:

Data buffers for each SD are initialized once with the values 0,1,2,3,etc in each 32 bits of the data block; Data buffers for FSD files are always initialized with 0x68656e6b J in each 32 bits.

Note that since in the past the same buffer was used for reads AND writes a workload with a mixed read/write workload could overlay this one-time generated data pattern. 5.03 will always have separate data buffers for reads and writes. The 4k dedup update mentioned above will also be done.

pattern=file/name

The contents of the file are copied into the data buffer. If the length of the file is smaller than the buffer, its contents will be repeated until the end of the buffer.

 

2         Miscellaneous changes

2.1      start_cmd and end_cmd parameters

You can now specify a series of commands to execute.

Simple example: start_cmd=(“ls /a”, “ls /b”,”lockstat ….”).

 

2.2      Default xfersize for file formatting.

To accommodate the default dedupunit=128k the default xfersize used for SD and FSD file formatting has also changed to 128k. See ‘fwd=format’ and ‘formatxfersize=’ in the documentation.

 

2.3      Tape testing

To support multi-threaded tape I/O Vdbench needs to know how much data is written on a tape so that it can avoid trying to read beyond the tape marks. If you have eight I/O’s queued and the first I/O hits EOT, there is no way to stop the other seven already queued I/O’s from starting. We’ve caused some crazy problems this way. Vdbench therefore required that a tape that needed to be read first had to be written to in the very same Vdbench execution. Vdbench then was able to avoid the above-mentioned seven reads from ever being issued.

Multi-threaded tape I/O is extremely dangerous because you’re never guaranteed what order the data arrives on tape or, after reading, which order the data arrives back in memory. I never believed in using multi-threaded tape i/o, but decided to implement it anyway to get people to start using Vdbench. Now with 503 I have removed the ‘first write then read’ requirement when doing single threaded I/O, so you now can read ANY tape, real or virtual.

 

2.4      ‘wrapping’ using the range= parameter

The SD or WD ‘range=(nn,mm)’ parameter tells Vdbench to only use the disk space between nn% and mm% of its available space, for instance range=(10,20). Last week I received a request to allow for some more boundary testing by allowing a wraparound, for instance range=(90,110). This now causes all disk space between 0 and 10% and 90 and 100% of the capacity to be used. When using seekpct=eof however Vdbench will start at 90% but will end at 100%.

Note that when your lun size is not a multiple of your xfersize, the high lba is truncated. Vdbench won’t handle reading or writing half a block at then end and then the next half at the beginning. Vdbench also refuses to write to block zero of a raw volume so that it does not overwrite the volume labels. This all means that this wrapping is not perfect.

 

2.5      ‘compratio’ and ‘dedupratio’ in flatfile.html 

I received a request for this change from a user who loves to put as much testing as possible all in one Vdbench execution by using many of the ‘forxxx=’ parameters. His total came to 17280 runs within one Vdbench execution! Since he uses the Vdbench flatfile parser he missed the ability to find the compression ratio in the flatfile.

Some changes just take only a few minutes. For a user who puts so much effort in using Vdbench that he set a (probable) world record I don’t mind spending a few minutes on a small enhancement J