By Henk Vandenbergh-Oracle on Apr 19, 2010
On Solaris, and I expect the same with other operating systems, whenever and I/O is requested some process-level lock is set. This means that if you try to run very high IOPS, this lock can become 100% busy, causing all threads that need this lock to start spinning. End result is two-fold: high CPU utilization and/or lower than expected IOPS.
This is not a new problem. The problem was discovered several years ago when storage subsystems became fast enough to handle 5000 IOPS and more. Since that time cpus have become much faster and Solaris code has been enhanced several times to lower the need and duration for these locks. I have seen Vdbench runs where we were able to do 100k IOPS without problems.
Vdbench is written in Java, and Java runs as a single process. Vdbench therefore introduced what is called multi-JVM mode, the ability of Vdbench to split the requested workload over multiple JVMs (Java Virtual Machines).
By default Vdbench starts one JVM for each 5000 IOPS requested, with a maximum of 8, and no more than one per Storage Definition (SD). The 5000-number probably should be changed some day; it is a leftover of the initial discovery of this problem.
So, when you ask for iorate=max with only a single SD and you’re lucky enough to be running against a Solid State Device (SSD) guess what: you may run into this locking problem.
To work around this you have to override the default JVM count:
- Specify hd=localhost,jvms=nn I suggest you request one JVM for each 50k IOPS that you expect
- Add ‘-m nn’ as an execution parameter, for instance ‘-m4’.
There is one exception though, and that is for 100% sequential workloads using the seekpct=sequential or seekpct=eof Workload Definition (WD) parameter. A sequential workload will only run using one single JVM. This is done to prevent for instance with two JVMs that the workload would look like this: read block 1,1,2,2,3,3,4,4,5,5, etc. The performance numbers of course will look great because the second read of a block will be guaranteed a cache hit, but this is not really a valid sequential workload.