SSDs for Performance Engineers
By realneel on Apr 01, 2009
There has a been lot of buzz regarding SSDs lately. SSDs change the dynamics of the IO subsystem. You are no longer limited by rotational latency and vibration effects. For a performance engineer this has many implications. Since performance engineers care mostly about performance, the first thought that comes to mind is "Are we going to see a big impact in benchmarks?".
The answer is really easy for IO bound benchmarks. How about CPU bound benchmarks? Many database benchmarks are CPU limited. Does a faster disk really change anything?
So what does an SSD really give you?
- Faster IOPS
- Decreased Latency for an IO
SSD's have a huge random IO capability. During a recent experiment with a SSD, I got around 12,000 random IO operations per second! I have seen SSDs where you can get more. If you have ever worked with rotating disks, this is HUGE. However, performance engineers have rarely been limited by IOPS. We can always use multiple disks to get the same IOPS requirements. Since we are talking benchmarks, and many benchmarks do not have a $/txn metric, we dont worry about cost More importantly, many of the benchmarks (TPC-C, TPC-E, etc) are all CPU limited. So having a faster disk does not really change too many things.
Performance engineers have long known that it is easy to compensate for the IO latency by using multiple threads of execution. You can always hide IO latency by using multiple threads of execution (provided you have a scalable workload). This is a very common technique and used in many benchmarks.
True value of a SSD
SSDs have a huge advantage in terms of cost, power and density. If you factor $/txn or Watt/txn, then SSDs clearly have a big advantage; and no wonder customers are really pumped up about SSDs. SSDs are really good for random read IO. For sequential IO, you are most likely going to be limited by your bus bandwidth and a few regular disks should be sufficient to max out the pipe. For writes, using a controller that has a write cache should provide the same latency for most database workloads. It is the random reads where the SSD's true value can be seen.
So where does SSDs really help?
For CPU bound workloads, the benefit of SSDs may be realized in the fact that you need fewer threads of execution to max out your CPUs. Using fewer threads of execution means there will less contention for resources, and it is possible for some workloads to benefit by reduced contention. This improvement is application specific. Scalable workloads will see less benefit than non-scalable workloads.
For mysql, this reduced contention would probably benefit a lot. It is probably the same effect as reducing innodb_thread_concurrency but with greater throughput
For IO bound applications or non-scalable applications (for ex, applications that hold a lock while doing IO), the decreased latency with SSDs will cause a huge improvement in their performance. SSDs kinda level the playing field. Most database vendors have spent years optimizing the IO codepath. Things like multiblock read, or IO sorting, etc are moot with SSDs. Copy-on-write is the same as 'inplace' modification of database blocks.
SSDs as a cheaper RAM
Probably the most important benefit of using SSDs with databases is that it reduces the penalty for missing the database buffer cache. A common question customers usually face is how much is the performance improvement will I get if I buy more memory? The answer is indirectly related to the penalty for missing the cache. A database buffer cache miss has to be fulfilled by an IO. For a regular disk this is atleast 6ms. For a SSD, this is less than 1ms!. It is straightforward to calculate avg time for IO for different hit rates. Depending on how many misses you have per transaction, the benefits of using SSD can be huge. I will blog with some actual numbers in a later blog.
So in conclusion, most customers will benefit a lot by using SSDs. Performance engineers are not so lucky