Today Sun Microsystems announced MySQL 5.4, a release that focuses on performance and scalability. For a long time it's been possible to escape the confines of a single system with MySQL, thanks to scale-out technologies like replication and sharding. But it ought to be possible to scale-up efficiently as well - to fully utilize the CPU resource on a server with a single instance.
MySQL 5.4 takes a stride in that direction. It features a number of performance and scalability fixes, including the justifiably-famous Google SMP patch along with a range of other fixes. And there's plenty more to come in future releases. For specifics about the MySQL 5.4 fixes, check out Mikael Ronstrom's blog.
So how well does MySQL 5.4 scale? To help answer the question I'm going to take a look at some performance data from one of Sun's CMT systems based on the UltraSPARC T2 chip. This chip is an interesting test case for scalability because it amounts to a large SMP server in a single piece of silicon. The UltraSPARC T2 chip features 8 cores, each with 2 integer pipelines, one floating point unit, and 8 hardware threads. So it presents a total of 64 CPUs (hardware threads) to the Operating System. Since there are only 8 cores and 16 integer pipelines, you can't actually carry out 64 concurrent operations. But the CMT architecture is designed to hide memory latency - while some threads on a core are blocked waiting for memory, another thread on the same core can be executing - so you expect the chip to achieve total throughput that's better than 8x or even 16x the throughput of a single thread.
The data in question is based on an OLTP workload derived from an industry standard benchmark. The driver ran on the system under test as well as the database, and I used enough memory to adequately cache the active data and enough disks to ensure there were no I/O bottlenecks. I applied the scaling method we've used for many years in studies of this kind. Unused CPUs were turned off at each data point (this is easily achieved on Solaris with the psradm command). That ensured that "idle" CPUs did not actually participate in the benchmark (e.g. the OS assigns device drivers across the full set of CPUs, so they routinely handle interrupts if you leave them enabled). I turned off entire cores at a time, so an 8 "CPU" result, for example, is based on 8 hardware threads in a single core with the other seven cores turned off, and the 32-way result is based on 4 cores each with 8 hardware threads. This approach allowed me to simulate a chip with less than 8 cores. Sun does in fact ship 4- and 6-core UltraSPARC T2 systems; the 32- and 48-way results reported here should correspond with those configurations. For completeness, I've also included results with only one hardware thread turned on.
Since the driver kit for this workload is not open source, I'm conscious that community members will be unable to reproduce these results. With that in mind, I'll separately blog Sysbench results on the same platform. So why not just show the Sysbench results and be done with it? The SQL used in the Sysbench oltp test is fairly simple (e.g. it's all single table queries with no joins); this test rounds out the picture by demonstrating scalability with a more complex and more varied SQL mix.
It's worth noting that any hardware threads that are turned on will get access to the entire L2 cache of the T2. So, as the number of active hardware threads increases, you might expect that contention for the L2 would impact scalability. In practice we haven't found it to be a problem.
Here is the my.cnf:
table_open_cache = 4096
innodb_log_buffer_size = 128M
innodb_log_file_size = 1024M
innodb_log_files_in_group = 3
innodb_buffer_pool_size = 6G
innodb_additional_mem_pool_size = 20M
innodb_thread_concurrency = 0
Enough background - on to the data. Here are the results presented graphically.
And here is some raw data:
Abbreviations: "vCPUS" refers to hardware threads, "Us", "Sy", "Id" are CPU utilization for User, System, and Idle respectively, "smtx" refers to mutex spins, "icsw" to involuntary context switches, and "sycl" to systems calls - the data in each of these last three columns is normalized per transaction.
Some interesting takeaways:
- The throughput increases 71% from 32 to 64 vCPUs. It's encouraging to see a significant increase in throughput beyond 32 vCPUs.
- The scaleup from 1 to 64 vCPUs is almost 30x. As I noted earlier, the UltraSPARC T2 chip does not have 64 cores - it only has 8 cores and 16 integer pipelines, so this is a good outcome.
To put these results into perspective, there are still many high volume high concurrency environments that will still require replication and sharding to scale acceptably. And while MySQL 5.4 scales better than previous releases, we're not quite seeing scalability equivalent to that of proprietary databases. It goes without saying, of course, that we'll be working hard to further improve scalability and performance in the weeks and months to come.
But the clear message today is that MySQL 5.4 single-server scalability is now good enough for many commercial transaction-based application environments. If your organization hasn't yet taken the plunge with MySQL, now is definitely the time. And you certainly won't find yourself complaining about the price-performance.