Testing the New Pool-of-Threads Scheduler in MySQL 6.0
By Tim Cook on Apr 08, 2009
I have recently been investigating a bew feature of MySQL 6.0 - the "Pool-of-Threads" scheduler. This feature is a fairly significant change to the way MySQL completes tasks given to it by database clients.
To begin with, be advised that the MySQL database is implemented as a single multi-threaded process. The conventional threading model is that there are a number of "internal" threads doing administrative work (including accepting connections from clients wanting to connect to the database), then one thread for each database connection. That thread is responsible for all communication with that database client connection, and performs the bulk of database operations on behalf of the client.
This architecture exists in other RDBMS implementations. Another common implementation is a collection of processes all cooperating via a region of shared memory, usually with semaphores or other synchronization objects located in that shared memory.
The creation and management of threads can be said to be cheap, in a relative sense - it is usually significantly cheaper to create or destroy a thread rather than a process. However these overheads do not come for free. Also, the operations involved in scheduling a thread as opposed to a process are not significantly different. A single operating system instance scheduling several thousand threads on and off the CPUs is not much less work than one scheduling several thousand processes doing the same work.
The theory behind the Pool-of-Threads scheduler is to provide an operating mode which supports a large number of clients that will be maintaining their connections to the database, but will not be sending a constant stream of requests to the database. To support this, the database will maintain a (relatively) small pool of worker threads that take a single request from a client, complete the request, return the results, then return to the pool and wait for another request, which can come from any client. The database's internal threads still exist and operate in the same manner.
In theory, this should mean less work for the operating system to schedule threads that want CPU. On the other hand, it should mean some more overhead for the database, as each worker thread needs to restore the context of a database connection prior to working on each client request.
A smaller pool of threads should also consume less memory, as each thread requires a minimum amount of memory for a thread stack, before we add what is needed to store things like a connection context, or working space to process a request.
You can read more about the different threading models in the MySQL 6.0 Reference Manual.
Testing the Theory
Mark Callaghan of Google has recently had a look at whether this theory holds true. He has published his results under "No new global mutexes! (and how to make the thread/connection pool work)". Mark has identified (via this bug he logged) that the overhead for using Pool-of-Threads seems quite large - up to 63 percent.
So, my first task is see if I get the same results. I will note here that I am using Solaris, whereas Mark was no doubt using a Linux distro. We probably have different hardware as well (although both are Intel x86).
Here is what I found when running sysbench read-only (with the sysbench clients on the same host). The "conventional" scheduler inside MySQL is known as the "Thread-per-Connection" scheduler, by the way.
This is in contrast to Mark's results - I am only seeing a loss in throughput of up to 30%.
What about the bigger picture?
These results do show there is a definite reduction in maximum throughput if you use the pool-of-threads scheduler.
I believe it is worth looking at the bigger picture however. To do this, I am going to add in two more test cases:
- sysbench read-only, with the sysbench client and MySQL database on separate hosts, via a 1 Gb network
- sysbench read-write, via a 1 Gb network
What I want to see is what sort of impact the pool-of-threads scheduler has for a workload that I expect is still the more common one - where our database server is on a dedicated host, accessed via a network.
As you can see, the impact on throughput is far less significant when the client and server are separated by a network. This is because we have introduced network latency as a component of each transaction and increased the amount of work the server and client need to do - they now need to perform ethernet driver, IP and TCP tasks.
This reduces the relative overhead - in CPU consumed and latency - introduced by pool-of-threads.
This is a reminder that if you are conducting performance tests on a system prior to implementing or modifying your architecture, you would do well to choose a test architecture and workload that is as close as possible to that you are intending to deploy. The same is true if you are are trying to extrapolate performance testing someone else has done to your own architecture.
The Converse is Also True
On the other hand, if you are a developer or performance engineer conducting testing in order to test a specific feature or code change, a micro-benchmark or simplified test is more likely to be what you need. Indeed, Mark's use of the "blackhole" storage engine is a good idea to eliminate that processing from each transaction.
In this scenario, if you fail to make the portion of the software you have modified a significant part of the work being done, you run the risk of seeing performance results that are not significantly different, which may lead you to assume your change has negligible impact.
In my next posting, I will compare the two schedulers using a different perspective.