MySQL Query Cache Sizing
By user12610824 on Mar 06, 2009
The query cache is an interesting component of MySQL. It caches result sets of queries, which has the potential to improve performance by eliminating the entire parse/optimize/execute cycle for a repeated query. This comes at a cost though, and that cost tends to increase with the size of the cache. Due to the current implementation, the cost may also increase with the number of hardware threads and with the number of active connections, due to the cache's current coarse grained locking implementation. When the query cache helps, it can help a lot. When it hurts, it can hurt a lot. If you are currently using or considering using the query cache, you should definitely test your assumptions about how it interacts with your particular workload. You should also plan to revisit the topic from time to time, since changes to your application or workload may change the interactions with the query cache.
This particular case also highlights an issue related to migrations. If you are doing a "major" migration, for example between systems with different architectures, significantly different configurations, or, as in this case, with different operating systems, be sure to keep track of any other changes that are made along with the migration. It is natural to assume that any negative impacts are due to the big changes, but many times it will turn out to be a seemingly small change that causes grief. In this case, we knew that a major performance drop due to a Linux to Solaris migration was completely unexpected, which kept us from being led too far astray from finding the real root cause.
Now back to the query cache... In most cases, the query cache should be sized in the tens of megabytes and not larger. In the particular case we investigated, the customer had been running with a 20MB query cache on the older systems, and configured the new system with a 500MB query cache. This was presumably done because the new system had substantially more memory than the older systems, and it seemed to make sense to use some of that memory to boost the size of the query cache. However, due to a number of factors related to the implementation of the query cache, this actually hurt performance by a large margin. This was due to the level of overhead that was required to maintain the large cache, which was much higher than any benefit that it could provide.
To identify the problem, the customer generated a dtrace profile of mysqld with the DTraceToolkit "/opt/DTT/hotuser" script, or one similar to it. When piped through c++filt to demangle C++ names, the output looked like this:
Truncated to top 4
libc.so.1`clear_lockbyte 361 1.2%
libc.so.1`mutex_lock_impl 511 1.7%
libc.so.1`mutex_trylock_adaptive 1127 3.8%
mysqld`Query_cache::insert_into_free_memory_sorted_list(Query_cache_block\*, Query_cache_block\*\*) 21702 73.1%
This shows that mysqld was spending 73% of its time in the Query_cache::insert_into_free_memory_sorted_list() method. As soon as this was seen, we knew that the issue was related to updating the query cache, and a quick comparison of configurations confirmed that this system had a dramatically larger query cache than the systems that were performing well. It is likely that plockstat would have shown substantial contention on the query cache structure_guard_mutex as well, but that wasn't checked. Decreasing the query cache size, by setting the query_cache_size tunable, resolved the problem instantly.
The issue here was that the customer had a moderate level of write traffic, and the current query cache implementation invalidates all result sets for a given table whenever that table is updated. As the query cache grows in size, the number of entries that must be invalidated for a given table may grow as well. In addition, the coarse locking on the cache can lead to lock contention that can kill performance, particularly on multi-core hardware.
This leads to some general recommendations:
- If you plan to use the query cache, start small, perhaps 10MB to 20MB, and monitor the impact on your query/transaction throughput and response time. If performance drops, the query cache may not be a good match for your workload.
- Consider the read/write mix of your workload, as well as the rate of data updates. Query cache invalidations due to write traffic can be quite expensive, and even moderate levels can counteract the benefit of the cache.
- Verify whether the query cache is enabled, and if so, verify its currently configured size. You can do this by running "SHOW VARIABLES LIKE '%query_cache%';". If query_cache_size (which defines the size of the cache, in bytes) is non-zero, and have_query_cache is TRUE, the cache is active. If query_cache_size is set to zero or have_query_cache is FALSE, the query cache is disabled.
- query_cache_size defaults to zero (ie disabled) on UNIX/Linux, but is non-zero in some of the example my.cnf files included with the MySQL distribution and on Windows.
- You can change query_cache_size dynamically via "SET GLOBAL query_cache_size = XXX;" or permanently by adding a "query_cache_size = XXX" statement to your my.cnf file.
- Hit rate = Qcache_hits / (Qcache_hits + Com_select)
- Insert rate = Qcache_inserts / (Qcache_hits + Com_select)
- Prune rate = Qcache_lowmem_prunes / Qcache_inserts
- This is an approximation of the number of inserts that were too large for available free memory in the cache.
- Updates to house keeping data may also result in a prune, so this is not an exact correlation.
You should try to allow all of the caches on your system to warm up before checking your cache statistics. This means allowing your application to run for at least 10 to 30 minutes, which is usually enough time for the system to reach a steady state.
When looking at hit rates, keep in mind that relatively low query cache hit rates may provide substantial benefit due to the potentially high cost of generating a result set for a query. For example, we usually try to maintain InnoDB buffer pool hit rates in the high 90% range, while hit rates below 50% in the query cache may be fine. On the other hand, if you can't get hit rates out of the single digits, you should really question the benefit of enabling the cache, since looking in the cache adds overhead to every select. In the end, query/transaction throughput and response time should be your ultimate indicator.
When looking at insert rates, keep in mind that relatively low insert rates may have a substantial negative impact on performance due to the high cost of updating the cache. I would be cautious of anything higher than single digit insert rates, and test the application with the query cache disabled (by setting query_cache_size to zero) to see if the cache is helping or hurting performance. However, keep in mind that a high prune rate might result in the need to repeatedly insert the same result set in the cache, so you need to consider that as well.
A non-zero prune rate may be an indication that you should increase the size of your query cache. However, keep in mind that the overhead of maintaining the cache is likely to increase with its size, so do this in small increments and monitor the result. If you need to dramatically increase the size of the cache to eliminate prunes, there is a good chance that your workload is not a good match for the query cache.
As you consider how your workload interacts with the query cache, keep in mind that you can use the SQL_NO_CACHE hint in a select statement if you find that a specific query has a tendency to negatively interact with the query cache.
As I said earlier, when the query cache helps, it can help a lot, but when it hurts, it can hurt a lot. It all depends on your particular workload, so it requires testing and is not something you should just blindly enable. In the end, the goal is to increase throughout and/or reduce response time, so those are the metrics you should ultimately consider. For more information about configuring and tuning the query cache, see the Query Cache section of the MySQL documentation. There is also an excellent write up in the book "High Performance MySQL" by Baron Schwartz et al, that touches on a number of issues I haven't mentioned.
I'd like to acknowledge my colleague Simon Vobes, who did a great job working with the customer to resolve this performance issue!