MySQL Query Cache Sizing

I recently helped to analyze a performance issue with one of our MySQL customers, which highlighted the need to be cautious when sizing the query cache, and to validate whether it works well with your workload. The customer was evaluating a new 2-socket, 8 core, X64 system running Solaris, as a replication slave in an environment consisting of older systems running Linux. When they introduced the new slave, they found that it was unable to keep up with its replication load during their peak periods, and it took hours to catch up once the traffic dropped. The older systems running Linux had no trouble keeping up, so the obvious assumption was that there was something wrong with the new system, with Solaris, or with ZFS. In reality, the issue turned out to be an over sized MySQL query cache, and once that was addressed on the new system, it performed as expected.

The query cache is an interesting component of MySQL. It caches result sets of queries, which has the potential to improve performance by eliminating the entire parse/optimize/execute cycle for a repeated query. This comes at a cost though, and that cost tends to increase with the size of the cache. Due to the current implementation, the cost may also increase with the number of hardware threads and with the number of active connections, due to the cache's current coarse grained locking implementation. When the query cache helps, it can help a lot. When it hurts, it can hurt a lot. If you are currently using or considering using the query cache, you should definitely test your assumptions about how it interacts with your particular workload. You should also plan to revisit the topic from time to time, since changes to your application or workload may change the interactions with the query cache.

This particular case also highlights an issue related to migrations. If you are doing a "major" migration, for example between systems with different architectures, significantly different configurations, or, as in this case, with different operating systems, be sure to keep track of any other changes that are made along with the migration. It is natural to assume that any negative impacts are due to the big changes, but many times it will turn out to be a seemingly small change that causes grief. In this case, we knew that a major performance drop due to a Linux to Solaris migration was completely unexpected, which kept us from being led too far astray from finding the real root cause.

Now back to the query cache... In most cases, the query cache should be sized in the tens of megabytes and not larger. In the particular case we investigated, the customer had been running with a 20MB query cache on the older systems, and configured the new system with a 500MB query cache. This was presumably done because the new system had substantially more memory than the older systems, and it seemed to make sense to use some of that memory to boost the size of the query cache. However, due to a number of factors related to the implementation of the query cache, this actually hurt performance by a large margin. This was due to the level of overhead that was required to maintain the large cache, which was much higher than any benefit that it could provide.

To identify the problem, the customer generated a dtrace profile of mysqld with the DTraceToolkit "/opt/DTT/hotuser" script, or one similar to it. When piped through c++filt to demangle C++ names, the output looked like this:

Truncated to top 4

libc.so.1`clear_lockbyte 361 1.2%
libc.so.1`mutex_lock_impl 511 1.7%
libc.so.1`mutex_trylock_adaptive 1127 3.8%
mysqld`Query_cache::insert_into_free_memory_sorted_list(Query_cache_block\*, Query_cache_block\*\*) 21702 73.1%

This shows that mysqld was spending 73% of its time in the Query_cache::insert_into_free_memory_sorted_list() method. As soon as this was seen, we knew that the issue was related to updating the query cache, and a quick comparison of configurations confirmed that this system had a dramatically larger query cache than the systems that were performing well. It is likely that plockstat would have shown substantial contention on the query cache structure_guard_mutex as well, but that wasn't checked. Decreasing the query cache size, by setting the query_cache_size tunable, resolved the problem instantly.

The issue here was that the customer had a moderate level of write traffic, and the current query cache implementation invalidates all result sets for a given table whenever that table is updated. As the query cache grows in size, the number of entries that must be invalidated for a given table may grow as well. In addition, the coarse locking on the cache can lead to lock contention that can kill performance, particularly on multi-core hardware.

This leads to some general recommendations:

  1. If you plan to use the query cache, start small, perhaps 10MB to 20MB, and monitor the impact on your query/transaction throughput and response time. If performance drops, the query cache may not be a good match for your workload.
  2. Consider the read/write mix of your workload, as well as the rate of data updates. Query cache invalidations due to write traffic can be quite expensive, and even moderate levels can counteract the benefit of the cache.
  3. Verify whether the query cache is enabled, and if so, verify its currently configured size. You can do this by running "SHOW VARIABLES LIKE '%query_cache%';".  If query_cache_size (which defines the size of the cache, in bytes) is non-zero, and have_query_cache is TRUE, the cache is active. If query_cache_size is set to zero or have_query_cache is FALSE, the query cache is disabled.
    Note:
    • query_cache_size defaults to zero (ie disabled) on UNIX/Linux, but is non-zero in some of the example my.cnf files included with the MySQL distribution and on Windows.
    • You can change query_cache_size dynamically via "SET GLOBAL query_cache_size = XXX;" or permanently by adding a "query_cache_size = XXX" statement to your my.cnf file.
  4. Verify your query cache statistics via "SHOW GLOBAL STATUS;" for example
    • Hit rate = Qcache_hits / (Qcache_hits + Com_select)
    • Insert rate = Qcache_inserts / (Qcache_hits + Com_select)
    • Prune rate = Qcache_lowmem_prunes / Qcache_inserts
      • This is an approximation of the number of inserts that were too large for available free memory in the cache.
      • Updates to house keeping data may also result in a prune, so this is not an exact correlation.

You should try to allow all of the caches on your system to warm up before checking your cache statistics. This means allowing your application to run for at least 10 to 30 minutes, which is usually enough time for the system to reach a steady state.

When looking at hit rates, keep in mind that relatively low query cache hit rates may provide substantial benefit due to the potentially high cost of generating a result set for a query. For example, we usually try to maintain InnoDB buffer pool hit rates in the high 90% range, while hit rates below 50% in the query cache may be fine. On the other hand, if you can't get hit rates out of the single digits, you should really question the benefit of enabling the cache, since looking in the cache adds overhead to every select. In the end, query/transaction throughput and response time should be your ultimate indicator.

When looking at insert rates, keep in mind that relatively low insert rates may have a substantial negative impact on performance due to the high cost of updating the cache. I would be cautious of anything higher than single digit insert rates, and test the application with the query cache disabled (by setting query_cache_size to zero) to see if the cache is helping or hurting performance. However, keep in mind that a high prune rate might result in the need to repeatedly insert the same result set in the cache, so you need to consider that as well.

A non-zero prune rate may be an indication that you should increase the size of your query cache. However, keep in mind that the overhead of maintaining the cache is likely to increase with its size, so do this in small increments and monitor the result. If you need to dramatically increase the size of the cache to eliminate prunes, there is a good chance that your workload is not a good match for the query cache.

As you consider how your workload interacts with the query cache, keep in mind that you can use the SQL_NO_CACHE hint in a select statement if you find that a specific query has a tendency to negatively interact with the query cache.

As I said earlier, when the query cache helps, it can help a lot, but when it hurts, it can hurt a lot. It all depends on your particular workload, so it requires testing and is not something you should just blindly enable. In the end, the goal is to increase throughout and/or reduce response time, so those are the metrics you should ultimately consider. For more information about configuring and tuning the query cache, see the Query Cache section of the MySQL documentation. There is also an excellent write up in the book "High Performance MySQL" by Baron Schwartz et al, that touches on a number of issues I haven't mentioned.

I'd like to acknowledge my colleague Simon Vobes, who did a great job working with the customer to resolve this performance issue!

Comments:

David, rather than RAM allocated you might want to look at the size of the free list. If it's over 10,000 that's a sign that the query cache is probably starting to become inefficient. By 100,000 it's very likely to be an issue. For any size you can improve this with FLUSH QUERY CACHE, since that defragments the free list. Sometimes it might be worth doing that regularly with a larger size. FLUSH QUERY CACHE is a good way to quickly confirm that a too-large free list is an issue. You can expect to see a quick decrease in CPU usage if it is.

It's also worth being aware that in older installations FLUSH QUERY CACHE and other operations that remove large numbers of entries from the query cache would stall the server until they completed. The more recent ones temporarily disable the query cache instead of making everything wait.

Prior to MySQL 5.1.33 a query that used SQL_NO_CACHE would still check the query cache to see whether the result is there. That and later versions skip the check and hence skip taking the server-wide query cache mutex that all SELECTs otherwise encounter. This mutex is an issue to be aware of on systems with lots of cores, since it's effectively serialising throughput with the query cache as a potential bottleneck and running on just a single core.

This is one part of the MySQL server that's showing its age and needs a new design. It's on the to do list.

James Day, MySQL Senior Support Engineer, Sun Microsystems

Posted by James Day on March 25, 2009 at 01:12 PM PDT #

@James, thanks for the great comments!

If I understand your comment about looking at the size of the free list (Qcache_free_memory), I assume you mean to check it in combination with Qcache_lowmem_prunes, which makes sense. If you see prunes but have a lot of free space in the cache, the cache is probably fragmented.

Thanks!

Posted by David Lutz on March 25, 2009 at 11:27 PM PDT #

The cache will be fragmented if the value of prumes are high and the free blocks are high too.

Posted by WB on April 20, 2009 at 12:29 AM PDT #

Could you get a general idea on "query cache size" when 1 million,2 million or 5 million records in mysql?

Posted by robin on November 03, 2010 at 07:23 PM PDT #

By size of the free list I meant Qcache_free_blocks, not Qcache_free_memory.

Robin, more than twenty megabytes is unlikely to be useful. If you're setting it larger you need to carefully consider and test whether it's really useful to be as big as it is. It's been rare for me to see people having even as much as fifty megabytes really used, even if more is allocated, and that's at the high end of real use. No harm to try larger allocations if you want to and maybe you'll be one of the uncommon cases where they do some good. The size doesn't depend on the number of records, it depends more on how often at least one record is changed in each table and how big the result set sizes are. The more frequent the changes, the smaller the useful size of the query cache. The settings that are most affected by the data size are key_buffer_size and innodb_buffer_pool_size. Getting those right is the most important tuning you can do.

If you're configuring a replication slave with MySQL 5.5 and later and you're using row-based replication your optimal setting is probably query_cache_type = 0 and query_cache_size = 0 to eliminate the significant overhead it adds to row-based replication slaves. There's no difference for query_cache_type = 0 with RBR on a replication slave in 5.1 and earlier. We didn't change it in 5.1 because the change in 5.5 blocks resizing while the server is running.

Posted by James Day on November 03, 2010 at 09:35 PM PDT #

Post a Comment:
  • HTML Syntax: NOT allowed
About

user12610824

Search

Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today