MySQL Innodb Performance Tuning For Disk-Bound Workloads On CMT Servers
By luojiac on Oct 09, 2007
MySQL is one of the world's most popular open source databases, and it is widely used and becoming the database-of-choice for many leading Web 2.0 sites. Like most database servers, the most common bottleneck in the enterprise environment encountered by MySQL is disk I/O. To maximize the performance of MySQL for disk I/O bound workloads on the Solaris operating system on CMT servers (e.g. the Sun Fire T2000 Server), configuration and tuning of MySQL server variables is critical and can make a big difference to performance, as does the optimization of the Solaris filesystem for MySQL, and the configuration of storage arrays.
MySQL server's performance can be optimized using various configuration settings. The first step is to read the configuration and system variables by running the command:
mysql> show variables;
Once you have done this, you can take appropriate action to configure/tune the variables for better performance. This action can be one of the following:
1. Change a value in the my.cnf configuration file, or mysql start up options
2. Configure the optimum number of user connections
3. Optimize Solaris file system performance
4. Setup and configure storage disk array
5. Make database schema changes, such as changing design of one or more tables, or adding or modifying indexes
6. Optimize the queries used by the application
This document will only concentrate on changing MySQL configuration settings and file system tuning. Storage configuration and changes at the database design and application level are not covered.
MySQL Server Variables
MySQL has many variables that can be adjusted to change MySQL behavior or for performance purpose. For I/O bound workloads, the most important parameters are memory related variables. MySQL includes several storage engines, including MyISAM, InnoDB, HEAP, and Berkeley DB (BDB), some variables apply to one of the storage engines only, some variables are used in the SQL layer applying to all the storage engines. While using the ACID transaction supported Innodb storage engine, first, we don’t need to configure the following memory-related variables, saving precious memory in the disk I/O bound workload:
· key_cache_age_threshold, key_cache_block_size, key_cache_division_limit
· read_buffer_size, read_rnd_buffer_size
There are several memory-related variables that apply to all storage engines
· join_buffer_size – A buffer used for full join. When there are large joins without indexes, increase this buffer size to improve the efficiency.
· sort_buffer_size – A buffer used for the sort result set allocated by each thread. This can speed up ORDER BY and GROUP BY queries.
· query_cache_size – Set this variable to a nonzero value to enable query caching
· query_cache_limit – The maximum size of the cached result set, the larger result set won’t be cached
· query_cache_min_res_unit –query cache allocate memory blocks with the minimum size set by this variable. When the application has a lot of queries with small results,the default block size (4KB) may lead to memory fragmentation. So with small resultsets, decreasing it to 2048 or 1024 bytes might improve performance; with large query resultsets, increasing it to 8192, 16384 or more may improve performance query_cache_type: 0=OFF, 1=ON
The MySQL query cache stores the identical SELECT queries issued by clients to the database server. This makes it possible to locate and re-issue the same queries without repetitive hard parsing activities. MySQL also stores the query's result set in the query cache, which can significantly reduce the overhead of creating complex result sets for queries from the disk or memory caches, reducing both physical and logical I/O. This can speed up applications where repetitive queries of products are being issued. If you see a high value for qcache_hits compared to your total queries at runtime or a low value for qcache_free_memory seen from the mysql>show status; you probably need to increase the value of the query_cache_size parameter accordingly. Otherwise, you would decrease the value of the query_cache_size parameter to save memory resources for the other MySQL cache buffers. If qcache_hit is 0 in the runtime, you would completely turn off the query cache by setting query_cache_type as 0, together with setting query_cache_size as 0, since there is some overhead caused by having the query cache enabled besides wasting the memory resource. If the application uses many simple SELECT queries without them being repeated, having the query cache enabled may actually impede performance by 5-10%. However, for applicationswith many repeated SELECT queries with large resultsets, the performance increase set by the query cache can be 200% or more.
· tmp_table_size – set the maximum memory to allocate to a temporary table automatically created during query execution before MySQL converts it into an on-disk MyISAM table. When you see a lot of queries with the state value shown as “copying to tmp table on disk” when running the mysql>SHOW PROCESSLIST(or mysqladmin –i10 processlist extended-status) command, this means that the temporary resultset was larger than the value set by tmp_table_size, so that the MySQL thread copies the temporary table from RAM to disk. , In such cases, increasing the value can speed up execution of large queries; otherwise, decrease the value to save memory for the MySQL I/O bound workload.
· table_cache – Size this cache to keep most tables open since opening tables can be expensive. The optimum value for table_cache is directly related to the number of tables that need to be opened simultaneously in order to perform multiple-table joins. The table_cache value should be no less than the number of concurrent connections times the largest number tables involved in any one join. 1024 is a good value for applications with a couple of hundred tables (each connection has its own entry). You should check the Open_tables status variable to see if it is large compared to table_cache
MySQL Innodb Only Memory-related variables
· innodb_buffer_pool_size – Set the amount of memory allocated to both Innodb data and index buffer cache. If the server requests data available in the cache, the data can be processed right away. Otherwise, the operating system will request that the data be loaded from the disk into the buffer. It is important to set this value as high as possible to use the more efficient innodb data and index buffer cache instead of operating system buffer. For the sysbench I/O bound workload on a T2000 server with 8G RAM, increasing innodb_buffer_pool_size from 4G to 5G can improve performance by around 11%.
· innodb_additional_mem_pool_size - Sets the amount of memory allocated to the buffer storing the InnoDB internal data dictionary and other internal data structures. This parameter does not affect performance much, so set it to 20M (For applications with more tables, more memory needs to be allocated here) for the sysbench OLTP I/O bound test case.
· innodb_log_buffer_size - Set the amount of memory allocated to the buffer storing InnoDB write-ahead log entries. For large transactions, the log can be loaded into the log buffer instead of writing log to the log files on disk untill the log buffer is flushed on each transaction commit. If you see large log I/Os in the show innodb status output at runtime, you probably need to set a larger value for the innodb_log_buffer_size parameter to save disk I/O. For workloads which don’t have long transactions like sysbench, it is not necessary to waste memory resources by setting a higher value for the log buffer; it is fine to set it to 8Mbytes.
Other MySQL Innodb variables impacting I/O Performance
· innodb_flush_log_at_trx_commit - InnoDB flushes the transaction log to disk approximately once per second in the background. As a default, innodb_flush_log_at_trx_commit is set to 1, meaning the log is flushed to the disk at a transaction commit, and modifications made by the transaction won’t be lost during a MySQL, OS, or HW crash. For workloads running with many small transactions, you can reduce disk I/O to the logs to improve performance by setting the innodb_flush_log_at_trx_commit parameter to 0, meaning no log flushing on each transaction commit. However, the transaction might be lost if MySQL crashes. In the sysbench OLTP I/O bound workload test on a T2000 server, setting innodb_flush_log_at_trx_commit =0 in the read-only test can improve performance by 4%. You can set this value to 2 to flush the log to the OS cache to save disk I/O on each transaction commit.
· innodb_log_file_size – Set the size of each log file in a log group. InnoDB writes to the log files in a circular fashion, so the bigger innodb_log_file_size, the less checkpoint flush activity, reducing disk I/O, but increasing recovery time. In the show innodb status output, if there are large page writes in the BUFFER POOL AND MEMORY part, you will need to increase this parameter.
Configure an Optimum Number of User Threads
MySQL is a single-process, multithreaded application. There is one master thread with highest priority to control the server. For every client request, it creates a dedicated user thread running at normal priority in the thread pools to process the user request and send back the result to each client once the result is ready. And there is one single user thread that waits for input from the console, and a group of utility threads running at lower priority to handle some background tasks. Currently, MySQL cannot scale well with the number of concurrent user connections. On a T2000 server, in the OLTP I/O bound read-write sysbench test, MySQL can scale from 2 up to 64 concurrent user threads to reach the peak performance point. After that, increasing the number of user connections will increase the user level lock contention observed from prstat –mL output(LCK) to reduce MySQL performance. For applications where the number of user connections is tunable, you need to test to get the optimum number of user connections for peak performance. For applications where the number of user connections is not tunable, the innodb_thread_concurrency parameter can be configured to set the number of threads working inside the InnoDB engine. You need to increase this value when you see many queries in the queue in show innodb status. Setting this value at 0 will disable it. On the T2000 server, we set it to be around 2\*(Num of disks) in the sysbench OLTP I/O bound workload test. Testing and tuning the optimal value for the innodb_thread_concurrency parameter according to the kind of workload, and behavior of your system at runtime, can affect performance significantly.
Optimize File system Performance on T2000
File system performance have a big impact on system performance -- particularly when running an I/O bound workload with a database size much bigger than system memory. How to configure the file system for better performance depends on the workload access pattern: random or sequential. For a sequential workload, we can increase the file system cluster size (the maxcontig parameter) to allow read ahead or writing back more data from/to the disk to reduce the total number of I/O operations. For random workloads, we can reduce the file system cluster size to match the innodb I/O size. On the Solaris platform, maxcontig is set as 128 by default, which will trigger read-ahead for the whole file system cluster length (128\*8 Kbytes on Solaris Sparc, 128\*4Kbytes on Solaris x86) or the maximum size of physical I/O set in maxphys system variable. In the sysbench OLTP I/O bound test (a random workload,) for example, it can saturate a disk and significantly degrade performance because of it. One way to solve this problem is to reduce the value of the maxcontig parameter. In the sysbench OLTP I/O bound tests on the T2000, changing the maxcontig value to be 2 by using the tunefs –a 2 /dev/dsk/c4t1d0s6 command on the file system improved performance by 10%-13%. The shortcoming of this solution is that it will impact the performance of other sequential workloads on your system. The other way to improve performance is to disable file system caching with the UFS mount option: forcedirectio (mount –o remount,forcedirectio /data). Since innodb has its own buffer cache for the data and index(set by innodb_buffer_pool_size) which is more efficient than the operating system cache, we can use filesystem directio to save the double buffering and automatically disable read-ahead to benefit the random workload. On a T2000 installed with Solaris 10 update 1 to update 3, it is important to add set auto_lpg_maxszc=1 to the /etc/system suggested by Aleksandr Guzovskiy to reduce max pagesize for heap/stack/mmap to be 64k(default is up to be 256M) while using filesystem directio, otherwise, you may encounter a big performance drop with directio while multiple concurrent user threads are accessing the MySQL database. On the T2000 in the sysbench OLTP I/O bound tests, using directio improved performance by 14%-17%. The shortcoming of using filesystem directio is that it will significantly impact the performance of other applications on your system which don’t have internal caching (like the MySQL MyISAM engine which does not have its own data buffer cache) but instead depend on the filesytem caching to implement buffering.
Expected MySQL Performance On Niagara 2 UltraSPARC T2 Server
For disk I/O workload, MySQL can benefit the new features on Niagara 2 servers compared to Niagara 1 servers:
Larger L2 Cache: The on-chip 4MB L2 cache on Niagara 2 server can cache frequently accessed memory to get better MySQL performance.
Larger memory: Up to 512GB memory of fully buffered DIMMs in the integrated memory controller with an aggregated memory bandwidth of 64 gbps on the Niagara 2 server can buffer larger databases into the memory and reduce the time the CPU spends waiting for data to arrive. By adjusting how much memory MySQL innodb uses, we can expect to get significant performance improvements.
X8 PCI Express: PCI-E directly on-chip can reduce latency to speed up MySQL disk I/O performance as expected.
More paper on UltraSPARC T2 Server Technology, performance, etc.
Allan Pack's Weblog: CMT Comes Of Age
Example MySQL options
Here is the example of /etc/my.cnf on T2000(32x1200MHz, 8GB RAM, Solaris 10 11/06) in sysbench OLTP I/O bound test(100M-row):
port = 3306
socket = /tmp/mysql.sock
basedir = /usr/local/mysql
datadir = /usr/local/mysql/data
log-error = /data/error.txt
max_connections = 3000
table_cache = 1024
max_allowed_packet = 1M
sort_buffer_size = 64K
thread_cache = 8
thread_concurrency = 32
query_cache_size = 0M
query_cache_type = 0
default-storage-engine = innodb
transaction_isolation = REPEATABLE-READ
tmp_table_size = 1M
innodb_data_file_path = ibdata1:100M:autoextend
innodb_buffer_pool_size = 5500M
innodb_additional_mem_pool_size = 20M
innodb_log_buffer_size = 8M
innodb_lock_wait_timeout = 300
innodb_max_dirty_pages_pct = 90
Example /etc/system on T2000