Friday Apr 13, 2012

New flushing algorithm in InnoDB

Note: this article was originally published on on April 13, 2012 by Inaam Rana.

In MySQL labs release April 2012 we have reworked the flushing heuristics in InnoDB. In this post I’ll give an overview of what we have changed and the various configuration variables we have introduced to fine tune the flushing algorithm. If you are interested in finding out how our new flushing algorithm fares in benchmarks you can get these details in Dimitri’s well-explained blog here.

Flushing means writing dirty pages to disk. I have explained in some detail about adaptive_flushing and types of flushing in my previous notes. Please go through these notes if you want to make sense of what follows.

The page_cleaner thread checks the state of the system every second and takes into account number of dirty pages, amount of reusable redo space, the rate at which redo is generated and the IO capacity for which the server is configured and based on these factors decide how many pages we need to flush.

In the new scheme of things the page_cleaner thread uses a single non-linear formula to calculate how many pages we need to flush to have sufficient reusable redo space. This is different from current flushing heuristic where async_water_mark is taken as point where we change our flushing algorithm. Similarly, instead of using innodb_max_dirty_pages_pct as a switch which triggers flushing we have introduced the concept of a range where the flushing to control the dirty pages percentage starts once we cross the low water mark and gets more and more aggressive as we near the high water mark.

There are four new configuration variables. Note that if your system is not experiencing any IO spikes due to checkpoints then you can probably leave all of the following as is. All the variables are global in scope and can be set dynamically.

  • innodb_adaptive_flushing_lwm: Low water mark measured in %age of total redo log size at which adaptive flushing kicks in. If currently unusable redo space is less then this value no background flushing will happen. Default value is 10 and permissible values are 0 – 70.
  • innodb_max_dirty_pages_pct_lwm: Low water mark of dirty pages in %age where preflushing to control dirty page ratio kicks in. Default value is 0 which has the special meaning of this value having no effect. Allowable values are 0 – 99.
  • innodb_max_io_capacity: InnoDB generally attempts to work within the limits of innodb_io_capacity. However, if it needs to do aggressive flushing then innodb_max_io_capacity defines the limit to which the write IOPs can be stretched. Default value is 2000 which is ten times the default value for innodb_io_capacity.
  • innodb_flushing_avg_loops: Number of iterations for which we keep the previously caculated snapshot of the flushing state. This variable is roughly a measure of how smooth you want the transition in the flushing activity to be. The higher the value the smoother will be the transition in flushing in face of rapidly changing workload. A lower value implies that the flushing algorithm is more responsive but it also means that flushing acitivity can become spiky when confronted with a quickly changing workload. Default value is 30 and permissible range is 1 – 1000.

Finally, there is a wealth of information availabe to you to monitor the impact of changing these variables. You can get a good inside view of how flushing activity is working by querying innodb_metrics table.

mysql> select name, comment from information_schema.innodb_metrics where name like ‘Buffer_flush%’;
| name | comment |
| buffer_flush_batch_scanned | Total pages scanned as part of flush batch |
| buffer_flush_batch_num_scan | Number of times buffer flush list flush is called |
| buffer_flush_batch_scanned_per_call | Pages scanned per flush batch scan |
| buffer_flush_batch_total_pages | Total pages flushed as part of flush batch |
| buffer_flush_batches | Number of flush batches |
| buffer_flush_batch_pages | Pages queued as a flush batch |
| buffer_flush_neighbor_total_pages | Total neighbors flushed as part of neighbor flush |
| buffer_flush_neighbor | Number of times neighbors flushing is invoked |
| buffer_flush_neighbor_pages | Pages queued as a neighbor batch |
| buffer_flush_n_to_flush_requested | Number of pages requested for flushing. |
| buffer_flush_avg_page_rate | Average number of pages at which flushing is happening |
| buffer_flush_lsn_avg_rate | Average redo generation rate |
| buffer_flush_pct_for_dirty | Percent of IO capacity used to avoid max dirty page limit |
| buffer_flush_pct_for_lsn | Percent of IO capacity used to avoid reusable redo space limit |
| buffer_flush_adaptive_total_pages | Total pages flushed as part of adaptive flushing |
| buffer_flush_adaptive | Number of adaptive batches |
| buffer_flush_adaptive_pages | Pages queued as an adaptive batch |
| buffer_flush_sync_total_pages | Total pages flushed as part of sync batches |
| buffer_flush_sync | Number of sync batches |
| buffer_flush_sync_pages | Pages queued as a sync batch |
| buffer_flush_background_total_pages | Total pages flushed as part of background batches |
| buffer_flush_background | Number of background batches |
| buffer_flush_background_pages | Pages queued as a background batch |
23 rows in set (0.00 sec)


This is the InnoDB team blog.


« April 2012 »