MySQL 5.6: Multi threaded purge
By Calvin Sun on Apr 06, 2011
Note: this article was originally published on http://blogs.innodb.com on April 6, 2011 by Sunny Bains.
What does purge exactly do and why is it needed? If you have ever wondered then read on. It is really a type of garbage collector. When a user issues a DML like “DELETE FROM t WHERE c = 1;”, InnoDB doesn’t remove the matching record. This is what happens under the hood:
- It marks the record as deleted by setting a bit in the control bits of the record.
- Stores the before image of the modified columns to the UNDO log
- Updates the system columns DB_TRX_ID and DB_ROLL_PTR in the clustered index record. DB_TRX_ID identifies the transaction that made the last change, and DB_ROLL_PTR points to the new UNDO log record. This UNDO log record contains the old values of DB_TRX_ID and DB_ROLL_PTR, possibly pointing to an older transaction and undo log entry.
From this you should be able to visualise that the UNDO log records related to the modified clustered record are in a disk based linked list, with the head anchored in the clustered index record. For the sake of simplicity I’ve ignored the UNDO log pages and the case where DML updates a record. This information is required by rollback and MVCC (multi version concurrency control). For MVCC we need the entry to exist in the clustered index so that we can follow a pointer back to where the “before” changes were written in the UNDO log and use that information to construct a previous version of the record. For rollback we need the before information of the record (UNDO entry) so that we can restore it when a transaction is rolled back.
Another benefit of purging separately in the background is that expensive B+Tree block merge operations, if the removal of the record were to lead to underflow, can be done asynchronously by purge and not by user transactions.
Once a transaction has been committed, the UNDO log entries and delete-marked records written by it may be needed by other transactions for building old versions of the record. When there is no transaction left that would need the data, the delete-marked records and the related undo log records can be purged. For this reason we have a dedicated (and aptly named) purge function that runs asynchronously within InnoDB. Another problem is that InnoDB clustered and secondary indexes can have multiple entries for the same key. These need to be removed (garbage collected) too by Purge (see InnoDB Change Buffering for details regarding secondary indexes).
How does purge work?
Purge clones the oldest view in the system. This view has the control information that limits what changes are visible, the ones made by other transactions. By cloning the oldest view, purge can determine that there can’t be any transactions that are active than the oldest view low limit in the system. Purge then reads the UNDO log entries in reverse starting from the oldest to the latest. It then parses these entries and removes the delete marked entries from the cluster and secondary indexes, if they are not referenced by any currently running transaction.
Problem with the old design
In versions prior to 5.5 this purge function was part of the responsibility of the InnoDB master thread. At fixed intervals it would run purge asynchronously. The problem with this approach was that the master thread was also responsible for flushing (writing to disk) of dirty pages, among other tasks. A high load on the server that dirtied a lot of pages would force the master thread to spend most of its time flushing and therefore no purging would get done and vice a versa.
Changes in 5.5
In 5.5 there is an option innodb-purge-threads=[0,1] to create a dedicated thread that purges asynchronously if there are UNDO logs that need to be removed. We also introduced another option innodb-purge-batch-size that can be used to fine tune purge operations. The batch size determines how many UNDO log pages purge will parse and process in one pass. The default setting is 20, this is the same as the hard coded value that is in previous InnoDB releases. An interesting side effect of this value is that it also determines when purge will free the UNDO log pages after processing them. It is always after 128 passes, this magic value of 128 is the same as the number of UNDO logs in the system tablespace, now that 5.5 has 128 rollback segments. By increasing the innodb-purge-batch-size the freeing of the UNDO log pages behaviour changes, it will increase the number of UNDO log pages that it removes in a batch when the limit of 128 is reached. This change was seen as necessary so that we could reduce the cost of removing the UNDO log pages for the extra 127 rollback segments that were introduced in 5.5. Prior to this change iterating over the 128 rollback segments to find the segment to truncate had become expensive.
Changes in 5.6
In 5.6 we have the same parameters as 5.5 except that innodb-purge-threads can now be between 0 and 32. This introduces true multi threaded purging. If the value is greater than 1 then InnoDB will create that many purge worker threads and a dedicated purge coordinator thread. The responsibility of the purge coordinator thread is to parse the UNDO log records and parcel out the work to the worker threads. The coordinator thread also purges records, instead of just sitting around and waiting for the worker threads to complete. The coordinator thread will divide the innodb-purge-batch-size by innodb-purge-threads and hand that out as the unit of work for each worker thread.
For single table tests like Sysbench one obvious problem is that all the worker threads tend to block/serialise on the dict_index_t::lock. This reduces their effectiveness and for such loads it is usually better to have a single dedicated purge thread unless you are using partitions. With multiple tables the purge threads come into their own and can purge records in parallel with minimal contention.