InnoDB 5.6.4 supports databases with 4k and 8k page sizes
By Calvin Sun on Dec 20, 2011
Note: this article was originally published on http://blogs.innodb.com on Dec 20, 2011 by Kevin Lewis.
In the 5.6.4 release it is now possible to create an InnoDB database with 4k or 8k page sizes in addition to the original 16k page size. Previously, it could be done by recompiling the engine with a different value for UNIV_PAGE_SIZE_SHIFT and UNIV_PAGE_SIZE. With this release, you can set –innodb-page-size=n when starting mysqld, or put innodb_page_size=n in the configuration file in the [mysqld] section where n can be 4k, 8k, 16k, or 4096, 8192, 16384.
The support of smaller page sizes may be useful for certain storage media such as SSDs. Performance results can vary depending on your data schema, record size, and read/write ratio. But this provides you more options to optimize your performance.
When this new setting is used, the page size is set for all tablespaces used by that InnoDB instance. You can query the current value with;
SHOW VARIABLES LIKE ‘innodb_page_size’;
SELECT variable_value FROM information_schema.global_status WHERE LOWER(variable_name) = ‘innodb_page_size’;
It is a read-only variable while the engine is running since it must be set before InnoDB starts up and creates a new system tablespace. That happens when InnoDB does not find ibdata1 in the data directory. If you start mysqld with a page size other than the standard 16k, the error log will contain something like this;
111214 15:55:05 InnoDB: innodb-page-size has been changed from the default value 16384 to 4096.
If your system tablespace already exists using one page size and innodb-page-size is something else, the engine will not start. There will be a message logged like this;
111214 16:06:51 InnoDB: Error: data file .\ibdata1 uses page size 4096,
111214 16:06:51 InnoDB: but the start-up parameter is innodb-page-size=16384
InnoDB knows the page size used in an existing tablespace created by version 5.6.4 because it stamps that page size in the header page. But this is not readable to older engines, of course. If an older engine opens a database with a page size other than 16k, it will report a corrupted page and quit.
All features in InnoDB work the same with smaller page sizes. But some limits are affected. The maximum record size is proportionately less with smaller pages. Record length limits are calculated within InnoDB based on a variety of factors including row type, column type and length, number of columns, secondary indexes, index prefix lengths, and of course, the page size. The main record is stored in a clustered index and a minimum of two records must fit into each page. So the maximum record length for 8k pages is about half that of 16k pages and the maximum record size with 4k pages is about half that of 8k pages.
In addition to record lengths, the maximum key lengths are proportionately smaller. For 16k pages, MySQL prevents key definitions from containing over 3072 bytes of data. This means that the primary key in the clustered index cannot use more than 3072 bytes of data. Secondary indexes can only contain up to 3072 bytes of data in the key also. But within a secondary index, InnoDB must store the primary key as the record pointer. So if you have a 3072 byte primary key and a 3072 byte secondary key, each entry in the secondary index contains 6144 bytes of data plus internal record overhead. The amount of internal overhead is dependent upon the column types, but this gets the secondary record close to half the page size which is its natural limit. So this limit of 3072 bytes per key is less than but close to what InnoDB would have to impose on a table by table basis. Based on that, InnoDB now reports to MySQL that any key defined for 8k page sizes must be less than 1536 bytes. Likewise, the maximum key length when InnoDB uses 4k page sizes is 768 bytes.
If you have a database schema with any large records or keys defined, you may not be able to use smaller page sizes. Even if your records do barely fit in the clustered index page, it may not be advisable to use these smaller pages because the btree will be a lot deeper. For example, if only 2 records fit on the page and there are 1,000,000 records, leaf pages are 20 levels deep, meaning InnoDB will need to read 20 pages to find the leaf page. If that were on 4k pages, then using the same table on 16k pages would give 8 records per page and the leaf pages would only be 7 levels down.
There is a trick to reducing the size of records on the clustered index page;
1) Use Dynamic or Compressed row format.
2) Convert VARCHAR fields to TEXT fields. (VARBINARY can be converted to BLOB)
There are 4 ROW FORMATS in InnoDB. The first two, Redundant and Compact, which are considered the Antelope file version, store at least 768 bytes of each field in the clustered record. The Barracuda file version consists of Compact and Dynamic ROW FORMATS. These have the ability to put all of a VARCHAR, VARBINARY, TEXT or BLOB field onto a separate BLOB page for storage. So one good way to prepare a table structure to use smaller page sizes is to use Dynamic or Compressed row format.
The decision of how big a record will be in the clustered record is made during the INSERT or UPDATE. Each record is evaluated for its own length. If the record is too long to fit in half the page, InnoDB will look for the longest actual field in that record and put as much of that field off-page as possible based on the row type. Then if it is still to long, it will shorten the longest field remaining. Since this is done when the record is added to the page, different records may have different columns stored off-page when there are multiple long fields in the record.
VARCHAR/VARBINARY fields are treated like TEXT/BLOB fields if they are over 255 bytes long. If you are using Compressed or Dynamic row format and your record is too long because you have too many VARCHAR fields 255 bytes or less, you can reduce the record length by converting them to TEXT fields. Likewise, VARBINARY fields 255 bytes or less can be converted to BLOB fields to take advantage of this ability to store the whole TEXT or BLOB field on a separate page.
A file extent in InnoDB is 1 Mb independent of the page size. So an extent will hold 64 16k pages, 128 8k pages and 256 4k pages. This means that the read ahead mechanisms will read more pages with smaller page sizes since they read a whole extent at a time. The doublewrite buffer, which is based on the size of an extent, will also contain more pages.
If you want to use smaller page sizes with existing data, export the data first with a logical export utility such as mysqldump. Then create the new mysql instance with innodb-page-size=4k or 8k and import the data. Do not use a physical export method such as alter table … discard tablespace.
This feature makes it easier to try smaller page sizes in an InnoDB database. And with the 5.6.4 release, those smaller page sizes are fully supported by MySQL. Just export your data, move or delete the system database (ibdata1) and the log files (ib_logfile0 & ib_logfile1), set innodb-page-size to either 4k or 8k, and restart MySQL. A new InnoDB instance will be created with the smaller page size. Then you can import your data and run your tests, all without recompiling InnoDB.