Create InnoDB databases with 4k and 8k page sizes without recompiling
By Calvin Sun on Jul 25, 2011
Note: this article was originally published on http://blogs.innodb.com on July 25, 2011 by Kevin Lewis.
One of the features found in the summer 2011 labs release is the ability to select the InnoDB page size without recompiling. Smaller page sizes may be useful for certain storage media such as SSDs where there is no need to minimize seek time between reads.
A new global setting called innodb-page-size can be set to 4k, 8k or 16k before creating a new MySQL instance. This sets the page size for all tablespaces used by that InnoDB instance. This can be done in my.cnf or on the mysqld command line. It is a read-only variable while the engine is running since it must be set before InnoDB starts up and creates a new system tablespace. That happens when InnoDB does not find ibdata1 in the data directory. If your system tablespace already exists using one page size and innodb-page-size is something else, the engine will not start.
A few bugs were found and fixed related to smaller page sizes in InnoDB tablespaces. Those bug fixes can also be found in the 2011 summer labs release.
The mysql-test-run can be run like this to test a smaller page size;
perl mysql-test-run.pl –mysqld=–innodb-page-size=4k –suite=innodb –force
Other features of InnoDB work the same with smaller page sizes but some limits are affected. While each of the 4 row formats; Redundant, Compact, Compressed and Dynamic are supported, the maximum record size is less. Record length limits are calculated within InnoDB based on a bunch of factors including row type, column type and length, number of columns, secondary indexes, index prefix lengths, and of course, the page size. Since the main record is stored in a clustered index, a minimum of two records must fit in each page. So the maximum record length for 8k pages is about half that of 16k pages and the max record size with 4k pages is half that of 8k pages.
If you have a database schema with any large records, you may not be able to use smaller page sizes. Even if your records do barely fit in the clustered index page, it may not be advisable to use these smaller pages because the btree will be a lot deeper. For example, if only 2 records fit on the page and there are 1,000,000 records, leaf pages are 20 levels down meaning InnoDB will need to read and search 20 pages to find the leaf page. If that were on 4k pages, then using the same table on 16k pages would give 8 records per page and the leaf pages would only be 7 levels down.
There is a trick to reducing the size of records on the clustered index page. VARCHAR fields can be converted to TEXT fields. (And VARBINARY can be converted to BLOB.) InnoDB tries to put as much of VARCHAR fields in the clustered record as it can. Normally, it will put as much as 767 bytes which is the index prefix limit on long fields. But this labs release also contains a feature to increase that limit to 3072 bytes. So if you have VARCHAR fields that use hundreds or thousands of bytes, they can make the clustered record very long. TEXT columns, like BLOB columns, are stored in BLOB pages. Only 20 bytes is needed in the record to link to that data. Long VARCHAR fields are split at 767 bytes (or 3072 bytes if innodb_large_prefix=ON) and that 20 bytes is added to the clustered record to link to the rest of those fields. If your VARCHAR(5000) field rarely ever uses more than 767 bytes, it will rarely be split up. Still, you might get better performance by reducing the depth of the btree if you define long columns as TEXT instead of VARCHAR.
A file extent in InnoDB is 1 Mb independent of the page size. So an extent will hold 64 16k pages, 128 8k pages and 256 4k pages. This means that the read ahead mechanisms will read more pages with smaller page sizes since they read a whole extent at a time. The doublewrite buffer, which is based on the size of an extent, will also contain more pages.
If you want to use smaller page sizes with existing data, export the data first with a logical export utility such as mysqldump. Then create the new mysql instance with innodb-page-size=4k or 8k and import the data. Do not use a physical export method such as alter table … discard tablespace.
This feature makes it easier to try smaller page sizes in an InnoDB database. Just export your data, move or delete the system database (ibdata1) and the log files (ib_logfile0 & ib_logfile1), set innodb-page-size to either 4k or 8k, and restart MySQL. A new InnoDB instance will be created with the smaller page size. Then you can import your data and run your tests, all without recompiling InnoDB.