Occasionally, a database schema design calls for a table with foreign keys to be partitioned. However, this will not work with HeatWave because MySQL does not support partitioning tables that have foreign keys. A quick clarification before proceeding any further: HeatWave is a fully-managed MySQL database service provided by Oracle Cloud. In this article, I propose a work-around to accomplish this in a way that is nearly transparent to the application. The limitations of the work-around are noted in the article. This article also provides a cursory introduction to MySQL table partitioning for readers who are not familiar with this feature. For those who would like to skip immediately to the work-around, please scroll down to the section “The Work-Around”.
MySQL Table Partitioning
Table partitioning is a feature in MySQL whereby data in a table is stored as multiple chunks, called partitions, in the underlying storage system as opposed to as a single big chunk of data. MySQL stores data in files called tablespaces. By default, MySQL stores a table’s complete dataset into a single tablespace, with one tablespace for each table. When implementing table partitioning, MySQL distributes the partitioned table’s data into multiple smaller tablespaces, with one tablespace file for each partition. In other words, MySQL partitioned tables are stored into multiple smaller files instead of one big file.
MySQL offers multiple strategies for how to partition tables. One strategy is to partition data by ranges of values. A simple example of a range strategy would be to partition data with a rule for the data before year 2000 into one partition, and another rule for the data from 2000 and on into another partition. Another strategy is to partition tables based on specific values. For example, a table could be partitioned by country code, having one partition rule per country code. One more strategy is to partition tables into a fixed number of partitions and hash the contents of one or more columns or keys, or hash the result of an SQL expression, for each row, combined with a modulo operation on the resulting hash value, to determine the partition that should contain the row. An overview of table partitioning can be found in the documentation here: https://dev.mysql.com/doc/refman/8.0/en/partitioning-overview.html
When seeking data with a DML command, MySQL attempts to restrict its work to only those partitions that are known to contain the data targeted by analyzing the WHERE clause of the command and matching to it those partitions that would contain the data based on the partitioning rules. In the example of a table partitioned by country code, the MySQL optimizer would know to only seek data in a specific partition if the WHERE clause restricts the possible query dataset to the specific country code associated with that partition. This implicit filtering of partitions is called partition pruning. DML commands, as well as many DDL commands, also provide special SQL syntax to explicitly target specific partitions. This is called partition selection. When MySQL is unable to prune the partitions and the partitions are not selected, then it will scan every partition of the table for a query.
Here is an example of partition pruning. MySQL will only consider the partitions that contain data for the years before 2000, assuming the table is partitioned by year.
SELECT * FROM `orders` WHERE `year` < 2000;
Here is an example of partition selection where the command explicitly indicates the specific partition to query by using the PARTITION clause:
SELECT * FROM `employees` PARTITION(`p1`);
The documentation for partition pruning can be found here: https://dev.mysql.com/doc/refman/8.0/en/partitioning-pruning.html, and the documentation for partition selection here: https://dev.mysql.com/doc/refman/8.0/en/partitioning-selection.html
Please note that while table partitioning may intuitively appear to be a good idea to increase database performance, it usually does not help performance much or at all. If improving database performance is the objective, re-evaluating whether table partitioning is the best solution for this purpose could be the first step towards implementing the correct solution. Here are some benefits and drawbacks related to table partitioning.
Possible Benefits of Table Partitioning
There may indeed be some limited performance gains with MySQL partitioning owing to updating smaller B(+) trees for the data and indexes potentially being a bit faster compared to updating single larger indexes. InnoDB fragmentation may also be better contained in smaller partitions, depending on the situation. Data partitioning shines at eliminating ranges of data that match the partition rules as the DROP and TRUNCATE commands can be used to eliminate all the data of these partitions at near immediate speed. This technique is frequently used when implementing data retention policies by simply automating the dropping of those partitions that fall off the data retention policy. Another article will cover this technique in more details.
There are database management commands that allow to operate on explicitly selected partitions. One management operation useful with partitioning is the rebuilding of partitions for the purpose of optimization and defragmentation of those partitions that are no longer being updated by the application. For example, a table partition that contains data from previous years that may seldomly if ever be updated could benefit from a rebuild operation to reclaim some storage space. This is the command to rebuild a table partition:
ALTER TABLE <table name> REBUILD PARTITION <partition name>;
A useful trick with MySQL partitioning is the exporting and importing of partition tablespaces from one database instance to another as a fast way to transfer data between instances. Another command exists to exchange a partition of a partitioned table with the dataset (or tablespace) of a non-partitioned table. This can be used as a method to archive the data of a non-partitioned table by moving it to a partitioned table. This feature is documented here: https://dev.mysql.com/doc/refman/8.4/en/partitioning-management-exchange.html.
ALTER TABLE <partitioned table> EXCHANGE PARTITION <partition name> WITH TABLE <non-partitioned table>;
Some Drawbacks of Table Partitioning
Most drawbacks to MySQL partitioning come from the restrictions and limitations that appertain to it. Here are a few that could be concerning:
- SQL Expressions used to determine which partition a row should reside in are limited to specific SQL functions and exclude user-created functions. This limitation could make it impossible to implement the desired partitioning scheme.
- MySQL provides no automation facilities for partition creation and deletion, although some of this can be implemented as custom tasks with the MySQL Scheduler.
- Partitioned tables may not serve as parent tables of other tables with foreign keys.
- Foreign keys are not allowed in partitioned tables, ie. the topic of this article.
The Work-Around
The work-around for implementing table partitioning with a foreign key is non-comprehensive and has some strict limitations. It consists of an intermediate table with an insert trigger as the mechanism to insert into the partitioned table. The limitations may make it unsuitable for some use-cases. They are:
- The foreign keys clauses ON UPDATE and ON DELETE are restricted.
- Insert statements must strictly be performed on the intermediate table, not directly on the partitioned table.
- This can be made transparent to the application using the MySQL Query Rewriter, if available.
- Update statements that would by-pass the foreign keys in the intermediate table are prohibited.
- The intermediate table must occasionally be truncated to quickly remove its accumulated data.
Database Schema Implementation
The work-around is implemented by creating an intermediate table which has the foreign key, as well as a trigger that is used to capture the incoming rows and insert them into the destination partitioned table, which does not have the foreign key. The directing idea of this work-around is that foreign keys are used to ensure the integrity of incoming data and of certain data changes, but they are not used for select queries. This enables us to separate the integrity validation from the partitioned table and perform the validation at an earlier step in the process with another table. The work-around as described only covers validating the foreign key contraints for insert statements. It would be possible to extend the work-around with additional triggers to comprehensively cover update and delete statements, but these would require quite a bit of work to be robust.
The work-around is described in details by walking through an example. The example features two primary tables in a parent-child relationship: customers and orders. The child table orders is partitioned by year and has, or rather should have, a foreign key referencing customer_id in the parent table customers. The foreign key will be declared in the intermediate table orders_ingest, described later.
Note that the table orders is partitioned by year with a primary key that combines both the date of the order and the ID of the order. Be sure to carefully consult the MySQL documentation on how to craft partition keys as there exist some requirements and some limitations. A secondary index is created on customer_id to optimize read queries that search by the customer ID. On a side note, secondary indexes on MySQL partitioned tables are local to each partition and not global across all partitions. This means that the MySQL optimizer may choose to use the customer_id index found locally in each partition that has been identified for the query, but there will not be a global customer_id index that can be used to perform the partition pruning at execution time.
CREATE TABLE `test`.`customers`
(
`customer_id` INT UNSIGNED PRIMARY KEY,
`name` VARCHAR(100) NOT NULL
) ENGINE=InnoDB;
CREATE TABLE `test`.`orders`
(
`order_id` INT UNSIGNED NOT NULL AUTO_INCREMENT,
`customer_id` INT UNSIGNED NOT NULL,
`date` DATE,
`item` VARCHAR(100),
`count` INT UNSIGNED NOT NULL,
PRIMARY KEY(`date`,`order_id`),
KEY(`order_id`),
KEY(`customer_id`)
)
ENGINE=InnoDB
PARTITION BY RANGE(YEAR(`date`))
(
PARTITION `p1` VALUES LESS THAN (2015),
PARTITION `p2` VALUES LESS THAN (2020),
PARTITION `p3` VALUES LESS THAN (2025)
);
The work-around implements an intermediate orders table called orders_ingest, seen below, with the foreign key to validate the customer_id values and a trigger to insert the now validated rows into the table orders. The choice of the ON UPDATE / ON DELETE foreign key clauses in the example is made to prevent problematic behaviors. The trigger is declared with the keyword AFTER to ensure that the trigger only inserts the data after the integrity of the data has been validated with the foreign key. The trigger causes the data to be inserted into both the partitioned table and the intermediate (non-partitioned) table. It will therefore be necessary to occasionnally purge the intermediate table.
CREATE TABLE `test`.`orders_ingest`
(
`pk` INT UNSIGNED NOT NULL AUTO_INCREMENT INVISIBLE,
`customer_id` INT UNSIGNED,
`date` DATE,
`item` VARCHAR(100),
`count` INT UNSIGNED NOT NULL,
PRIMARY KEY(`pk`),
FOREIGN KEY (`customer_id`) REFERENCES `test`.`customers`(`customer_id`) ON DELETE SET NULL ON UPDATE NO ACTION
) ENGINE=InnoDB;
CREATE TRIGGER `test`.`orders_insert` AFTER INSERT ON `test`.`orders_ingest` FOR EACH ROW
INSERT INTO `test`.`orders`
(
`customer_id`,
`date`,
`item`,
`count`
)
VALUES
(
NEW.`customer_id`,
NEW.`date`,
NEW.`item`,
NEW.`count`
);
The table orders_ingest features a primary key because there are often requirements with MySQL that tables should have primary keys (ex. HeatWave, MySQL InnoDB Cluster, etc.). And using primary keys is a general good practice with MySQL.
Purging the Intermediate Data
The example uses the storage engine InnoDB for the table orders_ingest because, being fully ACID-compliant, it is the premier storage engine for MySQL, it is the only engine supported by HeatWave, and it implements foreign keys (unlike, for example, Blackhole), which is what we are after here. Unfortunately, the trigger will cause the rows to also be inserted into the intermediate table, which is almost certainly not the desired outcome. The rows which are thus persisted in the intermediate table should be purged at intervals.
MySQL features a programmable job scheduler that can be made to do this. A MySQL Scheduler event that runs at 5 minute intervals to truncate the table orders_ingest should to the job.
CREATE EVENT `test`.`purge_orders_ingest` ON SCHEDULE EVERY 5 MINUTE DO
TRUNCATE `test`.`orders_ingest`;
Configuring Access Control
The next step is to control access to the table orders to prevent the application user, called app@% in the example, from directly inserting into the table orders as this would by-pass the foreign key in the table orders_ingest and potentially compromise the integrity of the data. Application user grants could look like this:
GRANT SELECT, INSERT, UPDATE, DELETE ON `test`.`customers` TO `app`@`%`; GRANT SELECT, UPDATE (`date`, `item`, `count`), DELETE ON `test`.`orders` TO `app`@`%`; GRANT INSERT ON `test`.`orders_ingest` TO `app`@`%`;
As seen in the grants above, the application user is still able to update data in the table orders, just not the customer_id, because the table does not have a foreign key to protect the integrity.
The MySQL Query Rewriter
All the necessary mechanics of the work-around are now in place. All that is left is to make the work-around as transparent as possible to the application. This means modifying the SQL statements at the server level to insert data into the table orders_ingest instead of into the table orders. Fortunately, MySQL has a feature to help do this: The Query Rewriter. A rewriter rule can be created to modify incoming INSERT statements for the table orders. The documentation for the MySQL Query Rewriter can be found here: https://dev.mysql.com/doc/refman/8.4/en/rewriter-query-rewrite-plugin.html. Alas, the Query Rewriter is not available for HeatWave at this time, it only is for the MySQL on-premises distributions.
The MySQL Query Rewriter compares incoming statements with patterns indicated in user-defined rewriter rules. The statements are modified according to the modification indicated by the rules that match them. The performance overhead of the Query Rewriter is very small because it uses statement digests and digest hash values to match incoming statements against the patterns of the rules, which are also digested. Note that these rules are global, not per-user. However, a database user can be configured to not be subjected to the Query Rewriter by granting it the privilege SKIP_QUERY_REWRITE.
The command to install the Query Rewriter is as follows. Command specifics may vary depending on the MySQL installation.
mysql -uroot -p < /usr/share/mysql/install_rewriter.sql
Once installed, the Query Rewriter must be enabled:
SET GLOBAL rewriter_enabled = ON;
Here is a good Query Rewriter rule we can use for the example:
INSERT INTO `query_rewrite`.`rewrite_rules`(`pattern`, `replacement`) VALUES
(
'INSERT INTO `test`.`orders`(`customer_id`,`date`,`item`,`count`) VALUES(?,?,?,?)',
'INSERT INTO `test`.`orders_ingest`(`customer_id`,`date`,`item`,`count`) VALUES(?,?,?,?)'
);
Finally, the new rule(s) must be flushed for the query digests to be created and the rules to be enabled:
CALL query_rewrite.flush_rewrite_rules();
Testing
Now that the work-around is fully implemented, it can be tested. Here are a few commands to test it using the user app@%:
INSERT IGNORE INTO `test`.`customers` VALUES(1,'customer 1'); INSERT IGNORE INTO `test`.`customers` VALUES(2,'customer 2'); INSERT IGNORE INTO `test`.`customers` VALUES(3,'customer 3'); INSERT INTO `test`.`orders`(`customer_id`,`date`,`item`,`count`) VALUES(1,'2018-01-02','item 1',10); INSERT INTO `test`.`orders`(`customer_id`,`date`,`item`,`count`) VALUES(1,'2020-05-14','item 2',1); INSERT INTO `test`.`orders`(`customer_id`,`date`,`item`,`count`) VALUES(3,'2024-10-22','item 3',100); UPDATE `test`.`orders` SET `count` = 200 WHERE `customer_id` = 3; DELETE FROM `test`.`orders` WHERE `customer_id` = 1; SELECT * FROM `test`.`orders`;
The commands above should all work without errors. Note how the INSERT commands inserted into the table orders. This is because the MySQL Query Rewriter modifies these commands to use the table orders_ingest and not the table orders. If the MySQL Query Rewriter cannot be used, the commands should instead explicitely insert into the table orders_ingest.
Other commands that are definitely expected to fail:
INSERT INTO `test`.`orders`(`customer_id`,`date`,`item`,`count`) VALUES(4,'2014-11-07','item 3',1000); UPDATE `test`.`orders` SET `customer_id` = 1000 WHERE `order_id` = 3; SELECT * FROM `test`.`orders_ingest`;
Finally, it should be confirmed that the intermediate table orders_ingest has been purged by the MySQL Scheduler Event:
mysql> select * from orders_ingest; Empty set (0.01 sec)
Conclusion
Phew! When setting out to write an article on MySQL partitioning, I did not expect to cover this much ground with it. In this article, we (briefly) saw:
- MySQL Table Partitioning
- Foreign Keys
- Triggers
- MySQL Schedule Events
- MySQL Query Rewriter (unavailable in HeatWave)
When something seems impossible with MySQL, there is usually a way if we just think outside of the box. I hope that this work-around will help users of MySQL who need to partition tables with foreign keys. Please let me know of any errors or glaring omissions. This article will surely see a few revisions.
