MySQL HeatWave Best Practices Series: Schema Design

December 11, 2022 | 11 minute read
Mandy Pang
Senior Principal Product Manager
Text Size 100%:

MySQL HeatWave Best Practices Series: Schema Design

 

MySQL HeatWave is designed to transparently accelerates MySQL performance by orders of magnitude for analytics and mixed workloads without the need of ETL process. Existing MySQL applications can take advantage of the improved performance without any changes.

MySQL HeatWave uses MySQL Autopilot, a machine learning based automation, to intelligently improve the system performance and scalability in provisioning, data loading, query execution and failure handling.

MySQL HeatWave

 

This is first in a series of blogs that details best practices that would help you get the most out of MySQL HeatWave.

  1. Schema Design
  2. Data Load with MySQL Autopilot
  3. Query Processing with MySQL Autopilot
  4. Monitoring

 

Best Practices for Schema Design

1. Use normalized tables

Many of the existing MySQL customers tend to de-normalize their schemas by pre-joining the normalized table results into a universal table with a large number of columns. This is done recognizing MySQL performance limitations for joins involving large tables.

This is no longer needed as MySQL HeatWave has efficient join processing. Customers can revert to standard normalized tables.

For example, consider the case where user creates a denormalized table as join result of part and partsupp as shown below for 10GB TPC-H database:

Copied to Clipboard
Error: Could not Copy
Copied to Clipboard
Error: Could not Copy
CREATE TABLE part_den (
    PRIMARY KEY (p_partkey , ps_suppkey)
) AS SELECT p.*, ps.* FROM
    part p,
    partsupp ps
WHERE
    p.p_partkey = ps.ps_partkey;

The biggest disadvantage is the additional storage overhead incurred to hold the denormalized table. In this example, by checking the sizes we see:

Copied to Clipboard
Error: Could not Copy
Copied to Clipboard
Error: Could not Copy
SELECT
    TABLE_NAME AS `Table`,
    ROUND((DATA_LENGTH + INDEX_LENGTH) / 1024 / 1024) AS `Size (MB)`
FROM
    information_schema.TABLES
WHERE
    TABLE_SCHEMA = 'tpch'
        AND TABLE_NAME IN (‘part - den’ , ‘part’, ‘partsupp’);
Table Size (MB)
part_den 2453
partsupp 1609
part 317

 

The total storage required for part and partsupp is 1609 + 317 = 1926 MB as opposed to 2453 MB for denormalized table. Since MySQL Heatwave loads the entire table into memory, it is extremely important to reduce the storage costs.

Normalized tables vs denormalized tables

With regards to query performance, theoretically joins can perform faster against denormalized tables but MySQL HeatWave’s efficient join processing makes join ‘on the fly’ query competitive (0.19s vs 0.16s for example in query below):

Same Query using denormalized table takes 0.16 sec

Copied to Clipboard
Error: Could not Copy
Copied to Clipboard
Error: Could not Copy
SELECT
    s_acctbal,
    s_name,
    n_name,
    p_partkey,
    p_mfgr,
    s_address,
    s_phone,
    s_comment
FROM
    <strong>part_den</strong> pd1,
    supplier,
    nation,
    region
WHERE
    s_suppkey = ps_suppkey AND p_size = 15
        AND p_type LIKE '%BRASS'
        AND s_nationkey = n_nationkey
        AND n_regionkey = r_regionkey
        AND r_name = 'EUROPE'
        AND ps_supplycost = (SELECT
            MIN(ps_supplycost)
        FROM
            <strong>part_den</strong> pd2,
            supplier,
            nation,
            region
        WHERE
            pd1.p_partkey = pd2.ps_partkey
                AND s_suppkey = ps_suppkey
                AND s_nationkey = n_nationkey
                AND n_regionkey = r_regionkey
                AND r_name = 'EUROPE')
ORDER BY s_acctbal DESC , n_name , s_name , p_partkey
LIMIT 100;

Same Query using normalized tables takes 0.19 sec:

Copied to Clipboard
Error: Could not Copy
Copied to Clipboard
Error: Could not Copy
SELECT
    s_acctbal,
    s_name,
    n_name,
    p_partkey,
    p_mfgr,
    s_address,
    s_phone,
    s_comment
FROM
    <strong>part</strong>,
    supplier,
    <strong>partsupp</strong>,
    nation,
    region
WHERE
    p_partkey = ps_partkey
        AND s_suppkey = ps_suppkey
        AND p_size = 15
        AND p_type LIKE '%BRASS'
        AND s_nationkey = n_nationkey
        AND n_regionkey = r_regionkey
        AND r_name = 'EUROPE'
        AND ps_supplycost = (SELECT
            MIN(ps_supplycost)
        FROM
            <strong>partsupp</strong>,
            supplier,
            nation,
            region
        WHERE
            p_partkey = ps_partkey
                AND s_suppkey = ps_suppkey
                AND s_nationkey = n_nationkey
                AND n_regionkey = r_regionkey
                AND r_name = 'EUROPE')
ORDER BY s_acctbal DESC , n_name , s_name , p_partkey
LIMIT 100;

2. Use VARCHAR columns with appropriate sizes

Many customers tend to use a fixed size such as VARCHAR(500) uniformly for all their string columns even though many of their columns are much smaller in size. Instead, you should pick sizes appropriately.

3. Use columns of specific data type

Many customers tend to use VARCHAR even to hold more specific data such as decimals. MySQL allows users to aggregate over a string column to find sum as shown below:

Copied to Clipboard
Error: Could not Copy
Copied to Clipboard
Error: Could not Copy
SELECT SUM(amt) FROM mysales; -- amt is defined as VARCHAR(64)

Such queries will run more efficiently if a decimal column is used and in addition, it will allow the query to be offloaed to MySQL HeatWave.

Without using the specific data type, the query cannot offload to HeatWave because aggregate function SUM on VARCHAR columns is currently not supported in HeatWave.

Copied to Clipboard
Error: Could not Copy
Copied to Clipboard
Error: Could not Copy
EXPLAIN SELECT SUM(amt) FROM mysales; -- amt is defined as VARCHAR(64)

<span style="font-size:11pt"><span style="line-height:107%"><span style="font-family:"Calibri",sans-serif"><span style="font-size:10.0pt"><span style="line-height:107%"><span style="color:black">{"Reason": "Aggregation functions other than COUNT, MIN, MAX, FIRST_VALUE, LAST_VALUE, NTH_VALUE do not support VARCHAR and BIT datatype inputs yet"}</span></span></span></span></span></span>

If the data type cannot be changed, then CAST function needs to be used for offloading such a query as shown below:

Copied to Clipboard
Error: Could not Copy
Copied to Clipboard
Error: Could not Copy
EXPLAIN SELECT SUM(CAST(amt AS UNSIGNED INT)) FROM mysales;

+----+-------------+---------+------------+------+---------------+------+---------+------+------+----------+------------------------------+
| id | select_type | table   | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra                        |
+----+-------------+---------+------------+------+---------------+------+---------+------+------+----------+------------------------------+
|  1 | SIMPLE      | mysales | NULL       | ALL  | NULL          | NULL | NULL    | NULL |    1 |   100.00 | Using secondary engine RAPID |
+----+-------------+---------+------------+------+---------------+------+---------+------+------+----------+------------------------------+

4. Use generated columns in tables to handle complex expressions in queries that are not yet supported in MySQL HeatWave

There could be cases where certain functions are not yet supported in MySQL HeatWave. If there are a few such references to functions or expressions involving them in queries, then they can be precomputed using GENERATED column option in table and queries can run by referencing these generated columns.

For example:

Copied to Clipboard
Error: Could not Copy
Copied to Clipboard
Error: Could not Copy
CREATE TABLE cust_complaints(
    id INT PRIMARY KEY,
    severity_flag VARCHAR(8),
    problem VARCHAR(32),
    c_comments JSON NOT SECONDARY) SECONDARY_ENGINE=RAPID;


INSERT INTO cust_complaints VALUES
    (1, 'HIGH', 'reload fails', '{"comment":"small text small text small text"}'),
    (2, 'HIGH', 'reload fails', '{"comment":"small text small text small text"}'),
    (3, 'MEDIUM', 'takes long time', '{"comment":"long text long text long text long text long text long textlong text long text long textlong text long text long textlong text long text long textlong text long text long textlong text long text long textlong text long text long textlong text long text long textlong text long text long textlong text long text long textlong text long text long textlong text long text long textlong text long text long textlong text long text long textlong text long text long text"}'),
    (4, 'LOW', 'incorrect trace', '{"comment":"small text small text small text"}');

Since JSON data type columns are not supported, c_comments is marked as NOT secondary. Now user cannot execute any query involving c_comments column in MySQL HeatWave.

Copied to Clipboard
Error: Could not Copy
Copied to Clipboard
Error: Could not Copy
ALTER TABLE cust_complaints SECONDARY_LOAD;

EXPLAIN SELECT id, severity_flag, problem FROM cust_complaints WHERE severity_flag = ('HIGH') OR LENGTH(c_comments)>128;

+----+-------------+-----------------+------------+------+---------------+------+---------+------+------+----------+-------------+
| id | select_type | table           | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra       |
+----+-------------+-----------------+------------+------+---------------+------+---------+------+------+----------+-------------+
|  1 | SIMPLE      | cust_complaints | NULL       | ALL  | NULL          | NULL | NULL    | NULL |    4 |   100.00 | Using where |
+----+-------------+-----------------+------------+------+---------------+------+---------+------+------+----------+-------------+

The query does not offload as length function is on a NOT SECONDARY column (which is not loaded to HeatWave). User can get around this problem by creating a GENERATED column as shown below.

Copied to Clipboard
Error: Could not Copy
Copied to Clipboard
Error: Could not Copy
ALTER TABLE cust_complaints SECONDARY_ENGINE=NULL;
ALTER TABLE cust_complaints ADD (comm_len INT GENERATED ALWAYS AS (LENGTH(c_comments)) STORED);
ALTER TABLE cust_complaints SECONDARY_ENGINE=RAPID;
ALTER TABLE cust_complaints SECONDARY_LOAD;

Now query referencing the GENERATED column gets the desired results and offloads to MySQL HeatWave as shown below.

Copied to Clipboard
Error: Could not Copy
Copied to Clipboard
Error: Could not Copy
EXPLAIN SELECT id, severity_flag, problem FROM cust_complaints WHERE severity_flag = ('HIGH') OR comm_len>128;

+----+-------------+-----------------+------------+------+---------------+------+---------+------+------+----------+-------------------------------------------+
| id | select_type | table           | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra                                     |
+----+-------------+-----------------+------------+------+---------------+------+---------+------+------+----------+-------------------------------------------+
|  1 | SIMPLE      | cust_complaints | NULL       | ALL  | NULL          | NULL | NULL    | NULL |    4 |    50.00 | Using where; Using secondary engine RAPID |
+----+-------------+-----------------+------------+------+---------------+------+---------+------+------+----------+-------------------------------------------+

5. For tables with no primary key columns, choose primary key appropriately

For MySQL HeatWave processing, all tables must have a primary key otherwise as shown below the table load into MySQL HeatWave fails. 

Copied to Clipboard
Error: Could not Copy
Copied to Clipboard
Error: Could not Copy
CREATE TABLE emp(id INT, ename VARCHAR(32))SECONDARY_ENGINE=RAPID;

ALTER TABLE emp SECONDARY_LOAD;

ERROR 3877 (HY000): Tables loaded in TRANSACTIONAL load pool must have a primary key

If you have tables with no primary key, a simple option is to introduce an auto_increment column as shown below:

Copied to Clipboard
Error: Could not Copy
Copied to Clipboard
Error: Could not Copy
ALTER TABLE emp SECONDARY_ENGINE=NULL;
ALTER TABLE emp add (eid_auto INT AUTO_INCREMENT PRIMARY KEY);
ALTER TABLE emp SECONDARY_ENGINE=RAPID;
ALTER TABLE emp SECONDARY_LOAD;

However, before adding a new column, you should consider if any of the existing column or columns can be designated as a primary key.

6. Adhere to the default SQL Mode of MySQL 8.0

If we check the default SQL mode, we will see as follows:

Copied to Clipboard
Error: Could not Copy
Copied to Clipboard
Error: Could not Copy
<span style="font-size:11.0pt"><span style="line-height:107%"><span style="font-family:"Calibri",sans-serif">SELECT @@sql_mode;</span></span></span>

+-----------------------------------------------------------------------------------------------+
| @@sql_mode                                                                                                            
+-----------------------------------------------------------------------------------------------+
| ONLY_FULL_GROUP_BY,STRICT_TRANS_TABLES,<strong>NO_ZERO_IN_DATE,NO_ZERO_DATE</strong>,ERROR_FOR_DIVISION_BY_ZERO,NO_ENGINE_SUBSTITUTION |
+-----------------------------------------------------------------------------------------------+

Maintain these properties to get smoother migration and maximum offload with MySQL HeatWave. This including not having default 0 for date columns in your tables. Click here for more details on MySQL SQL Modes.

 

In summary

  • Use normalized tables
  • Use VARCHAR columns with appropriate sizes
  • Use columns of specific data type
  • Use generated columns in tables to handle complex expressions in queries that are not yet supported in MySQL HeatWave
  • For tables with no primary key columns, choose primary key appropriately
  • Adhere to the default SQL Mode of MySQL 8.

 

Please stay tune for the next few topics of the MySQL HeatWave best practices on data load, query performance, and monitoring.

Addition Resources

 

Mandy Pang

Senior Principal Product Manager

MySQL HeatWave Product Manager


Previous Post

Analyzing queries in OCI MySQL Database Service – Slow Query Log (part 1)

Frederic Descamps | 7 min read

Next Post


Analyzing queries in OCI MySQL Database Service – Slow Query Log (part 2)

Frederic Descamps | 6 min read
Oracle Chatbot
Disconnected