MySQL HeatWave is designed to transparently accelerates MySQL performance by orders of magnitude for analytics and mixed workloads without the need of ETL process. Existing MySQL applications can take advantage of the improved performance without any changes.
MySQL HeatWave uses MySQL Autopilot, a machine learning based automation, to intelligently improve the system performance and scalability in provisioning, data loading, query execution and failure handling.
This is first in a series of blogs that details best practices that would help you get the most out of MySQL HeatWave.
Many of the existing MySQL customers tend to de-normalize their schemas by pre-joining the normalized table results into a universal table with a large number of columns. This is done recognizing MySQL performance limitations for joins involving large tables.
This is no longer needed as MySQL HeatWave has efficient join processing. Customers can revert to standard normalized tables.
For example, consider the case where user creates a denormalized table as join result of part and partsupp as shown below for 10GB TPC-H database:
CREATE TABLE part_den (
PRIMARY KEY (p_partkey , ps_suppkey)
) AS SELECT p.*, ps.* FROM
part p,
partsupp ps
WHERE
p.p_partkey = ps.ps_partkey;
The biggest disadvantage is the additional storage overhead incurred to hold the denormalized table. In this example, by checking the sizes we see:
SELECT
TABLE_NAME AS `Table`,
ROUND((DATA_LENGTH + INDEX_LENGTH) / 1024 / 1024) AS `Size (MB)`
FROM
information_schema.TABLES
WHERE
TABLE_SCHEMA = 'tpch'
AND TABLE_NAME IN (‘part - den’ , ‘part’, ‘partsupp’);
Table | Size (MB) |
---|---|
part_den | 2453 |
partsupp | 1609 |
part | 317 |
The total storage required for part and partsupp is 1609 + 317 = 1926 MB as opposed to 2453 MB for denormalized table. Since MySQL Heatwave loads the entire table into memory, it is extremely important to reduce the storage costs.
With regards to query performance, theoretically joins can perform faster against denormalized tables but MySQL HeatWave’s efficient join processing makes join ‘on the fly’ query competitive (0.19s vs 0.16s for example in query below):
Same Query using denormalized table takes 0.16 sec
SELECT
s_acctbal,
s_name,
n_name,
p_partkey,
p_mfgr,
s_address,
s_phone,
s_comment
FROM
<strong>part_den</strong> pd1,
supplier,
nation,
region
WHERE
s_suppkey = ps_suppkey AND p_size = 15
AND p_type LIKE '%BRASS'
AND s_nationkey = n_nationkey
AND n_regionkey = r_regionkey
AND r_name = 'EUROPE'
AND ps_supplycost = (SELECT
MIN(ps_supplycost)
FROM
<strong>part_den</strong> pd2,
supplier,
nation,
region
WHERE
pd1.p_partkey = pd2.ps_partkey
AND s_suppkey = ps_suppkey
AND s_nationkey = n_nationkey
AND n_regionkey = r_regionkey
AND r_name = 'EUROPE')
ORDER BY s_acctbal DESC , n_name , s_name , p_partkey
LIMIT 100;
Same Query using normalized tables takes 0.19 sec:
SELECT
s_acctbal,
s_name,
n_name,
p_partkey,
p_mfgr,
s_address,
s_phone,
s_comment
FROM
<strong>part</strong>,
supplier,
<strong>partsupp</strong>,
nation,
region
WHERE
p_partkey = ps_partkey
AND s_suppkey = ps_suppkey
AND p_size = 15
AND p_type LIKE '%BRASS'
AND s_nationkey = n_nationkey
AND n_regionkey = r_regionkey
AND r_name = 'EUROPE'
AND ps_supplycost = (SELECT
MIN(ps_supplycost)
FROM
<strong>partsupp</strong>,
supplier,
nation,
region
WHERE
p_partkey = ps_partkey
AND s_suppkey = ps_suppkey
AND s_nationkey = n_nationkey
AND n_regionkey = r_regionkey
AND r_name = 'EUROPE')
ORDER BY s_acctbal DESC , n_name , s_name , p_partkey
LIMIT 100;
Many customers tend to use a fixed size such as VARCHAR(500) uniformly for all their string columns even though many of their columns are much smaller in size. Instead, you should pick sizes appropriately.
Many customers tend to use VARCHAR even to hold more specific data such as decimals. MySQL allows users to aggregate over a string column to find sum as shown below:
SELECT SUM(amt) FROM mysales; -- amt is defined as VARCHAR(64)
Such queries will run more efficiently if a decimal column is used and in addition, it will allow the query to be offloaed to MySQL HeatWave.
Without using the specific data type, the query cannot offload to HeatWave because aggregate function SUM on VARCHAR columns is currently not supported in HeatWave.
EXPLAIN SELECT SUM(amt) FROM mysales; -- amt is defined as VARCHAR(64)
<span style="font-size:11pt"><span style="line-height:107%"><span style="font-family:"Calibri",sans-serif"><span style="font-size:10.0pt"><span style="line-height:107%"><span style="color:black">{"Reason": "Aggregation functions other than COUNT, MIN, MAX, FIRST_VALUE, LAST_VALUE, NTH_VALUE do not support VARCHAR and BIT datatype inputs yet"}</span></span></span></span></span></span>
If the data type cannot be changed, then CAST function needs to be used for offloading such a query as shown below:
EXPLAIN SELECT SUM(CAST(amt AS UNSIGNED INT)) FROM mysales;
+----+-------------+---------+------------+------+---------------+------+---------+------+------+----------+------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+---------+------------+------+---------------+------+---------+------+------+----------+------------------------------+
| 1 | SIMPLE | mysales | NULL | ALL | NULL | NULL | NULL | NULL | 1 | 100.00 | Using secondary engine RAPID |
+----+-------------+---------+------------+------+---------------+------+---------+------+------+----------+------------------------------+
There could be cases where certain functions are not yet supported in MySQL HeatWave. If there are a few such references to functions or expressions involving them in queries, then they can be precomputed using GENERATED column option in table and queries can run by referencing these generated columns.
For example:
CREATE TABLE cust_complaints(
id INT PRIMARY KEY,
severity_flag VARCHAR(8),
problem VARCHAR(32),
c_comments JSON NOT SECONDARY) SECONDARY_ENGINE=RAPID;
INSERT INTO cust_complaints VALUES
(1, 'HIGH', 'reload fails', '{"comment":"small text small text small text"}'),
(2, 'HIGH', 'reload fails', '{"comment":"small text small text small text"}'),
(3, 'MEDIUM', 'takes long time', '{"comment":"long text long text long text long text long text long textlong text long text long textlong text long text long textlong text long text long textlong text long text long textlong text long text long textlong text long text long textlong text long text long textlong text long text long textlong text long text long textlong text long text long textlong text long text long textlong text long text long textlong text long text long textlong text long text long text"}'),
(4, 'LOW', 'incorrect trace', '{"comment":"small text small text small text"}');
Since JSON data type columns are not supported, c_comments is marked as NOT secondary. Now user cannot execute any query involving c_comments column in MySQL HeatWave.
ALTER TABLE cust_complaints SECONDARY_LOAD;
EXPLAIN SELECT id, severity_flag, problem FROM cust_complaints WHERE severity_flag = ('HIGH') OR LENGTH(c_comments)>128;
+----+-------------+-----------------+------------+------+---------------+------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-----------------+------------+------+---------------+------+---------+------+------+----------+-------------+
| 1 | SIMPLE | cust_complaints | NULL | ALL | NULL | NULL | NULL | NULL | 4 | 100.00 | Using where |
+----+-------------+-----------------+------------+------+---------------+------+---------+------+------+----------+-------------+
The query does not offload as length function is on a NOT SECONDARY column (which is not loaded to HeatWave). User can get around this problem by creating a GENERATED column as shown below.
ALTER TABLE cust_complaints SECONDARY_ENGINE=NULL;
ALTER TABLE cust_complaints ADD (comm_len INT GENERATED ALWAYS AS (LENGTH(c_comments)) STORED);
ALTER TABLE cust_complaints SECONDARY_ENGINE=RAPID;
ALTER TABLE cust_complaints SECONDARY_LOAD;
Now query referencing the GENERATED column gets the desired results and offloads to MySQL HeatWave as shown below.
EXPLAIN SELECT id, severity_flag, problem FROM cust_complaints WHERE severity_flag = ('HIGH') OR comm_len>128;
+----+-------------+-----------------+------------+------+---------------+------+---------+------+------+----------+-------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-----------------+------------+------+---------------+------+---------+------+------+----------+-------------------------------------------+
| 1 | SIMPLE | cust_complaints | NULL | ALL | NULL | NULL | NULL | NULL | 4 | 50.00 | Using where; Using secondary engine RAPID |
+----+-------------+-----------------+------------+------+---------------+------+---------+------+------+----------+-------------------------------------------+
For MySQL HeatWave processing, all tables must have a primary key otherwise as shown below the table load into MySQL HeatWave fails.
CREATE TABLE emp(id INT, ename VARCHAR(32))SECONDARY_ENGINE=RAPID;
ALTER TABLE emp SECONDARY_LOAD;
ERROR 3877 (HY000): Tables loaded in TRANSACTIONAL load pool must have a primary key
If you have tables with no primary key, a simple option is to introduce an auto_increment column as shown below:
ALTER TABLE emp SECONDARY_ENGINE=NULL;
ALTER TABLE emp add (eid_auto INT AUTO_INCREMENT PRIMARY KEY);
ALTER TABLE emp SECONDARY_ENGINE=RAPID;
ALTER TABLE emp SECONDARY_LOAD;
However, before adding a new column, you should consider if any of the existing column or columns can be designated as a primary key.
If we check the default SQL mode, we will see as follows:
<span style="font-size:11.0pt"><span style="line-height:107%"><span style="font-family:"Calibri",sans-serif">SELECT @@sql_mode;</span></span></span>
+-----------------------------------------------------------------------------------------------+
| @@sql_mode
+-----------------------------------------------------------------------------------------------+
| ONLY_FULL_GROUP_BY,STRICT_TRANS_TABLES,<strong>NO_ZERO_IN_DATE,NO_ZERO_DATE</strong>,ERROR_FOR_DIVISION_BY_ZERO,NO_ENGINE_SUBSTITUTION |
+-----------------------------------------------------------------------------------------------+
Maintain these properties to get smoother migration and maximum offload with MySQL HeatWave. This including not having default 0 for date columns in your tables. Click here for more details on MySQL SQL Modes.
In summary
|
Please stay tune for the next few topics of the MySQL HeatWave best practices on data load, query performance, and monitoring.
Addition Resources
MySQL HeatWave Product Manager
Previous Post
Next Post