MySQL Enterprise Data Masking and De-Identification components were introduced in MySQL v. 8.0.33 and are available in Enterprise Edition and cloud version. Among other functionality, they provide functions that randomly select terms from user defined masking dictionaries. They are stored in a dedicated table and memory cache. So far, the schema of the table was fixed and consistency between the table and the cache was achieved by restricting modifications of masking dictionaries to special administrative functions. Since v. 8.3 the schema may be arbitrary selected, and content of modified dictionaries table may be flushed to memory either on demand or periodically.
Motivation
MySQL offers different methods and formats of replication. For example, Row Based Replication format operates on table rows, so functions called on master will not be called on slaves. When some data set (like masking dictionaries) must be cached for proper work, it must be flushed to the memory after the replication. So, there is a need for flushing mechanism.
In order to manipulate the masking dictionaries, MASKING_DICTIONARIES_ADMIN privilege is needed. Though, this privilege is not sufficient to directly modify or even read the dictionaries table, especially when it is in mysql system schema, that should be particularly secured. Consequently, this privilege is also not sufficient to do logical backups (i.e. using mysqldump tool). Locating the dictionaries in an arbitrary chosen schema allows for decoupling access to a system schema from managing and replicating masking dictionaries.
Upgrade From Legacy Data Masking Plugin
If the legacy plugin was in use, it must be uninstalled prior to installing the components and all its functions must be dropped:
UNINSTALL PLUGIN data_masking; DROP FUNCTION gen_blocklist; DROP FUNCTION gen_dictionary; DROP FUNCTION gen_dictionary_drop; DROP FUNCTION gen_dictionary_load; DROP FUNCTION gen_range; DROP FUNCTION gen_rnd_email; DROP FUNCTION gen_rnd_pan; DROP FUNCTION gen_rnd_ssn; DROP FUNCTION gen_rnd_us_phone; DROP FUNCTION mask_inner; DROP FUNCTION mask_outer; DROP FUNCTION mask_pan; DROP FUNCTION mask_pan_relaxed; DROP FUNCTION mask_ssn;
Then one may continue with installing Data Masking and De-Identification components described in the next section.
If there are some masking dictionaries stored for the legacy plugin in text files, it is possible to migrate them to the table used by the components ad described in the penultimate section.
Data Masking and De-Identification Components Installation with Custom Schema
By default, the masking components expect that masking_dictionaries table (used for storing masking dictionaries) is placed in mysql system schema. In order to use a dedicated schema from the very beginning (i.e. first launch of masking feature) follow the procedure provided below.
Create the schema (assuming it does not exist) named of your choice (say mask_db) and masking_dictionaries table storage in it:
CREATE DATABASE IF NOT EXISTS mask_db;
CREATE TABLE IF NOT EXISTS
mysql.masking_dictionaries(
Dictionary VARCHAR(256) NOT NULL,
Term VARCHAR(256) NOT NULL,
UNIQUE INDEX dictionary_term_idx (Dictionary, Term),
INDEX dictionary_idx (Dictionary)
) ENGINE = InnoDB DEFAULT CHARSET=utf8mb4;
Install the components:
INSTALL COMPONENT ‘file://component_masking_functions’;
Set appropriate value to component_masking.masking_database read-only system variable: using SET PERSIST_ONLY statement, defaults file, or pass –component_masking.masking_database system parameter to mysqld. E.g.:
Finally restart MySQL server for the read-only variable change take effect.
Migrating to Custom Schema
Dedicated schema for masking_dictionaries might be applied at some point in time, while Data Masking and De-Identification components being in use already and masking_dictionaries table placed in the default schema mysql. In that case steps described below are needed.
Create the target schema (say the chosen name is mask_db), masking_dictionaries table in that schema (following the structure of the original table) and copy the content:
CREATE TABLE mask_db.masking_dictionaries LIKE mysql.masking_dictionaries;
INSERT INTO mask_db.masking_dictionaries SELECT * FROM mysql.masking_dictionaries;
You may want to drop the original table:
Set appropriate value to component_masking.masking_database read-only system variable: using SET PERSIST_ONLY statement, defaults file, or pass –component_masking.masking_database system parameter to mysqld. E.g.:
Finally restart MySQL server for the read-only variable change take effect.
Flushing Data Masking Dictionaries
Flushing data masking dictionaries from table to the memory cache may be done on demand or periodically.
Imagine there is some source containing items suitable to mask confidential data. One may import it directly to a masking dictionaries table using SQL queries. E.g.
In order to make the new dictionary terms usable for gen_dictionary() or gen_blocklist() functions, the new dictionary terms must be flushed which may be done by calling:
(This function requires the MASKING_DICTIONARIES_ADMIN privilege.)
Another use case may be replication. Then, a periodical flush should be configured on a slave node. First, the scheduler component must be installed if not done previously:
Second, the length of the interval must be set, e.g.:
will cause flushing every minute. In order to make this setting persistent, the variable must be set in options file, SET PERSISTENT or passed as an server option.
Migration from Legacy Text File Based Dictionaries to Table
The below procedure may be used to move masking dictionaries stored in files (used by the legacy Data Masking plugin). Suppose there are two such files: de_cities.txt and gb_cities.txt and masking dictionaries schema is mask_db.
Load the files to the table:
LOAD DATA INFILE ‘gb_cities.txt’ INTO TABLE masking_dictionaries (Term) SET Dictionary = ‘gb_cities’;
Note: LOAD DATA INFILE requires secure_file_priv variable set and FILE privilege.You may also need to add clause “LINES TERMINATED BY ‘\r\n'” when working on Windows. For security reasons it is recommended to restrict loading data from files as much as possible. As the legacy Data Masking plugin used this feature you may consider increase the level of restrictions after the plugin is uninstalled.
Flush the dictionaries:
Additional Information
Explore the new MySQL Enterprise Data Masking and De-Identification Features
MySQL Enterprise Data Masking and De-Identification Components
MySQL Enterprise Data Masking and De-Identification Component Installation

