By Plampio-Oracle on Jul 10, 2015
The names of databases and tables within MySQL are known as identifiers. In the simplest case these identifiers are just strings of certain ASCII characters (the basic Latin letters, the digits 0-9, the dollar sign and the underscore). However, if an identifier is placed in quotes, it can contain any character of the full Unicode Basic Multilingual Plane (except U+0000). We say that a character is a special character if it is permitted in a quoted identifier but not in an unquoted identifier.
MySQL Enterprise Backup (MEB) 3.12.1 introduces support for proper handling of table and database names with special characters. In MEB versions prior to 3.12.1 database and table names were represented as ASCII strings and the same string was used on the command line, internally within MEB and in filenames. This caused MEB to fail some operations when database/table names contained characters other than those that can appear in unquoted MySQL identifiers.
Starting with version 3.12.1, MEB represents database and table names as strings of Unicode characters (or code points) and three distinct encodings are employed: the command-line character encoding, the internal UTF-8 encoding and the filename encoding used by the MySQL server. Thus, the same string of Unicode characters appears in three different forms depending on where the string is used.
The command-line character encoding is defined by the operating system. On Linux and other Unix-line systems this is done by the command shell and on Windows by the Command Prompt. In order for the special character support to work, MEB must know what character encoding is used on the command line. The user can tell MEB the character encoding used on the command line by specifying the name of the encoding with the --default-character-set option. If this option is not given, then MEB tries to detect the command-line character encoding by examining the command program from which it was invoked.
In the following example the database db contains three tables with names that contain special characters: àbc, à123 and déf. We backup the tables àbc and à123 using the Transportable Tablespaces feature (TTS) of the MySQL Server as follows:
$ mysqlbackup --include-tables='`db`\.`à.*`' --use-tts --backup-dir=/full-backup backup MySQL Enterprise Backup version 3.12.1 Linux-3.2.0-80-generic-i686 [2015/06/22] Copyright (c) 2003, 2015, Oracle and/or its affiliates. All Rights Reserved. ...
The tables àbc and à123 are included in this partial backup by specifying with the --include-tables option a regular expression that matches the fully-qualified table names `db`.`àbc` and `db`.`à123`.
In the example above, we have specified both the database and table name quoted with backticks. However, it is not necessary to provide these backticks because MEB forms four strings from every fully-qualified table name: DB.TABLE, `DB`.TABLE, DB.`TABLE` and `DB`.`TABLE`, and a regular expression is defined to match a table if it matches any of these four strings generated from the table name.
Special characters can be specified with these four MEB options:
--include=REGEXP--include-tables=REGEXP--exclude-tables=REGEXP--rename="OLD-NAME to NEW-NAME"
Regular expressions can contain any Unicode character in the Unicode Basic Multilingual Plane and they are matched against strings of Unicode characters. This means that special characters can be used in regular expressions in exactly the same way as the characters that are not special.
Each command program (or a shell, in the Unix terminology) treats certain characters in a special way. For example, the bash program (which is popular on Linux and other Unix-like systems) uses single and double quotes as quoting characters and assigns a very special meaning to backticks: anything enclosed in backticks is treated as a command line and evaluated. This can be problematic when the user invokes MEB from the bash program. Suppose that the user wishes to restore the table db.à123 from a backup and rename it to db.new_à123. The command-line
$ mysqlbackup ... --include-tables="db\.à123" --rename="`à123` TO `new_à123`" copy-back
seems to do just that, but unfortunately when invoked from the bash shell it fails with the following error:
à123: command not found
The problem is that the new table à123 contains a special character and the table name must be therefore quoted. However, the string `à123` is interpreted by the bash shell as a request to run the command à123 which, of course, fails.
There are two ways to remedy this situation. In the first fix, we place the parameter of the --rename option in single quotes:
$ mysqlbackup ... --include-tables="db\.a123" --rename='`à123` TO `new_à123`' copy-back
This works because the bash shell treats backticks as normal characters when they appear in a string placed in single quotes.
Another solution is provided by MEB. In addition to the backtick quoting, MEB allows identifiers to be quoted with single quotes and double quotes. Therefore, the following command line (where we replaced backticks with single quotes) works
$ mysqlbackup ... --include-tables="db\.a123" --rename="'à123' TO 'new_à123'" copy-back
MEB converts any identifiers on the command line that are quoted with single or double quotes to a standard form where the quoting is done with backticks. This is a very handy feature when MEB is invoked from a command program, such as the bash shell, that assigns special meaning to backticks.
Sometimes the Windows Command Prompt can be a little bit tricky to set up correctly to handle special characters. For example, single quotes do not work with the Windows Command Prompt. Therefore, some users might want to consider using the free Cygwin command program when working with special characters on Windows. Cygwin software provides a command line similar to the one on the Unix-like systems. Using Cygwin should allow easy manipulation of the command-line character encoding.
Sometimes MEB has the wrong idea of the character encoding used on the command line. Here we show two examples of what can happen if the command-line character encoding is different from what MEB thinks it is.
$ mysqlbackup --default-character-set="latin1" --backup-dir=/full-backup --include-tables='db\.a123' --rename="a123 TO 'new_à123'" copy-back MySQL Enterprise Backup version 3.12.1 Linux-3.2.0-80-generic-i686 [2015/07/07] Copyright (c) 2003, 2015, Oracle and/or its affiliates. All Rights Reserved. ... 150707 17:01:30 mysqlbackup: INFO: Importing table: db.a123 and renaming it to `db`.`new_à123`. 150707 17:01:31 mysqlbackup: INFO: Analyzing table: `db`.`new_à123`. 150707 17:01:31 mysqlbackup: INFO: Copy-back operation completed successfully. 150707 17:01:31 mysqlbackup: INFO: Finished copying backup files to '/sqldata/tts-5.6' mysqlbackup completed OK!
Everything looks good in the example shown above: it seems that we have successfully restored the table db.a123 and renamed it to db.new_à123, as we intended. However, when we look at the list of tables at the server we see that the new table name is not db.new_à123, but db.new_Ã 123:
$ echo "use db; show tables;" | mysql à123 àbc a123 déf new_Ã 123
This problem was caused by specifying an incorrect value for the command-line character encoding with the --default-character-set="latin1" option on the command line invoking the copy-back operation.
In the next example we show another possible outcome when MEB is given a wrong command-line character encoding. We invoke the following MEB restore operation from a command shell that uses UTF-8 character encoding.
$ mysqlbackup --default-character-set="hebrew" --backup-dir=/full-backup --include-tables='db\.a123' --rename="a123 TO 'new_à123'" --force copy-back MySQL Enterprise Backup version 3.12.1 Linux-3.2.0-80-generic-i686 [2015/06/22] Copyright (c) 2003, 2015, Oracle and/or its affiliates. All Rights Reserved. ... mysqlbackup: ERROR: Failed to convert "hebrew" characters to "utf8" characters in string "a123 TO 'new_à123'". mysqlbackup: ERROR: Unicode character conversion failed.
The conversion from "hebrew" characters to "utf8" characters fails because the character encoding used on the command line is not "hebrew", but "utf8". This conversion fails in the middle of the string because some of the UTF-8 encoded characters are not valid "hebrew" encoded characters. The error message "Unicode character conversion failed" means always that MEB has the wrong idea of the character encoding used on the command line. To recover from this failure, the user must find out the character encoding used on the command line and specify it to MEB with the --default-character-set option.