X

Learn MySQL with the Curriculum Team

Recent Posts

MySQL for Database Administrators

How to use Invisible Indexes in MySQL 8.0

Have you ever tried to improve the performance of a query by adding a new index to a table with a large number of rows, waited for ages while the index is built, only to discover that it didn't really help as much as you'd like? Or perhaps you have had an index that you have a hunch isn't doing much to help your query performance and decided to drop it, only to have a bunch of users complain that their queries are stalling? If so, then you will appreciate a nice new feature in the forthcoming MySQL 8.0 release called "invisible indexes". Invisible indexes enable you to test the effect of removing an index on query performance without making a destructive change that must be undone should the index turn out to be required. By marking an index as invisible, you are effectively "hiding" it from the optimizer while the index itself remains intact and can be restored at any time. This feature makes it much easier to test the removal or addition of indexes and to perform a staged rollout of the changes. Making an index invisible is not the same as disabling it. The index is still kept up to date and you can continue to maintain it with DML statements. Every index you create is visible by default. If you want to know if a particular index is visible, you can query the Information Schema: mysql> SELECT INDEX_NAME, COLUMN_NAME, IS_VISIBLE -> FROM INFORMATION_SCHEMA.STATISTICS -> WHERE TABLE_SCHEMA = 'world' AND TABLE_NAME = 'Country'; +------------+-------------+------------+ | INDEX_NAME | COLUMN_NAME | IS_VISIBLE | +------------+-------------+------------+ | PRIMARY | Code | YES | +------------+-------------+------------+ 1 row in set (#.## sec) Let's say that you want to add a new index on the Continent column in the Country table of the world database, but you're not quite ready to make it generally available. You can add the index with the usual CREATE TABLE, ALTER TABLE, or CREATE INDEX statement, but use the INVISIBLE keyword to hide it from queries: mysql> ALTER TABLE Country ADD INDEX cont_idx (Continent) INVISIBLE; Query OK, 0 rows affected (#.## sec) Records: 0 Duplicates: 0 Warnings: 0 mysql> SELECT INDEX_NAME, COLUMN_NAME, IS_VISIBLE -> FROM INFORMATION_SCHEMA.STATISTICS -> WHERE TABLE_SCHEMA = 'world' AND TABLE_NAME = 'Country'; +------------+-------------+------------+ | INDEX_NAME | COLUMN_NAME | IS_VISIBLE | +------------+-------------+------------+ | cont_idx | Continent | NO | | PRIMARY | Code | YES | +------------+-------------+------------+ 2 rows in set (#.## sec) Verify that a query cannot use the index: mysql> EXPLAIN SELECT * FROM Country WHERE Continent = 'Asia'\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: Country partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 239 filtered: 14.29 Extra: Using where 1 row in set, 1 warning (#.## sec) After some testing you're happy with the resulting increase in performance and so you mark the index as VISIBLE: mysql> ALTER TABLE Country ALTER INDEX cont_idx VISIBLE; Query OK, 0 rows affected (0.05 sec) Records: 0 Duplicates: 0 Warnings: 0 mysql> SELECT INDEX_NAME, COLUMN_NAME, IS_VISIBLE -> FROM INFORMATION_SCHEMA.STATISTICS -> WHERE TABLE_SCHEMA = 'world' AND TABLE_NAME = 'Country'; +------------+-------------+------------+ | INDEX_NAME | COLUMN_NAME | IS_VISIBLE | +------------+-------------+------------+ | cont_idx | Continent | YES | | PRIMARY | Code | YES | +------------+-------------+------------+ 2 rows in set (#.## sec) And now your queries can use the index: mysql> EXPLAIN SELECT * FROM Country WHERE Continent = 'Asia'\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: Country partitions: NULL type: ref possible_keys: cont_idx key: cont_idx key_len: 1 ref: const rows: 51 filtered: 100.00 Extra: NULL 1 row in set, 1 warning (#.## sec) Dropping and re-adding an index can be expensive for a large table, whereas making it invisible and visible are fast, in-place operations. This makes invisible indexes a great tool for any DBA to have in their toolbox.

Have you ever tried to improve the performance of a query by adding a new index to a table with a large number of rows, waited for ages while the index is built, only to discover that it didn't really...

MySQL for Developers

Which PHP Extension Should I use for MySQL Development?

For many programming languages, it's pretty easy to figure out which Connector to use. Java has Connector/J, if you're using C# or ASP you'll opt for Connector/.NET: the clue is in the name. For PHP however, things aren't quite so straightforward. If you want to write a PHP application that communicates with a MySQL database, you have a choice of PHP extensions that you can use: mysql, mysqli, and PDO_MySQL. We can simplify that list right away by discounting the mysql extension. This is old and was intended for use with MySQL versions before 4.1. It does not support many of the things that you can take for granted with the other two, such as improved MySQL authentication protocols or the ability to create prepared statements to prevent against SQL injection. It was dropped from PHP 7. In fact, you would have to jump through a couple of hoops just to get it to work with a later version of MySQL. So don't bother! The mysql extension was replaced by mysqli (mysql "improved"). This extension is intended for use with PHP 5.0.0 and MySQL 4.1.1 and later. It fully supports the authentication protocol used in MySQL 5.0, as well as the Prepared Statements and Multiple Statements APIs. In addition, this extension supports object-oriented programming techniques, whereas the older mysql extension was strictly procedural. PDO_MySQL is the most recent extension, where PDO stands for "PHP Data Objects". Unlike mysqli which is specific to MySQL, PDO is a general database abstraction layer with support for MySQL. As such, it only implements features that MySQL shares with other databases. As PDO and MySQL have matured, the functionality gap between PDO_MySQL and mysqli has narrowed and, if you do find a MySQL feature that is unsupported, it is likely to be pretty arcane. PDO_MySQL is a great choice if you want to write code that is easy to port to other database platforms. Both mysqli and PDO_MySQL are PHP extensions that are lightweight wrappers on top of a C client library, either the mysqlnd library or the libmysqlclient library. The mysqlnd library has been part of the PHP distribution since 5.3.0. It offers features such as lazy connections and query caching that are not available with libmysqlclient, so Oracle recommends sticking with mysqlnd unless you have a compelling reason not to. If you wanted to switch to libmysqlclient, you would have to recompile the extension. For a more in-depth comparison of the PHP extensions for MySQL, visit the PHP website.

For many programming languages, it's pretty easy to figure out which Connector to use. Java has Connector/J, if you're using C# or ASP you'll opt for Connector/.NET: the clue is in the name. For PHP...

MySQL Performance Tuning

MySQL Performance Tuning Course Updated for 5.7

We've been busy in the MySQL Curriculum team and are pleased to announce the recent update of the MySQL Performance Tuning course. This course has been substantially revised to take advantage of the performance enhancements available in MySQL 5.7, including new Performance Schema instrumentation and improvements in InnoDB and the Query Optimizer. Even more so than in previous versions of the course, we try to filter out what's "nice to know" and focus on the really important configuration settings that can make the most positive impact on your server workload. We provide more in-depth coverage and practice of using system databases like Performance Schema/sys and graphical tools like MySQL Enterprise Monitor and MySQL Workbench that can help you identify performance bottlenecks and be proactive about resolving them before they start to become a major issue for your users. We also cover the use of Oracle tools like Oracle Enterprise Manager with the MySQL plugin and SQL Developer, for those DBAs already familiar with these tools. As always, the emphasis is on "learning by doing", so you'll find a whole range of activities that help to reinforce the knowledge you gain in the lectures. Taught by expert MySQL DBAs, this course teaches you everything you need to know to get the most out of your databases. Find out more at on the Oracle University website.

We've been busy in the MySQL Curriculum team and are pleased to announce the recent update of the MySQL Performance Tuning course. This course has been substantially revised to take advantage of the...

MySQL for Database Administrators

What are the Differences Between InnoDB and NDB, and MySQL Cluster and InnoDB Cluster?

Like any mature technology that is constantly evolving, MySQL has amassed a jargon of its own. Some of this terminology occasionally causes confusion, especially when two terms are similar, but actually refer to two completely different things. This is particularly the case for the two storage engines InnoDB and NDB (which sound very alike when spoken aloud), and the two "Cluster" technologies: InnoDB Cluster and MySQL Cluster. Let's see if we can clear this confusion up. InnoDB is a storage engine - the software component that a database uses to read, write, update, and delete data and perform other fundamental operations. InnoDB replaced MyISAM as the default storage engine for MySQL in 2010, with the release of version 5.5 (largely because of its support for transactions and foreign keys) and is the best option for most workloads. NDB (also known as NDBCLUSTER) is another storage engine, but this one stores data predominantly in-memory and independently of a MySQL Server instance. It is the storage engine that MySQL Cluster uses. NDB stands for "network database". MySQL Cluster or MySQL NDB Cluster is an entirely separate product that stores data in the memory of multiple servers, none of which require a MySQL server instance. With its distributed "shared nothing" architecture, MySQL Cluster offers high availability and auto-sharding (partitioning) of tables horizontally across low-cost hardware to serve both read and write-intensive workloads. InnoDB Cluster is almost entirely unrelated to MySQL Cluster and is implemented as a set of plugins, starting with MySQL 5.7. One of these is the Group Replication plugin which enables MySQL servers in the group to replicate data between them. If a server fails, then the cluster re-configures itself automatically. With tools like MySQL Shell (for configuring the cluster) and MySQL Router (to transparently connect client applications to the server instances in the group) InnoDB Cluster provides a convenient way for DBAs to configure a high availability system using the familiar InnoDB storage engine. MySQL Server is often used as the front end to query data stored in the MySQL Cluster, but the cluster can also be queried using NoSQL techniques. So, in a nutshell, InnoDB and NDB are both storage engines. But InnoDB is the default storage engine for MySQL Server, whereas NDB is designed specifically for MySQL Cluster. MySQL Cluster and InnoDB Cluster are both high availability solutions for MySQL. MySQL Cluster uses the NDB storage engine and does not require the MySQL Server software to be installed on any of the nodes within the cluster. InnoDB Cluster provides a mechanism for servers with the MySQL Server software installed to replicate data between them. MySQL NDB Cluster MySQL InnoDB Cluster

Like any mature technology that is constantly evolving, MySQL has amassed a jargon of its own. Some of this terminology occasionally causes confusion, especially when two terms are similar, but...

MySQL for Database Administrators

What Causes Replication Lag?

Replication lag occurs when the slaves (or secondaries) cannot keep up with the updates occuring on the master (or primary). Unapplied changes accumulate in the slaves' relay logs and the version of the database on the slaves becomes increasingly different from that of the master. To work out what's causing the lag, you must determine which replication thread is getting backed up. Replication relies on three threads per master/slave connection: one is created on the master and two are created on the slave. The Slave I/O Thread. When you issue START SLAVE on a slave server, the slave creates this thread which connects to the master and requests a copy of the master's binary log. The Binlog Dump Thread. When the slave connects to the master, the master uses this thread to send the slave the contents of its binary log. The Slave SQL Thread. The slaves creates this SQL (or applier) thread to read the contents of the retrieved binary log and apply its contents. Replication lag is caused when either the I/O Thread or SQL Thread cannot cope with the demands placed upon it. If the I/O Thread is suffering, this means that the network connection between the master and its slaves is slow. You might want to consider enabling the slave_compressed_protocol to compress network traffic or speaking to your network administrator. If it's the SQL thread then your problem is probably due to poorly-optimized queries that are taking the slave too long to apply. There may be long-running transactions or too much I/O activity. Having no primary key on the slave tables when using the ROW or MIXED replication format is also a common cause of lag on this thread: check that both your master and slave versions of tables have a primary key. But how do you know which thread is affected? This depends on whether you are using binary log file names and positions or GTIDs. If you are using binary log file names and positions, then executing SHOW MASTER STATUS and SHOW SLAVE STATUS gives you all you need to work out which thread is causing the problems: master> SHOW MASTER STATUS; +---------------+-----------+ | File | Position |... +---------------+-----------+ | binlog.000002 | 121752008 |... +---------------+-----------+ slave> SHOW SLAVE STATUS\G ********************* 1. row ********************* Slave_IO_State: Waiting for master to send event Master_Host: 127.0.0.1 Master_User: replication Master_Port: 22808 Connect_Retry: 60 Master_Log_File: binlog.000002 Read_Master_Log_Pos: 121409852 Relay_Log_File: relaylog.000002 Relay_Log_Pos: 119819329 Relay_Master_Log_File: binlog.000002 Slave_IO_Running: Yes Slave_SQL_Running: Yes ... Exec_Master_Log_Pos: 120003004 Relay_Log_Space: 121226377 You can calculate any I/O thread lag by subtracting the value of Read_Master_Log_Pos from Position. Bytes behind master = Position - Read_Master_Log_Pos = 121752008 - 121409852 = 342156 bytes Note that this assumes that the binary log file name is the same on both the slave and the master, otherwise the calculation becomes more complex because you must take into account the size of the binary log files. The following calculation tells you if the slave SQL thread is lagging behind the replication master: Bytes behind master = Position - Exec_Master_Log_Pos = 121752008 - 120003004 = 1749004 bytes Or, if the slave's SQL thread is lagging behind its I/O thread, then this will be visible in the results of the following calculation: Bytes behind master = Read_Master_Log_Pos - Exec_Master_Log_Pos = 121409852 - 120003004 = 1406848 bytes The Seconds_Behind_Master column in the output of SHOW SLAVE STATUS also gives you a rough idea of the extent of the delay. Don't rely on it too much though, because it doesn't take into account long-running transactions or any operations that it performs on the master that is not required on the slave. Its value is best monitored over a period. If your replication setup uses GTIDs, then you'll use the following information to diagnose replication lag: Executed GTIDs. Shows which transactions have been executed on the instance. Available on both master and slave. You can also get the GTID of the latest transaction executed from the global gtid_executed status variable. Retrieved GTIDs. Shows which transactions the I/O thread has received. Available only on the slave. Purged GTIDs. Shows which transactions have been purged from the binary log after completion. You're only interested in the transactions purged by the slave in this context. Once you have the GTIDs you can use the GTID_SUBTRACT() function to calculate the difference in the slave's view of the world from that of the master. For example, the following query on the slave shows the GTIDs read from the binary log that have not yet been applied (SQL thread lag): slave> SELECT GTID_SUBTRACT('96985d6f-2ebc-11e7-84df-08002715584a:5-133', '96985d6f-2ebc-11e7-84df-08002715584a:26-132') AS MissingGTIDs; +-----------------------------------------------+ | MissingGTIDs | +-----------------------------------------------+ | 96985d6f-2ebc-11e7-84df-08002715584a:5-25:133 | +-----------------------------------------------+ 1 row in set (0.00 sec)

Replication lag occurs when the slaves (or secondaries) cannot keep up with the updates occuring on the master (or primary). Unapplied changes accumulate in the slaves' relay logs and the version of...

MySQL Performance Tuning

What do the Queries and Questions Status Variables Tell You?

The two status variables Queries and Questions can sometimes cause confusion. On the face of it, both appear to count the number of times a query is executed on the server. Each variable has both session and global scope, meaning that you can get statistics for both the current connection and for all connections. You can reset the session variables by executing FLUSH STATUS, but resetting the global variables requires a server restart. All good so far. However, the way in which these variables are incremented is not always intuitive. For one thing, everything in MySQL is a Question. Any operation you perform, including checking the status of a server variable is a Question. Even checking the number of Questions issued by your session is, in itself, a Question: mysql> SHOW STATUS LIKE 'Questions'; +---------------+-------+ | Variable_name | Value | +---------------+-------+ | Questions | 2 | +---------------+-------+ 1 row in set (0.00 sec) mysql> SHOW STATUS LIKE 'Questions'; +---------------+-------+ | Variable_name | Value | +---------------+-------+ | Questions | 3 | +---------------+-------+ 1 row in set (0.00 sec) mysql> SHOW STATUS LIKE 'Questions'; +---------------+-------+ | Variable_name | Value | +---------------+-------+ | Questions | 4 | +---------------+-------+ 1 row in set (0.00 sec) Another thing to be aware of is that MySQL increments these counters before executing the query. This doesn't always give you a very accurate indication of how many queries are executing concurrently, because these status variables tell you when the query was issued and not when it was run. So, if you rely on these variables for insight into your server workload you might well observe queries being executed at a pretty constant rate when in actual fact they are not completing until some required resource (such as an InnoDB row-level lock) becomes available. A better indication of your server activity at any one time is the Threads_running status variable. This represents the total number of client processes (threads) currently executing on the database server. The server is holding these connections while the client is waiting for a reply. This is a much better indication of the load on your server than the Questions and Queries status variables. # mysqladmin -i1 extended | grep Threads_running | Threads_running | 33 | | Threads_running | 22 | | Threads_running | 28 | | Threads_running | 1 | | Threads_running | 21 | | Threads_running | 17 | | Threads_running | 2 | | Threads_running | 26 | | Threads_running | 25 | | Threads_running | 30 | | Threads_running | 27 | | Threads_running | 23 | | Threads_running | 1 | | Threads_running | 28 | ... The above output is from a server with 32 connections and you can see that on one occasion 33 threads are running. This indicates that the server is maxed out at that time. (The 33 includes the connection that is checking the value of the Threads_running status variable, so you must extract one from the value for a true picture of the number of active threads.)

The two status variables Queries and Questions can sometimes cause confusion. On the face of it, both appear to count the number of times a query is executed on the server. Each variable has both...

MySQL for Developers

How to Populate JSON Columns

While it was always technically possible to store JSON in MySQL database columns, it was never a particularly attractive option. JSON data is just text, so any string type of sufficient length will take it quite happily. However, getting your JSON into the database using this method is one thing, but getting it back out again in any useful format had to be the responsibility of your application. All that changed in MySQL 5.7.8 with the introduction of a native JSON data type and a range of useful built-in functions that made JSON a first-class citizen of MySQL. In this post, we'll have a look at some of the functions you can use to store JSON data. Later posts will cover retrieving and otherwise manipulating that data. Imagine that we are an online retailer, selling a range of electronics equipment. We sell anything from top of the range TVs to toasters. Clearly all these items will have things in common: they will be associated with a manufacturer, for example, and have a sale price. However, not all attributes will be relevant to all product types. Toasters don't usually have an HDMI port and TVs aren't able to deal with frozen bread. We could always create a new table for each type of product, but what if a particular manufacturer starts producing a new range of toasters with a built-in digital radio? We would need to add a new column to our toasters table to differentiate those from the others, and this would quickly become painful. This is where having a flexible, schema-less data type like JSON could come in really handy. Let's create a table called products that could store this sort of data: mysql> CREATE TABLE `products` ( -> `id` MEDIUMINT(8) UNSIGNED NOT NULL AUTO_INCREMENT, -> `name` VARCHAR(100) NOT NULL, -> `manuf_id` INT UNSIGNED NOT NULL, -> `category_id` INT UNSIGNED NOT NULL, -> `price` DECIMAL(7,2) NOT NULL, -> `attributes` JSON NOT NULL, -> PRIMARY KEY(`id`) -> ); Query OK, 0 rows affected (#.## sec) Now, let's add a product: the great new MP3 player from Wombat Technlogies, called the Wombat Pro Player. Here's what we know about the Wombat: Manufacturer: Wombat Technologies (id 22 in the `manufacturers` table) Category: MP3 players (id 331 in the `categories` table) Price: $49.99 Attributes: Storage: 8 GB Size: 8 cm x 4 cm x 1 cm, weighs 32 g Supported formats: MP3, WMA, FLAC, and AVI To store the attributes, we can just pass in valid JSON as the field value in our INSERT statement: mysql> INSERT INTO `products` (`name`, `manuf_id`, `category_id`, `price`, `attributes`) -> VALUES ( -> 'Wombat ProPlayer', -> 22, -> 331, -> 49.99, -> '{"storage_gb": 8, "size": {"height_cm": 8, "width_cm": 4, "depth_cm": 1, "weight_g": 32}, -> "formats": ["MP3", "WMA", "FLAC", "AVI"]}'); Query OK, 1 row affected (#.## sec) We can then list the contents of the products table and see our JSON attributes. mysql> SELECT * FROM products\G *************************** 1. row *************************** id: 1 name: Wombat ProPlayer manuf_id: 22 category_id: 331 price: 49.99 attributes: {"size": {"depth_cm": 1, "weight_g": 32, "width_cm": 4, "height_cm": 8}, "formats": ["MP3", "WMA", "FLAC", "AVI"], "storage_gb": 8} 1 row in set (#.## sec) See how the order of JSON fields in the attributes column has changed from the order in which we provided them? MySQL sorts the keys and removes any extraneous whitespace between the key/value pairs. If you specified the same key twice, MySQL would only retain the first value you provided for that key. All these steps are part of MySQL's normalization process for JSON. You must provide valid JSON, otherwise you'll get an error from MySQL. You can quickly validate any JSON document by using JSONLint. Or, you can pass some JSON to the JSON_VALID() function in MySQL, which will return 1 if the JSON validates correctly. But providing JSON field values in this way can be very fiddly and error-prone, so you might want to use the MySQL JSON_OBJECT() function to lay out the JSON for you. The JSON_OBJECT() function accepts keys and their corresponding values as parameters: mysql> INSERT INTO `products` (`name`, `manuf_id`, `category_id`, `price`, `attributes`) -> VALUES( -> 'Wombat ProPlayer', -> 22, -> 331, -> 49.99, -> JSON_OBJECT( -> "storage_gb", -> 8, -> "size", -> JSON_OBJECT( -> "height_cm", -> 8, -> "width_cm", -> 4, -> "depth_cm", -> 1, -> "weight_g", -> 32 -> ), -> "formats", -> JSON_ARRAY("MP3", "WMA", "FLAC", "AVI") -> ) -> ); Query OK, 1 row affected (#.## sec) Note how we've used JSON_OBJECT() twice here: once for the main JSON attributes object, and once for the nested "size" object. We've also used another utility function, JSON_ARRAY(), to create and populate the formats field. Another useful function is JSON_MERGE(). This takes two or more arrays and consolidates all their elements into a single array. Or, if you pass it objects instead of arrays, you get back a single object that includes the properties of all the objects you passed to it. Note that JSON_MERGE() behaves differently from JSON_OBJECT() with multiple keys of the same name. It will only use the key once in the output, but its value will be an array containing all the values specified by keys of that name: mysql> SELECT JSON_MERGE( -> JSON_OBJECT("a", 1, "b", 2, "c", 3), -> JSON_OBJECT("c", 4, "d", 5) -> ); +------------------------------------------------------------------------------+ | JSON_MERGE(JSON_OBJECT("a", 1, "b", 2, "c", 3), JSON_OBJECT("c", 4, "d", 5)) | +------------------------------------------------------------------------------+ | {"a": 1, "b": 2, "c": [3, 4], "d": 5} | +------------------------------------------------------------------------------+ 1 row in set (#.## sec) Hopefully this post has outlined some of the methods you can use to get your JSON data into MySQL. In later posts we'll look at other MySQL functions that can help you search and manipulate your JSON data.

While it was always technically possible to store JSON in MySQL database columns, it was never a particularly attractive option. JSON data is just text, so any string type of sufficient length will...

MySQL for Developers

Calculating Distances with MySQL's Spatial Data Extensions

I used to work for a company specializing in Geographical Information Systems, and I still get excited about using computers to interact with geospatial data. Once a field that interested only geography geeks, GIS has now become mainstream, with just about every mobile app offering some kind of location awareness. Whereas many such applications use some kind of third-party API to work with spatial data, with MySQL you can do this right in the database itself. MySQL provides a number of spatial data types which let you create and store simple geometric shapes like points, lines, and polygons that represent features like cities, roads, and countries respectively, together with their location in the real world. Once you have created these geographical features in the database, you can use MySQL to understand the relationship between them. For example, many applications provide a "find my nearest..." capability, to locate things you are interested in, like coffee shops, gas stations, or furry conventions (hey, I don't judge!) If you want to find the distance between your current location and your point of interest then Pythagoras has you covered. He defined a formula for calculating the distance between coordinates way back in the 5th century BC, and it looks like this: distance = sqrt((X2 - X1)^2 + (Y2 - Y1)^2) Which is great if you only ever want to calculate short distances, but fails to take into account the curvature of the earth which would really skew the results for longer ones. What you need then is the Haversine formula which was first used in the 19th century, to calculate distances for ocean-going vessels. Here's what the Haversine formula looks like when implemented as a Java class: public class Haversine {     public static final double R = 6372.8; // In kilometers     public static double haversine(double lat1, double lon1, double lat2, double lon2) {         double dLat = Math.toRadians(lat2 - lat1);         double dLon = Math.toRadians(lon2 - lon1);         lat1 = Math.toRadians(lat1);         lat2 = Math.toRadians(lat2);           double a = Math.pow(Math.sin(dLat / 2),2) + Math.pow(Math.sin(dLon / 2),2) * Math.cos(lat1) * Math.cos(lat2);         double c = 2 * Math.asin(Math.sqrt(a));         return R * c;     }     public static void main(String[] args) {         System.out.println(haversine(36.12, -86.67, 33.94, -118.40));     } } The important thing to note here is the variable R, which corresponds to the approximate radius of the earth, in kilometres. If your trigonometry is as basic as mine, you will appreciate the fact that MySQL wraps all this up for you in a single function called ST_Distance_Sphere: ST_Distance_Sphere(pt1, pt2 [,radius]) You call this function with the two points pt1 and pt2, that you want to calculate the distance between. The radius parameter is optional, and if omitted, uses the value of 6,370,986 meters, being a reasonable approximation of the radius of the earth. (I see this optional extra parameter as being a great bit of future-proofing by MySQL, for when we are living on a terraformed Mars.) Let's use ST_Distance_Sphere to calculate the spherical distance between London and San Francisco. First, we need to create MySQL variables of POINT type to store the longitude/latitude coordinates of our two cities, in decimal degrees. MySQL has a convenience function called ST_GeomFromText for creating spatial types from what is termed Well-Known Text, a basic markup language for representing simple map geometries. mysql> SET @london = ST_GeomFromText('POINT(0.13 51.5)'); Query OK, 0 rows affected (0.00 sec) mysql> SET @sanfrancisco = ST_GeomFromText('POINT(122.4 37.8)'); Query OK, 0 rows affected (0.00 sec) Then, we can call ST_Distance_Sphere on those two points: mysql> SELECT ST_Distance_Sphere(@london, @sanfrancisco); +--------------------------------------------+ | ST_Distance_Sphere(@london, @sanfrancisco) | +--------------------------------------------+ |                          8613637.557000635 | +--------------------------------------------+ 1 row in set (0.00 sec) The result is in meters, so the approximate distance is 8,613 kilometers. This is just a very simple example of the sort of functionality that is available to you once you take advantage of MySQL's data types. Find out more about MySQL's spatial data support in the MySQL Reference Manual.

I used to work for a company specializing in Geographical Information Systems, and I still get excited about using computers to interact with geospatial data. Once a field that interested...

Learning MySQL

How to Create an Index Based on an Expression, with Generated Columns

One of the many great new features included in MySQL 5.7 is the generated column. A generated column is a column in a table where the data is calculated for you, based on an expression that you provide. To understand the benefits of using generated columns, let's consider a very simple example. Imagine that I want to find out how many new hires my organization had in the year 2000. Here's my table: mysql> DESC employees; +------------+---------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +------------+---------------+------+-----+---------+-------+ | emp_no | int(11) | NO | PRI | NULL | | | birth_date | date | NO | | NULL | | | first_name | varchar(14) | NO | | NULL | | | last_name | varchar(16) | NO | | NULL | | | gender | enum('M','F') | NO | | NULL | | | hire_date | date | NO | | NULL | | +------------+---------------+------+-----+---------+-------+ 6 rows in set (#.## sec) It's a simple enough query: I just use the YEAR() function to extract the year from the hire_date column. However, the use of the YEAR() function prevents the optimizer from using hire_date as an index and results in a full table scan: mysql> EXPLAIN SELECT COUNT(emp_no) -> FROM employees WHERE YEAR(hire_date)=2000\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: employees partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 299733 filtered: 100.00 Extra: Using where 1 row in set, 1 warning (#.## sec) In a trivial query like this one, I might just choose to live with the overhead. But imagine a frequently-executed query that examines millions of records. I would want to optimize it as much as possible, and to do that I need to be able to use the year as an index. I could create a new column and populate it with the year portion of the date. But then I'd need to create a trigger on the table that keeps this new column up to date when the data changes. That sounds like a lot of work to me. A much better approach would be to use a generated column, which will create and maintain the data on my behalf: mysql> ALTER TABLE employees -> ADD COLUMN hire_year SMALLINT -> GENERATED ALWAYS AS (YEAR(hire_date)) STORED; Query OK, 300024 rows affected (#.## sec) Records: 300024 Duplicates: 0 Warnings: 0 In the above statement, the GENERATED ALWAYS clause tells MySQL that this is a generated column, and provides the expression used to create the data in the column. The STORED keyword specifies that this data should be stored with the table and can be indexed. The default option is VIRTUAL, which does not store the column values but evaluates them each time a row is read. Generated columns of the VIRTUAL type can be used only for secondary indexes. I can now add an index based on this new column. The optimizer can use it to improve the performance of my query: mysql> CREATE INDEX idx_hire_year -> ON employees(hire_year); Query OK, 0 rows affected (#.## sec) Records: 0 Duplicates: 0 Warnings: 0 mysql> DESC employees; +------------+---------------+------+-----+---------+------------------+ | Field | Type | Null | Key | Default | Extra | +------------+---------------+------+-----+---------+------------------+ | emp_no | int(11) | NO | PRI | NULL | | | birth_date | date | NO | | NULL | | | first_name | varchar(14) | NO | | NULL | | | last_name | varchar(16) | NO | | NULL | | | gender | enum('M','F') | NO | | NULL | | | hire_date | date | NO | | NULL | | | hire_year | smallint(6) | YES | MUL | NULL | STORED GENERATED | +------------+---------------+------+-----+---------+------------------+ 7 rows in set (#.## sec) mysql> EXPLAIN SELECT COUNT(emp_no) -> FROM employees WHERE hire_year=2000\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: employees partitions: NULL type: ref possible_keys: idx_hire_year key: idx_hire_year key_len: 3 ref: const rows: 13 filtered: 100.00 Extra: Using index 1 row in set, 1 warning (#.## sec) As I hope this brief post demonstrates, generated columns are great for working with columns of data that cannot be indexed directly, including indexing fields in JSON documents. They are also useful for "centralizing" complex conditions that multiple queries rely on and ensuring that every query uses exactly the same condition. Find out more about MySQL Generated Columns in the MySQL Reference Manual.

One of the many great new features included in MySQL 5.7 is the generated column. A generated column is a column in a table where the data is calculated for you, based on an expression that...

MySQL for Database Administrators

MySQL for Database Administrators: New edition

The "MySQL for Database Administrators" course has been revised and released in its 4th edition. As with previous editions, it covers the core techniques that all DBAs need to know, such as server configuration, user management, security, troubleshooting, query optimization, backups, and maintaining high availability. The new edition also covers how to use exciting new features in the latest MySQL GA releases, including Group Replication and MySQL Cloud Service. This is a very hands-on course, and each lesson has extensive practical activities so that you get a chance to try out new features and techniques in the safety of a properly-configured learning environment. You can take this course in the following formats: Live-Virtual Event: Attend a live event from your own desk—no travel required. You can choose from a selection of events on the schedule to suit different timezones, from the Western coast of the Americas all the way to Australia. In-Class Event: Travel to an education center to attend an event. There are scheduled events in classrooms from Chicago to Singapore, from New York to Kuala Lumpur. To register for an event or to learn more about the MySQL curriculum, visit http://education.oracle.com/mysql.

The "MySQL for Database Administrators" course has been revised and released in its 4th edition. As with previous editions, it covers the core techniques that all DBAs need to know, such as server ...

What does the 11 mean in INT(11)?

code {font-size: 1em;} div.sidebar {width:40%; float:right; background-color:lightgrey;font-size:.7em;line-height:1.2em;padding:4px;margin:4px;} If you create a table using a numeric type like INT or BIGINT, you might have been surprised by the numbers that appear in the table's DESCRIBE: mysql> CREATE TABLE test(a INT, b SMALLINT, c BIGINT); Query OK, 0 rows affected (1.04 sec) mysql> DESCRIBE test; +-------+-------------+------+-----+---------+-------+ | Field | Type        | Null | Key | Default | Extra | +-------+-------------+------+-----+---------+-------+ | a     | int(11)     | YES  |     | NULL    |       | | b     | smallint(6) | YES  |     | NULL    |       | | c     | bigint(20)  | YES  |     | NULL    |       | +-------+-------------+------+-----+---------+-------+ 3 rows in set (0.03 sec) I didn't put those values 11, 6, and 20 in there. Where did they come from and what are they? They're the columns' "Display Width" Well, for an integer type (the value in parentheses called the display width of the field. This is different from (and somewhat less intuitive than) the parenthesised value in character fields—such as VARCHAR(10)—where it's the maximum number of characters you can store in the field, and for floating types where it describes the total number of digits you can store. The display width for integers... well it doesn't seem to do much really, on the surface. An example  Here's an example, using BIGINT because they're, well, biggest: mysql> CREATE TABLE d1(c1 BIGINT(5), c2 BIGINT, c3 BIGINT(30)); Query OK, 0 rows affected (0.78 sec) mysql> DESC d1; +-------+------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-------+------------+------+-----+---------+-------+ | c1 | bigint(5) | YES | | NULL | | | c2 | bigint(20) | YES | | NULL | | | c3 | bigint(30) | YES | | NULL | | +-------+------------+------+-----+---------+-------+ 3 rows in set (0.00 sec)  The c1 and c3 columns use explicit display widths. The c2 column uses the default display width, which is just enough to contain the largest amount of characters in a BIGINT (including the sign). The largest negative BIGINT is -9223372036854775808, which if you count carefully you'll see is 20 characters. For similar reason, the default display with of an INT (with largest negative value -2147483648, go on, count 'em) is 11, and so on. mysql> INSERT INTO d1 VALUES (1, 1, 1); Query OK, 1 row affected (0.09 sec) mysql> INSERT INTO d1 VALUES (123456, 123456, 123456); Query OK, 1 row affected (0.07 sec) mysql> SELECT * FROM d1; +--------+--------+--------+ | c1     | c2     | c3     | +--------+--------+--------+ |      1 |      1 |      1 | | 123456 | 123456 | 123456 | +--------+--------+--------+ 2 rows in set (0.00 sec) You can see from this that the display width has no effect on the output at all, at least when you use the mysql command-line client. We'll come back to this idea later. So what's it for? The numeric type overview calls this mysterious value "the maximum display width for integer types," and continues: "Display width is unrelated to the range of values a type can contain." To see an effect, you can pad the columns with zeroes: mysql> CREATE TABLE d2(c1 BIGINT(5) ZEROFILL, c2 BIGINT ZEROFILL,      -> c3 BIGINT(30) ZEROFILL); Query OK, 0 rows affected (0.67 sec) Note that ZEROFILL implicitly makes the column unsigned so it cannot store negative numbers. You couldn't, for example, see a number like -000123 in such a column. The ZEROFILL option fills up the return value with zeros, as you might have guessed. It turns numbers like 123 into numbers like 000123, assuming your display width is 6. Let's see what it means in our table: mysql> INSERT INTO d2 VALUES (1, 1, 1); Query OK, 1 row affected (0.06 sec) mysql> INSERT INTO d2 VALUES (123456, 123456, 123456); Query OK, 1 row affected (0.08 sec) mysql> SELECT * FROM d2; +--------+----------------------+--------------------------------+ | c1     | c2                   | c3                             | +--------+----------------------+--------------------------------+ |  00001 | 00000000000000000001 | 000000000000000000000000000001 | | 123456 | 00000000000000123456 | 000000000000000000000000123456 | +--------+----------------------+--------------------------------+ 2 rows in set (0.00 sec) So it should be clear that the display width does not limit the amount of values you store in the column—you still get 123456 in a column with a display width of 5—but simply put, it affects how the values appear when they're padded. What's the point if it doesn't work without ZEROFILL? Ah, but it does. Well, it does if you want it to. The mysql command-line client doesn't use the display width unless the field is full of zeroes, but other client applications can (and do). The display width is available to applications through the API, so they can use it to pad (with spaces, dashes, or whatever you like) the values. For example, in PHP you could do something like this: $result = $mysqli->query('SELECT c3 FROM d1');  ... $c3metadata = $result->fetch_field_direct(0); // fetch metadata from first field (c3) $c3length = $c3metadata->length; // get the length from it (30) ... $row = $result->fetch_assoc(); printf("%${c3length}d\n", $row['c3']); // prints c3 in a column 30 chars wide     This code reads the display width of the c3 column (which we know is 30, from the code above) from the column's metadata into a variable $c3length, and uses that value to provided a width to the format specifier %d (for decimal integer), so you get the expression %${c3length}d, which evaluates as %30d. This prints the value of $row['c3'] as an integer in a field 30 characters wide, right justified and space-padded. The same sort of code exists in other languages; in Java, the java.sql.ResultSetMetaData interface provides the getColumnDisplaySize method, and Java's String.format method works similarly to the printf code above. In short...  You can ignore the display widths if you don't use them. The defaults are enough to display any value compatible with the type, and won't cause you trouble if your application uses the display width without your knowledge. If you need to zero-pad your numbers, display width lets you say how much padding you need, but you'll need to make sure your application deals with number larger than that width either by ensuring it can handle them, or that your business logic prevents them. If you want to display space-padded numbers in plaintext reports or other fixed-width output formats and you want to store the column's display width with the other column metadata at the database (and where else would you put it?), use the display width.  

If you create a table using a numeric type like INT or BIGINT, you might have been surprised by the numbers that appear in the table's DESCRIBE: mysql> CREATE TABLE test(a INT, b SMALLINT, c BIGINT);Qu...

The "test" Database and Security

code {font-size: 1em;display:inline;margin:0px; padding:0px;line-height:1.2em;} div.sidebar {width:40%; float:right; background-color:lightgrey;font-size:.7em;line-height:1.2em;padding:4px;margin:4px;} Many installations of MySQL server come with a built-in database called test. It's initially empty, and you might wonder what it's for, or even if you can delete it without any problems. What is it for?  The test database is installed by the MySQL Server RPM as part of the mysql_install_db process, and some other package managers run that script too. If you run that script as part of a manual install of MySQL, you'll get the same effect. It creates the database by creating an empty directory called "test" in the data directory, and creates wide-open access to the database test and any database with a name beginning with test_ by inserting a couple of rows into the mysql.db table that give everyone full access to create or use those databases. The configuration is designed to make it easy for new users to create a playground or sandbox database to work with, one that doesn't require asking the DBA to open up permissions every time a user wants to create a database for testing purposes. The test security settings  These are the lines the mysql_install_db script executes (slightly modified):     INSERT INTO mysql.db VALUES ('%','test','','Y','Y','Y','Y', 'Y','Y','N','Y','Y','Y','Y','Y','Y','Y','Y','N','N','Y','Y');     INSERT INTO mysql.db VALUES ('%','test\_%','','Y','Y','Y','Y', 'Y','Y','N','Y','Y','Y','Y','Y','Y','Y','Y','N','N','Y','Y'); This is quite significant, because if you create a database called test_db or test_banana or test_anything_else_really, all those databases are wide open to any MySQL user on that server, even otherwise low-privilege users. This is particularly important if you run a MySQL server shared across many projects or customers. Securing the Default "test" Database Configuration  It's actually a best-practice to remove the test database along with the rows inserted by the lines above—they make the test database (and others beginning with test_) completely accessible to anyone with a MySQL account. When it's created during installation (as it is with the Oracle-built RPM distributions), anyone has full access to it as a sort of sandbox environment, and could in theory use it as a launching point for an attack. Run the mysql_secure_installation script to perform a number of security optimizations, including removing the lines added by the mysql_install_db process. Another way is simply to delete the two rows: DELETE FROM mysql.db WHERE Db IN('test', 'test\_%');  

Many installations of MySQL server come with a built-in database called test. It's initially empty, and you might wonder what it's for, or even if you can delete it without any problems. What is it...

Keeping up with New Releases

You can keep up with the latest developments in MySQL software in a number of ways, including various blogs and other channels. However, for the most correct (if somewhat dry and factual) information, you can go directly to the source.  Major Releases  For every major release, the MySQL docs team creates and maintains a "nutshell" page containing the significant changes in that release. For the current GA release (whatever that is) you'll find it at this location: https://dev.mysql.com/doc/mysql/en/mysql-nutshell.html  At the moment, this redirects to the summary notes for MySQL 5.6. The notes for MySQL 5.7 are also available at that website, at the URL http://dev.mysql.com/doc/refman/5.7/en/mysql-nutshell.html, and when eventually that version goes GA, it will become the currently linked notes from the URL shown above. Incremental Releases  For more detail on each incremental release, you can have a look at the release notes for each revision. For MySQL 5.6, the release notes are stored at the following location: http://dev.mysql.com/doc/relnotes/mysql/5.6/en/   At the time I write this, the topmost entry is a link for MySQL 5.6.15. Each linked page shows the changes in that particular version, so if you are currently running 5.6.11 and are interested in what bugs were fixed in versions since then, you can look at each subsequent release and see all changes in glorious detail. One really clever thing you can do with that site is do an advanced Google search to find exactly when a feature was released, and find out its release notes. By using the preceding link in a "site:" directive in Google, you can search only within those pages for an entry. For example, the following Google search shows pages within the release notes that reference the --slow-start-timeout option:     site:http://dev.mysql.com/doc/relnotes/mysql/ "--slow-start-timeout" By running that search, you can see that the option was added in MySQL 5.6.5 and also rolled into MySQL 5.5.20.   White Papers Also, with each major release you can usually find a white paper describing what's new in that release. In MySQL 5.6 there was a "What's new" whitepaper at this location: http://www.mysql.com/why-mysql/white-papers/whats-new-mysql-5-6/ You'll find other white papers at:   http://www.mysql.com/why-mysql/white-papers/   Search the page for "5.6" to see any papers dealing specificallly with that version.

You can keep up with the latest developments in MySQL software in a number of ways, including various blogs and other channels. However, for the most correct (if somewhat dry and factual) information,...

Building Queries Systematically

code {display:inline;font-size: 1em; margin:0px; padding:2px;} div.sidebar {width:40%; float:right; background-color:lightgrey;font-size:.7em;line-height:1.2em;padding:4px;margin:4px;} The SQL language is a bit like a toolkit for data. It consists of lots of little fiddly bits of syntax that, taken together, allow you to build complex edifices and return powerful results. For the uninitiated, the many tools can be quite confusing, and it's sometimes difficult to decide how to go about the process of building non-trivial queries, that is, queries that are more than a simple SELECT a, b FROM c; A System for Building Queries When you're building queries, you could use a system like the following:  Decide which fields contain the values you want to use in our output, and how you wish to alias those fields Values you want to see in your output Values you want to use in calculations . For example, to calculate margin on a product, you could calculate price - cost and give it the alias margin. Values you want to filter with. For example, you might only want to see products that weigh more than 2Kg or that are blue. The weight or colour columns could contain that information. Values you want to order by. For example you might want the most expensive products first, and the least last. You could use the price column in descending order to achieve that. Assuming the fields you've picked in point 1 are in multiple tables, find the connections between those tables Look for relationships between tables and identify the columns that implement those relationships. For example, The Orders table could have a CustomerID field referencing the same column in the Customers table. Sometimes the problem doesn't use relationships but rests on a different field; sometimes the query is looking for a coincidence of fact rather than a foreign key constraint. For example you might have sales representatives who live in the same state as a customer; this information is normally not used in relationships, but if your query is for organizing events where sales representatives meet customers, it's useful in that query. In such a case you would record the names of columns at either end of such a connection. Sometimes relationships require a bridge, a junction table that wasn't identified in point 1 above but is needed to connect tables you need; these are used in "many-to-many relationships". In these cases you need to record the columns in each table that connect to similar columns in other tables. Construct a join or series of joins using the fields and tables identified in point 2 above. This becomes your FROM clause. Filter using some of the fields in point 1 above. This becomes your WHERE clause. Construct an ORDER BY clause using values from point 1 above that are relevant to the desired order of the output rows. Project the result using the remainder of the fields in point 1 above. This becomes your SELECT clause. A Worked Example   Let's say you want to query the world database to find a list of countries (with their capitals) and the change in GNP, using the difference between the GNP and GNPOld columns, and that you only want to see results for countries with a population greater than 100,000,000. Using the system described above, we could do the following:  The Country.Name and City.Name columns contain the name of the country and city respectively.  The change in GNP comes from the calculation GNP - GNPOld. Both those columns are in the Country table. This calculation is also used to order the output, in descending order To see only countries with a population greater than 100,000,000, you need the Population field of the Country table. There is also a Population field in the City table, so you'll need to specify the table name to disambiguate. You can also represent a number like 100 million as 100e6 instead of 100000000 to make it easier to read. Because the fields come from the Country and City tables, you'll need to join them. There are two relationships between these tables: Each city is hosted within a country, and the city's CountryCode column identifies that country. Also, each country has a capital city, whose ID is contained within the country's Capital column. This latter relationship is the one to use, so the relevant columns and the condition that uses them is represented by the following FROM clause: FROM Country JOIN City ON Country.Capital = City.ID The statement should only return countries with a population greater than 100,000,000. Country.Population is the relevant column, so the WHERE clause becomes: WHERE Country.Population > 100e6  To sort the result set in reverse order of difference in GNP, you could use either the calculation, or the position in the output (it's the third column): ORDER BY GNP - GNPOld or ORDER BY 3 Finally, project the columns you wish to see by constructing the SELECT clause: SELECT Country.Name AS Country, City.Name AS Capital,        GNP - GNPOld AS `Difference in GNP`  The whole statement ends up looking like this:  mysql> SELECT Country.Name AS Country, City.Name AS Capital, -> GNP - GNPOld AS `Difference in GNP` -> FROM Country JOIN City ON Country.Capital = City.ID -> WHERE Country.Population > 100e6 -> ORDER BY 3 DESC; +--------------------+------------+-------------------+ | Country            | Capital    | Difference in GNP | +--------------------+------------+-------------------+ | United States | Washington | 399800.00 | | China | Peking | 64549.00 | | India | New Delhi | 16542.00 | | Nigeria | Abuja | 7084.00 | | Pakistan | Islamabad | 2740.00 | | Bangladesh | Dhaka | 886.00 | | Brazil | Brasília | -27369.00 | | Indonesia | Jakarta | -130020.00 | | Russian Federation | Moscow | -166381.00 | | Japan | Tokyo | -405596.00 | +--------------------+------------+-------------------+ 10 rows in set (0.00 sec) Queries with Aggregates and GROUP BY While this system might work well for many queries, it doesn't cater for situations where you have complex summaries and aggregation. For aggregation, you'd start with choosing which columns to view in the output, but this time you'd construct them as aggregate expressions. For example, you could look at the average population, or the count of distinct regions.You could also perform more complex aggregations, such as the average of GNP per head of population calculated as AVG(GNP/Population). Having chosen the values to appear in the output, you must choose how to aggregate those values. A useful way to think about this is that every aggregate query is of the form X, Y per Z. The SELECT clause contains the expressions for X and Y, as already described, and Z becomes your GROUP BY clause. Ordinarily you would also include Z in the query so you see how you are grouping, so the output becomes Z, X, Y per Z.  As an example, consider the following, which shows a count of  countries and the average population per continent:  mysql> SELECT Continent, COUNT(Name), AVG(Population)     -> FROM Country     -> GROUP BY Continent; +---------------+-------------+-----------------+ | Continent     | COUNT(Name) | AVG(Population) | +---------------+-------------+-----------------+ | Asia          |          51 |   72647562.7451 | | Europe        |          46 |   15871186.9565 | | North America |          37 |   13053864.8649 | | Africa        |          58 |   13525431.0345 | | Oceania       |          28 |    1085755.3571 | | Antarctica    |           5 |          0.0000 | | South America |          14 |   24698571.4286 | +---------------+-------------+-----------------+ 7 rows in set (0.00 sec) In this case, X is the number of countries, Y is the average population, and Z is the continent. Of course, you could have more fields in the SELECT clause, and  more fields in the GROUP BY clause as you require. You would also normally alias columns to make the output more suited to your requirements. More Complex Queries  Queries can get considerably more interesting than this. You could also add joins and other expressions to your aggregate query, as in the earlier part of this post. You could have more complex conditions in the WHERE clause. Similarly, you could use queries such as these in subqueries of yet more complex super-queries. Each technique becomes another tool in your toolbox, until before you know it you're writing queries across 15 tables that take two pages to write out. But that's for another day...

The SQL language is a bit like a toolkit for data. It consists of lots of little fiddly bits of syntax that, taken together, allow you to build complex edifices and return powerful results. For...

MySQL Certifications

For those of you wishing to capitalize on your hard-earned MySQL skills, there are two new MySQL certifications based on MySQL version 5.6: Oracle Certified Professional, MySQL 5.6 Developer–This certification displays your ability as a developer of MySQL applications and solutions, and covers language-specific topics and performance issues. Oracle Certified Professional, MySQL 5.6 Database Administrator–This certification proves you're a competent administrator of MySQL systems, and covers backups, replication, tuning and other administrative tasks. Each certification requires passing a single exam, and because they're new, they're currently going through a beta process and are much cheaper than usual, a snip at €39 ($50). Even so, you still get the full certification on passing the exam; there's no difference between a certification obtained during the beta process and afterward. Because it's in beta, the exams have more questions—this is to make sure that the beta process works and all questions get a good airing—and you'll need to wait a bit longer to get your results due to the internal analysis. I should also mention the Oracle Linux 5 and 6 System Administration exam, also currently in beta, seeing as many of our MySQL training courses use that operating system. The MySQL certification beta process runs until 14th December. You can register online to take the exams at any Pearson Vue examination center. 

For those of you wishing to capitalize on your hard-earned MySQL skills, there are two new MySQL certifications based on MySQL version 5.6: Oracle Certified Professional, MySQL 5.6 Developer–This...

Connection Failures Between Nodes in MySQL Cluster

code {font-size: 1em;} div.sidebar {width:40%; float:right; background-color:lightgrey;font-size:.7em;line-height:1.2em;padding:4px;margin:4px;} If you're setting up MySQL Cluster on Oracle Linux or another Linux such as Red Hat, CentOS or Fedora, you might have come across some problems getting the nodes to communicate. There are two ways you might bump into problems with network connectivity: The iptables firewall, and SELinux. These security mechanisms might prevent your various nodes—management, data, and API—from communicating with each other in various ways and with various symptoms. Let's have a look at what you're likely to see.  Data nodes stuck at "starting" The first thing you might notice is that your data nodes get stuck in the "starting" state. Running show in the management client gives something like this: [ndbd(NDB)]     2 node(s) id=2    @192.168.56.211  (mysql-5.6.11 ndb-7.3.2, Nodegroup: 0, Master) id=3    @192.168.56.212  (mysql-5.6.11 ndb-7.3.2, starting, Nodegroup: 0) ...and it never moves away from starting. Looking in the management node's log (possibly named something like ndb_1_cluster.log), you might see something like this, repeated many times: [MgmtSrvr] INFO     -- Node 3: Initial start, waiting for 2 to connect,   nodes [ all: 2 and 3 connected: 3 no-wait:  ] This sort of thing appears too: [MgmtSrvr] INFO     -- Node 2: Waiting 60 sec for non partitioned start,      nodes [ all: 2 and 3 connected: 2 missing: 3 no-wait:  ] Just to make matters worse, the first data node might start, but the second gets stuck at this point: [ndbd] INFO     -- Start phase 0 completed  What might confuse you is that it's obvious that both data nodes have connected to the management node, but they're not talking to each other. The reason for this is grounded in how the various nodes communicate with data nodes. Data Node Communication  Data nodes communicate with the management node on port 1186. This port is registered with IANA, and is often open on system that have MySQL installed. Once a data node launches and has been given a node ID, it gets a unique dynamically allocated port—an "ephemeral port"—on its host, and opens a socket so other hosts can communicate with it. This port is assigned by the operating system, and as a result cannot be predicted by the data node; the IANA suggest a port range of 49152–65535, but in Linux the range is from 32768 to 61000. You can get this range with the following command: $ cat /proc/sys/net/ipv4/ip_local_port_range 32768   61000 This is where the problem lies: Firewalls routinely block incoming traffic on those high ports, so while each data node can open a connection to the management node, they can't open connections to other data nodes if those nodes are listening on sockets that are blocked by the firewall. MySQL Cluster won't declare the cluster started until all data nodes have connected (unless you use --nowait-nodes, and in general you shouldn't), so they get stuck in "starting" until they can talk to other data nodes. iptables and ServerPort  Many Linux servers use iptables as a firewall. To open the range of ephemeral ports in the Linux iptables firewall, use something like the following: iptables -I INPUT -p tcp --dport 32768:61000 -s 192.168.56.0/24 -j ACCEPT  This isn't ideal, because you're opening thousands of ports, so MySQL Cluster provides a way that you can be more targetted in your setup.  You can assign fixed ports to data nodes by using the ServerPort option in your management node's config.ini file, like this:  [ndbd] Hostname=datahost1 ServerPort=50501 [ndbd] Hostname=datahost2 ServerPort=50502  Make sure to use ports that are unique to that host, and are unlikely to be used by other services.  Once you've done that, open those ports in the data node hosts' firewalls. To match the example shown above, open port 50501 on datahost1 and port 50502 on datahost2. For example, the following command, when run as root on datahost1, opens its socket: iptables -I INPUT -p tcp --dport 50501 -s 192.168.56.0/24 -j ACCEPT Once you've verified that this works, save your firewall rules with the following command (run as root) on each data node: service iptables save  Stuck Management Nodes Of course, it's also possible that your management node can't listen on port 1186; this is particularly possible on systems that you've installed by copying the binary from compressed archive, tar.gz or similar. If so, you might need to open that port too by using a similar technique. For example, if you've configured two management servers, and port 1186 is not open on one of them, you'll see something like this when running show in the management client that cannot see the other: ERROR Message: The cluster configuration is not yet confirmed by all defined management servers. This management server is still waiting for node 2 to connect. Could not get configuration *  4012: Failed to get configuration *    The cluster configuration is not yet confirmed by all defined management servers. This management server is still waiting for node 2 to connect. If one management node has port 1186 open and the other closed, the one that can communicate with the other displays this: [ndb_mgmd(MGM)] 2 node(s) id=1 (not connected, accepting connect from mgmhost1) id=2    @192.168.56.216  (mysql-5.6.11 ndb-7.3.2) iptables and PortNumber In this situation, the one that appears to be working is actually the one with the blocked port; it can see the other, because the other's port is available. Again, a quick blast of iptables will fix the problem: iptables -I INPUT -p tcp --dport 1186 -s 192.168.56.0/24 -j ACCEPT  You can, of course, change the management node's port. The relevant option is PortNumber, although bear in mind that 1186 is a registered port, so changing it requires changing not only your iptables Tables configuration, but also SELinux and any application connect strings you use. MySQL Server Node failing to connect When you install MySQL from RPM—either MySQL Server or MySQL Cluster—the installer adds SELinux configuration to the machine. The default configuration allows MySQL's server process mysqld to talk to ports 3306 and 1186. These are the ports of the mysqld process and the MySQL Cluster management node respectively. However, the default SELinux configuration doesn't know anything about MySQL acting as an API node in a MySQL Cluster, so doesn't configure any permissions that allow the mysqld process to access the data nodes' dynamic ports. This typically manifests itself as the data node appearing to connect, but running show displays something like this: [mysqld(API)]   1 node(s) id=5 (not connected, accepting connect from any host)  The MySQL Error log shows this: [Warning] NDB : Tables not available after 30 seconds.   Consider increasing --ndb-wait-setup value   The cluster log shows this: [MgmtSrvr] INFO     -- Nodeid 5 allocated for API at 192.168.56.215 [MgmtSrvr] INFO     -- Node 5: mysqld --server-id=1 [MgmtSrvr] INFO     -- Node 3: Node 5 Connected [MgmtSrvr] INFO     -- Node 4: Node 5 Connected [MgmtSrvr] INFO     -- Node 4: Node 5: API mysql-5.6.11 ndb-7.3.2 [MgmtSrvr] INFO     -- Node 3: Node 5: API mysql-5.6.11 ndb-7.3.2 [MgmtSrvr] ALERT    -- Node 4: Node 5 Disconnected [MgmtSrvr] INFO     -- Node 4: Communication to Node 5 closed [MgmtSrvr] ALERT    -- Node 3: Node 5 Disconnected ...and several more such lines. Attempts to create a clustered table result in a vague error: ERROR 157 (HY000): Could not connect to storage engine  Running show warnings in the MySQL client gives a more bizarre result: mysql> show warnings; +--------+-----+---------------------------------------------------------------------------------+ | Level  | Code| Message                                                                         | +--------+------+--------------------------------------------------------------------------------+ | Warning| 1296| Got error 4009 'Cluster Failure' from NDB. Could not acquire global schema lock | | Warning| 1296| Got error 4009 'Cluster Failure' from NDB                                       | | Error  |  157| Could not connect to storage engine                                             | | Error  | 1499| Too many partitions (including subpartitions) were defined                      | +--------+-----+---------------------------------------------------------------------------------+ 4 rows in set (0.00 sec) The MySQL Error log adds lines like this: [Warning] NDB: Could not acquire global schema lock (4009)Cluster Failure SELinux and MySQL Cluster SQL Nodes  This all looks pretty disastrous, but is easily fixed: You can fix it by tweaking SELinux a bit. I've written about SELinux and MySQL in the past, and the same advice in that article applies here. You could, for example, disable SELinux entirely. But, being a more intelligent reader, you're more likely to want to know how to configure it.  Once you've set ServerPort as shown above,  you can easily let SELinux in on that particular secret and permit the mysqld process to access the ports you've assigned by running the following command as root on the MySQL host:  semanage port -a -t mysqld_port_t -p tcp 50501-50524 This opens 24 ports starting with the two in the configuration example shown earlier. Having done this, your MySQL Server instance can talk to data nodes on their fixed ports. Other Connections A lot of what's said in this post also applies to other connections. If you've got a Java application that uses ClusterJPA to talk to your cluster, you'd better make sure it can see the management nodes and that SELinux isn't preventing it from accessing the data nodes. Similarly, if you've changed your management node's port from the default, ensure you've configured the connection string correctly, with that port, in all of your applications' connect strings. This includes MySQL Server applications where you'll need to include the revised port number in the ndb-connectstring option.  You don't need to tell your applications what hosts or ports your data nodes are at. That's the joy of a centralized configuration, isn't it?

If you're setting up MySQL Cluster on Oracle Linux or another Linux such as Red Hat, CentOS or Fedora, you might have come across some problems getting the nodes to communicate. There are two ways you...

How big is a database?

code {font-size: 1em;} div.sidebar {width:40%; float:right; background-color:lightgrey;font-size:.7em;line-height:1.2em;padding:4px;margin:4px;} I got a question a while ago that I thought was quite simple, but turned into an interesting discussion: How much hard disk space does a database take up? As it happens, there's a simple answer and a much, much more involved (yet ill-defined) answer, and which one you choose depends on what you think goes into a database and how very, very pedantic you are. The Simple Answer You can query the size of table and index data using Information Schema or SHOW TABLE STATUS. The columns DATA_LENGTH and INDEX_LENGTH contain the number of bytes for the table and index data respectively. You can construct simple queries based on that data, or do a quick search on the web for more interesting queries (For example, Peter Zaitsev of Percona wrote a post on Researching your MySQL Table Sizes). Adding up the totals for all tables in a database gives you a figure that shows how big the data rows and index contents for the database are. This really is where you should stop. Let's Get Slightly Pedantic: Taking the File System's Perspective In MySQL, databases are little more than file folders, so they don't consume space themselves. The space is consumed by the data within, specifically tables and indexes: Views, stored routines, triggers are little more than command text and so take a trivial amount of space, so for now we can ignore them.  In MySQL 5.6, table and index data are stored in individual files in the appropriate directory. So, in my /var/lib/mysql/world_innodb folder, I've got City.ibd, Country.ibd, and CountryLanguage.ibd. To check the size of the database, I can simply read the size of that folder, which gives me the approximate size of the table and index data, along with any extra space allocated. The extra space comes from the fact that the file size is typically bigger than the amount of data stored within it, because InnoDB increases the file size by 64MB (8MB in 5.5) every time it fills up. You can change this value by setting the innodb_autoextend_increment option. In MySQL 5.1 and 5.5, the default InnoDB configuration stores all InnoDB information in a single tablespace with a filename of (typically) /var/lib/mysql/ibdata1. By default, this file contains all table data and index information for all InnoDB tables in all databases, which means you cannot simply calculate the space taken on disk by the data of any single database. Although all InnoDB data defaults to a single tablespace file, you can change this by setting innodb_file_per_table, which is enabled by default in 5.6. By doing so, you store table and index data for each database in a file named for the table, in the appropriate database directory. Let's Get Really Pedantic: Going Beyond the File System  The simple answer at the top really is quite simple: It only considers data, so if there is no data, there are no rows taking up space. This means that regardless of how many functions, procedures, triggers, views, tables, permissions, or anything else you've configured in your (otherwise empty) database, MySQL considers that it is empty, and therefore takes up 0 space. Of course, if you're quite pedantic about it, you'll know better. What Else is in a Database? MySQL does not allocate space per database, but per table and any indexes associated with that table. If every table in a database contain no data, then MySQL considers that the database contains no data. Queries on DATA_LENGTH and INDEX_LENGTH show information about data (as opposed to metadata), so for the purpose of that query, a database with no data takes up 0 space. This might be a quite simplistic way of looking at it, so let's consider the difference between this viewpoint—a database with no data takes up no space—and another—a database with objects, even empty ones, has to exist, and that must take up some space. As well as table data and indexes, the other information MySQL maintains about a database is its metadata. Some of this metadata is contained within: The file system: as a database directory as db.opt, .frm, and .trg files The mysql database: stored routines, events, privileges, and other database metadata; these are stored in MyISAM tables, each of which contains information for all databases on the server As the metadata is distributed in multiple locations, MySQL does not have a single location that you could point to and say "This is the database," so for the purposes of explanation, let's drill into my local copy of the sakila database.   Metadata on the File System As an administrator, I can see that the structure of my sakila database takes up space in the form of the following file system artefacts:     The database directory inode takes up 4KB on my machine.     The db.opt file takes up 65 bytes (for my sakila database)     Triggers take just over 1KB each (TRG and TRN files)     Table metadata (.frm files) are about 8KB each     View metadata (.frm files) depend on the complexity of the view, but let's say 1KB each On my system, these add up to 336KB for my slightly-extended copy of the sakila database. Metadata in the mysql Database There is also row data in mysql relevant to the sakila database:     The mysql.proc table contains 6 rows for sakila; with an average row length of 1116 bytes—from SHOW TABLE STATUS LIKE 'sakila';—that gives approximately 6KB.     The mysql.db table on my system has two rows on my system (I've set up a couple of users), at ~440 bytes each     The mysql.tables_priv table has one row at 851 bytes (unrepresentative, but included for detail)     I have no mysql.event, mysql.columns_priv, or mysql.procs_priv rows relevant to sakila at this time (again, unrepresentative, but included for detail). Have We Considered Everything Yet? This isn't even the whole story; I haven't included mysql.user records for users who only use this database, or log file entries relevant to the sakila database. You can take this as a sign that what constitutes a database is actually quite a fuzzy thing. So, taking all of that together, you might consider that the metadata for my sakila database takes up ~344KB, but it should be apparent that it is quite a difficult task to define what is and isn't sakila metadata, and there is certainly no built-in way to arrive at this figure in a comprehensive and consistent way that would satisfy the most pedantic administrator. Best to just consider the data, right?

I got a question a while ago that I thought was quite simple, but turned into an interesting discussion: How much hard disk space does a database take up? As it happens, there's a simple answer and a...

Circular Replication in MySQL

code {font-size: 1em;} div.sidebar {width:40%; float:right; background-color:lightgrey;font-size:.7em;line-height:1.2em;padding:4px;margin:4px;} Replication is a hot topic in MySQL 5.6, and for good reason: There are many excellent features that make it a strong well-supported feature, from the new Global Transaction Identifiers (GTIDs), to simplified replication configuration and automated failover using MySQL Utilities (now available in alpha as a separate download). Circular Replication The simplest topology consists of a master server that accepts changes, and slaves that replicate those changes from the master. A common requirement is for a network to have multiple servers that accept changes and replicate to each other. This is possible by means of circular replication, where each master is also the slave of another master, in a circular fashion. However, this configuration is prone to certain problems. Asynchronous Replication Firstly, you have to know a bit about how replication works. MySQL replication is asynchronous, which means each server executes operations without waiting for another server to replicate them. It does this by logging every event and subsequently transmitting these events to connected slaves. This works perfectly—without conflicts—when you have a single master that accepts changes from client applications. That master can have any number of slaves that can execute queries for applications, but don't change the contents of the database. Circular Replication  Circular replication enforces replication from only one other master, so MySQL avoids some sorts of conflicts such as time-order conflicts. A time-order conflict occurs when masters can replicate directly from two or more sources, and conflicting event pairs replicate to masters in a different order. This problem cannot occur in MySQL replication. Circular replication introduces the possibility that two or more servers can replicate concurrent changes to each other. It is therefore is subject to conflicts, because it is possible that two servers update the same row at the same (or nearly the same) time and that each subsequently replicates its changes to the other server. If the replicated changes apply to the same row, then that row ends up with a different value on each server. MySQL does not perform conflict resolution in such situations, so you have to make sure your application caters for that possibility. Conflicts  Conflicts in circular replication occur when your application allows updating the same data (rows and related rows) on different servers. If you do not take care to avoid conflicts, you risk a situation where two servers accept conflicting changes at nearly the same time, and replicate them to the other side. For example, imagine a product that costs $520: The "promotions" team updates it on the sales server by subtracting $50 At the same time, the "brands" team on the management server increases its price by 20% Operation  Sales server  Management server (0) Initial price  $520  $520 (1) Promotions update (subtract $50)  $470   (2) Brands update (increase by 20%)    $624 (3) Promotions update replicates    $574 (4) Brands update replicates  $564   (5) Final price $564  $574 After each of the changes replicates to the other server—after being performed on its local server—the price on the sales server ends up being $564, and that on the management server (for the same product) is $574. There are no errors generated, because each server executes the statements in the order it receives them (either from a connected client application or a replicating master), which, due to the nature of asynchronous replication, is not always the same order on each server. Further Examples  The preceding example describes a simple situation that could occur when you use statement-based replication. If you use row-based replication, you are not immune either, because there are several ways in which operations can conflict. Here are some further examples of conflicts that can occur whether you use statement- or row-based replication: One server deletes several rows based on a WHERE clause that should include a row that has just changed on the other server - the UPDATE propagates in one direction and the DELETE in the other, resulting in a row that exists on one server but not the other. One server updates a set of records at the same time as the other server updates an overlapping set Two servers generate an identical report at exactly 18:00 each night, but one server executes an update statement immediately before the report, that replicates to the other server just after it generates its report. In short, a conflict is likely to occur at some point if you allow changes to common data on two or more servers. Avoiding Conflicts The simplest way to avoid conflicts in a circular replication topology is to ensure that each master server can only update rows that no other master server changes. For example: One master server updates rows within the sales table but not the products table, and the other updates products but not sales. The stock table can be updated by one master during the day, when the warehouse is packing and dispatching; another updates overnight when deliveries arrive. Masters can update each order based on the order's status. One master updates orders that have not yet shipped, another updates only shipped orders. Care must be taken that the order status changes in a controlled way, to avoid conflicts. For example, only the master with write access at that time (based on order status) can change its status; other masters use the order status to decide if they have write access at that time. If you need to improve the performance of your replicated network, you can then scale out each master to multiple slaves to allow for greater bandwidth when running read-only queries. In practice, this means that you can load-balance queries across the slaves, and dedicate the master to handling writes for its tables. Of course, to avoid conflicts all together, you should avoid circular replication and ensure that only one master accepts writes. But where would be the fun in that?

Replication is a hot topic in MySQL 5.6, and for good reason: There are many excellent features that make it a strong well-supported feature, from the new Global Transaction Identifiers (GTIDs), to...

Converting InnoDB Tables to MySQL Cluster

code {font-size: 1em;} div.sidebar {width:40%; float:right; background-color:lightgrey;font-size:.7em;line-height:1.2em;padding:4px;margin:4px;} If you're considering using MySQL Cluster, you might be tempted to try it out on one of your existing databases. MySQL Cluster 7.3 (available as a development release at the time of this post) supports foreign keys, so it seems sensible to try migrating an InnoDB database with foreign keys to MySQL Cluster. What won't work For this example, I'm using the world_innodb.sql script available from the MySQL sample databases. It's a simple database with three tables: Country, City, and CountryLanguage. Both City and CountryLanguage have foreign keys referencing the Country table's Code field. So, you might try something like this: ALTER TABLE Country ENGINE=NDB; ALTER TABLE City ENGINE=NDB; ALTER TABLE CountryLanguage ENGINE=NDB; Sadly, this won't work; InnoDB won't let you convert a table from InnoDB to another engine if the table is at either end of a foreign key, to avoid dangling constraints. If you try to convert an InnoDB table to NDB using one of the above statements, this happens: mysql> ALTER TABLE Country ENGINE=NDB; ERROR 1217 (23000): Cannot delete or update a parent row: a foreign key constraint fails Now, the clever among you might be aware of the  foreign_key_checks variable, which disables InnoDB's foreign key constraint checking when you turn it off. It's useful when you're importing data from a script and don't want to check constraints until the import has finished. However, you can't use it to switch off contraints when changing a table's storage engine; it's designed to be used temporarily, for data, so it won't allow dangling constraints. Similarly, the ndb_deferred_constraints variable can't be used for this purpose either, because it doesn't affect InnoDB's constraint protection. So how do I do it? There are two ways to migrate InnoDB tables with foreign keys to NDB. Dump the database and edit the script so each table specifies ENGINE=NDB before re-importing the script to a new database Drop the constraints, alter the tables to use the NDB engine, and recreate the constraints Dumping the database and editing the script is a straightforward use of mysqldump and a text editor. Dropping and recreating the constraints is more interesting, so I'll walk through that. First, I use SHOW CREATE TABLE to see the constraints' definitions, and particularly their names:  mysql> SHOW CREATE TABLE City\G *************************** 1. row ***************************        Table: City Create Table: CREATE TABLE `City` (   `ID` int(11) NOT NULL AUTO_INCREMENT,   `Name` char(35) NOT NULL DEFAULT '',   `CountryCode` char(3) NOT NULL DEFAULT '',   `District` char(20) NOT NULL DEFAULT '',   `Population` int(11) NOT NULL DEFAULT '0',   PRIMARY KEY (`ID`),   KEY `CountryCode` (`CountryCode`),   CONSTRAINT `city_ibfk_1` FOREIGN KEY (`CountryCode`) REFERENCES `Country` (`Code`) ) ENGINE=InnoDB AUTO_INCREMENT=4080 DEFAULT CHARSET=latin1 1 row in set (0.02 sec) Running the same command for the CountryLanguage table gives me the following constraint (the Country table does not have any foreign keys defined):  CONSTRAINT `countryLanguage_ibfk_1` FOREIGN KEY (`CountryCode`) REFERENCES `Country` (`Code`)  Now I know what they're called, I can drop them: mysql> ALTER TABLE City -> DROP FOREIGN KEY city_ibfk_1; Query OK, 4079 rows affected (0.95 sec) Records: 4079  Duplicates: 0  Warnings: 0 mysql> ALTER TABLE CountryLanguage -> DROP FOREIGN KEY countryLanguage_ibfk_1; Query OK, 984 rows affected (0.58 sec) Records: 984  Duplicates: 0  Warnings: 0 Then I can run the ALTER TABLE statements shown at the top of this post: mysql> ALTER TABLE Country ENGINE=NDB; Query OK, 239 rows affected (4.15 sec) Records: 239  Duplicates: 0  Warnings: 0 mysql> ALTER TABLE City ENGINE=NDB; Query OK, 4079 rows affected (4.02 sec) Records: 4079  Duplicates: 0  Warnings: 0   mysql> ALTER TABLE CountryLanguage ENGINE=NDB; Query OK, 984 rows affected (4.32 sec) Records: 984  Duplicates: 0  Warnings: 0 Finally, I can recreate the constraints based on their definitions from the output of the SHOW CREATE TABLE statements above: mysql> ALTER TABLE City      -> ADD CONSTRAINT `city_ibfk_1`      -> FOREIGN KEY (`CountryCode`)      -> REFERENCES `Country` (`Code`); Query OK, 0 rows affected (1.46 sec) Records: 0  Duplicates: 0  Warnings: 0 mysql> ALTER TABLE CountryLanguage      -> ADD CONSTRAINT `countryLanguage_ibfk_1`      -> FOREIGN KEY (`CountryCode`)      -> REFERENCES `Country` (`Code`); Query OK, 0 rows affected (1.01 sec) Records: 0  Duplicates: 0  Warnings: 0 Now that we have our constraints back, I can see if they work: mysql> DELETE FROM Country WHERE Code='IRL'; ERROR 1451 (23000): Cannot delete or update a parent row: a foreign key constraint fails (Unknown error code) All is as expected, and now my InnoDB data is working in my cluster. 

If you're considering using MySQL Cluster, you might be tempted to try it out on one of your existing databases. MySQL Cluster 7.3 (available as a development release at the time of this post) supports...

Running Out of Physical Disk Space

code {font-size: 1em;} div.sidebar {width:40%; float:right; background-color:lightgrey;font-size:.7em;line-height:1.2em;padding:4px;margin:4px;} Problem: You've a large table (or two) in a database on a partition that's running out of space, and you want to see if you can move that table to another drive. Solution: Well, several actually. No silver bullet, but several options, some with conditions and some that require preparation. Let's look at some background information first. How MySQL Stores Data  OK, that's somewhat of an ambitious heading for an incidental paragraph or two, so to tone it back a bit, I'll summarise briefly. The data directory is where MySQL stores databases, and it's set by the datadir server option. Each database is stored in a subdirectory of the data directory. You can also save a considerable amount of space without moving data around, by using features of the various storage engines, for example if you enable compression on InnoDB tables. If you don't use InnoDB, you can consider using the ARCHIVE storage engine to compress data. Storage engines are what MySQL uses to do the grunt work of storing and retrieving data. InnoDB is the default and is quite full-featured and robust, but some systems still use MyISAM, which is fast but not as well-featured or robust. Other storage engines are available. MyISAM and InnoDB store data in the data directory.  MyISAM stores data in .MYD files and index information in .MYI files, both in the database subdirectories mentioned above. InnoDB stores data and index information in .ibd files in the database subdirectories, when you have the innodb_file_per_table option enabled. This is the default in MySQL 5.6, but must be enabled in 5.5; when disabled, InnoDB stores all data and indexes in a shared tablespace in the root of the data directory. Now you know where the data lives, it's time to look at how to move it somewhere else. Moving the Whole Data Directory  As the datadir option controls where MySQL stores its data, one option we have is to move the contents of the current data directory to a new partition, and then change the datadir option to point to that location.  For example, on my machine, the datadir option points to the /var/lib/mysql directory. As expected, it contains a directory for each of my databases, and it also contains the InnoDB system tablespace files. To increase the space available, I could add a new drive and point the datadir there. Consider the following scenario: Add a new hard drive, partitioned using a suitable file system such as ext4. That file system is visible to my OS as /dev/sdf1 Back everything up to provide a rollback route. Ensure that the datadir option points to the /var/lib/mysql directory (or note the directory otherwise. we'll assume /var/lib/mysql here) Mount the partition in /data by doing the following: # mount /dev/sdf1 /data Give the mysql user ownership of the new filesystem: # chown mysql:mysql /data Stop MySQL Copy the contents of /var/lib/mysql to /data # cp -a /var/lib/mysql/* /data Set the datadir option to point to /data Restart MySQL Beware of the various Mandatory Access Control subsystems that might affect you. If you've got AppArmor installed, you'll also need to change the profile for mysqld so it can access files in the new directory. If you've got SELinux installed, you'll need to add a context mapping for MySQL. In fact, you don't even have to point MySQL to the new partition. If the mountain won't go to Mohammed... let's undo the last couple of steps: Assume you haven't set the datadir option to point to /data, and that it still points to /var/lib/mysql. Also assume MySQL is still stopped. Now the data directory contents are at the new location, but on its next boot MySQL still looks in /var/lib/mysql. Unmount the new partition /dev/sdf1 (or you can use its mountpoint, which is easier to type): # umount /data Mount /dev/sdf1 to /var/lib/mysql:  # mount /dev/sdf1 /var/lib/mysql Restart MySQL Assuming it all works, you should then modify the file /etc/fstab to automatically mount /dev/sdf1 to /var/lib/mysql on reboot, so persisting your new configuration But I Only Want to Move One Database Moving a whole database to another location is achieved by using symbolic links. Simple summary (read the link for more details): Make the new partition as in the example above Rather than move the whole data directory, this time just move a single database directory; this only works with innodb_file_per_table enabled, because otherwise the InnoDB data is in the shared tablespace and can't be moved At this point you can either mount the new partition into the database directory, or use symlinks as described in the link above. But I Only Want to Move One Table Ah, now we're getting interesting. The technique for doing this depends on the version of MySQL you're running—I'll talk about 5.5 and 5.6—and what storage engine the table uses. Firstly, in MySQL 5.5 you can use symbolic links only for tables that use the MyISAM storage engine, and only on certain operating systems such as Linux and Oracle Solaris. The technique is similar to that described above, although you can also use the DATA DIRECTORY and INDEX DIRECTORY options to CREATE TABLE when creating the tables if you want to set up the links from within MySQL rather than at the Linux command prompt. You should not use symbolic links to InnoDB tables, because InnoDB stores some information about each table in the shared tablespace, even when you use the innodb_file_per_table option. MySQL 5.6 also allows you to copy or move InnoDB tables to another server by performing a file copy, by using the transportable tablespaces feature. In MySQL 5.6 you can also move InnoDB tables to a different location by using the DATA DIRECTORY option to CREATE TABLE. When you do so, the tablename.ibd file is moved to a subdirectory of that location named for the table's database, and a corresponding tablename.isl file created in the database's directory that acts as a link to the table's .ibd file. The .isl file is not an actual symbolic link, but is treated as such by MySQL. Plenty of Space All told, you have many options available to you regardless of which storage engine or version you use, although it won't surprise you to know that MySQL 5.6 gives more options than previous versions. Don't forget that modern UNIX-like operating systems often have mandatory access control systems such as AppArmor, SELinux, or Extended Policy, so be sure to do your homework before moving files around.

Problem: You've a large table (or two) in a database on a partition that's running out of space, and you want to see if you can move that table to another drive. Solution: Well, several actually....

SELinux and MySQL

code {font-size: 1em;} div.sidebar {width:40%; float:right; background-color:lightgrey;font-size:.7em;line-height:1.2em;padding:4px;margin:4px;} I've previously written about AppArmor and MySQL, and how to change MySQL's default file locations on systems with AppArmor enabled. Ubuntu and SUSE ship with AppArmor enabled, but some other distributions such as Oracle Linux don't, along with related distrubutions such as Red Hat, CentOS and Fedora. Rather, these other distributions use another mandatory access control system called SELinux.  Here's some technical detail that might come in handy later. SELinux uses concepts such as types and domains. Types belong to resources such as files and ports; these are the "objects" in SELinux. Domains contain the "subjects" (processes) and object types that are associated with each other in some way, for example because they are all related to MySQL. Each executable for a service that SELinux understands, including MySQL, has its own type. When the process runs, it is placed within a domain based on its type; members of that domain (normally processes) that wish to access objects (such as files and ports) must be authorised to do so by virtue of the object having a configured type that is approved for that domain. This configuration is contained within policy files that usually ship with each distribution, but can be created using the appropriate tools. SELinux (Security-Enhanced, if you're interested) "is a Linux feature that provides the mechanism for supporting access control security policies" according to Wikipedia. More simply, it stops things—like programs—from accessing things—like files and network ports—they shouldn't access. By "shouldn't access" I really mean "haven't been configured to access". For example, MySQL is allowed to write to its data directory in /var/lib/mysql, and read from /etc/my.cnf. It can open a socket on port 3306, but SELinux prevents it from writing to files in /home/jeremy or /sbin or anywhere else that isn't already configured as a MySQL location. In short, if you try changing MySQL's port to a non-standard one, or try taking backups or configuring data files or log files to anywhere but the usual locations, you'll get some odd access-denied type errors in the MySQL error log. In addition, you'll get messages in  /var/log/audit/audit.log (if auditd is running, otherwise /var/log/messages or /var/log/syslog, depending how your system is configured). What Error do I get?  To set this demonstration up, I've installed MySQL 5.6 on an Oracle Linux 6.3 system, with SELinux enabled. When I change the datadir option to /datadir (which contains a copy of the MySQL data directory, and has the correct permissions) the service does not start. Let's look at the errors. From the MySQL error log: 130321 11:50:51 mysqld_safe Starting mysqld daemon with databases from /datadir ... 2013-03-21 11:50:52 2119 [Warning] Can't create test file /datadir/boxy.lower-test 2013-03-21 11:50:52 2119 [Warning] Can't create test file /datadir/boxy.lower-test ... 2013-03-21 11:50:52 2119 [ERROR] /usr/sbin/mysqld: Can't create/write to file '/datadir/boxy.pid' (Errcode: 13 - Permission denied) 2013-03-21 11:50:52 2119 [ERROR] Can't start server: can't create PID file: Permission denied 130321 11:50:52 mysqld_safe mysqld from pid file /datadir/boxy.pid ended  Now, I'm very sure the permissions on that folder are correct, so let's have a look in /var/log/audit/audit.log: ... type=AVC msg=audit(1363866652.030:24): avc:  denied  { write } for  pid=2119 comm="mysqld" name="datadir" dev=dm-0 ino=394 scontext=unconfined_u:system_r:mysqld_t:s0 tcontext=unconfined_u:object_r:default_t:s0 tclass=dir ... A similar error occurs if I try starting MySQL on port 3307, a non-default port: In the MySQL error log: 2013-03-21 12:12:09 3436 [Note] Server hostname (bind-address): '*'; port: 3307 ... 2013-03-21 12:12:09 3436 [ERROR] Can't start server: Bind on TCP/IP port: Permission denied 2013-03-21 12:12:09 3436 [ERROR] Do you already have another mysqld server running on port: 3307 ? 2013-03-21 12:12:09 3436 [ERROR] Aborting In the audit log:  type=AVC msg=audit(1363867929.432:42): avc:  denied  { name_bind } for  pid=3436 comm="mysqld" src=3307 scontext=unconfined_u:system_r:mysqld_t:s0 tcontext=system_u:object_r:port_t:s0 tclass=tcp_socket  Clearly something going on here. The Access Vector Cache (as seen in the "avc: denied" message) is where SELinux caches permissions for the kernel, so it's definitely SELinux doing the denying.  Just Stop It! I'm going to start with the hammer and work my way down to the scalpel.  Here's the hammer: [root@boxy ~]# setenforce 0 [root@boxy ~]# getenforce Permissive The setenforce 0 command switches off SELinux enforcing until the next reboot, and getenforce shows you the current status. To stop SELinux from enforcing on any reboot, you'll need to change a configuration file: [root@boxy ~]# cat /etc/selinux/config  # This file controls the state of SELinux on the system. # SELINUX= can take one of these three values: #     enforcing - SELinux security policy is enforced. #     permissive - SELinux prints warnings instead of enforcing. #     disabled - No SELinux policy is loaded. SELINUX=enforcing # SELINUXTYPE= can take one of these two values: #     targeted - Targeted processes are protected, #     mls - Multi Level Security protection. SELINUXTYPE=targeted    Change that enforcing to permissive (or disabled) and you're good to go. The difference: enforcing blocks operations that SELinux does not allow permissive does not block the operations, but logs them (to /var/log/audit/audit.log) disabled switches off SELinux entirely, to the extent that you cannot use setenforce until you change it and reboot.  For example, on a machine with SELinux set to permissive, I can do the following: [root@boxy ~]# setenforce 1 [root@boxy ~]# getenforce Enforcing But if it's disabled, this happens: [root@boxy ~]# setenforce 0 setenforce: SELinux is disabled [root@boxy ~]# setenforce 1 setenforce: SELinux is disabled  That's the hammer.  So, to return to the example that generated the error, I can use the hammer: [root@boxy ~]# setenforce 0 [root@boxy ~]# service mysql start --datadir=/datadir Starting MySQL. SUCCESS!  [root@boxy ~]# service mysql stop Shutting down MySQL.. SUCCESS!  If you're happy with that, you could then edit the configuration file to disable SELinux on next reboot, and thanks for reading. See you next time. I'm intrigued. How do I configure it?  Obviously, there's a lot more to SELinux than disabling it, and a responsible admin (that's you, right?) wants to know how to use it rather than disable it. I'm not going to get into too much detail here. We can, however, look at how you can assign SELinux types to objects such as ports and files, so that members of the mysqld_t domain (specifically the mysqld_safe process, launced when you run service mysql start) can access those objects. So here's the scalpel. First, let's configure SELinux to enable MySQL's use of port 3307:  [root@boxy ~]# semanage port -a -t mysqld_port_t -p tcp 3307  You'll need to install the policycoreutils-python package to use the semanage utility. The semanage utility changes various SELinux settings.  In this case, it adds (-a) a type (-t  mysqld_port_t) to the port mappings for port 3307 using TCP as its protocol (-p tcp). When MySQL (through the mysqld_safe process) tries to access that port, SELinux recognises that the port has a type that is approved for such access by the policy. We can also allow MySQL to use the /datadir directory:  [root@boxy ~]# semanage fcontext -a -t mysqld_db_t "/datadir(/.*)?" [root@boxy ~]# restorecon -Rv /datadir restorecon reset /datadir context unconfined_u:object_r:default_t:s0->unconfined_u:object_r:mysqld_db_t:s0 restorecon reset /datadir/mysql.sock context system_u:object_r:mysqld_var_run_t:s0->system_u:object_r:mysqld_db_t:s0  In this example, semanage is adding the type mysqld_db_t to the file context map (fcontext) for anything in the /datadir directory and subdirectories ("/datadir(/.*)?", a regular expression). File mappings such as this are contained in the file /etc/selinux/targeted/contexts/files/file_contexts.local; that file must subsequently be read in order to set the appropriate type on the file itself. That's done by the restorecon utility, and at system reboot.  If you want to change a file context immediately, but don't need it to survive a reboot, there's a chcon utility that performs that task. The same principles and statements apply if you wish to use other ports or directories. There are similar types that apply to different kinds of files; I used mysqld_db_t above for database directories, but the standard SELinux policy for MySQL also include: mysqld_etc_t for configuration files such as /etc/my.cnf mysqld_log_t for logfiles such as those named /var/log/mysql* Types for the PID file, tmp files, service startup files in the /etc/init.d directory, and the various executables you're likely to use As you can see, you can get quite fine-grained as you wield your configuration scalpel. Personally, I've had mixed results using things like mysqld_log_t for custom logfile locations, but I got around it initially by using mysqld_db_t (as for data files), and subsequently by using a custom-made policy file. Conclusion  This post is already long enough, so I won't get into the deeper topics in SELinux, such as the ability to compile your own policy files and configure new policies for services that SELinux doesn't yet know about. At this stage, you know how to add an SELinux type to an object such as a port or a file so that MySQL can access that object, even if it's not used by default. You also know how to disable SELinux in a couple of ways, but you're not going to do that now, are you? You've got a perfectly good scalpel, after all. Why use a hammer?

I've previously written about AppArmor and MySQL, and how to change MySQL's default file locations on systems with AppArmor enabled. Ubuntu and SUSE ship with AppArmor enabled, but some...

NoSQL with MySQL's Memcached API

code {font-size: 1em;} div.sidebar {width:40%; float:right; background-color:lightgrey;font-size:.7em;line-height:1.2em;padding:4px;margin:4px;} One of our training courses has a section covering MySQL's Memcached API, and how it works. In the discussion, there's a line that goes like this:  "A key is similar to a primary key in a table, and a value is similar to a second column in the same table" For someone well versed in database tables but not so much in key-value stores, that sentence might take a bit of grasping. So, let's break it down. An Example Key/Value Store  Imagine the table kvstore has a column key and a column value. Also imagine that we've set up the Memcached plugin in MySQL and configured it to use that table and those columns as its store. I won't get into that bit for now, but trust me, it's not that hard. You might be familiar with statements like this: REPLACE INTO kvstore (key, value) VALUES ('X', 'Y'); SELECT value FROM kvstore WHERE key='X'; Now imagine you want to be able to get at your values with a simple syntax like this: set X Y ... get X ...or if you're using PHP (and have a MySQL memcached connection called $m), you might want something like this: $Y = $m->get($X); $m->set($X, $Y, ...);  Note the simplicity of the later statements, and the absence of the words "kvstore", "key" or "value"; Once you've configured the Memcached plugin, you don't have to worry about those. Remember, it's a key/value store, so we just worry about the keys and the values and let the configuration take care of everything else. The memcached API merely provides you with a different way of getting at your InnoDB data. Although I've greatly simplified the memcached protocol examples above, you can get the idea. Simply put, you can read and write values without having to construct entire SQL statements. Not only is this faster for you, the programmer, but it's also faster for MySQL. Now, SQL is a much more expressive language, and lets you do things like aggregation and range-based processing, but at a cost: that the SQL parser, statement normalizer, and optimizer have to do their jobs regardless of whether it's a simple SELECT or a complex aggregated join with subqueries, whereas the memcached API just does one thing per statement, so you bypass all the extra CPU work that comes before the storage engine kicks in. So, why bother? An Example Use Case  MySQL's NoSQL feature is comparable with other key-value caches. These are often used on the web when you want to serve pieces of the page quickly, but the page as a whole differs between users. Imagine you've got 100,000 users of a web page, and each one wants to see a different page when they're logged in. You want to see your login name and your current karma/kudos/rep at the top right, and I want to see my relevant information on mine. The navigation div at the top and on the common links on the left will be the same for both of us, but my list of "tags" (or subsectons or whatever other personal navigation shortcuts) will belong to me and yours to you. Then the body content is going to be the same (per article) but different (per page), so my front page will show articles 41, 42, 45, and yours will show articles 42, 43, 44, depending on our options. And the same goes for each of the other 100,000 users who visit the site and want their own personal content. Now the page generation logic, rather than reading the entire navigation code and articles and navigation shortcuts and so on for each page, becomes a matter of asking the cache for the HTML values contained in the keys "user12345-header" and "common-nav" and "common-list" and "user12345-navshortcuts" and "article-41" and "article-42" and ... etc. So the page processing becomes a case of asking the cache for content rather than generating the content each time. This provides massive performance improvements over generating each page's content per page view, although you need to change your logic to catch cache-misses and generate the HTML code to put into the cache accordingly, and to handle session/cache timeouts. MySQL's Advantage  In MySQL this is even better than many other key-value stores, because some cache-misses are automatically taken care of; with InnoDB as the backing store, you've two levels of "cache-hit", the first being when the page is already in the buffer pool, so you're getting the key's value from RAM (which is about as fast as existing key-value caches), but if the page isn't in the buffer pool, InnoDB automatically reads it from the table. This isn't anything new; it's how InnoDB already works with table data. With other key-value stores, if it's not already in RAM, you just get a cache-miss and your code has to generate the data. With InnoDB, it's got this effective second-level cache that stores stuff on disk when it's not in RAM. Plus you get the benefit of other parts of the application being able to use the power of the SQL API (remember those SELECTs and subqueries and joins and aggregation and all the other fun stuff that key-value stores don't have?) to run reports on all your cache data. That's why they call it "Not only SQL". And you get this without having to do any ELT transfers or writing clever persistence logic for your cache. You get that bit for free.

One of our training courses has a section covering MySQL's Memcached API, and how it works. In the discussion, there's a line that goes like this:  "A key is similar to a primary key in a table, and a v...

AppArmor and MySQL

code {font-size: 1em;} div.sidebar {width:40%; float:right; background-color:lightgrey;font-size:.7em;line-height:1.2em;padding:4px;margin:4px;} MySQL accesses files in various places on the file system, and usually this isn't something to worry about. For example, in a standard MySQL 5.5 installation on Ubuntu, the data goes in /var/lib/mysql, and the socket is a file in /var/run/mysqld. It puts configuration files in /etc, logs and binaries in various locations, and it even needs to access some operating system files such as /etc/hosts.allow. This is all very well until you start trying to be clever and get MySQL to access other parts of the file system. After all, you can configure the location of data, log files, socket, and so on, so why shouldn't you use those settings to optimize your system? Unfortunately, on many modern Linux distributions, it's not that always easy. Take Ubuntu, for example. Ubuntu comes with something called AppArmor, a kernel-integrated application security system that controls how applications can access the file system. This goes above and beyond normal permissions, and as a result can sometimes be a bit confusing. TL;DR  First, here's the quick version of this post: If you want to relocate the data directory in MySQL (in this example, to the /data/directory), and AppArmor is not letting you, add the following two lines to the bottom of /etc/apparmor.d/local/usr.sbin.mysqld: /data/ r, /data/** rwk,  ...and then reload the AppArmor profiles:  # service apparmor reload  The Demonstration To demonstrate this in a bit more detail, I've done the following: Installed a stock MySQL 5.5 from the Ubuntu 12.04 repository Created the /data directory, owned by the mysql user and group Copied my data directory to /data with cp -a Now, when I start MySQL with the new data directory, I get the following: [root@boxy ~]# mysqld_safe --datadir=/data 130130 21:31:51 mysqld_safe Logging to syslog. 130130 21:31:51 mysqld_safe Starting mysqld daemon with databases from /data 130130 21:31:51 mysqld_safe mysqld from pid file /var/run/mysqld/mysqld.pid ended ...and it dies. As it's logging to syslog, let's look there: Jan 30 21:31:51 boxy mysqld: 130130 21:31:51 InnoDB: Completed initialization of buffer pool Jan 30 21:31:51 boxy mysqld: 130130 21:31:51  InnoDB: Operating system error number 13 in a file operation. Jan 30 21:31:51 boxy kernel: [81703.213926] type=1400 audit(1359581511.909:36): apparmor="DENIED" operation="open" parent=16198 profile="/usr/sbin/mysqld" name="/data/ibdata1" pid=16538 comm="mysqld" requested_mask="rw" denied_mask="rw" fsuid=116 ouid=116 Jan 30 21:31:51 boxy mysqld: InnoDB: The error means mysqld does not have the access rights to Jan 30 21:31:51 boxy mysqld: InnoDB: the directory. The final two lines say mysqld doesn't have access to the directory, even though I've changed the ownership (both user and group) to mysql. If you haven't come across AppArmor before, this is about where you start to get confused. However, that big "DENIED" is a bit of a giveaway, and it's associated with apparmor, so let's have a look at AppArmor's status: [root@boxy ~]# aa-status  apparmor module is loaded. 18 profiles are loaded. 18 profiles are in enforce mode.    /sbin/dhclient ...    /usr/sbin/cupsd    /usr/sbin/mysqld    /usr/sbin/tcpdump 0 profiles are in complain mode. 2 processes have profiles defined. ... There's a profile loaded for the mysqld process, which could be what's blocking it from accessing /data. There are a couple of quick and dirty ways to get past this. You could, for example, disable AppArmor; it's a service, so you could uninstall it, or stop it with the special teardown command to unload all profiles. You could even delete the offending profile if you want rid of it. Another less extreme option is to use the apparmor-utils package, which contains the utilities aa-complain and aa-enforce that allow you to work with existing profiles without removing them or stopping AppArmor entirely: [root@boxy ~]# aa-complain /usr/sbin/mysqld  Setting /usr/sbin/mysqld to complain mode. As you can probably guess, complain mode simply whines when a process accesses something on the file system that it shouldn't, whereas enforce mode is what stops such access. [root@boxy ~]# aa-status  apparmor module is loaded. 18 profiles are loaded. 17 profiles are in enforce mode.    /sbin/dhclient ...       /usr/sbin/tcpdump 1 profiles are in complain mode.    /usr/sbin/mysqld 2 processes have profiles defined. ... So now it's in complain mode, we can check to see if it starts: [root@boxy ~]# mysqld_safe --datadir=/data 130130 21:34:16 mysqld_safe Logging to syslog. 130130 21:34:16 mysqld_safe Starting mysqld daemon with databases from /data So now we know that AppArmor is the reason why MySQL is not starting, we can enforce again, before going through the proper configuration: [root@boxy ~]# aa-enforce /usr/sbin/mysqld  Setting /usr/sbin/mysqld to enforce mode. Time to look under the covers. Under the Covers: AppArmor's Policy Files When you install MySQL on Ubuntu, it places an AppArmor policy file in /etc/apparmor.d/usr.sbin.mysqld. Another policy file gets placed in /etc/apparmor.d/local/usr.sbin.mysqld, which is initially empty (aside from comments) but exists to let you add non-standard policies such as those specific to this machine. In practice, you could add such policies to either file, but for now I'll put them in the local file. There's also a cached policy file, which is a binary compiled version of the policy. We can happily ignore that; it's automatically generated from the policy text files. Here are some of the contents of /etc/apparmor.d/usr.sbin.mysqld: # vim:syntax=apparmor # Last Modified: Tue Jun 19 17:37:30 2007 #include <tunables/global> /usr/sbin/mysqld {   #include <abstractions/base> ...     /var/log/mysql.err rw,   /var/lib/mysql/ r,   /var/lib/mysql/** rwk,   /var/log/mysql/ r,   /var/log/mysql/* rw, ...     # Site-specific additions and overrides. See local/README for details.   #include <local/usr.sbin.mysqld> } In the middle are the file system policies. Looking at the settings for the existing data directory /var/lib/mysql, you can see that the profile gives read (r) access to the directory itself, and read, write, and lock access (rwk) to its contents recursively (controlled by the **). Conveniently, it also #includes the contents of the local file. Editing the Policy  To give MySQL the necessary access to the /data directory, I edit the included local file so it looks like the following: [root@boxy ~]# cat /etc/apparmor.d/local/usr.sbin.mysqld # Site-specific additions and overrides for usr.sbin.mysqld. # For more details, please see /etc/apparmor.d/local/README. /data/ r, /data/** rwk, As you can see I haven't been particularly creative; I've just copied the policy that applies to the standard data directory /var/lib/mysql, and copied it to this file, mapping the same settings to the new /data directory. Also, although I've put this in the local version of the policy file, I could just as easily have modified the main policy file. Finally, reload the AppArmor profiles: [root@boxy ~]# service apparmor reload  * Reloading AppArmor profiles Skipping profile in /etc/apparmor.d/disable: usr.bin.firefox Skipping profile in /etc/apparmor.d/disable: usr.sbin.rsyslogd No errors about my new profile settings; the default configuration disables some AppArmor policies, but nothing I have to be concerned with. Finally, the moment of truth: Can we start MySQL? [root@boxy ~]# mysqld_safe --datadir=/data 130130 21:38:42 mysqld_safe Logging to syslog. 130130 21:38:42 mysqld_safe Starting mysqld daemon with databases from /data Success! Conclusion  It's worth pointing out that this technique applies if you want to change where MySQL puts anything on the file system. Although the use case described here is a common first reason to bump up against AppArmor's security policies, the data directory is not the only thing that you might want to move. Logs, the UNIX socket, and even configuration files are subject to the controls placed in the AppArmor policy. This also includes any files you access with anything that uses the FILE privilege, such as SELECT ... INTO OUTFILE, LOAD DATA INFILE, or the LOAD_FILE() function. While you can secure this sort of access with MySQL's secure_file_priv option, AppArmor provides a further layer of security that prevents even currently unknown exploits from accessing parts of the file system that they really, really shouldn't.

MySQL accesses files in various places on the file system, and usually this isn't something to worry about. For example, in a standard MySQL 5.5 installation on Ubuntu, the data goes in /var/lib/mysql,...

Performing an Unattended Uninstall of MySQL in Windows

code {font-size: 1em;} div.sidebar {width:40%; float:right; background-color:lightgrey;font-size:.7em;line-height:1.2em;padding:4px;margin:4px;} The MySQL Installer is an easy way to install and maintain your MySQL installations on Microsoft Windows. It comes not only with the database server and documentation, but also with the underrated MySQL Workbench, and various connectors and support utilities such as MySQL for Excel. Modifying the installed products or uninstalling them is also fairly straightforward:   Start -> All Programs -> MySQL -> MySQL Installer -> MySQL Installer This launches the Maintenance window of the Installer GUI, and gives you the options: Add/Modify Products and Features Check for Updates Remove MySQL Products. I had a question from someone who wanted to automate a process in an environment where they tear down and reinstall MySQL to test deployments. The GUI-based way to change or remove MySQL is manual and requires human intervention, and they wanted to automate as much as possible, and thereby install and uninstall MySQL repeatedly and automatically, from a script run at the command prompt. This is quite easy to script if you've got the MSI file you used to install MySQL, as documented here. I thought I'd go a bit further into it and explore the mechanism a little more. Control Panel and the Registry Before we get into the how of unattended uninstallations, the Windows Control Panel also lets you uninstall from the Programs and Features applet (which used to be called Add/Remove Programs). For MySQL, this option offers similar options to the Installer. This mechanism (which isn't specific to MySQL) relies on a centralised repository of change/uninstall commands. So, after a bit of investigation, I found the location in the registry: HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\Uninstall\{26FFE68D-7BD4-472A-9AB8-3517B6BF9E51} In that location, the UninstallString key contains the following text: MsiExec.exe /I{26FFE68D-7BD4-472A-9AB8-3517B6BF9E51}  You might recognise part of that statement from the article Automating MySQL Installation linked to above. Running that MsiExec.exe command launches the uninstall GUI, just as if you'd clicked it from Control Panel. Note: The Registry is a dangerous place to play around in, even for experienced people, so you should be exceptionally careful in the registry and using MsiExec. The GUID (the value between brackets) is correct on my machine, but differs from version to version.  You can find the correct value by searching HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\Uninstall\ in the Windows Registry for “MySQL Server” and finding  the GUID that the Windows Installer uses for the program. Tweaking MsiExec.exe Now, to automate it, there are options to MsiExec.exe that help: /X bypasses the front screen and goes straight to the uninstall process /quiet disables the GUI entirely and automates the uninstall (if possible, which it is in MySQL's case) /norestart disables the post-uninstall reboot that is needed in some cases /L enables logging, with various options (the following example uses /L*V to log in verbose fashion) Performing an Unattended Uninstall Bringing all of this together, you can do the following. In a command-line started with the “Run as Administrator” option, type the following (all on one line): MsiExec.exe /x {26FFE68D-7BD4-472A-9AB8-3517B6BF9E51} /quiet /norestart /L*V "C:\MySQL-uninstall.log" This performs a quiet (no-gui, i.e. unattended) uninstall, without a reboot, logging in verbose format to the file C:\MySQL-uninstall.log. Follow a similar technique to automate the uninstallation of other components that the MySQL Installer installs, or indeed of any well-behaved Windows application.

The MySQL Installer is an easy way to install and maintain your MySQL installations on Microsoft Windows. It comes not only with the database server and documentation, but also with the underrated MySQ...

Automating Backups

code {font-size: 1em;} div.sidebar {width:40%; float:right; background-color:lightgrey;font-size:.7em;line-height:1.2em;padding:4px;margin:4px;} If you need to automate backups, you might wonder about the different techniques available to you. With regards to scheduling backups using built-in features of MySQL, you have two main options: Either run mysqldump (or mysqlbackup if you have an Enterprise licence) from an operating system scheduler, for example in Linux using "cron" or in Windows using the "Task Scheduler". This is the most commonly used option. Alternatively, use the Event Scheduler to perform a series of  SELECT ... INTO OUTFILE ... commands, one for each table you need to back up. This is a less commonly used option, but you might still find it useful. Scheduling mysqlbackup with cron mysqldump is a client program, so when you run it, you run it from a shell script, or at a terminal, rather than inside a MySQL statement. The following statement backs up the sakila database using settings stored in the file /home/bk/bk.cnf: mysqldump --defaults-file=/home/bk/bk.cnf sakila > /home/bk/backup.sql Typically, you would place login information into an options file to avoid specifying a password on the command line. Bear in mind that the configuration file stores the password in plain text, so you should ensure it is secured against unauthorised access. An option file such as that used in the preceding statement could look like the following, assuming that the user bk exists with the password as shown: [mysqldump] user=bk password=bkpwd123 To schedule a backup using mysqldump, use an operating system scheduler such as cron in Linux. Without getting into too much detail, cron uses a configuration file (called a crontab) that allows you to specify a recurring schedule based on minutes, hours, day (of the month or of the week), and month. The following line (placed in the system's crontab)  runs a backup every Sunday at 3:15am: #m h dom mon dow user command 15 3 * * 7 bk mysqldump --defaults-file=/home/bk/bk.cnf sakila > /home/bk/backup.sql For a complete discussion of cron and the crontab, your operating system's manpages should have an entry visible with the man crontab command. Wikipedia has a good overview too.  The Event Scheduler The Event Scheduler runs from within MySQL, so you can use it to schedule any valid MySQL statement that you can use within a query or stored routine. In particular, you can use the Event Scheduler to schedule a statement such as SELECT ... INTO OUTFILE ... , but you cannot invoke a command-line program such as mysqldump. For example, you could schedule a regular dump of the sakila.payment table with the following sequence:  DELIMITER // CREATE EVENT payment_backup ON SCHEDULE EVERY 1 WEEK STARTS '2013-01-20 03:15:00' DO BEGIN SELECT * INTO OUTFILE '/home/bkuser/payment.tsv' FROM sakila.payment; END// DELIMITER ; The preceding statement creates a scheduled event that runs weekly starting at 3:15am on 20th January 2013, a time that is still in the future as I write this. If you create an event starting some time in the past, it is immediately discarded with a warning, so bear that in mind if you're copying and pasting from this post. The Event Scheduler is enabled with the global event_scheduler variable. This variable is set to OFF by default, so to enable it you must set it to ON. Comparing the Techniques  Backups created with mysqldump are files containing SQL statements to recreate the database and its data; one backup typically consists of a single SQL file that can be executed as a whole to recreate the database(s) along with all data. On the other hand, the files created by SELECT ... INTO OUTFILE ... statements are tab-separated data values, by default, and are usually one file per table; although you have the flexibility of dumping data from multiple tables in a single query, the resulting file can not be used as a backup, because restoring from such a file would be impractical. The files are of very different formats, so you would choose your automated backup method based not only on how the schedule and syntax works, but also how best to restore files created using the two techniques and whether you wish to use the files in some other way. For example, many common spreadsheet programs can read TSV files, so you could use files created with SELECT ... INTO OUTFILE ... for analysis as well as for backups. Also bear in mind the need to build a recovery strategy around the file formats you have; there's no point going to all the effort of taking backups if you can't recover from them. There are several other alternatives you can use to back up your databases, including MySQL Enterprise Backup (with similar conditions to those of mysqldump mentioned earlier in this post). You can also use file-system snapshots or raw file copies, neither of which makes use of mysqldump or the Event scheduler, and each of which has its own interesting features and limitations.

If you need to automate backups, you might wonder about the different techniques available to you. With regards to scheduling backups using built-in features of MySQL, you have two main options: Either...

Excluding a Table From Backup

code {font-size: 1em;} div.sidebar {width:40%; float:right; background-color:lightgrey;font-size:.7em;line-height:1.2em;padding:4px;margin:4px;} Let's say you have a database that stores not only current transactional data, but also historic data that's unchanging. In particular, you have a large table containing hundreds of gigabytes worth of last year's data, and it won't change. Having backed it up already, you don't need to back it up every time. Is there any way to exclude this table from a backup? For InnoDB tables with innodb-file-per-table enabled (the default as of MySQL 5.6), MySQL Enterprise Backup supports this feature in inverse. Specifically, you can choose to include specific innodb-file-per-table tables in addition to those stored in the system tablespace. In order to exclude a specific table, you need to provide a regular expression to the --include option that includes every table except the one you want to exclude. For example, in my sakila database, I know that the payment table is the only one beginning with "p", so if I want to exclude it, I can do this: # mysqlbackup --datadir=/var/lib/mysql  --backup_dir=/backups \ >        --include 'sakila\.[^p].*' backup-and-apply-log The following link contains detailed information on the --include option: http://dev.mysql.com/doc/mysql-enterprise-backup/3.7/en/only-innodb-usage.html You can also specify filters for non-InnoDB tables using options such as --databases and --databases-list-file. The following link covers a variety of different partial backup techniques: http://dev.mysql.com/doc/mysql-enterprise-backup/3.7/en/partial.html 

Let's say you have a database that stores not only current transactional data, but also historic data that's unchanging. In particular, you have a large table containing hundreds of gigabytes worth of...

Oracle

Integrated Cloud Applications & Platform Services