Thursday Oct 24, 2013

Keeping up with New Releases

You can keep up with the latest developments in MySQL software in a number of ways, including various blogs and other channels. However, for the most correct (if somewhat dry and factual) information, you can go directly to the source. 

Major Releases 

For every major release, the MySQL docs team creates and maintains a "nutshell" page containing the significant changes in that release. For the current GA release (whatever that is) you'll find it at this location:

 At the moment, this redirects to the summary notes for MySQL 5.6. The notes for MySQL 5.7 are also available at that website, at the URL, and when eventually that version goes GA, it will become the currently linked notes from the URL shown above.

Incremental Releases 

For more detail on each incremental release, you can have a look at the release notes for each revision. For MySQL 5.6, the release notes are stored at the following location:

At the time I write this, the topmost entry is a link for MySQL 5.6.15. Each linked page shows the changes in that particular version, so if you are currently running 5.6.11 and are interested in what bugs were fixed in versions since then, you can look at each subsequent release and see all changes in glorious detail.

One really clever thing you can do with that site is do an advanced Google search to find exactly when a feature was released, and find out its release notes. By using the preceding link in a "site:" directive in Google, you can search only within those pages for an entry. For example, the following Google search shows pages within the release notes that reference the --slow-start-timeout option:

    site: "--slow-start-timeout"

By running that search, you can see that the option was added in MySQL 5.6.5 and also rolled into MySQL 5.5.20. 

 White Papers

Also, with each major release you can usually find a white paper describing what's new in that release. In MySQL 5.6 there was a "What's new" whitepaper at this location:

You'll find other white papers at:

Search the page for "5.6" to see any papers dealing specificallly with that version.

Thursday Oct 17, 2013

Building Queries Systematically

The SQL language is a bit like a toolkit for data. It consists of lots of little fiddly bits of syntax that, taken together, allow you to build complex edifices and return powerful results. For the uninitiated, the many tools can be quite confusing, and it's sometimes difficult to decide how to go about the process of building non-trivial queries, that is, queries that are more than a simple SELECT a, b FROM c;

A System for Building Queries

When you're building queries, you could use a system like the following: 

  1. Decide which fields contain the values you want to use in our output, and how you wish to alias those fields
    1. Values you want to see in your output
    2. Values you want to use in calculations . For example, to calculate margin on a product, you could calculate price - cost and give it the alias margin.
    3. Values you want to filter with. For example, you might only want to see products that weigh more than 2Kg or that are blue. The weight or colour columns could contain that information.
    4. Values you want to order by. For example you might want the most expensive products first, and the least last. You could use the price column in descending order to achieve that.
  2. Assuming the fields you've picked in point 1 are in multiple tables, find the connections between those tables
    1. Look for relationships between tables and identify the columns that implement those relationships. For example, The Orders table could have a CustomerID field referencing the same column in the Customers table.
    2. Sometimes the problem doesn't use relationships but rests on a different field; sometimes the query is looking for a coincidence of fact rather than a foreign key constraint. For example you might have sales representatives who live in the same state as a customer; this information is normally not used in relationships, but if your query is for organizing events where sales representatives meet customers, it's useful in that query. In such a case you would record the names of columns at either end of such a connection.
    3. Sometimes relationships require a bridge, a junction table that wasn't identified in point 1 above but is needed to connect tables you need; these are used in "many-to-many relationships". In these cases you need to record the columns in each table that connect to similar columns in other tables.
  3. Construct a join or series of joins using the fields and tables identified in point 2 above. This becomes your FROM clause.
  4. Filter using some of the fields in point 1 above. This becomes your WHERE clause.
  5. Construct an ORDER BY clause using values from point 1 above that are relevant to the desired order of the output rows.
  6. Project the result using the remainder of the fields in point 1 above. This becomes your SELECT clause.

A Worked Example 

 Let's say you want to query the world database to find a list of countries (with their capitals) and the change in GNP, using the difference between the GNP and GNPOld columns, and that you only want to see results for countries with a population greater than 100,000,000.

Using the system described above, we could do the following: 

  • The Country.Name and City.Name columns contain the name of the country and city respectively. 
  • The change in GNP comes from the calculation GNP - GNPOld. Both those columns are in the Country table. This calculation is also used to order the output, in descending order
  • To see only countries with a population greater than 100,000,000, you need the Population field of the Country table. There is also a Population field in the City table, so you'll need to specify the table name to disambiguate. You can also represent a number like 100 million as 100e6 instead of 100000000 to make it easier to read.
  • Because the fields come from the Country and City tables, you'll need to join them. There are two relationships between these tables: Each city is hosted within a country, and the city's CountryCode column identifies that country. Also, each country has a capital city, whose ID is contained within the country's Capital column. This latter relationship is the one to use, so the relevant columns and the condition that uses them is represented by the following FROM clause: 

FROM Country JOIN City ON Country.Capital = City.ID

  • The statement should only return countries with a population greater than 100,000,000. Country.Population is the relevant column, so the WHERE clause becomes: 
WHERE Country.Population > 100e6

  •  To sort the result set in reverse order of difference in GNP, you could use either the calculation, or the position in the output (it's the third column):


  • Finally, project the columns you wish to see by constructing the SELECT clause:
SELECT Country.Name AS Country, City.Name AS Capital, 
       GNP - GNPOld AS `Difference in GNP`

 The whole statement ends up looking like this: 

mysql> SELECT Country.Name AS Country, City.Name AS Capital, 
    ->      GNP - GNPOld AS `Difference in GNP` 
    -> FROM Country JOIN City ON Country.Capital = City.ID 
    -> WHERE Country.Population > 100e6
    -> ORDER BY 3 DESC;
| Country            | Capital    | Difference in GNP |
| United States      | Washington |         399800.00 |
| China              | Peking     |          64549.00 |
| India              | New Delhi  |          16542.00 |
| Nigeria            | Abuja      |           7084.00 |
| Pakistan           | Islamabad  |           2740.00 |
| Bangladesh         | Dhaka      |            886.00 |
| Brazil             | Brasília   |         -27369.00 |
| Indonesia          | Jakarta    |        -130020.00 |
| Russian Federation | Moscow     |        -166381.00 |
| Japan              | Tokyo      |        -405596.00 |
10 rows in set (0.00 sec)

Queries with Aggregates and GROUP BY

While this system might work well for many queries, it doesn't cater for situations where you have complex summaries and aggregation. For aggregation, you'd start with choosing which columns to view in the output, but this time you'd construct them as aggregate expressions.

For example, you could look at the average population, or the count of distinct regions.You could also perform more complex aggregations, such as the average of GNP per head of population calculated as AVG(GNP/Population).

Having chosen the values to appear in the output, you must choose how to aggregate those values. A useful way to think about this is that every aggregate query is of the form X, Y per Z. The SELECT clause contains the expressions for X and Y, as already described, and Z becomes your GROUP BY clause. Ordinarily you would also include Z in the query so you see how you are grouping, so the output becomes Z, X, Y per Z.

 As an example, consider the following, which shows a count of  countries and the average population per continent:

 mysql> SELECT Continent, COUNT(Name), AVG(Population)
     -> FROM Country
     -> GROUP BY Continent;
| Continent     | COUNT(Name) | AVG(Population) |
| Asia          |          51 |   72647562.7451 |
| Europe        |          46 |   15871186.9565 |
| North America |          37 |   13053864.8649 |
| Africa        |          58 |   13525431.0345 |
| Oceania       |          28 |    1085755.3571 |
| Antarctica    |           5 |          0.0000 |
| South America |          14 |   24698571.4286 |
7 rows in set (0.00 sec)

In this case, X is the number of countries, Y is the average population, and Z is the continent. Of course, you could have more fields in the SELECT clause, and  more fields in the GROUP BY clause as you require. You would also normally alias columns to make the output more suited to your requirements.

More Complex Queries 

Queries can get considerably more interesting than this. You could also add joins and other expressions to your aggregate query, as in the earlier part of this post. You could have more complex conditions in the WHERE clause. Similarly, you could use queries such as these in subqueries of yet more complex super-queries. Each technique becomes another tool in your toolbox, until before you know it you're writing queries across 15 tables that take two pages to write out. But that's for another day...

Thursday Oct 10, 2013

MySQL Certifications

For those of you wishing to capitalize on your hard-earned MySQL skills, there are two new MySQL certifications based on MySQL version 5.6:

Each certification requires passing a single exam, and because they're new, they're currently going through a beta process and are much cheaper than usual, a snip at €39 ($50). Even so, you still get the full certification on passing the exam; there's no difference between a certification obtained during the beta process and afterward. Because it's in beta, the exams have more questions—this is to make sure that the beta process works and all questions get a good airing—and you'll need to wait a bit longer to get your results due to the internal analysis.

I should also mention the Oracle Linux 5 and 6 System Administration exam, also currently in beta, seeing as many of our MySQL training courses use that operating system.

The MySQL certification beta process runs until 14th December. You can register online to take the exams at any Pearson Vue examination center. 


Jeremy Smyth writes MySQL training courses, and likes exploring interesting questions that come up from novices and experts alike.



« October 2013 »