Wednesday Mar 18, 2009

How MySQL tests server binaries before a release

What happens when the binary files of a fresh MySQL version is published on the web?

You may have noticed that the date on the release notes is not the same as the date the downloads are available. Sometimes there is a two weeks gap, sometimes more. Many people in the community have asked what is going on in the meantime.

The answer is a lot of hard work. The code is built for all the operating systems supported by MySQL, and tested in each platform (1). During this process, portability problems, test case glitches, and other things not caught in the normal daily build and test are fixed.

This task involves QA engineers, Build engineers, the Maintenance team, with help and cooperation from the Services, Development, and Community teams.

I asked our Build and QA Teams to tell what happens between the date a release is "branched off" from the Bazaar tree and the date it is available in the downloads page. This is the list of what goes on. It's very impressive that the regression test suite, which looks huge and intimidating to the casual user, is just a tiny bit compared to the rest of the torture tests that the server goes through.

BTW, this is just the tip of the iceberg. QA is a continuous process, not just a set of tests at the end. Sun/MySQL uses a continuous build and test process triggered on code check in, on different products and branches, in total 574 build and test runs a day (more about that in a separate article). The release process adds to that some more testing.

Thanks, MySQL engineers!

The following text was provided by Kent Boortz, Senior Production Engineer in the Build Team.

Tests run during the build process

During the build all packages was tested using the regression test suites. These tests are mainly testing SQL features, from basic queries to replication and partitioning, but also the “mysql” command line client, the “mysqldump” tool and other client tools.

The server can be run in different “modes”, SQL standards and protocols. Because of this, the same test suite is run several times with different combinations of these modes and protocols. As there are time limits not all combinations are run, but what is believed to be a fair sample of combinations to catch regressions. The test runs for each package are

  1. The main suite, except replication tests, was run against the bundled debug enabled server
  2. The main suite was run against the server in default protocol mode
  3. The main suite was run against the server using prepared statements protocol mode
  4. The main suite was run against the embedded server library
  5. The 'funcs_1' suite was run against the server using prepared statements protocol mode
  6. The 'stress' suite was run against the server in default protocol mode

There are some suites not run against all packages, or was disabled

  1. The 'jp' suite was run against the server in normal mode, but not against the Windows server (this is a bug the suite was not run, there is no good reason to skip this)
  2. The 'funcs_2' test suite was not run, disabled because of Bug#20447
  3. The 'nist' test was run using both normal and prepared statements protocol, in all builds but the RPM builds

Tests run after the release binaries are built

Some package tests are run after the packages are built, not in direct connection to the build process.

  1. The package names were verified
  2. RPM packages were checked for unwanted dependencies, like the IA-64 RPMs build using the Intel icc compiler, verified that no icc runtime libraries are needed to install
  3. A simple link test was done against the client and embedded libraries, to try catch missing symbols, missing headers or defects in the output of the “mysql_config” script, that defines the usage

Install Test

The binaries are installed and basic smoke test was done on all supported platforms to catch any install related issues. Testing on some of the platforms was done using automated scripts while on the others it was done manually.

System Test Suite

This is a concurrency/longevity test and can be configured to run with with different storage engines, concurrent users and time.

This test suite contains a number of of Insert/Update/Delete and Selects tests using stored procedures and triggers. The test also has an aspect of Integration testing, since it uses Events, Partitions, etc together and covers scenarios for various storage engines like testing MyISAM and Innodb together, e.g. a trigger on the Innodb table writes into a MyISAM table.

We tested the scenarios that contain inserts, updates, deletes and selects using stored procedures and triggers for innodb and myisam tables, separately. Each of these tests were run for a period of 24 hours with 50 concurrent users


This is also a concurrency/longevity test and executes Insert/Update/Delete scenarios using stored procedures and triggers but with replication. We tested 3 different scenarios one each for replication types RBR, SBR and MBR, Each of these tests were run with 100 concurrent users for 6 each hours with Innodb tables

High Concurrency

This is Concurrency/Longevity/Stress testing and executes an OLTP scenario on one table. The concurrency and the time period can be configured. We ran this test on Linux platform , with 2000 concurrent users over a period of 8 hours using InnoDB table


This is a single user performance benchmark and tests various scenarios, providing one specific angle to performance benchmark. With this test we catch \*performance results / regressions per operation\* (Sometimes "operation" = one SQL query, but often this is block of statements/queries). We ran this test using Prepared statements with MyISAM and InnoDB tables separately, Each test was run 3 times and the average time computed for each operation. The resulting performance numbers were compared with previous versions of MySQL server for identifying regressions..

DBT-2 Benchmark (TPC-C)

This toolkit implements the TPC-C benchmark standard, measuring the performance for OLTP scenario by using the New order transactions per minute (NOTPM). The tests are completely configurable. We ran the tests for CPU bound analysis with 16, 40 and 80 concurrent users with InnoDB tables and then compared the performance with both 5.1.30 and 5.0 version to detect performance regressions on SUSE platform.

Upgrade/Downgrade testing

Executes upgrade/downgrade scenario of the MySQL server, checking that objects can be created/altered/dropped and data can be inserted/updated/deleted/selected between the previous and current versions of MySQL. Objects include Permissions, Tables,Views,Triggers, Stored Procedures, events and partition and a large number of datatypes. Both live and dump/dump scenarios are tested.

We tested minor version upgrades from pervious versions of 5.1 to 5.1.31 and also major version upgrades 5.0.x to 5.1.31


We use the Sysbench tests to measure database server performance (OLTP benchmark) and compare this against the performance of previous mysql server versions.

We ran the OLTP_RO, OLTP_RW and also the atomic queries using InnoDB and MyISAM tables for 4, 16, 64, 128, 256 threads

Large dataset testing

This test suite runs insert/update/delete/select scenarios using indexes against a very large database containing up to 1 billion rows in the tables. We try to uncover areas where Query Analyser or Optimizer perhaps is not using the right approach. Large datasets provide clearer results on if internal query analysis & optimization has been done correctly.

Tests Using Random Query Generator

Additionally, replication tests were run in the 5.1 replication team tree using Random Query Generator tool. These tests were run on myisam tables, with all replication modes, with a "simple" workloads (DML only) and a "complex" workload (DML, DDL, implicit commit, other statements interesting for replication). Few bugs were found and fixed during this process.

How we deal with bugs during the release phase

As we use the "release often" process for enterprise releases, a bug found during the release process might not cause the process to stop to fix that bug, and rebuild it all.

If considered serious enough it will, but in some cases a bug is filed and the correction is targeted for the next or a later "maintenance" release (or what they are named).

Over time we will get better and better at test automation, preventing these kind of bugs to sneak in. But at this time it might, and means a maintenance release could be a regression to an unlucky user hit by that specific bug.

Having said that, it is rare that we find a new bug during the release process that is ignored, but it could happen, as said a side effect of the "release often" release process.

(1) Not all tests are run in all platforms. Some performance tests are only run on the most popular ones, to compare performance across versions.


Giuseppe Maxia


« August 2016