Tuesday Apr 21, 2009

Expanding Google's InnoDB Synchronization Improvements to Solaris

There is much excitement today at the launch of MySQL 5.4, so I will relate my story about a project I contributed to this new version.

When we started looking at performance improvements for MySQL, we were interested in "low hanging fruit", or fixes and changes that could reap measurable benefits for users in the short term.

An obvious candidate at that time was the now well-known Google SMP patch. I had seen Mark Callaghan present on this at the MySQL User Conference in 2008, and was interested to investigate.

I was pretty new to InnoDB at that time, and was soon to discover that InnoDB was possibly experiencing poor scalability around its mutexes and read-write locks because InnoDB had a private implementation of adaptive mutexes and read-write locks, and this was probably not the best implementation on all or even most platforms MySQL is available on.

Now InnoDB's "private" mutexes and rw-locks were a good way to get spin-locks on all platforms, which may be a win in many cases, but as the Google team had demonstrated, it could be improved on. Indeed, I knew that adaptive spin-locks are available on Solaris, and they offer an extra advantage - if the holder of a lock is found to be off CPU, we don't bother spinning, but instead put the thread wanting the lock straight to sleep.

So, I decided to undertake a couple of performance studies of InnoDB's locking, being:

  1. Apply the Google SMP patch to MySQL 5.1 and test
  2. Modify InnoDB in 5.1 to use POSIX mutexes and RW-locks and test

The second step turned out to be quite complicated. I could not even change all of InnoDB's RW-locks to POSIX ones, as the InnoDB sychronization objects offer functionality not available via POSIX. It also meant we would be diverging more significantly from the InnoDB in 5.1, so this option - although looking promising - was shelved.

This left the Google SMP patch. It also looked promising. It was a less dramatic change, and offered scaling benefits in all the testing I did.

There was one last snag though - the mutex and RW-lock improvments in the Google SMP patch would only be applied if you were building on x86/x64 with GCC 4.1 or later, as they relied on GCC's atomic built-ins.

You can consider that we have a two-dimensional matrix of platforms that MySQL supports, being a compiler, then an Operating System. To make a feature portable across this matrix, you need to find a portable API, write code that is portable, or write code that uses a choice of different portable API's depending on what is available.

Now we definitely wanted to get a similar benefit for InnoDB on SPARC, and not necessarily just with GCC. In any case, GCC did not offer all of the built-in atomics for SPARC at the time. Happily, there are atomic functions available in Solaris that fit the job fine. MySQL 5.4 uses the functions if you build on Solaris without a version of GCC that supports built-in atomics.

Just so you understand though, here is (a simplified version of) what happens when you build MySQL 5.4 on your chosen platform with your chosen compiler:

  • IF (compiler has GCC built-in atomics)
    use GCC built-in atomics
  • ELSE IF (OS has atomic functions)
    use atomic functions
  • ELSE
    use traditional InnoDB synchronization objects, based on pthread_mutex\*.


As Neel points out in his blog, it was an exercise we learnt something from, even if we did develop functionality that will not be used. The important thing is we know we have improved the performance of MySQL, by extending the Google SMP improvements to all Solaris users, regardless of chosen compiler.

Friday Sep 26, 2008

Tamp - a Lightweight Multi-Threaded Compression Utility

UPDATE: Tamp has been ported to Linux, and is now at version 2.5

Packages for Solaris (x86 and SPARC), and a source tarball are available below.

Back Then

Many years ago (more than I care to remember), I saw an opportunity to improve the performance of a database backup. This was before the time of Oracle on-line backup, so the best choice at that time was to:

  1. shut down the database
  2. export to disk
  3. start up the database
  4. back up the export to tape

The obvious thing to improve here is the time between steps 1 and 3. We had a multi-CPU system running this database, so it occurred to me that perhaps compressing the export may speed things up.

I say "may" because it is important to remember that if the compression utility has lower throughput than the output of the database export (i.e. raw output; excluding any I/O operations to save that data) we may just end up with a different bottleneck, and not run any faster; perhaps even slower.

As it happens, this era also pre-dated gzip and other newer compression utilities. So, using the venerable old "compress", it actually was slower. It did save some disk space, because Oracle export files are eminently compressible.

So, I went off looking for a better compression utility. I was now more interested in something that was fast. It needed to not be the bottleneck in the whole process.

What I found did the trick - It reduced the export time by 20-30%, and saved some disk space as well. The reason why it saved time was that it was able to compress at least as fast as Oracle's "exp" utility was able to produce data to compress, and it eliminated some of the I/O - the real bottleneck.

More Recently

I came across a similar situation more recently - I was again doing "cold" database restores and wanted to speed them up. It was a little more challenging this time, as the restore was already parallel at the file level, and there were more files than CPUs involved (72). In the end, I could not speed up my 8-odd minute restore of ~180GB, unless I already had the source files in memory (via the filesystem cache). That would only work in some cases, and is unlikely to work in the "real world", where you would not normally want this much spare memory to be available to the filesystem.

Anyway, it took my restore down to about 3 minutes in cases where all my compressed backup files were in memory - this was because it had now eliminated all read I/O from the set of arrays holding my backup. This meant I had eliminated all competing I/O's from the set of arrays where I was re-writing the database files.

Multi-Threaded Lightweight Compression

I could not even remember the name of the utility I used years ago, but I knew already that I would need something better. The computers of 2008 have multiple cores, and often multiple hardware threads per core. All of the current included-in-the-distro compression utilities (well, almost all utilities) for Unix are still single-threaded - a very effective way to limit throughput on a multi-CPU system.

Now, there are a some multi-threaded compression utilities available, if not widely available:

  • PBZIP2 is a parallel implementation of BZIP2. You can find out more here
  • PIGZ is a parallel implementation of GZIP, although it turns out it is not possible to decompress a GZIP stream with more than one thread. PIGZ is available here.

Here is a chart showing some utilities I have tested on a 64-way Sun T5220. The place to be on this chart is toward the bottom right-hand corner.

Here is a table with some of the numbers from that chart:

Utility Reduction (%) Elapsed (s)
tamp 66.18 0.31
pigz --fast 71.18 1.04
pbzip2 --fast 77.17 4.17
gzip --fast 71.10 16.13
gzip 75.73 40.29
compress 61.61 18.21

To answer your question - yes, tamp really is 50-plus-times faster than "gzip --fast".


The utility I have developed is called tamp. As the name suggests, it does not aim to provide the best compression (although it is better than compress, and sometimes beats "gzip --fast").

It is however a proper parallel implementation of an already fast compression algorithm.

If you wish to use it, feel free to download it. I will be blogging in the near future on a different performance test I conducted using tamp.

Compression Algorithm

Tamp makes use of the compression algorithm from Quick LZ version 1.40. I have tested a couple of other algorithms, and the code in tamp.c can be easily modified to use a different algorithm. You can get QuickLZ from here (you will need to download source yourself if you want to build tamp).

Update, Jan 2012 - changed the downloads to .zip files, as it seems blogs.oracle.com interprets a download of a file ending in .gz as a request to compress the file via gzip before sending it. That confuses most people.



Tim Cook's Weblog The views expressed on this blog are my own and do not necessarily reflect the views of Oracle.


« July 2016