MySQL 5.1 Memory Allocator Bake-Off
By Tim Cook on Dec 17, 2008
After getting sysbench running properly with a scalable memory allocator (see last post), I can now return to what I was originally testing - what memory allocator is best for the 5.1 server (mysqld).
This stems out of studies I have made of some patches that have been released by Google. You can read about the work Google has been doing here.
I decided I wanted to test a number of configurations based on the MySQL community source, 5.1.28-rc, namely:
- The baseline - no Google SMP patch, default memory allocator (5.1.28-rc)
- With Google SMP patch, mem0pool enabled, no custom malloc (pool)
- With Google SMP patch, mem0pool enabled, linked with mtmalloc (pool-mtmalloc)
- With Google SMP patch, mem0pool disabled, linked with tcmalloc (TCMalloc)
- With Google SMP patch, mem0pool disabled, linked with umem (umem)
- With Google SMP patch, mem0pool disabled, linked with mtmalloc (mtmalloc)
Here are some definitions, by the way:
|mem0pool||InnoDB's internal "memory pools" feature, found in mem0pool.c (NOTE: Even if this is enabled, other parts of the server will not use this memory allocator - they will use whatever allocator is linked with mysqld)|
|tcmalloc||The "libtcmalloc_minimal.so.0.0.0" that is built from google-perftools-0.99.2|
|Hoard||The Hoard memory allocator, version 3.7.1|
|umem||The libumem library (included with Solaris)|
|mtmalloc||The mtmalloc library (included with Solaris)|
My test setup was a 16-CPU Intel system, running Solaris Nevada build 100. I chose to use only an x86 platform, as I was not able to build tcmalloc on SPARC. I also chose to run with the database in TMPFS, and with an innoDB buffer size smaller than the database size. This was to ensure that we would be CPU-bound if possble, rather than slowed by I/O.
If I built any package (no need for mtmalloc or umem), I used GCC 4.3.1, except for Hoard, which seemed to prefer the Sun Studio 11 C compiler (over Sun Studio 12 or GCC).
My test was a sysbench OLTP read-write run, of 10 minutes. Each series of runs at different thread counts is preceded by a database re-build and 20 minute warmup. Here are my throughput results for 1-32 SysBench threads, in transactions per second:
These results show that while the Google SMP changes are a benefit, the disabling of InnoDB's mem0pool does not seem to provide any further benefit for my configuration. My results also show that TCMalloc is not a good allocator for this workload on this platform, and Hoard is particularly bad, with significant negative scaling above 16 threads.
The remaining configurations are pretty similar, with mtmalloc and umem a little ahead at higher thread counts.
Before I get a ton of comments and e-mails, I would like to point out that I did some verification of my TCMalloc builds, as the results I got surprised me. I verified that it was using the supplied assembler for atomic routines, and I built it with optimization (-O3) and without.
I also discovered that TCMalloc was emitting this diagnostic when mysqld was starting up:
src/tcmalloc.cc:151] uname failed assuming no TLS support (errno=0)
I rectified this with a change in tcmalloc.cc, and called this configuration "TCMalloc -O3, TLS". It is shown against the other two configurations below.
I often like to have a look at what the CPU cost of different configurations are. This helps to demonstrate headroom, and whether different throughput results may be due to less efficient code or something else. The chart below lists what I found - note that this is system-wide CPU (user & system) utilization, and I was running my SysBench client on the same system.
Lastly, I did do one other comparison, which was to measure how much each memory allocator affected the virtual size of mysqld. I did not expect much difference, as the most significant consumer - the InnoDB buffer pool - should dominate with large long-lived allocations. This was indeed the case, and memory consumption grew little after the initial start-up of mysqld. The only allocator that then caused any noticable change was mtmalloc, which for some reason made the heap grow by 35MB following a 5 minute run (it was originally 1430 MB)
- Sun Developer Network - A Comparison of Memory Allocators in Multiprocessors
- The Hoard Memory Allocator
- TCMalloc : Thread-Caching Malloc
- google-perftools - the home of TCMalloc
- MySQL InnoDB Performance Tuning for the Solaris 10 OS - includes a recommendation for mtmalloc
- MySQL scalability on Linux with sysbench
- My previous blog on improving SysBench with a scalable memory allocator, which also discusses mtmalloc and umem
- (a version of) the source for mem0pool.c, which documents what InnoDB's memory pools do