After getting sysbench running properly with a scalable memory
allocator (see last post), I can now return to what I was originally
testing - what memory allocator is best for the 5.1 server (mysqld).
This stems out of studies I have made of some patches that have been
released by Google. You can
read about the work Google has been doing here.
I decided I wanted to test a number of configurations based on the MySQL community source, 5.1.28-rc, namely:
Here are some definitions, by the way:
|mem0pool||InnoDB's internal "memory pools" feature, found in mem0pool.c (NOTE: Even if this is enabled, other parts of the server will not use this memory allocator - they will use whatever allocator is linked with mysqld)|
|tcmalloc||The "libtcmalloc_minimal.so.0.0.0" that is built from google-perftools-0.99.2|
|Hoard||The Hoard memory allocator, version 3.7.1|
|umem||The libumem library (included with Solaris)|
|mtmalloc||The mtmalloc library (included with Solaris)|
My test setup was a 16-CPU Intel system, running Solaris Nevada build
100. I chose to use only an x86 platform, as I was not able to build
tcmalloc on SPARC. I also chose to run with the database in TMPFS,
and with an innoDB buffer size smaller than the database size. This
was to ensure that we would be CPU-bound if possble, rather than
slowed by I/O.
If I built any package (no need for mtmalloc or umem), I used GCC
4.3.1, except for Hoard, which seemed to prefer the Sun Studio 11 C
compiler (over Sun Studio 12 or GCC).
My test was a sysbench OLTP read-write run, of 10 minutes. Each
series of runs at different thread counts is preceded by a database
re-build and 20 minute warmup. Here are my throughput results for
1-32 SysBench threads, in transactions per second:
These results show that while the Google SMP changes are a benefit,
the disabling of InnoDB's mem0pool does not seem to provide any
further benefit for my configuration. My results also show that
TCMalloc is not a good allocator for this workload on this platform,
and Hoard is particularly bad, with significant negative scaling above
The remaining configurations are pretty similar, with mtmalloc and
umem a little ahead at higher thread counts.
Before I get a ton of comments and e-mails, I would like to point out
that I did some verification of my TCMalloc builds, as the results I
got surprised me. I verified that it was using the supplied assembler
for atomic routines, and I built it with optimization (-O3) and
I also discovered that TCMalloc was emitting this diagnostic when
mysqld was starting up:
src/tcmalloc.cc:151] uname failed assuming no TLS support (errno=0)
I rectified this with a change in tcmalloc.cc, and called this
configuration "TCMalloc -O3, TLS". It is shown against the other two
I often like to have a look at what the CPU cost of different
configurations are. This helps to demonstrate headroom, and whether
different throughput results may be due to less efficient code or
something else. The chart below lists what I found - note that this is system-wide CPU (user & system) utilization, and I was running my SysBench client on the same system.
Lastly, I did do one other comparison, which was to measure how much
each memory allocator affected the virtual size of mysqld. I did not
expect much difference, as the most significant consumer - the InnoDB
buffer pool - should dominate with large long-lived allocations.
This was indeed the case, and memory consumption grew little after the
initial start-up of mysqld. The only allocator that then caused any
noticable change was mtmalloc, which for some reason made the heap
grow by 35MB following a 5 minute run (it was originally 1430 MB)