MySQL Cluster and NUMA
By LinuxJedi on Jul 12, 2010
One problem with MySQL Cluster we are starting to see quite often is to do with the current generation of Xeon processors. This post outlines the problem and how to avoid it.
With the Nehalem based Intel Xeons (and also in some older AMD CPUs) they add a technology called NUMA (Non-Uniform Memory Access). This basically gives each CPU its own bank of memory instead of all CPUs accessing all the memory. For many applications this means much faster memory access. You should be able to see if NUMA is on by looking for it in dmesg.
So why is this a bad thing?
MySQL Cluster data nodes typically require a large portion of the memory, this means very often that one CPU will need to access the memory from another other CPU. This in general is quite slow, on a busy cluster we have seen this access take 100ms - 500ms! MySQL Cluster is real-time and is not a happy bunny when there are things stopping it becoming real-time. Therefore typically watchdog timeouts are very regular in NUMA based systems.
So, how can this be improved?
For starters NUMA is easy to turn off, simply add the kernel boot option numa=off. We have also observed that later Linux kernels (around 2.6.30) have improved the scheduler for NUMA and appear to be friendlier to MySQL Cluster. But I would personally recommend turning it off even with newer kernels.
What else can cause problems?
We do not yet have as much data on this, but it is also believed that dynamic CPU clocking can also cause similar issues. If the data node is not busy the CPU is clocked down which then causes timing issues for cluster. I would recommend setting the CPU to full performance settings where possible.
Edit: Mat Keep has confirmed that dynamic CPU clocking certainly causes performance issues in the comments.
Hyper-threading can also be a killer. If you have a 4 core CPU with hyper-threading it shows as 8 cores, but since these are not full cores setting MaxNoOfExectionThreads=8 can cause a lot of contention. In most cases you do not need to turn hyper-threading off but do not try to give the CPUs more workload than they can handle.