MySQL Scalability on Nehalem systems
By realneel on Apr 14, 2009
MySQL Scalability on Nehalem systems (Sun Fire X4270)
Today Sun announced multiple servers based on Intel's Nehalem Processor. I had early access to a Sun Fire X4270 server (2 socket) for a couple of weeks. I used this opportunity to test some of latest MySQL performance and scalability enhancements. For someone unfamiliar with this system, this is a 2 socket 2U server with support for a max of 144GB of memory. With hyperthreading turned on, the operating system sees 16 CPUs.
Before I share the results of my findings, lets get clear on the terminology. Socket refers to physical sockets on the motherboard. CPU refers to the number of processors seen by the operating system. Core refers to the physical processing unit. A Nehalem socket has 4 cores. Thread refers to the hyperthreading threads. One Nehalem core has 2 threads. Using this terminology, the Sun Fire X4270 has 16 CPUs (2 sockets, 4 cores per socket, 2 threads per core).
I used the ever popular Sysbench benchmark. I used an internal version based off version of MySQL 5.1 running on OpenSolaris. Since the goal of this experiment was showcase MySQL (and Innodb) scalability, (and the X4270 system), I used a cached workload. You should be able to see similar speedups for regular applications, provided there are no IO bottlenecks and no known MySQL scalability issues are being exercised. The X4270 supports 16x2.5" disk drives (SATA, SAS or SSD) so IO should not be a problem for most workloads. I used the tunings mentioned in my earlier blog Maximizing Sysbench OLTP performance for MySQL.
Nehalem incorporates Hyperthreading technology. Hyperthreading allows a core to run an additional software thread. along with the original thread. Since there is very little dedicated chip resources for the second thread, you cannot expect to see 2x boost in performance.
There two ways you can disable
hyperthreading on Solaris.
- BIOS setting. During bootup, enter setup and disable Hyperthreading.
- Turn off 2nd thread of each core. In Solaris you can use the psrinfo -pv command to identify
with CPUs correspond to the same cores, and then use the psradm -f command to turn them
As you can see from the above tests, Hyperthreading gives you between 30-35% boost in performance.
There are two ways to study system
- One method is to study performance increase as we add more sockets to a system. This is useful for customers who want to evaluate the performance benefits of upgrading from a 1 socket system to a 2 socket system.
- The other way is see how a system behaves as we increase load. Scaling this way also showcases how the operating system schedules threads onto cores. In the case of Nehalem, the second thread of the core is not really a full blown core, so you will not see twice the horsepower of a single thread.
Scaling from one Nehalem socket to two Nehalem sockets
To study system scaling across sockets, we typically fully populate each core/socket before moving on to the next core/socket. For example with the X4270, we use
|1 core using both threads (1 CPUs)
|2 cores using both threads in each core (4 CPUs)
|4 cores using all threads (1 full socket) (8 CPUs)
|2 full sockets(16 CPUs)
By fully allocating CPUs per core one socket at a time, we are basically showing what would happen if you only had the number of CPUs shown. This approach shows the best scalability and is also the most realistic approach.
As you can see from above, going from 1 socket to 2 socket, we see a
87% improvement in ReadOnly test and 77% improvement in
Scaling with increasing loadFor this experiment, we enable hyperthreading and increase the load on the MySQL server. For Sysbench, increasing load means that we increase the number of connections to the database. Since MySQL uses a thread per connection, Solaris must be able to spread these threads out in an optimal manner. i.e if we have 8 threads, Solaris should ideally use one thread per core. This means that for 8 threads, we should see performance close to what we achieved with Hyperthreading off.
As you can see, we got 4139 transactions per second at 8 threads for the read-only test and 3268 transactions per second for the read-write test. This is around 90% of what we get when we have hyperthreading disabled. Solaris does a great job of scheduling threads in an optimal way.