Tuesday Apr 14, 2009

MySQL Scalability on Nehalem systems

MySQL Scalability on Nehalem systems (Sun Fire X4270)

Today Sun announced multiple servers based on Intel's Nehalem Processor. I had early access to a Sun Fire X4270 server (2 socket) for a couple of weeks. I used this opportunity to test some of latest MySQL performance and scalability enhancements. For someone unfamiliar with this system, this is a 2 socket 2U server with support for a max of 144GB of memory. With hyperthreading turned on, the operating system sees 16 CPUs.

Before I share the results of my findings, lets get clear on the terminology. Socket refers to physical sockets on the motherboard. CPU refers to the number of processors seen by the operating system. Core refers to the physical processing unit. A Nehalem socket has 4 cores. Thread refers to the hyperthreading threads. One Nehalem core has 2 threads. Using this terminology, the Sun Fire X4270 has 16 CPUs (2 sockets, 4 cores per socket, 2 threads per core).

I used the ever popular Sysbench benchmark. I used an internal version based off version of MySQL 5.1 running on OpenSolaris. Since the goal of this experiment was showcase MySQL (and Innodb) scalability, (and the X4270 system), I used a cached workload. You should be able to see similar speedups for regular applications, provided there are no IO bottlenecks and no known MySQL scalability issues are being exercised. The X4270 supports 16x2.5" disk drives (SATA, SAS or SSD) so IO should not be a problem for most workloads. I used the tunings mentioned in my earlier blog Maximizing Sysbench OLTP performance for MySQL.

Nehalem Hyperthreading

Nehalem incorporates Hyperthreading technology. Hyperthreading allows a core to run an additional software thread. along with the original thread. Since there is very little dedicated chip resources for the second thread, you cannot expect to see 2x boost in performance.

There two ways you can disable hyperthreading on Solaris.

  1. BIOS setting. During bootup, enter setup and disable Hyperthreading.
  2. Turn off 2nd thread of each core. In Solaris you can use the psrinfo -pv command to identify with CPUs correspond to the same cores, and then use the psradm -f command to turn them off.
Our tests indicated that both the methods are pretty similar in performance. Numbers below are TPS or transactions-per-transaction. Higher number indicates better performance.

Experiment Sockets CPUs seen
by Solaris
Read only
TPS
Read Write
TPS
Hyperthreading ON 2 16 6310 4652
Hyptherthreading OFF 2 8 4648 3584
  35% 29.7%

As you can see from the above tests, Hyperthreading gives you between 30-35% boost in performance.

System Scaling

There are two ways to study system scaling.

  1. One method is to study performance increase as we add more sockets to a system. This is useful for customers who want to evaluate the performance benefits of upgrading from a 1 socket system to a 2 socket system.
  2. The other way is see how a system behaves as we increase load. Scaling this way also showcases how the operating system schedules threads onto cores. In the case of Nehalem, the second thread of the core is not really a full blown core, so you will not see twice the horsepower of a single thread.
Lets look at both of them in detail.

Scaling from one Nehalem socket to two Nehalem sockets

To study system scaling across sockets, we typically fully populate each core/socket before moving on to the next core/socket. For example with the X4270, we use

1 core using both threads (1 CPUs)
2 cores using both threads in each core (4 CPUs)
4 cores using all threads (1 full socket) (8 CPUs)
2 full sockets(16 CPUs)

By fully allocating CPUs per core one socket at a time, we are basically showing what would happen if you only had the number of CPUs shown. This approach shows the best scalability and is also the most realistic approach.

Sockets CPUs seen
by Solaris
Read Only
TPS
ReadWrite
TPS
1 Socket 8 3364 2616
2 Sockets 16 6310 4652
  1.87x 1.77x

As you can see from above, going from 1 socket to 2 socket, we see a 87% improvement in ReadOnly test and 77% improvement in Read-Write performance.

Scaling with increasing load

For this experiment, we enable hyperthreading and increase the load on the MySQL server. For Sysbench, increasing load means that we increase the number of connections to the database. Since MySQL uses a thread per connection, Solaris must be able to spread these threads out in an optimal manner. i.e if we have 8 threads, Solaris should ideally use one thread per core. This means that for 8 threads, we should see performance close to what we achieved with Hyperthreading off.
Sysbench scaling with increasing load
Threads Read only
TPS
Read Write
TPS
1 569.06
452.12
2
1412.23
1066.06
4
2636.04
2080.23
8
4139.46
3268.45
16
6310.89
4652.63
32
6152.06
4286.59
48
6015.61
3944.35

As you can see, we got 4139 transactions per second at 8 threads for the read-only test and 3268 transactions per second for the read-write test. This is around 90% of what we get when we have hyperthreading disabled. Solaris does a great job of scheduling threads in an optimal way.

Conclusion

As you can see from the above benchmarks, many MySQL workloads will see very good scaling from one socket to two sockets. You can get 30-35% boost in performance by using Hyperthreading. When hyperthreading is enabled, Solaris does a great job of scheduling threads in an optimal way.
About

realneel

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today