Scaling MySQL on a 256-way T5440 server using Solaris ZFS and Java 1.7

Scaling MySQL on a 256-way T5440 server using Solaris ZFS and Java 1.7

A new era

In the past few years, I published many articles using Oracle as a database server. As a former Sybase system administrator and former Informix employee, it was obviously not a matter of personal choice. It was just because the large majority of Sun's customers running databases were also Oracle customers.

This summer, in our 26 Sun Solution Centers worldwide, I observed a shift. Yes, we were still seeing older solutions based on DB2, Oracle, Sybase or Informix being evaluated on new Sun hardware. But every customer project manager, every partner, every software engineer working on a new information system design asked us : Can we architect this solution with MySQL ?

In many cases, if you dared to reply YES to this question, the next interrogation would be about the scalability of the MySQL engine.

This is why I decided to write this article.



Goals

Please find below my initial goals :

  1. Reach a high throughput of SQL queries on a 256-way Sun SPARC Enterprise T5440

  2. Do it 21st century style i.e. with MySQL and ZFS , not 20th century style i.e with OraSybInf... and VxFS

  3. Do it with minimal tuning i.e as close as possible as out-of-the-box



This article is describing how I achieved this goals. It has two main parts : a short description of the technologies used, then a showing of the results obtained.





Sun SPARC Enterprise T5440 server
The T5440 server is the first quad-socket server proposing 256 hardware threads in just four rack units. Each socket host a UltraSPARC T2 Plus processor which propose eight cores and 64 simultaneous threads into a single piece of silicon. While a lot of customers are interested in the capacity of this system to be divided into 128 two-way domains, this article explores the database capacity of a single 256-way Solaris 10 domain.




The Zettabyte file system
Announced in 2004 and introduced part of OpenSolaris build 27 in November 2005, ZFS is the one-and-only 128-bit file system. It includes many innovative features like a copy-on-write transactional model, snapshots and clones, dynamic striping and variable block sizes. Since July 2006, ZFS is also a key part of the Solaris operating system . A key difference between UFS and ZFS is the usage of the ARC [Adaptive Replacement Cache] instead of the traditional virtual memory page cache. To obtain the performance level shown in this article, we only had to tune the size of the ARC cache and turn off atime management on the file systems to optimize ZIL I/O latency. The default ZFS recordsize is commonly changed for database workload. For this study, we kept the default value of 128k.






MySQL 5.1
The MySQL database server is the leading Open Source database for Web 2.0 environment. MySQL was introduced in May 1995 and has never stopped to be enriched with features. The 5.1 release is an important milestone as it introduces support for partitioning, event scheduling, XML functions and row based replication. While Sun is actively working on implementing a single instance highly scalable storage engine, this article is showing how one can reach a very high level of SQL query throughput using MySQL 5.1.29 64-bit on a 256-way server.







SLAMD and iGenOLTP
The SLAMD Distributed Load Generation Engine (SLAMD) is a Java-based application designed for stress testing and performance analysis of network-based applications. It was originally developed by Sun Microsystems, Inc., but it has been released as an open source application under the Sun Public License, which is an OSI-approved open source license. The main site for obtaining information about SLAMD is http://www.slamd.com/. It is also available as a java.net project.

iGenOLTP is a multi-processed and multi-threaded database benchmark. As a custom Java class for SLAMD, it is a lightweight workload composed of four select statements, one insert and one delete. It produces a 90% read/10% write workload simulating a global order system. For this exercise, we are using a maximum of 24 milllion customers and 240 million orders in the databases. The database is divided “sharded” in as many pieces as the number of MySQL instances on the system. [See this great article on database sharding]. For example, for 24 database instances, database 1 store customers 1 to 1 million, database 2 store customers 1 milion to 2 million and so on. The Java threads simulating the workload are aware of the database partitioning scheme and simulate the traffic accordingly.

This approach can be called “Application partitioning” as opposed to “Database partitioning”. Because it is based on a shared-nothing architecture, it it natively more scalable than a shared-everything approach (as in Oracle RAC).




Java Platform Standard Edition 7

Initially released in 1995, the programming language Java started a revolution in computer languages because of the concept of Java Virtual Machine causing instant portability across computer architectures. While the 1.7 JVM is still in beta release, it is the base of my iGenOltpMysql Java class performing the workload shown in this article. The key enhancement of the JVM 1.6 was the introduction of native Dtrace probes. The 1.7 JDK is an update packed with performance related enhancements including an improved Adaptive Compiler, optimized Rapid Memory Allocation , finely tuned garbage collector algorithms and finally a lighter thread synchronization capability causing better scalability. For this article we used the JDK7 build 38.


Software and Hardware summary

This study is using Solaris 10 Update 6 (released October 31st,2008), Java 1.7 build 38 (released Otober 23rd,2008), SLAMD 1.8.2, iGenOLTP v4.2 for MySQL and MySQL 5.1.29. The hardware tested is a T5440 with 4xUltraSPARC T2 Plus 1.2Ghz and 64 GB of RAM . A Sun Blade 8000 with 10 blades each with 2xAMD Opteron 8220 2.8Ghz and 8GB RAM is used as a client system. Finally a Sun ST6140 storage array [with 10x146GB 15k RPM drives] is configured in RAID-1 [2 HS], with two physical volumes and connected to the T54440 with two 4GB/s controllers.





Scaling vertically first

This is a matter of methodology. The first step is to determine the peak throughput of a single instance of MySQL with iGenOLTP using InnoDB then use approximately 75% of this throughput as the basis for the horizontal scalability test. ZFS and MySQL current best practices guided the choice of all the tunables used. [available upon request] The test is done in stabilized load with each simulation thread executing 10 transactions per second. Please find below the throughput and response time scalability curves :






Note that the peak throughput is 725 transactions per second which corresponds to 4350 SQL statements per second. We are caching the entire 1 Gbyte database. The only I/Os happening are due to the delete/insert statements, the MySQL log and the ZFS Intent Log. We will be using 75% of the peak workload simulation as the base workload per instance for the horizontal scalability exercise. Why 75% ? Our preliminary tests showed that it the was the best compromise to reach maximum multi-instance throughput.



Scaling horizontally

The next step was to increase the number of instances while increasing proportionally the database size (number of customer ids). We will have the same 600 TPS workload requested on each instance but querying a different range within the global data set. The beauty of the setup is that we do not have to reinstall the MySQL binaries multiple times : we could just use soft links. The main thing to do was to configure 32 ZFS file systems on our ZFS pool and then to create & load the databases. This was easily automated with ksh scripts. Finally, we had to customize the Java workload to query all the database instances accurately...

Here are the results :






As you can see, we were able to reach a peak of more than 79,000 SQL queries per second on a single 4 RU server. The transaction throughput is still increasing after 28 instances but this is the sweet spot for this benchmark on the T5440 as guided by the transactions average response time. At 28 instances, we observed less than 30ms average response time. However, for 32 instances, response times jumped to an average of 95ms.



The main trick to achieve horizontal scalability: Optimize thread scheduling

Solaris is using the timeshare class as the default scheduling class. The scheduler needs to always make sure that the thread priorities are adequately balanced. For this test, we are running thousand of threads running this workload and can get critical CPU User Time back by avoiding unnecessary work by the scheduler. To achieve this, we are running the MySQL engines and Java processes in the Fixed Priority class. This is achieved easily using the Solaris priocntl command.


Conclusion
As I mentioned in introduction, an architecture shift is happening. Database sharding and application partitioning are the foundation of future information systems as pioneered by companies like Facebook [see this interesting blog entry]. This article prove that Sun Microsystems servers with CoolThread technology are an exceptional foundation for this change. And they will also considerably lower your Total Cost of Ownership as illustrated in this customer success story.



A very special thank you to the following experts who helped in the process or reviewed this article : Huon Sok, Allan Packer, Phil Morris, Mark Mulligan, Linda Kateley, Kevin Figiel and Patrick Cyril.

See you next time in the wonderful world of benchmarking....



<script src="http://www.google-analytics.com/urchin.js" type="text/javascript"> </script> <script type="text/javascript"> _uacct="UA-917120-1"; urchinTracker(); </script>
Comments:

Does that mean that single MySQL instance cannot scale very well with added cores/threads?

Posted by Mike on November 11, 2008 at 10:25 AM PST #

Any application has scalability limits. It is part of the job of any computer engineers to know the sweet spot of the applications he/she is responsible for. And as I explained in the article, it does not matter when your startegy is to shard the database. The only thing that will change is the number of instances required to achieve the peak throughput.
btw; we are working on a scalable replacement of InnoDB. Stay tuned....

Posted by MrBenchmark on November 11, 2008 at 02:05 PM PST #

Sun engineers have always been able to tune Solaris for complex database workloads.
Now it seems that they will switch the role and tune a database to match their hardware.

"we are working on a scalable replacement of InnoDB. Stay tuned...."

Now that makes a lot of sense.

Posted by McD on November 11, 2008 at 03:10 PM PST #

Can you go back to the Oracle market where BS articles on performance are appreciated?

Posted by guest on November 12, 2008 at 12:12 PM PST #

Questions of why a single instance of MySQL doesn't scale well on the hardware aside, looking at the 32 separate shared nothing instances, what was keeping it from scaling better than 24X?

Posted by rick jones on November 13, 2008 at 01:16 PM PST #

<a href="http://www.byreplica.com/gucci-canvas-medium-shoulder-bag-154982-beige-mauve-handbags-106.html">Gucci Canvas Medium Shoulder Bag 154982 Beige Mauve</a>
<a href="http://www.byreplica.com/gucci-canvas-medium-shoulder-bag-154982-beige-white-handbags-107.html">Gucci Canvas Medium Shoulder Bag 154982 Beige White</a>
<a href="http://www.byreplica.com/gucci-canvas-large-shoppers-tote-purse-bag-132898-handbags-98.html">Gucci Canvas Large Shoppers Tote Purse Bag 132898</a>

<a href="http://www.byreplica.com/gucci-canvas-medium-shoulder-bag-154982-beige-mauve-handbags-106.html">Gucci Canvas Medium Shoulder Bag 154982 Beige Mauve</a>
<a href="http://www.byreplica.com/gucci-canvas-medium-shoulder-bag-154982-beige-white-handbags-107.html">Gucci Canvas Medium Shoulder Bag 154982 Beige White</a>

Posted by byreplica on March 16, 2010 at 12:32 AM PDT #

Questions of why a single instance of MySQL doesn't scale well on the hardware aside, looking at the 32 separate shared nothing instances, what was keeping it from scaling better than 24X?

Posted by louisvuitton handbags on May 27, 2010 at 02:21 PM PDT #

Sun engineers have always been able to tune Solaris for complex database workloads.
Now it seems that they will switch the role and tune a database to match their hardware.

"we are working on a scalable replacement of InnoDB. Stay tuned...."

Posted by wholesale chanel on September 02, 2010 at 08:59 AM PDT #

Any application has scalability limits. It is part of the job of any computer engineers to know the sweet spot of the applications he/she is responsible for.

Yes I also think so.

Posted by Laptop battery on September 16, 2010 at 07:48 PM PDT #

yes, i think i will like this I mentioned in introduction, an architecture shift is happening.

Posted by ed hardy on September 27, 2010 at 02:12 AM PDT #

Merry Christmas, here is a surprise to you in the upcoming Christmas! To give you a mysterious Christmas gift!

Posted by vuitton bags on October 07, 2010 at 01:29 PM PDT #

Questions of why a single instance of MySQL doesn't scale well on the hardware aside

Posted by China Florist on November 30, 2010 at 01:19 PM PST #

Greetings I recently finished reading through your blog and I'm very impressed. I do have a couple questions for you personally however. Do you think you're thinking about doing a follow-up posting about this? Will you be going to keep bringing up-to-date as well?

Posted by Software Download on December 03, 2010 at 02:26 AM PST #

Elect Bob and bring back Hani.

Posted by Egitim on December 09, 2010 at 10:35 PM PST #

i think i will like this I mentioned in introduction, an architecture shift is happening.

Posted by Wholesale Gucci on April 06, 2011 at 08:46 PM PDT #

congratulations

Posted by geciktirici on April 07, 2011 at 05:10 PM PDT #

Post a Comment:
  • HTML Syntax: NOT allowed
About

mrbenchmark

Search

Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today
News
Blogroll
deepdive

No bookmarks in folder