MySQL scalability: diagnostic for beginners
By Thierry Manfe on Nov 16, 2009
This past week I have been working with 2 Web2.0 startups - members of Sun Startup Essentials, they get the support for free - that are looking at improving the response time of their web site.
We've been working at understanding how they servers behave today and what could prevent their application to scale, which led me to show how to run a quick diagnostic on an existing system and to restate some key fundamentals in terms of scalability.
Let's start by the fundamentals.
Any piece of software is nothing else than a set of instructions that need 2 things: computing units to execute on, and a fast access to data.
Computing units are available in the form of a hardware threads located on a processor. The number of threads that a server makes available to the application depends on the number of processors on the server, the number of core per processor, and of threads per core. The only thing we have to keep in mind is that the industry has moved heavily to processors that provide many hardware threads (up to 64 on a CMT chip), so taking advantage of multiple threads is definitely the right way to scale as far as computing power is concerned.
As per the fast access to data, the rule of thumb I always used is that good performance can only be achieved if the data is located in physical memory - in RAM. As soon as disks are involved the penalty is in an order of magnitude - if not more.
How do I take advantage of multiple threads with MySQL or memcached? The good news is that both support multithreading which means they both have the ability to create many software threads that are mapped to HW-threads to execute many instruction sets in parallel (with memcached you may have to recompile the code according to this page).
Now how do I check what's really happening on my system?
First, let's check how many HW threads are available on my server. On Solaris or OpenSolaris I run psrinfo:
0 on-line since 11/15/2009 17:09:45
1 on-line since 11/15/2009 17:09:49
The command reports me that I have 2 HW threads on line (I have a dual-core X86 processor on my laptop).
Then I can check how many software threads each application is actually using. To do so I run the prstat command. On the following example prstat 5 tells me that MySQL is using 10 software threads. The number is displayed on the right-hand side column NLWP. Note that with 10 software threads MySQL can not take advantage of a server that has 64 HW-threads, but for my 2 core laptop this is already more than needed.
Next, are the software threads leveraging the HW-threads? Are they keeping the CPU busy? The fastest way to check if the CPU is loaded is to run vmstat 5:
In this example my CPU spends 98 to 99% of its time idle, and 0 or 1% executing code from an application - column us, or from the system - column sy. If MySQL was running on all CPU cylinders, I would see most of the load reported in column us, and possibly a few percent points in column sy. In my case MySQL does not receive any request.
But there might be other cases where I see both slow or hanging requests and a CPU partially loaded with applications. If my CPU is not fully loaded, and if I have enough software threads, why do I see slow requests? Well, in that case it's likely that requests are competing for resources - such as tables - that can not be shared. This is a situation that impedes scalability. Only a few requests are executed at a time keeping only a few threads busy. Though outside of the scope of this article note that if you are using MyISAM, moving to InnoDB can help you to address table-lock issues, hence improve the scalability of your application.
So with psrinfo, prstat, and vmstat I can have a quick overview of how my server is doing in terms of CPU power an load.
Now, how can I check if I am running out of physical memory?
The diagnostic above assumes that the limiting factor is the computing power and not the access to data, not the amount of physical memory. But how can I be sure I have enough RAM and what happens in the face of a shortage?
When running out of RAM, a system starts to page. It tries to free some space in RAM by transferring some content from RAM - available as pages - to disk. The OS kernel scans the RAM to locate the last recently used pages and put them on disk, with the hope that they won't be used anytime soon (in which case they would have to be brought back into RAM).
A peak in paging activity is nothing wrong. Cleaning up the RAM a little bit does not mean I am facing a shortage. It's when my system keeps on paging that I know I do not have enough RAM. Once again vmstat 5 helps me monitor the rate at which the kernel scans the RAM looking for pages to be moved away: the sr column reports the scaning rate. If it never goes back to zero it's time to add some more RAM to my server.