By Tim Cook on May 31, 2007
I have recently been engaged with a customer that is evaluating MySQL, in particular with its Memory (formerly known as Heap) engine, which stores all database data and metadata in RAM.
I was asked to assist with diagnosing if/whether/where statements were taking longer than 5 milliseconds to complete. Now, this was being observed from the viewpoint of the "client" - the client was a synthetic benchmark run as a Java program. It could be run either on a separate system or on the same system as the MySQL database, and a small number of transactions would be measured as taking longer than 5ms.
Now, there is more than one way to skin this cat, and it turns out that Jenny Chen has had a go at putting static probes into MySQL. For my (and the customer's) purposes however, we wanted to skip the step of re-compiling with our own probes, and just use what we can observe via use of the PID provider.
How Do We Do This?
Well, it is not trivial. However as it turns out, I have seen a bit of the MySQL code. I also had someone pretty senior from MySQL next to me, who helped confirm what was happening, as I used some "fishing" bits of DTrace to watch a mysqld thread call functions as we ran the "mysql" client and entered simple statements.
This allowed me to narrow down on a couple of vio_\*() routines, and to build some pseudo-code to describe the call flow around reception of requests from a client, processing of the statement, then return of a result to the client.
It is not as simple as looking for the entry and return of a single function, because I wanted to capture the full time from when the mysqld thread returns from a read(2) indicating a request has arrived from a client through to the point where the same thread has completed a write(2) to send the response back. This is the broadest definition of "response time measure at the server".
The result of all of our measurements showed that there were no statements from the synthetic benchmark that took longer than 5 ms to complete. Here is an example of the output of my DTrace capture (a histogram of microseconds):
bash-3.00# ./request.d -p `pgrep mysqld` \^C value ------------- Distribution ------------- count < 0 | 0 0 |@@@@@@@@@@@@@@@@@@@@@ 10691 500 |@@@@@@@@@@@@@@@@@ 8677 1000 |@ 680 1500 | 31 2000 | 0
Feel free to use the DTrace script for your own purposes. It should work on MySQL 5.0 and 5.1.
The Upshot - Observability in the JVM
There is a nagging question remaining - why was the Java client seeing some requests run longer than 5 ms?.
There are three possible answers I can think of:
- There is latency involved in transmitting the requests and responses between the client and server (e.g. network packet processing).
- The thread inside the JVM was being pre-empted (thrown off the CPU) between taking its time measurements.
- The measurements (taken using System.nanoTime()) are not reliable.