Constant factors: constantly foiling you

There are many different kinds of bugs with many different causes. Sometimes, it's just some silly typo or type mismatch; other times, it's an algorithmic issue or a correctness error; and occasionally, it appears to be something much stranger. This is a story about that last kind of bug, where finding the location of the problem is relatively straightforward, but figuring out how to fix it, or even why it's happening, isn't so easy.

We first meet our hero, yours truly, as he debugs an efficiency problem in a particular program for what is now the third day in a row. This program is actually just a small part in an otherwise large and complex system that Ksplice uses for its internal operations, at the heart of which lies a large collection of data, stored in a carefully constructed and extensively optimized MySQL database.

But this had not always been the case. In the past, much of the data was stored in a large collection of JSON-encoded archives. Seamlessly moving all that data to MySQL was a project that I had been involved in, and this was, in a sense, the final stage of the transition. We still had the JSON files, but having moved to a much shinier and spiffier database, it seemed a good idea to port all the programs that used the data to it as well.

Now, this particular program was designed to make five queries to the MySQL database — all totally identical except for a time column that all the rows contained. The queries each covered a different 48-hour period, distributed at various points throughout the last two months (relative when the program gets run).

These queries seemed simple enough, but mysteriously, the queries for older time periods, which returned fewer rows, also took considerably longer to run!

Eventually, I was able to reduce this to a consistently reproducible example:

mysql> SELECT COUNT(*) FROM events
        WHERE time > '2010-04-25' AND time < '2010-04-27';

+----------+
| COUNT(*) |
+----------+
|  1196740 |
+----------+

1 row in set (12.13 sec)

mysql> SELECT COUNT(*) FROM events
        WHERE time > '2010-02-24' AND time < '2010-02-26';

+----------+
| COUNT(*) |
+----------+
|   740034 |
+----------+

1 row in set (61.92 sec)

As you can see, the more recent range has about a million rows and took just over 12 seconds to run. The older range has just over half as many, but takes almost 5 times as long. This really doesn't make any sense — after all, the queries are totally identical except for the ranges we are looking across, and those ranges are of completely equal sizes. There really shouldn't be such a huge disparity in the running times.

Let's look at how MySQL is performing these queries:

mysql> EXPLAIN EXTENDED SELECT COUNT(*) FROM events
                                 WHERE time > '2010-04-25' AND time < '2010-04-27';

+----+-------------+--------+-------+---------------+-------------+---------+------+---------+----------+--------------------------+
| id | select_type | table  | type  | possible_keys | key         | key_len | ref  | rows    | filtered | Extra                    |
+----+-------------+--------+-------+---------------+-------------+---------+------+---------+----------+--------------------------+
|  1 | SIMPLE      | events | range | events_time   | events_time | 8       | NULL | 2234568 |   100.00 | Using where; Using index |
+----+-------------+--------+-------+---------------+-------------+---------+------+---------+----------+--------------------------+

1 row in set, 1 warning (0.04 sec)

mysql> EXPLAIN EXTENDED SELECT COUNT(*) FROM events

                         WHERE time > '2010-02-24' AND time < '2010-02-26';

+----+-------------+--------+-------+---------------+-------------+---------+------+---------+----------+--------------------------+
| id | select_type | table  | type  | possible_keys | key         | key_len | ref  | rows    | filtered | Extra                    |
+----+-------------+--------+-------+---------------+-------------+---------+------+---------+----------+--------------------------+
|  1 | SIMPLE      | events | range | events_time   | events_time | 8       | NULL | 1202716 |   100.00 | Using where; Using index |
+----+-------------+--------+-------+---------------+-------------+---------+------+---------+----------+--------------------------+

1 row in set, 1 warning (0.04 sec)

Besides confirming that we are using an index on the time column, there isn't too much here. I tried using MySQL's built in profiler by running SET PROFILING = 1 and then, after rerunning the second query, running

mysql> SHOW PROFILE FOR QUERY 1;

+--------------------+-----------+
| Status             | Duration  |
+--------------------+-----------+
| starting           |  0.000116 |
| Opening tables     |  0.000018 |
| System lock        |  0.000006 |
| Table lock         |  0.000013 |
| init               |  0.000032 |
| optimizing         |  0.000016 |
| statistics         |  0.047627 |
| preparing          |  0.000038 |
| executing          |  0.000007 |
| Sending data       | 62.852915 |
| end                |  0.000035 |
| query end          |  0.000007 |
| freeing items      |  0.000047 |
| logging slow query |  0.000005 |
| cleaning up        |  0.000005 |
+--------------------+-----------+

15 rows in set (0.03 sec)

Huh. Pretty much all of the time is spent "sending data". What the heck is going on?

Fragmented Database
Fragmented Database. For the newer events, we can just get them all in one go, since they're already sorted by time.

The answer turns out to be fragmentation. Remember when I said that we had migrated to this database from an older flat-file based one? Well, those files weren't read and inserted into the database in the chronological order of their contents, but rather based on the contents itself. So the old data that we inserted was fragmented across the database relative the time column. Newer data, on the other hand, was inserted as it comes in, so it is already pretty much ordered by time.

MySQL stores its indices using B-trees, so we can locate any given row (or range of rows) in logarithmic time. But unfortunately we can't just get any random row off the disk immediately. Instead, MySQL can only read data off the disk on the granularity of a page, which means that for a row living in a page that hasn't yet been read into memory, we have to first do a disk seek to load the right page.

Since the rows are fragmented across the disk, we have to do far more expensive disk seeks. So even though we're getting fewer total rows and the query is locating them in logarithmic time, the constant factor ends up being very large due to all the disk seeking.

What are your favorite bitten-by-a-constant-factor stories?

~leonidg

Comments:

Post a Comment:
Comments are closed for this entry.
About

Tired of rebooting to update systems? So are we -- which is why we invented Ksplice, technology that lets you update the Linux kernel without rebooting. It's currently available as part of Oracle Linux Premier Support, Fedora, and Ubuntu desktop. This blog is our place to ramble about technical topics that we (and hopefully you) think are interesting.

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today