Introduction to the Innodb IO subsystem

Introduction to the Innodb IO subsystem

Basics

When a client connects to MySQL, it creates a thread to handle it. This thread executes SQL queries and interacts with the storage engine (for simplicity, lets call them user threads). Innodb uses a four (4) additional threads to implement asynchronous io1. Although Innodb has an option innodb_file_io_threads to control the number of IO handler threads, it has no effect on how many IO handler threads are actually created. These IO handler threads wait and process events in a loop. Each IO handler thread processes different kinds of events. (Insert buffer writes, log writes, datafile writes, and read-ahead or prefetch) Let us now see how different types of IO are handled by Innodb.

Reads

Using a simple sysbench read-only test, we notice that all user threads issue reads (pread(2)) to the data files. We can verify this using a simple dtrace script
shell>dtrace -qn io:::start'/args[0]->b_flags & B_READ/{@[tid]=count()}tick-5s{exit(0)}'
      709               39
      711               42
      710               43
      712               51
As you can see above, all 4 threads are issuing reads.

Since innodb_thread_concurrency limits how many threads are executing inside Innodb, it acts as a throttle on the number of reads that can be issued in parallel.

Optionally you can use dtrace to look at the callstacks for reads and verify that the reads originate while reading pages of the Innodb datafiles.

Writes

Looking at writes, we find that bulk of the IO (pwrite(2)) is initiated by the Innodb IO handler threads. We also noticed a weird (atleast for me) behavior. We found that some reads were triggering fdsync() and thus causing writes to disk!. As noted above, since reads are done by all user threads, fdsync() was being issued by many threads.

So why are reads triggering fdsync() ?

In case the Innodb buffer gets full, it needs to evict data from the buffer pool to make space for the new record that is being read. The eviction function (buf_flush_free_margin()) also flushes possible buffered writes from the double write memory buffer. This flush is done using fdsync(). You can try increasing the buffer pool size incase the fdsync() bothers you. It is also possible to turn off the Innodb double write buffer, but it is generally not recommended (except for ZFS) as you might lose data in case of a power outage.

The 100 ios per second rule

The Innodb master thread has the responsibility (among many others) of flushing out modified buffer pool pages. Once a second, the innodb master thread will flush up to a maximum of 100 pages. This is a hard coded limit and is too low! (search for buf_flush_batch(BUF_FLUSH_LIST, 100,..) in srv/srv0srv.c). Reads do not suffer from this limitation.

So what can we do to improve Innodb's IO behavior?

Having only one thread for writes is not great for performance. The 100 ios per second rule is seriously flawed. Luckily the MySQL community (Google and Percona in particular) have fixes that change this behavior. Check out Mark Callaghan's blog for more details

So in conclusion,

  1. Reads are issued by all user threads
  2. Writes are only issued by innodb threads (one per kind of write)
  3. Buffer pool writes are currently limited to a max of 100 writes per second.
  4. It is possible for reads to cause writes
  5. Google (Mark Callaghan) has some really cool patches to fix most of the issues mentioned above. I have not tried the patch yet, but will soon.

1Rant: Most modern operating systems support AIO natively, I cannot understand why Innodb wants to implement its own)
Comments:

Post a Comment:
Comments are closed for this entry.
About

realneel

Search

Archives
« April 2015
SunMonTueWedThuFriSat
   
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
  
       
Today