InnoDB now supports native AIO on Linux
By Calvin Sun on Apr 14, 2010
Note: this article was originally published on http://blogs.innodb.com on April 14, 2010 by Inaam Rana.
With the exception of Windows InnoDB has used
‘simulated AIO’ on all other platforms to perform certain IO operations.
The IO requests that have been performed in a ‘simulated AIO’ way are
the write requests and the readahead requests for the datafile pages.
Let us first look at what does ‘simulated AIO’ mean in this context.
We call it ‘simulated AIO’ because it appears
asynchronous from the context of a query thread but from the OS
perspective the IO calls are still synchronous. The query thread simply
queues the request in an array and then returns to the normal working.
One of the IO helper thread, which is a background thread, then takes
the request from the queue and issues a synchronous IO call
(pread/pwrite) meaning it blocks on the IO call. Once it returns from
the pread/pwrite call, this helper thread then calls the IO completion
routine on the block in question which includes doing a merge of
buffered operations, if any, in case of a read. In case of a write, the
block is marked as ‘clean’ and is removed from the flush_list. Some
other book keeping stuff also happens in IO completion routine.
How does this change the design of the InnoDB IO
subsystem? Now the query thread instead of enqueueing the IO request
actually dispatches the request to the kernel and returns to the normal
working. The IO helper thread, instead of picking up enqueued requests,
waits on the IO wait events for any completed IO requests. As soon as it
is notified by the kernel that a certain request has been completed it
calls the IO completion routine on that request and then returns back to
wait on the IO wait events. In this new design the IO requesting thread
becomes kind of a dispatcher while the background IO thread takes on
the role of a collector.
What will this buy us? The answer is simple –
scalability. For example, consider a system which is heavily IO bound.
In InnoDB one IO helper thread works on a maximum of 256 IO requests at
one time. Assume that the heavy workload results in the queue being
filled up. In simulated AIO the IO helper thread will go through these
requests one by one making a synchronous call for each request. This
means serialisation forcing the request that is serviced last to wait
for the other 255 requests before it gets a chance. What this implies is
that with simulated AIO there can be at most ‘n’ IO requests in
parallel inside the kernel where ‘n’ is the total number of IO helper
threads (this is not entirely true because query threads are also
allowed to issue synchronous requests as well, but I’ll gloss over that
detail for now). In case of native AIO all 256 requests are dispatched
to the kernel and if the underlying OS can service more requests in
parallel then we’ll take advantage of that.
The idea of coalescing contiguous requests is now
off loaded to the kernel/IO scheduler. What this means is that which IO
scheduler you are using or the properties of your RAID/disk controller
may now have more affect on the overall IO performance. This is also
true because now many more IO requests will be inside the kernel than
before. Though we have not run tests to specifically certify any
particular IO scheduler the conventional wisdom has been that for
database engine workloads perhaps no-op or deadline scheduler would give
optimal performance. I have heard that lately a lots of improvements
have gone in cfq as well. It is for you to try and as always YMMV. And
we look forward to hear your story.