By Tim Cook on Mar 05, 2014
One of the perennial performance or sizing questions for a workload is how much I/O it generates. This can be broken down several ways:
- Network versus disk
- Read versus write
- Random versus sequential (for the disk component)
Many of the metrics have always been easy to quantify. There are standard operating system tools to measure disk I/O, network packets and bytes, etc. The question of how much random versus sequential I/O is much harder to answer, but it can be an important one if your storage is a potential limiter of performance and that storage has a significant amount of "conventional" disk - i.e. spindles of rotating rust.
Sequential I/O on conventional disks can generally be served at a higher throughput because conventional disks can avoid almost all rotational delay and seek time penalties. Random I/O will always incur these penalties, at a varying rate.
So if you need to decompose your workload enough to understand how much Random versus Sequential I/O there is, what do you do? You may be able to discuss it with the application developer to get their take on how the application behaves, what it stores, and how and when it needs to fetch data into memory or write or update it on disk. This information (or access to the developer) is not always available though.
What about if I can get inside the application and measure the I/O as it happens? That is a task for DTrace. I have developed a script that tallies up all I/O for all filesystems and can tell you how much of the I/O is Sequential (a read or write at a location in a file is then followed by a read or write at the next location for that file) or Random.
Here is some example output from the script, where I first ran a Sequential filebench test:
Sample interval: 120 seconds End time: 2011 Mar 23 10:42:20 FYI: Page I/O on VREG vnodes after read/write: fop_read fop_getpage 2 /u rd/s wr/s rdKB/s wrKB/s acc/look: 0 sequential 2616.20 0.62501 2676408 0.718235 readdir: 0 random 2.65836 0.20834 2713.623 0.641998 geta/seta: 0 page 0.00833 0 0.033334 0 cr/rm: 0 TOTAL 2618.86 0.83334 2679122 1.360233 other: 2.86669
then I ran a random filebench test:
Sample interval: 60 seconds End time: 2011 Mar 22 12:09:28 FYI: Page I/O on VREG vnodes after read/write: fop_write fop_putpage 1 fop_read fop_getpage 5 fop_write fop_getpage 18 /u rd/s wr/s rdKB/s wrKB/s acc/look: 2.28871 sequential 157.036 161.446 1255.552 1118.578 readdir: 0 random 20113.5 17119.0 160860.2 136919.0 geta/seta: 0 page 0 4.27672 0 17.10687 cr/rm: 0.05012 TOTAL 20270.6 17284.7 162115.7 138054.7 other: 5.26237
As you can see the script outputs a table breaking down reads and writes by operations and by KB/s, but also by "sequential", "random" and "page". There are also some totals, plus some statistics on a few other VFS operations.
What does "page" mean? Well, this script uses Solaris' virtual filesystem interface to measure filesystem I/O. Through this interface we can see conventional read and write operations, however there are also "page" operations that can happen which do not necessarily map to a read or write from an application. Because of this uncertainty I decided to list the I/O via these page operations separately. They will generally be small for an application that uses conventional I/O calls, but may be large if an application for is using the mmap() interface for example.
For the purposes of this blog entry I have simplified the output of the script. It normally outputs a table for every filesystem that has I/O during its run. The script could easily be modified to select a desired filesystem. It could also be modified to select I/O related to a specific user, process, etc.