Wednesday Mar 05, 2014

Random or Sequential I/O? How to find out with DTrace

One of the perennial performance or sizing questions for a workload is how much I/O it generates. This can be broken down several ways:

  • Network versus disk
  • Read versus write
  • Random versus sequential (for the disk component)

Many of the metrics have always been easy to quantify. There are standard operating system tools to measure disk I/O, network packets and bytes, etc. The question of how much random versus sequential I/O is much harder to answer, but it can be an important one if your storage is a potential limiter of performance and that storage has a significant amount of "conventional" disk - i.e. spindles of rotating rust.

Sequential I/O on conventional disks can generally be served at a higher throughput because conventional disks can avoid almost all rotational delay and seek time penalties. Random I/O will always incur these penalties, at a varying rate.

So if you need to decompose your workload enough to understand how much Random versus Sequential I/O there is, what do you do? You may be able to discuss it with the application developer to get their take on how the application behaves, what it stores, and how and when it needs to fetch data into memory or write or update it on disk. This information (or access to the developer) is not always available though.

What about if I can get inside the application and measure the I/O as it happens? That is a task for DTrace. I have developed a script that tallies up all I/O for all filesystems and can tell you how much of the I/O is Sequential (a read or write at a location in a file is then followed by a read or write at the next location for that file) or Random.

Here is some example output from the script, where I first ran a Sequential filebench test:

Sample interval: 120 seconds
End time: 2011 Mar 23 10:42:20

FYI: Page I/O on VREG vnodes after read/write:
  fop_read      fop_getpage      2

/u                         rd/s    wr/s   rdKB/s   wrKB/s   acc/look:       0
             sequential 2616.20 0.62501  2676408 0.718235    readdir:       0
                 random 2.65836 0.20834 2713.623 0.641998  geta/seta:       0
                   page 0.00833       0 0.033334        0      cr/rm:       0
                  TOTAL 2618.86 0.83334  2679122 1.360233      other: 2.86669

then I ran a random filebench test:

Sample interval: 60 seconds
End time: 2011 Mar 22 12:09:28

FYI: Page I/O on VREG vnodes after read/write:
  fop_write      fop_putpage      1
  fop_read       fop_getpage      5
  fop_write      fop_getpage     18

/u                         rd/s    wr/s   rdKB/s   wrKB/s   acc/look: 2.28871
             sequential 157.036 161.446 1255.552 1118.578    readdir:       0
                 random 20113.5 17119.0 160860.2 136919.0  geta/seta:       0
                   page       0 4.27672        0 17.10687      cr/rm: 0.05012
                  TOTAL 20270.6 17284.7 162115.7 138054.7      other: 5.26237

As you can see the script outputs a table breaking down reads and writes by operations and by KB/s, but also by "sequential", "random" and "page". There are also some totals, plus some statistics on a few other VFS operations.

What does "page" mean? Well, this script uses Solaris' virtual filesystem interface to measure filesystem I/O. Through this interface we can see conventional read and write operations, however there are also "page" operations that can happen which do not necessarily map to a read or write from an application. Because of this uncertainty I decided to list the I/O via these page operations separately. They will generally be small for an application that uses conventional I/O calls, but may be large if an application for is using the mmap() interface for example.

For the purposes of this blog entry I have simplified the output of the script. It normally outputs a table for every filesystem that has I/O during its run. The script could easily be modified to select a desired filesystem. It could also be modified to select I/O related to a specific user, process, etc.

References

About

Tim Cook's Weblog The views expressed on this blog are my own and do not necessarily reflect the views of Oracle.

Search

Archives
« March 2015
SunMonTueWedThuFriSat
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
    
       
Today