Understanding iostat

1Iostat has been around for years and until Dtrace came along and allowed us to look more deeply into the kernel was the tool for analysing how the io subsystem was working in Solaris. However interpreting the output has proved in the past to cause problems.

First if you are looking at latency issues it is vital that you use the smallest time quantum to iostat you can, which as of Solaris 10 is 1 second. Here is a sample of some output produced from “iostat -x 1”:

                  extended device statistics                 
device    r/s    w/s   kr/s   kw/s wait actv  svc_t  %w  %b 
sd3       0.0    0.0    0.0    0.0  0.0  0.0    0.0   0   0 
                 extended device statistics                 
device    r/s    w/s   kr/s   kw/s wait actv  svc_t  %w  %b 
sd3       5.0 1026.5    1.6 1024.5  0.0 25.6   24.8   0  23 
                 extended device statistics                 
device    r/s    w/s   kr/s   kw/s wait actv  svc_t  %w  %b 
sd3       0.0    0.0    0.0    0.0  0.0  0.0    0.0   0   0 


The first thing to draw your attention to is the Column “%b” which the manual tells you is:

%b percent of time the disk is busy (transactions in progress)

So in this example the disk was “busy”, ie had at least one transaction (command) in progress for 23% of the time period. Ie 0.23 seconds as the time period was 1 second.

Now look at the “actv” column. Again the manual says:

actv average number of transactions actively being serviced (removed from the queue but not yet completed)
This is the number of I/O operations accepted, but not yet serviced, by the device.
In this example the average number of transactions outstanding for this time quantum was 25.6. Now here is the bit that is so often missed. Since we know that all the transactions actually took place within 0.23 seconds and were not evenly spread across the full second the average queue depth when busy was 100/23 \* 25.6 or 111.3. Thanks to dtrace and this D script you can see the actual IO pattern2:

Even having done the maths iostat smooths out peaks in the IO pattern and thus under reports the peak number of transactions as 103.0 when the true value is 200.
The same is true for the bandwidth. The iostat above comes reports 1031.5 transactions a second (r/s + w/s) again though this does not take into account that all those IO requests happened in 0.23 seconds. So the true figure for the device would be 1031.5 \* 100/23 which is 4485 transations/sec.
If we up the load on the disk a bit then you can conclude more from the iostat:
device    r/s    w/s   kr/s   kw/s wait actv  svc_t  %w  %b 
sd3       0.0    0.0    0.0    0.0  0.0  0.0    0.0   0   0 
                 extended device statistics                 
device    r/s    w/s   kr/s   kw/s wait actv  svc_t  %w  %b 
sd3       5.0 2155.7    1.6 2153.7 30.2 93.3   57.1  29  45 
                 extended device statistics                 
device    r/s    w/s   kr/s   kw/s wait actv  svc_t  %w  %b 
sd3       0.0 3989.1    0.0 3989.1 44.6 157.2   50.6  41  83 
                 extended device statistics                 
device    r/s    w/s   kr/s   kw/s wait actv  svc_t  %w  %b 
sd3       0.0    0.0    0.0    0.0  0.0  0.0    0.0   0   0 

Since the %w column is non zero, and from the manual %w is:

%w percent of time there are transactions waiting for service (queue non-empty)

This is telling us that the device's active queue was full. So on the third line of the above output the devices queue was full for 0.41 seconds. Since the queue depth is quite easy to find out3 and in this case was 256, you can deduce that the queue depth for that 0.41 seconds was 256. Thus the average for the 0.59 seconds left was (157.2-(0.41\*256))/0.59 which is 88.5. The graph of the dtrace results tells a different story:


These examples demonstrate what can happen if your application dumps a large number of transactions onto a storage device while the through put will be fine and if you look at iostat data things can appear ok if the granularity of the samples is not close to your requirement for latency any problem can be hidden by the statistical nature of iostat.

1Apologies to those who saw a draft of this on my blog briefly.

2The application creating the IO attempts to keep 200 transations in the disk at all the time. It is interesting to see that it fails as it does not get notification of the completion of the IO until all or nearly all the outstanding transactions have completed.

3This command will do it for all the devices on your system:

echo '\*sd_state::walk softstate | ::print -d -at "struct sd_lun" un_throttle' | pfexec mdb -k

however be warned the throttle is dynamic so dtrace gives the real answer.

Comments:

Hi Chris,

I can't see the graphs.

Posted by Przemyslaw Bak on September 09, 2009 at 08:37 AM BST #

Sorry this got published while it was still a draft and the graphs were not up. Should be good now.

Posted by Chris Gerhard on September 09, 2009 at 10:15 AM BST #

Much better now :-)

Posted by Przemyslaw Bak (przemol) on September 09, 2009 at 01:23 PM BST #

Chris

your write up is very good and makes it clear where iostat's manual page perhaps does not...

Should we have a RFE to incorporate some of your information here incorporated into it man iostat?

thx for taking the time to do this.

Posted by Paul Humphreys on September 14, 2009 at 01:55 AM BST #

dtrace sd_cmds.d comments are wrong... You've got # instead of /\* \*/ minor tweak to get it going...

Posted by Lyle Merdan on September 14, 2009 at 01:08 PM BST #

Thanks Lyle, that will teach me to add the copyright and not test it.

I've fixed it.

Posted by Chris Gerhard on September 14, 2009 at 02:27 PM BST #

Post a Comment:
Comments are closed for this entry.
About

This is the old blog of Chris Gerhard. It has mostly moved to http://chrisgerhard.wordpress.com

Search

Archives
« April 2014
MonTueWedThuFriSatSun
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
    
       
Today