By user12625760 on Sep 09, 2009
1Iostat has been around for years and until Dtrace came along and allowed us to look more deeply into the kernel was the tool for analysing how the io subsystem was working in Solaris. However interpreting the output has proved in the past to cause problems.
First if you are looking at latency issues it is vital that you use the smallest time quantum to iostat you can, which as of Solaris 10 is 1 second. Here is a sample of some output produced from “iostat -x 1”:
extended device statistics device r/s w/s kr/s kw/s wait actv svc_t %w %b sd3 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 extended device statistics device r/s w/s kr/s kw/s wait actv svc_t %w %b sd3 5.0 1026.5 1.6 1024.5 0.0 25.6 24.8 0 23 extended device statistics device r/s w/s kr/s kw/s wait actv svc_t %w %b sd3 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0
The first thing to draw your attention to is the Column “%b” which the manual tells you is:
%b percent of time the disk is busy (transactions in progress)
So in this example the disk was “busy”, ie had at least one transaction (command) in progress for 23% of the time period. Ie 0.23 seconds as the time period was 1 second.
Now look at the “actv” column. Again the manual says:
actv average number of transactions actively being serviced (removed from the queue but not yet completed)
This is the number of I/O operations accepted, but not yet serviced, by the device.
In this example the average number of transactions outstanding for this time quantum was 25.6. Now here is the bit that is so often missed. Since we know that all the transactions actually took place within 0.23 seconds and were not evenly spread across the full second the average queue depth when busy was 100/23 \* 25.6 or 111.3. Thanks to dtrace and this D script you can see the actual IO pattern2:
Even having done the maths iostat smooths out peaks in the IO pattern and thus under reports the peak number of transactions as 103.0 when the true value is 200.
The same is true for the bandwidth. The iostat above comes reports 1031.5 transactions a second (r/s + w/s) again though this does not take into account that all those IO requests happened in 0.23 seconds. So the true figure for the device would be 1031.5 \* 100/23 which is 4485 transations/sec.
If we up the load on the disk a bit then you can conclude more from the iostat:
device r/s w/s kr/s kw/s wait actv svc_t %w %b sd3 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 extended device statistics device r/s w/s kr/s kw/s wait actv svc_t %w %b sd3 5.0 2155.7 1.6 2153.7 30.2 93.3 57.1 29 45 extended device statistics device r/s w/s kr/s kw/s wait actv svc_t %w %b sd3 0.0 3989.1 0.0 3989.1 44.6 157.2 50.6 41 83 extended device statistics device r/s w/s kr/s kw/s wait actv svc_t %w %b sd3 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0
Since the %w column is non zero, and from the manual %w is:
%w percent of time there are transactions waiting for service (queue non-empty)
This is telling us that the device's active queue was full. So on the third line of the above output the devices queue was full for 0.41 seconds. Since the queue depth is quite easy to find out3 and in this case was 256, you can deduce that the queue depth for that 0.41 seconds was 256. Thus the average for the 0.59 seconds left was (157.2-(0.41\*256))/0.59 which is 88.5. The graph of the dtrace results tells a different story:
These examples demonstrate what can happen if your application dumps a large number of transactions onto a storage device while the through put will be fine and if you look at iostat data things can appear ok if the granularity of the samples is not close to your requirement for latency any problem can be hidden by the statistical nature of iostat.
1Apologies to those who saw a draft of this on my blog briefly.
2The application creating the IO attempts to keep 200 transations in the disk at all the time. It is interesting to see that it fails as it does not get notification of the completion of the IO until all or nearly all the outstanding transactions have completed.
3This command will do it for all the devices on your system:
echo '\*sd_state::walk softstate | ::print -d -at "struct sd_lun" un_throttle' | pfexec mdb -k
however be warned the throttle is dynamic so dtrace gives the real answer.