Where can I/O queue up in sd/ssd

Using the opensolaris source at opensolaris.org you can see the source code to sd.c, which builds the sd and ssd scsi/fibre target drivers.

The driver keeps a count of i/o's it has accepted in the un->un_ncmds_in_driver element of the un structure which is the softstate for that disk lun. It keeps the number of i/o's it has passed down to the HBA to transport in the un->un_ncmds_in_transport variable.

New I/O's passed into sd/ssd to the transport are passed straight down to the HBA until un_ncmds_in_transport exceeds the un_throttle for this lun or during error processing when we retry commands sequentually until one succeeeds.

When we have to queue the bufs we sort them onto a linked list anchored in un->un_waitq_headp and un->un_waitq_tailp.

But there is another place that i/o can be queued before it gets to the waitq and throttles - in the xbuf layer. un->un_xbuf_attr points to the xbuf control block which is of type __ddi_xbuf_attr . This has an unsorted waitq anchored in xa_headp and xa_tailp, by default the xbuf layer only allows 256 commands through to the normal sd waitq and throttles, the rest are queued in the xbuf and un_ncmds_in_driver is incremented. So it is possible when a large number of commands to be dropped on a sd lun, un_ncmds_in_driver is very large but the waitq has less than 256 commands on it - the rest are lurking on the xbuf's queue

These xbuf queued i/o are invisible to iostat's wait column - maybe they shouldn't be?

Comments:

I think those IO's in the xbuf queue should be counted by the kstat. The xbuf queue is an implentation detail but the IO's are certainly waiting so they should be accounted for.

Posted by Chris Gerhard on March 16, 2006 at 04:58 AM GMT+00:00 #

My understanding was that once i/o's get backed up into the xbuf layer (or at least out of each LUN's queue) that this caused a spinlock or similar. As such, this results in jumps in %sys when there are more than 256 i/o's queued. Does this sound right?

This area tends to be a recurring topic on servers with 100+ LUNs with 2 - 4 paths to each LUN. My understanding is that by tuning sd_max_throttle to a small number (< 10) you ensure that those 256 IO requests are distributed among multiple LUNs. However, it would seem to make sense to me if you have hundreds of LUNs that being able to send far more than 256 IO requests to the sd/ssd layer would be beneficial to maximize throughput.

Posted by Mike Gerdts on March 16, 2006 at 11:34 AM GMT+00:00 #

Hello Mike,
Your pretty close but luckily the kernel keeps all these queues and data structures per lun.

So prior to solaris 9 the only waitq was in the scsi_disk structure and when you exceeded that lun's throttle it could grow unbounded if you poured enough i/o onto a lun. managing that queue was done under a mutex so that massively increased the apparent disk service and wait times, and as you point out it burnt %sys if the list was really really long. In solaris 9 and 10 (and later) the sorting is only done on the sd waitq, the per lun xbuf queue is not sorted so updates are quick, so by limiting the sd_lun waitq to (256-throttle) we bound the maximum insertion time.

sd_max_throttle applies to each lun seperately, so if you set it to 10 then only 10 commands per lun can go to the transport in parallel. You can load many more but they will be queued first in the per lun sd_lun sorted waitq and then on the per lun xbuf unsorted queue.
Solaris scales to support massive i/o workloads involving 1000's of disk luns all active with dozens or commands at the same time. The usual reason for setting sd_max_throttle is to protect the storage device from overwork especially if you have parallel paths and a storage frame that has a finite number of buffers to store incoming commands into, if you max out those buffers then the frame drops commands and you get scsi timeouts and other unwanted effects.

thanks
tim

Posted by tim uglow on March 16, 2006 at 01:33 PM GMT+00:00 #

Post a Comment:
Comments are closed for this entry.
About

timatworkhomeandinbetween

Search

Archives
« July 2014
MonTueWedThuFriSatSun
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
   
       
Today
News

No bookmarks in folder

Blogroll

No bookmarks in folder