NCQ performance analysis

NCQ was enabled in OpenSolaris back in snv_47, but it was only recently (build snv_63), that multiple concurrent I/Os were enabled - thereby, letting us take advantage of NCQ. I've been waiting for many months for this to happen, so i was curious as to what performance impact NCQ would really have.

The short and sweet is that NCQ is great for multi-threaded random I/O and horrible for multi-streamed sequential I/O.

Below are some side by side comparisons (you may have to expand that browser a bit) with NCQ disabled vs enabled for two main workloads: random reads and sequential reads. I have some different configs for the random reads workload (1 disk, 2 disk, 4 disk, etc.). I also experimented with a different number of threads/streams for each workload. filebench was used to generate the workloads. More info on filebench is here.

I tested on 2007-03-16 non-debug bits of solaris on a thumper. The disks were 500GB Hitachis with AJ0A firmware. The filebench profiles i used for the random read workload is here and the sequential read workload is here. All tests used ZFS with checksumming off.


Random Read Results

1 disk


2 disk RAID-0


4 disk RAID-0


16 disk RAID-0


32 disk RAID-0



Sequential Read Results

46 disk RAID-0





Conclusions

Clearly, NCQ gets you better iops (and hence better overall BW) when the workload is purely random I/O and you have more than 1 thread. We see the exact opposite result if the workload is purely sequential I/O and you have multiple streams.

For the multi-streamed sequential read workload, we were curious where the problem was. Using Dtrace, we found ldi_strategy() was taking ~3x longer with NCQ enabled over disabled. Somewhat interesting, but we still didn't know if the problem was in SATA, marvell, disk firmware, or just the disk hardware itself. So we Dtrace'd the sd, sata, and marvell88sx modules (as well as the ldi_strategy() and bdev_strategy() routines) to see where time was being spent. We found that mv_rw_dma_start() was taking ~20x longer. Ah, ha! So we know the problem is either in marvell, but more likely the disk (firmware).

My take on this is that NCQ is an immature technology, and the disk vendors have some work to do (please note, i only got to test this one version of Hitachi drives). I'd love to test out some other SATA disks... you know, when i have free time.

Oh yeah, currently in OpenSolaris, NCQ is enabled by default. If you would like to turn it off, set this in /etc/system and reboot:

set sata:sata_func_enable = 0x5

And now in snv_74, you can leave NCQ enabled, but effectively disable it by setting the number of multiple concurrent I/Os to 1 by putting the following in /etc/system and rebooting:

set sata:sata_max_queue_depth = 0x1

we're running a complete perf run to see if we should have NCQ enabled or disabled by default.

Comments:

I think the current state of the NCQ code is that we bundle up to N requests (32) together and issue them to the firmware; And only when the 32 are all done do we trigger the completion interrupt. This fits well with 20x longer I/Os. One reason this does not work well with ZFS, is that ZFS relies on I/O completion interrupt to get a pipeline of stuff running. If we could get the NCQ to generate more interrupts; that may help ZFS.

Posted by Roch on May 29, 2007 at 06:09 AM PDT #

Yikes! I hope that's not true, that's just destined for stalling. Please file a bug.

Posted by eric kustarz on May 29, 2007 at 06:41 AM PDT #

It is not true that the Solaris drivers bundle up 32 requests and then issue them. The commands are issued as they are delivered by the layers above and the interrupts are delivered immediately upon I/O completion.

Posted by guest on May 29, 2007 at 12:23 PM PDT #

Post a Comment:
Comments are closed for this entry.
About

erickustarz

Search

Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today