vq_max_pending

As part of the I/O scheduling, ZFS has a field called 'zfs_vdev_max_pending'. This limits the maximum number of I/Os we can send down per leaf vdev. This is NOT the maximum per filesystem or per pool. Currently the default is 35. This is a good number for today's disk drives; however, it is not a good number for storage arrays that are really comprised of many disks but exported to ZFS as a single device.

This limit is a really good thing when you have a heavy I/O load as described in Bill's "ZFS vs. The Benchmark" blog.

But if you've created say a 2 device mirrored pool - where each device is really a 10 disk storage array, and you think that ZFS just isn't doing enough I/O for you, here's a script to see if that's true:

#!/usr/sbin/dtrace -s

vdev_queue_io_to_issue:return
/arg1 != NULL/
{
        @c["issued I/O"] = count();
}

vdev_queue_io_to_issue:return
/arg1 == NULL/
{
        @c["didn't issue I/O"] = count();
}

vdev_queue_io_to_issue:entry
{
        @avgers["avg pending I/Os"] = avg(args[0]->vq_pending_tree.avl_numnodes);
        @lquant["quant pending I/Os"] = quantize(args[0]->vq_pending_tree.avl_numnodes);
        @c["total times tried to issue I/O"] = count();
}

vdev_queue_io_to_issue:entry
/args[0]->vq_pending_tree.avl_numnodes > 349/
{
        @avgers["avg pending I/Os > 349"] = avg(args[0]->vq_pending_tree.avl_numnodes);
        @quant["quant pending I/Os > 349"] = lquantize(args[0]->vq_pending_tree.avl_numnodes, 33, 1000, 1);
        @c["total times tried to issue I/O where > 349"] = count();
}

/\* bail after 5 minutes \*/
tick-300sec
{
        exit(0);
} 

If you see the "avg pending I/Os" hitting your vq_max_pending limit, then raising the limit would be a good thing. The way to do that used to be per vdev, but we now have a single global way to change all vdevs.

heavy# mdb -kw
Loading modules: [ unix genunix specfs dtrace cpu.generic cpu_ms.AuthenticAMD.15 uppc pcplusmp scsi_vhci ufs ip hook neti sctp arp usba fctl nca lofs zfs random nfs cpc fcip logindmux ptm sppp ipc ]
> zfs_vdev_max_pending/E
zfs_vdev_max_pending:
zfs_vdev_max_pending:           35              
> zfs_vdev_max_pending/W 0t70
zfs_vdev_max_pending:           0x23            =       0x46
> zfs_vdev_max_pending/E
zfs_vdev_max_pending:
zfs_vdev_max_pending:           70              
>

The above will change the max # of pending requests to 70, instead of 35.

So having people tune variables is never desireable, and we'd like 'vq_max_pending' (among others) to be dynamically set, see: 6457709 vdev_knob values should be determined dynamically .

Comments:

I'm a dtrace neophyte, so I hope you'll pardon me if this is a stupid question, but when I try to run this script, I get this error: dtrace: failed to compile script /tmp/max-io-dtrace: line 17: operator . cannot be applied to type "sigqueue_t \*"; must be applied to a struct or union

Posted by Jason Larke on August 08, 2006 at 04:17 AM PDT #

Nope, its an excellent question... i imagine you're running U2? There's a bug in the CTF data that breaks this script on U2 and there's no work around.

Posted by eric kustarz on August 08, 2006 at 04:30 AM PDT #

Hi, I run a mail server on ZFS now (was on utf for a long time). 5.5TB, (concat of 9 2way mirrors of DAS (iscsi and scsi) arrays cca. 700GB each (these are raid0). Sol 10U2 x64 I think I have problem described here. The performance is less then on 6 UTF filesystems before (now there are 3 arrays more!). My writes are waiting for 15 secs to complete, and reads are at cca 20 MB/s only. I played as you suggested above, I almost doubled the read BW, however the writes are still way delayed. I read that reads are prefered in ZFS, but the queue system is stuck now. Any hint? Workaround? Thank you, Ivan

Posted by Ivan Debnar on September 06, 2006 at 03:10 AM PDT #

Just to add more info: the iostat report 0% wait on all drives and avg 40% busy on all of them, avg svc time <20ms. System is 70% idle, 15% user, 15%kernel.

Posted by Ivan Debnar on September 06, 2006 at 03:21 AM PDT #

One more question.
I have
    ffffffff936b2a80 HEALTHY   -            root
    ffffffff936b2000 HEALTHY   -              mirror
    ffffffff936b2540 HEALTHY   -                /dev/dsk/c6t3d0s0
    ffffffff936b3ac0 HEALTHY   -                /dev/dsk/c8t22260001552EFE2Cd0s0

... etc ...
Should the 'vq_max_pending' be changed for the "mirror" device as well, or only for the components? And what about 'root' component?
Thanks for enlightment.

Posted by Ivan Debnar on September 06, 2006 at 07:42 AM PDT #

It should matter only to the leaf vdevs - so you don't have to worry about modifying the "mirror" vdev or "root" vdev.

The other questions have been taken to zfs-discuss - thanks as thats the best way to get questions answered!

Posted by eric kustarz on September 07, 2006 at 04:56 AM PDT #

Post a Comment:
Comments are closed for this entry.
About

erickustarz

Search

Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today