By paulie on Jan 14, 2014
The latest release of the ZFS Storage Appliance, 2013.1.1.1, introduces 1M block sizes for shares. This is a deferred update that can only be enabled inside of Maintenance → System. You can edit individual Filesystems or LUNs from within 'Shares' to enable the 1M support (database record size).
This new feature may need additional tweaking on all connected servers to fully realize significant performance gains. Most operating systems currently do not support a 1M transfer size by default. This can be very easily spotted within analytics by breaking down your expected protocol by IO size. As an example, let's look at a fibre channel workload being generated by an Oracle Linux 6.5 server:
The IO size is sitting at 501K, a very strange number that's eerily close to 512K. Why is this a problem? Well, take a look at our backend disks:
Our disk IO size (block size) is heavily fragmented! This causes our overall throughput to nosedive.
2GB/s is okay, but we can do better if our buffer size was 1M on the host side.
Fixing the problem
Solaris # echo 'set maxphys=1048576' > /etc/system Oracle Linux 6.5 (previous releases do not support 1M sizes for multipath) # echo 1024 > /sys/block/dm*/queue/max_sectors_kb Windows (QLogic [requires qlfc utility]) C:\> qlfc -tsize /fc /set 1024
Solaris mount -F nfs -o rsize=1048576,wsize=1048576 target:/export/share /mnt/share Linux mount -t nfs -o rsize=1048576,wsize=1048576 target:/export/share /mnt/share
After re-running the same FC workload with the correctly set 1M transfer size, I can see the IO size is now where it should be.
This has a drastic impact on the block sizes being allocated on the backend disks:
And an even more drastic impact on the overall throughput:
A very small tweak resulted in a 5X performance gain (2.1GB/s to 10.9GB/s)! Until 1M is the default for all physical I/O requests, expect to make some configuration changes on your underlying OS's.
- 1 x Oracle ZS3-4 Controller
- 2013.1.1.1 firmware
- 1TB DRAM
- 4 x 16G Fibre Channel HBAs
- 4 x SAS2 HBAs
- 4 x Disk Trays (24 4TB 7200RPM disks each)
- 4 x Oracle x4170 M2 servers
- Oracle Linux 6.5 (3.8.x kernel)
- 16G DRAM
- 1 x 16G Fibre Channel HBA
Each Oracle Linux server ran the following vdbench profile running against 4 LUNs:
sd=sd1,lun=/dev/mapper/mpatha,size=1g,openflags=o_direct,threads=128 sd=sd2,lun=/dev/mapper/mpathb,size=1g,openflags=o_direct,threads=128 sd=sd3,lun=/dev/mapper/mpathc,size=1g,openflags=o_direct,threads=128 sd=sd4,lun=/dev/mapper/mpathd,size=1g,openflags=o_direct,threads=128 wd=wd1,sd=sd*,xfersize=1m,readpct=70,seekpct=0 rd=run1,wd=wd1,iorate=max,elapsed=999h,interval=1This is a 70% read / 30% write sequential workload.