X

Paulie's world in a blog

  • January 14, 2014

Configuration Applications / OS's for 1MB ZFS Block Sizes

Guest Author

The latest release of the ZFS Storage Appliance, 2013.1.1.1, introduces 1MB block sizes for shares. This is a deferred update that can only be enabled inside of Maintenance → System. You can edit individual Filesystems or LUNs from within 'Shares' to enable the 1MB support (database record size).

1m_enable.png

This new feature may need additional tweaking on all connected servers to fully realize significant performance gains. Most operating systems currently do not support a 1MB transfer size by default. This can be very easily spotted within analytics by breaking down your expected protocol by IO size. As an example, let's look at a fibre channel workload being generated by an Oracle Linux 6.5 server:

Example


1m_fcbad.png

The IO size is sitting at 501K, a very strange number that's eerily close to 512K. Why is this a problem? Well, take a look at our backend disks:

1m_diskiobad.png

Our disk IO size (block size) is heavily fragmented! This causes our overall throughput to nosedive.

1m_throughputbad.png

2GB/s is okay, but we can do better if our buffer size was 1MB on the host side.

Fixing the problem


Fibre Channel

Solaris
# echo 'set maxphys=1048576' > /etc/system
Oracle Linux 6.5 uek3 kernel (previous releases do not support 1MB sizes for multipath)
# echo 1024 > /sys/block/dm*/queue/max_sectors_kb
or create a permanent udev rule:
# vi /etc/udev/rules.d/99-zfssa.rules
ACTION=="add", SYSFS{vendor}=="SUN", SYSFS{model}=="*ZFS*",
ENV{ID_FS_USAGE}!="filesystem", ENV{ID_PATH}=="*-fc-*",
RUN+="sh -c 'echo 1024 > /sys$DEVPATH/queue/max_sectors_kb'"
Windows
QLogic [qlfc]
C:\> qlfcx64.exe -tsize /fc /set 1024
Emulex [HBAnyware]
set ExtTransferSize = 1

Please see MOS Note 1640013.1 for configuration for iSCSI and NFS.

Results


After re-running the same FC workload with the correctly set 1MB transfer size, I can see the IO size is now where it should be.

1m_fcgood.png

This has a drastic impact on the block sizes being allocated on the backend disks:

1m_diskiogood.png

And an even more drastic impact on the overall throughput:

1m_throughputgood.png

A very small tweak resulted in a 5X performance gain (2.1GB/s to 10.9GB/s)! Until 1MB is the default for all physical I/O requests, expect to make some configuration changes on your underlying OS's.

System Configuration


Storage

  • 1 x Oracle ZS3-4 Controller
  • 2013.1.1.1 firmware
  • 1TB DRAM
  • 4 x 16G Fibre Channel HBAs
  • 4 x SAS2 HBAs
  • 4 x Disk Trays (24 4TB 7200RPM disks each)

Servers
  • 4 x Oracle x4170 M2 servers
  • Oracle Linux 6.5 (3.8.x kernel)
  • 16G DRAM
  • 1 x 16G Fibre Channel HBA

Workload


Each Oracle Linux server ran the following vdbench profile running against 4 LUNs:

sd=sd1,lun=/dev/mapper/mpatha,size=1g,openflags=o_direct,threads=128
sd=sd2,lun=/dev/mapper/mpathb,size=1g,openflags=o_direct,threads=128
sd=sd3,lun=/dev/mapper/mpathc,size=1g,openflags=o_direct,threads=128
sd=sd4,lun=/dev/mapper/mpathd,size=1g,openflags=o_direct,threads=128
wd=wd1,sd=sd*,xfersize=1m,readpct=70,seekpct=0
rd=run1,wd=wd1,iorate=max,elapsed=999h,interval=1

This is a 70% read / 30% write sequential workload.

Join the discussion

Comments ( 1 )
  • Carlos Azevedo Wednesday, January 15, 2014

    Great! I can acknowledge that's common to notice the problem.

    But knowing how to fix it is another story.

    I know of parallel systems that aren't (?) delivering more than 2GB/s.

    With what you've presented, it's even possible to beat them.

    Regards.


Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.