Tuesday Jan 14, 2014

Configuration Applications / OS's for 1MB ZFS Block Sizes

The latest release of the ZFS Storage Appliance, 2013.1.1.1, introduces 1MB block sizes for shares. This is a deferred update that can only be enabled inside of Maintenance → System. You can edit individual Filesystems or LUNs from within 'Shares' to enable the 1MB support (database record size).

1m_enable.png

This new feature may need additional tweaking on all connected servers to fully realize significant performance gains. Most operating systems currently do not support a 1MB transfer size by default. This can be very easily spotted within analytics by breaking down your expected protocol by IO size. As an example, let's look at a fibre channel workload being generated by an Oracle Linux 6.5 server:

Example


1m_fcbad.png

The IO size is sitting at 501K, a very strange number that's eerily close to 512K. Why is this a problem? Well, take a look at our backend disks:

1m_diskiobad.png

Our disk IO size (block size) is heavily fragmented! This causes our overall throughput to nosedive.

1m_throughputbad.png

2GB/s is okay, but we can do better if our buffer size was 1MB on the host side.

Fixing the problem


Fibre Channel

Solaris
# echo 'set maxphys=1048576' > /etc/system

Oracle Linux 6.5 uek3 kernel (previous releases do not support 1MB sizes for multipath)
# echo 1024 > /sys/block/dm*/queue/max_sectors_kb 

or create a permanent udev rule:

# vi /etc/udev/rules.d/99-zfssa.rules

ACTION=="add", SYSFS{vendor}=="SUN", SYSFS{model}=="*ZFS*", 
ENV{ID_FS_USAGE}!="filesystem", ENV{ID_PATH}=="*-fc-*", 
RUN+="sh -c 'echo 1024 > /sys$DEVPATH/queue/max_sectors_kb'"

Windows

QLogic [qlfc]
C:\> qlfcx64.exe -tsize /fc /set 1024

Emulex [HBAnyware]
set ExtTransferSize = 1

Please see MOS Note 1640013.1 for configuration for iSCSI and NFS.

Results


After re-running the same FC workload with the correctly set 1MB transfer size, I can see the IO size is now where it should be.

1m_fcgood.png

This has a drastic impact on the block sizes being allocated on the backend disks:

1m_diskiogood.png

And an even more drastic impact on the overall throughput:

1m_throughputgood.png

A very small tweak resulted in a 5X performance gain (2.1GB/s to 10.9GB/s)! Until 1MB is the default for all physical I/O requests, expect to make some configuration changes on your underlying OS's.

System Configuration


Storage

  • 1 x Oracle ZS3-4 Controller
  • 2013.1.1.1 firmware
  • 1TB DRAM
  • 4 x 16G Fibre Channel HBAs
  • 4 x SAS2 HBAs
  • 4 x Disk Trays (24 4TB 7200RPM disks each)
Servers
  • 4 x Oracle x4170 M2 servers
  • Oracle Linux 6.5 (3.8.x kernel)
  • 16G DRAM
  • 1 x 16G Fibre Channel HBA

Workload


Each Oracle Linux server ran the following vdbench profile running against 4 LUNs:

sd=sd1,lun=/dev/mapper/mpatha,size=1g,openflags=o_direct,threads=128
sd=sd2,lun=/dev/mapper/mpathb,size=1g,openflags=o_direct,threads=128
sd=sd3,lun=/dev/mapper/mpathc,size=1g,openflags=o_direct,threads=128
sd=sd4,lun=/dev/mapper/mpathd,size=1g,openflags=o_direct,threads=128

wd=wd1,sd=sd*,xfersize=1m,readpct=70,seekpct=0
rd=run1,wd=wd1,iorate=max,elapsed=999h,interval=1
This is a 70% read / 30% write sequential workload.

About

Hiya, my name is Paul Johnson and I'm a software engineer working on the Oracle ZFS Storage Appliance .

Search

Categories
Archives
« July 2015
SunMonTueWedThuFriSat
   
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
 
       
Today