Wednesday Feb 25, 2015

Deploying MySQL over Fibre Channel / iSCSI using the Oracle ZFS Storage Appliance

The blog post serves as a follow-up to Deploying MySQL over NFS using the Oracle ZFS Storage Appliance. The benefits remain the same which means this discussion will focus completely on configuring your Oracle ZFS Storage Appliance and database server.

Configuring the Oracle ZFS Storage Appliance

Depending on which block protocol you plan on using, set up target and initiator groups per the following documentation:

These instructions assume you are using a target and initiator group called 'mysql-tgt' and 'mysql-init'

1. From the ZFS controller’s CLI, create a project called ‘mysql’:

        zfs:> shares project mysql

2. Set logbias to latency to leverage write flash capabilities:

        zfs:shares mysql (uncommitted)> set logbias=latency
                               logbias = latency (uncommitted)

3. Commit the settings:

        zfs:shares mysql (uncommitted)> commit

4. Select the 'mysql' project:

        zfs:> shares select mysql

5. Create a LUN called 'innodb-data' to hold data files:

        zfs:shares mysql> lun innodb-data

6. Set the volume block record size to 16K to match Innodb’s standard page size:

        zfs:shares mysql/innodb-data (uncommitted)> set volblocksize=16K
                               volblocksize = 16K (uncommitted)

7. Set the volume size to a value large enough to accommodate your database:

        zfs:shares mysql/innodb-data (uncommitted)> set volsize=1T
                               volsize = 1T (uncommitted)

8. Set the initiator and target groups:

        zfs:shares mysql/innodb-data (uncommitted)> set initiatorgroup=mysql-init
                               initiatorgroup = 1T (uncommitted)
        zfs:shares mysql/innodb-data (uncommitted)> set targetgroup=mysql-tgt
                               initiatorgroup = 1T (uncommitted)
        zfs:shares mysql/innodb-data (uncommitted)> commit

9. Create a LUN called ‘innodb-log’ to hold the redo logs:

        zfs:shares mysql> lun innodb-log

10. Set the volume block record size to 128K:

        zfs:shares mysql/innodb-log (uncommitted)> set volblocksize=128K
                               volblocksize = 128K (uncommitted)

11. Set the volume size to a value large enough to accommodate your database:

        zfs:shares mysql/innodb-log (uncommitted)> set volsize=256G
                               volsize = 256G (uncommitted)
        zfs:shares mysql/innodb-log (uncommitted)> commit

Configuring your database server

1. A directory structure should be created to contain the MySQL database:

# mkdir -p /mysql/san/innodb-data
# mkdir -p /mysql/san/innodb-log
# chown -R mysql:mysql /mysql/san

2. If using iSCSI, login to the ZFS Storage Appliance:

# iscsiadm -m discovery -t sendtargets -p zfssa-ip-address
# iscsiadm -m node -p zfssa-ip-address -login

3. The multipath configuration held in /etc/multipath.conf should contain the following:

defaults {
    find_multipaths yes
    user_friendly_names yes
}
devices {
    device {
        vendor "SUN"
        product "ZFS Storage 7430"
        getuid_callout "/lib/udev/scsi_id --page=0x83 --whitelisted --device=/dev/%n"
        prio alua
        hardware_handler "1 alua"
        path_grouping_policy group_by_prio
        failback immediate
        no_path_retry 600
        rr_min_io 100
        path_checker tur
        rr_weight uniform
        features "0"
    }
}

4. Discover the LUNs with multipath. This may require a restart of the multipath service to take in the new configuration changes.

# multipath -ll
mpatha (3600144f0fa2f948b0000537cdb970008) dm-2 SUN,ZFS Storage 7430
size=1.0T features='0' hwhandler='0' wp=rw
|-+- policy='round-robin 0' prio=1 status=active
| `-0:0:0:0 sda 8:0   active ready running
|-+- policy='round-robin 0' prio=1 status=enabled
| `-0:0:1:0 sdc 8:32  active ready running
|-+- policy='round-robin 0' prio=1 status=enabled
| `-1:0:0:0 sde 8:64  active ready running
`-+- policy='round-robin 0' prio=1 status=enabled
  `-1:0:1:0 sdg 8:96  active  ready running
mpathb (3600144f0fa2f948b0000537cdbc10009) dm-3 SUN,ZFS Storage 7430
size=256G features='0' hwhandler='0' wp=rw
|-+- policy='round-robin 0' prio=1 status=active
| `-0:0:0:1 sdb 8:16  active ready running
|-+- policy='round-robin 0' prio=1 status=enabled
| `-0:0:1:1 sdd 8:48  active ready running
|-+- policy='round-robin 0' prio=1 status=enabled
| `-1:0:0:1 sdf 8:80  active ready running
`-+- policy='round-robin 0' prio=1 status=enabled
  `-1:0:1:1 sdh 8:112 active ready running

5. Align each LUN on a 32 byte boundary, specifying multiples of 256 for each partition. This is documented more extensively in Aligning Partitions to Maximize Performance, page 31.

# fdisk -u -S 32 /dev/dm-2
# fdisk -u -S 32 /dev/dm-3

6. Build an XFS filesystem* on top of each LUN for the full device path on the first partition:

# mkfs.xfs /dev/mapper/mpathap1
# mkfs.xfs /dev/mapper/mpathbp1
* XFS is preferred on Linux. ZFS is preferred on Solaris.

7. Mount each LUN either from a shell or automatically from /etc/fstab:

# mount -t xfs -o noatime,nodiratime,nobarrier,logbufs=8,logbsize=256k
 /dev/mapper/mpathap1 /mysql/san/innodb-data
# mount -t xfs -o noatime,nodiratime,nobarrier,logbufs=8,logbsize=256k
 /dev/mapper/mpathbp1 /mysql/san/innodb-log
That's it. Refer to the previous blog post to understand how to setup the MySQL database once these filesystems are mounted. In a future blog post, I'll cover how to backup your MySQL database using snapshots and cloning.

Tuesday Jan 14, 2014

Configuration Applications / OS's for 1MB ZFS Block Sizes

The latest release of the ZFS Storage Appliance, 2013.1.1.1, introduces 1MB block sizes for shares. This is a deferred update that can only be enabled inside of Maintenance → System. You can edit individual Filesystems or LUNs from within 'Shares' to enable the 1MB support (database record size).

1m_enable.png

This new feature may need additional tweaking on all connected servers to fully realize significant performance gains. Most operating systems currently do not support a 1MB transfer size by default. This can be very easily spotted within analytics by breaking down your expected protocol by IO size. As an example, let's look at a fibre channel workload being generated by an Oracle Linux 6.5 server:

Example


1m_fcbad.png

The IO size is sitting at 501K, a very strange number that's eerily close to 512K. Why is this a problem? Well, take a look at our backend disks:

1m_diskiobad.png

Our disk IO size (block size) is heavily fragmented! This causes our overall throughput to nosedive.

1m_throughputbad.png

2GB/s is okay, but we can do better if our buffer size was 1MB on the host side.

Fixing the problem


Fibre Channel

Solaris
# echo 'set maxphys=1048576' > /etc/system

Oracle Linux 6.5 uek3 kernel (previous releases do not support 1MB sizes for multipath)
# echo 1024 > /sys/block/dm*/queue/max_sectors_kb 

or create a permanent udev rule:

# vi /etc/udev/rules.d/99-zfssa.rules

ACTION=="add", SYSFS{vendor}=="SUN", SYSFS{model}=="*ZFS*", 
ENV{ID_FS_USAGE}!="filesystem", ENV{ID_PATH}=="*-fc-*", 
RUN+="sh -c 'echo 1024 > /sys$DEVPATH/queue/max_sectors_kb'"

Windows

QLogic [qlfc]
C:\> qlfcx64.exe -tsize /fc /set 1024

Emulex [HBAnyware]
set ExtTransferSize = 1

Please see MOS Note 1640013.1 for configuration for iSCSI and NFS.

Results


After re-running the same FC workload with the correctly set 1MB transfer size, I can see the IO size is now where it should be.

1m_fcgood.png

This has a drastic impact on the block sizes being allocated on the backend disks:

1m_diskiogood.png

And an even more drastic impact on the overall throughput:

1m_throughputgood.png

A very small tweak resulted in a 5X performance gain (2.1GB/s to 10.9GB/s)! Until 1MB is the default for all physical I/O requests, expect to make some configuration changes on your underlying OS's.

System Configuration


Storage

  • 1 x Oracle ZS3-4 Controller
  • 2013.1.1.1 firmware
  • 1TB DRAM
  • 4 x 16G Fibre Channel HBAs
  • 4 x SAS2 HBAs
  • 4 x Disk Trays (24 4TB 7200RPM disks each)
Servers
  • 4 x Oracle x4170 M2 servers
  • Oracle Linux 6.5 (3.8.x kernel)
  • 16G DRAM
  • 1 x 16G Fibre Channel HBA

Workload


Each Oracle Linux server ran the following vdbench profile running against 4 LUNs:

sd=sd1,lun=/dev/mapper/mpatha,size=1g,openflags=o_direct,threads=128
sd=sd2,lun=/dev/mapper/mpathb,size=1g,openflags=o_direct,threads=128
sd=sd3,lun=/dev/mapper/mpathc,size=1g,openflags=o_direct,threads=128
sd=sd4,lun=/dev/mapper/mpathd,size=1g,openflags=o_direct,threads=128

wd=wd1,sd=sd*,xfersize=1m,readpct=70,seekpct=0
rd=run1,wd=wd1,iorate=max,elapsed=999h,interval=1
This is a 70% read / 30% write sequential workload.

About

Hiya, my name is Paul Johnson and I'm a software engineer working on the Oracle ZFS Storage Appliance .

Search

Categories
Archives
« August 2015
SunMonTueWedThuFriSat
      
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
     
Today