Tuning tips for ZFS with IBM DB2
By Chang Shu on Jul 30, 2010
1. Use the latest Solaris version
Use the latest Solaris 10 release you can and apply the latest kernel patch the customer allows. Since the performance of ZFS is being improved continually, it is highly recommended to use the latest Solaris 10 release. At the time of writing, the latest release is Solaris 10 10/09. When possible, apply Solaris 10 patch 141444-09 (for SPARC) or 141445-09 (for x86/x64) or later on Solaris releases prior to Solaris 10 10/09.
2. Tune the record size
The general rule is to set record size equal to db2 page size for the filesystem which handles db2 data files. For db2 log files, using a distinct filesystem or pool (preferably) with the default record size of 128k is appropriate. For temporary data files, it is also better to use a distinct filesystem with the default record size of 128k.
When the database page size is less than the 'pagesize' of the OS (8k on SPARC and 4k on x64) set the record size as the 'pagesize' of the OS.
The separation of table data, index, and temp helps for database with high IO activities and improves the overall performance.
For DSS workload, the main I/O activity is reading the large tables in sequential. This is better handled by large IOs with128k record size.
Please note that the modification of the record size property (and other properties) in a ZFS file system only affects the files created after the change. To apply the change for existing data files, you need to first change the property of the file system, and then copy the data files into it or repopulate the data.
3. Keep sufficient free space in zpool
To avoid any performance impact when the volume used goes over 80% of the zpool capacity, keeping more than 20% of free space is suggested for DB2 database. You can set a quota on the zpool to guarantee that 20% free space is available at all time.
4. Separate zpool for log files
It is advised to use a dedicated zpool for db2 log files. Using a dedicated LUN on dedicated disks for log files allows ZFS to handle the latency sensitive log activity in priority and provides better observability. This will really help the database workload with high transaction rate.
The log device needs not be large. Typically a small LUN, or slice of LUN cut in a storage array could be used as the separated log device.
5. Limit the ARC size
ZFS use ARC (Adjustable Replacement Cache) to cache the data from all pools. It dynamically shrinks or grows based on memory pressure. If your DB2 application memory requirement is significantly large and well defined, then it is advised to warn ZFS not to grow it's own cache too much. This will limit the need for ZFS to actually shrink it's cache in period of peak memory demand.
To estimate the upper limit for ZFS ARC, start from the physical memory, subtracts all large memory usuage known (DB2 instance memory, database memory including buffer pools, heaps, etc), then declare about 80% of the remaining memory estimate as the value for zfs_arc_max.
e.g, to cap the ARC size to 30GB, set the zfs:zfs_arc_max parameter in the /etc/system file:
set zfs:zfs_arc_max = 0x780000000
set zfs:zfs_arc_max = 32212254720
You can use kstat command to verify the size of the zfs caches.
6. Control what is cached in ZFS
In Solaris 10 10/09 release, two new ZFS file system properties -
'primarycache' and 'secondarycache' - are provided
in that allow you to control what is cached in the primary cache (ARC) or
the secondary cache (L2ARC). The possible choices for both properties are: all, none, and metadata. If 'all' is chosen, then both user data and metadata are cached. If 'none' is chosen, then neither user data nor metadata is cached. If 'metadata' is chosen, then only metadata is cached. The default is 'all'.
If you wants to cache the data at the DB2 database level, then you can set
primarycache=metadata so the ARC is used for metadata only. This is somewhat like unbuffered I/O.
Check out this thread for discussion regarding ZFS and UFD Direct I/O.
7. Adjust the ZFS I/O queue size
ZFS controls the I/O queue depth for a given LUN with the zfs_vdev_max_pending parameter.
In cases when there are many LUNs in a zfs pool, and each LUN may contain many physical disks, the ZFS queue could become a limiting factor on reads. The ZFS I/O queue size is set to 35 by default, which means there are 35 concurrent I/Os per LUN, in cases when zpool contains many LUNs, it could bring contention on the storage channel and lead to inflated service time. To relief the contention and reduce the service time, you can adjust the ZFS I/O queue size (down to 10 in the following example) dynamically by:
echo zfs_vdev_max_pending/W0t10 | mdb -kw
For more information, please check out:
ZFS Best Practices Guide
http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide•ZFS Evil Tuning Guide