Improving Directory (DSEE) Import Rate Through ZFS Caching
By wahmed on Jan 26, 2010
As we all know, the process of importing data into the directory database is the first step in building a directory service. Importing is an equally important step in recovering from a directory disaster such as an inadvertent corruption of the database due to hardware failure or an application with a bug. In this scenario, a nightly database binary backup or an archived ldif could save the day for you. Furthermore, if your directory has a large number of entries (tens of millions) then the import process can be time consuming. Therefore, it is very important to fine tune the import process in order to reduce initialization and recovery time.
Most import tuning recommendations have focused on the write capabilities of the disk subsystem. Undeniably, it is the most important ingredient of the import process. However as we all know, the input to the import process is a ldif file which is used to initialize and (re)build the directory database. As demonstrated by our recent performance testing effort, the location of the ldif file is also very important. I'll mainly concentrate on ZFS in this post as time and again it has proven to be the ideal filesystem for the Directory. Note in some cases, you can save hour's of time by even the smallest gain in the import rate. Especially if your ldif file has tens of millions of entries.
Generally speaking there are few gotchas that need to be kept in mind for the import process. First thing is to ensure that you have a separate partition for your database, logs and transaction logs (this is actually true for any filesystem). For ZFS this translates into separate Pools. Similarly it is recommended to place the ldif file on a pool that is not being used for any other purpose during importing. This maximizes the read I/O for that pool without having to share it with any other process. In ZFS, the Adaptive Replacement Cache (ARC) cache plays an important role in the import process as seen in the table below. ZFS caches can be controlled via the primarycache and secondarycache properties that can be set via the zfs set command. This excellent blog explains these caches in detail. To understand and prove the effectiveness of these caches we ran few tests of imports on a SunFire X4150 system with ldif files of 3 million and 10 million entries each. The ldif file was generated using the telco.template via make-ldif. Details about the Hardware, OS and ZFS configuration and other useful commands are listed in the Appendix.
|Dataset||primarycache (6GB)||secondarycache||Time taken (sec)||Import Rate (entries/sec)|
The table shows the results of various combinations of primarycache and secondarycache on the ldifpool only. The db pool where the directory database is created always had primarycache and secondarycache set to all. The astute reader will observe from the Appendix that the ZFS Intent Log (ZIL) is actually configured on flash memory. This did not skew our results as we are concerned with the ldifpool (where the ldif file resides).
So going back to the table, as expected the primarycache (ARC in DRAM) is obviously the key catalyst in the read performance. Disabling it causes a catastrophic drop in the import rate primarily because prefetching also gets disabled and a lot more reads have to go to the disk directly. The charts below (data obtained via iostat -xc) depicts this very clearly as the disk are lot busier in reading when the primarycache is set to none for the 3 Million ldif file import.
So far, I have concentrated on discussing the primarycache (ARC). What about the secondarycache (L2ARC)? Typically the secondarycache is utilized optimally when used with a flash memory device. We did have flash memory device (Sun Flash F20) added to the ldifpool, however our reads were sequential and by design the L2ARC does not cache sequential data. So for this particular use case the secondarycache did not come into play as evident by the results in the table. Maybe if we limited the size of the ARC to just 1GB or less, the pre-fetches would have "spilled" over to the L2ARC and hence the L2ARC would have contributed more.
Finally a disclaimer, since the intent of this exercise is to show the effect of ZFS caches, the import rate results in the table are for comparison and not a benchmark. And i would also like to thank my colleagues who help me with this blog. These specialists are Brad Diggs, Pedro Vazquez, Ludovic Poitou, Arnaud Lacour, Mark Craig, Fabio Pistolesi, Nick Wooler and Jerome Arnou.
zm1 # uname -a SunOS zm1 5.10 Generic_141445-09 i86pc i386 i86pc zm1 # cat /etc/release Solaris 10 10/09 s10x_u8wos_08a X86 Copyright 2009 Sun Microsystems, Inc. All Rights Reserved. Use is subject to license terms. Assembled 16 September 2009 zm1 # cat /etc/system | grep -i zfs \* Limit ZFS ARC to 6 GB set zfs:zfs_arc_max = 0x180000000 set zfs:zfs_mdcomp_disable = 1 set zfs:zfs_nocacheflush = 1 zm1 # zfs set primarycache=all ldifpool zm1 # zfs set secondarycache=all ldifpool zm1 # echo "::memstat" | mdb -k Page Summary Pages MB %Tot ------------ ---------------- ---------------- ---- Kernel 189405 739 2% ZFS File Data 52657 205 1% Anon 184176 719 2% Exec and libs 4624 18 0% Page cache 7575 29 0% Free (cachelist) 3068 11 0% Free (freelist) 7944877 31034 95% Total 8386382 32759 Physical 8177488 31943 NOTE: The system had three ZFS pools. The “db” pool for storing the directory database and striped across 6 SATA disks with the ZIL on a flash memory. The “ldifpool” pool was were the ldif file, transaction and access logs were located. In the import process the transaction and access logs are not used therefore the pool was entirely dedicated to the ldif file. zm1 # zfs get all ldifpool | grep cache ldifpool primarycache none local ldifpool secondarycache none local zm1 # zpool list NAME SIZE USED AVAIL CAP HEALTH ALTROOT db 816G 2.25G 814G 0% ONLINE - ldifpool 136G 93.0G 43.0G 68% ONLINE - rpool 136G 75.6G 60.4G 55% ONLINE - zm1 # zpool status -v pool: db state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM db ONLINE 0 0 0 c0t1d0 ONLINE 0 0 0 c0t2d0 ONLINE 0 0 0 c0t3d0 ONLINE 0 0 0 c0t4d0 ONLINE 0 0 0 c0t5d0 ONLINE 0 0 0 c0t6d0 ONLINE 0 0 0 logs c2t0d0 ONLINE 0 0 0 errors: No known data errors pool: ldifpool state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM ldifpool ONLINE 0 0 0 c0t7d0 ONLINE 0 0 0 cache c2t3d0 ONLINE 0 0 0 errors: No known data errors pool: rpool state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM rpool ONLINE 0 0 0 c0t0d0s0 ONLINE 0 0 0 errors: No known data errors ds@dsee1$ du -h telco_\* 48G telco_10M.ldif 14G telco_3M.ldif ds@dsee1$ grep cache dse.ldif | grep size nsslapd-dn-cachememsize: 104857600 nsslapd-dbcachesize: 104857600 nsslapd-import-cachesize: 2147483648 nsslapd-cachesize: -1 nsslapd-cachememsize: 1073741824