Following up on my post about OS 8.7
release let's review current best practices.
All Flash Pool Best Practices
The choice of using an All Flash Pool or a Hybrid Storage Pool
is up to you but of course all flash storage appeals to the use case where a well identified subset of high access high value data is present. We have found a good balance using 2 trays of flash devices per ZS5-2 cluster. You may of course use more trays for capacity but for smaller configurations 2 or 3 trays is what gives you optimal $/IOPS. With 2 trays, one can configure the pool with no single point of failure (NSPF)
by configuring half the SSD from each tray onto each head of the storage cluster. It might surprise some, but we also found that using log devices is benefical even in the case of an all flash pool. Log devices hold short lived ZIL
blocks and it's beneficial to not mix those with long lived data blocks to achieve the highest level of performance.
We also found that using logbias = throughput
is what works best with AFP. For AFP, logbias = throughput
gives you low latency synchronous writes and prevents the ZIL from causing too much write inflation. With logbias = throughput, why use slogs then? The reason for slogs in AFPs is twofold. First, logbias is a performance hint to ZFS and not all workloads fit the prerequisite. Given that logbias = throughput
is now advocated for AFP in general, we expect a greater variety of workloads to hit the associated code paths and some workloads just end up benefiting outright from the presence of log devices. But even for workloads that truly are handled in "throughput mode", we've changed the ZIL so that the short lived ZIL tracking blocks are now allocated from the log devices. This change allows better space management in each data devices and improves the process of ZIL recovery during cluster failover events.
New Oracle DB and RMAN Best Practices
With all these things in place, it's actually time to review our Oracle DB best practises (BP). The BP is what we recommend when setting up a new DB on ZFSSA storage when there is no special knowledge or data to support some alternative configuration. It is designed to deliver good performance on a variety of workloads and reduce the likelihood of problems. Historically we advocated that DB data files be setup with matching DB blocksize and ZFS recordsize along with the setting of logbias = throughput
. Matching blocksize was to prevent read-modify-writes from hitting HDD and logbias=throughput allowed log device to better handle the important REDO log workload since the log devices were relieved from handling throughput mode DB writer writes.
But today, we have extra large memory configurations for primary cache and very large and effective L2ARC devices that greatly mitigate the read-modify-write (RMW) penalty. Moreover todays log devices are much cheaper and much more capable (throughput wise) than in the past. They can handle the most demanding DB workload (redo log + DB data) without being taxed.
For those reasons, the new oracle DB best practices are now revised. In the absence of data to suggest an alternative, we're advocating the use of 32K ZFS recordsize to hold regular DB datafile blocks. The larger record helps reduce the metadata management (4X less indirect blocks compare to 8K) and improves the on-disk layout of DB data blocks for faster DB restore operations. We're also suggesting the use of LZ4 compression which provides a good compromise between space saving and CPU compression cost.
Read more about the excellent work on DB and RMAN best practice done by my colleague Greg Drobish :