I have spoken initially in an older blog entry "Open Storage - The (R)Evolution
" about ZFS and Hybrid Storage Pools. Now I would like to dive a bit deeper into this great feature.
ZFS is not just a filesystem. It is actually a hybrid filesystem and volume manager. These two functions are the main source of the flexibility of ZFS. Being hybrid means that ZFS manages storage differently than traditional solutions. Traditionally, you have a 1:1 mapping of filesystems to disk partitions, or alternately you have a 1:1 mapping of filesystems to logical volumes, each of which is made out of one ore more disks. In ZFS, all disks participate in one storage pool. Each filesystem can use all disk drives in a pool, and since the filesystem is not mapped to a volume, all space is shared! Space can be reserved, so that a single filesystem cannot fill up the whole pool and space reservations can be changed at will. Growing or shrinking of a filesystem isn't just painless, it is irrelevant!
The definition of hybrid storage within ZFS goes even further! A storage pool can have more than just logical volumes or partitions. You can split the pool into three different areas:
- ZIL - ZFS Intend Log
- Read / Write Cache Pool
- High Capacity Pool
By using different devices for each position above, you can tremendously increase the performance of your filesystem.
ZFS Intend Log (ZIL)
All file system related system calls are logged as transaction records by the ZIL. The transaction records contain sufficient information to replay them back in the event of a system crash.
The ZIL performance is critical for performance of synchronous writes. A common application that issues synchronous writes is a database. This means that all of these writes run at the speed of the ZIL.
Synchronous writes can be quickly written and acknowledged by the "slog" in ZFS jargon to the client before the data is written to the storage pool. The slog is used only for small transactions while large transactions use the main storage pool – it's tough to beat the raw throughput of large numbers of disks. A flash-based log device would be ideally suited for a ZFS slog. Using such a device with ZFS can reduce, latencies of small transactions to a range of 100-200µs.
Read Cache Pool
How many data on your traditional storage systems are active data? 5%? 10%? Wouldn't it be nice to have a low latency solid storage that delivers you the information in time and without additional IO on your traditional storage (disks)? Is your RAM not sufficient to store all hot read data or is it too expensive to have 256GB RAM?
That is exactly where the read cache pool has it's role.
ZFS and most other filesystems use a L1ARC (Adaptive Replacement Cache) that resides in your RAM memory. The drawback of this is that it is not solid and very expensive. Not solid means after each reboot you rely for a certain time on your traditional storage until the cache has been rebuilt for optimal performance.
The people from the ZFS team have now also implemented a L2ARC that can use whatever device to improve your read performance!
The level 2 ARC (L2ARC) is a cache layer in-between main memory and the disk. It uses dedicated storage devices to hold cached data, which are populated using large infrequent writes. The main role of this cache is to boost the performance of random read workloads. The intended L2ARC devices include short-stroked disks, solid state disks, and other media with substantially faster read latency than disk.
Imagine a 10TB file system with a 1TB SSD L2ARC! Screaming fast!
High Capacitiy Pool
The high capacity pool now just takes care for the mass storage. You can basically go with low performing high capacity disks as most of your IO/s are being handled in the L2ARC and ZIL.
Old fashioned Storage vs the new Fashion
The following pictures illustrate the historic view of filesystems and storage versus the actual view and implementation:
By combining the use of flash as an intent-log to reduce write latency with flash as a cache to reduce read latency, we create a system that performs far better and consumes less power than a traditional system at similar cost. It's now possible to construct systems with a precise mix of write-optimized flash, flash for caching, DRAM, and cheap disks designed specifically to achieve the right balance of cost and performance for any given workload with data automatically handled by the appropriate level of the hierarchy. Most generally, this new flash tier can be thought of as a radical form of hierarchical storage management (HSM) without the need for explicit management.
ZFS allows Flash to join DRAM and commodity disks to form a hybrid pool automatically used by ZFS to achieve the best price, performance and energy efficiency conceivable. Adding Flash will be like adding DRAM - once it's in, there's no new administration, just new capability.
And do you know what? All of those features are part of our recently announced Sun Storage 7000 Unified Storage Systems!