ZFS and the Hybrid Storage Concept

I have spoken initially in an older blog entry "Open Storage - The (R)Evolution" about ZFS and Hybrid Storage Pools. Now I would like to dive a bit deeper into this great feature.

Hybrid Storage

ZFS is not just a filesystem. It is actually a hybrid filesystem and volume manager. These two functions are the main source of the flexibility of ZFS. Being hybrid means that ZFS manages storage differently than traditional solutions. Traditionally, you have a 1:1 mapping of filesystems to disk partitions, or alternately you have a 1:1 mapping of filesystems to logical volumes, each of which is made out of one ore more disks. In ZFS, all disks participate in one storage pool. Each filesystem can use all disk drives in a pool, and since the filesystem is not mapped to a volume, all space is shared! Space can be reserved, so that a single filesystem cannot fill up the whole pool and space reservations can be changed at will. Growing or shrinking of a filesystem isn't just painless, it is irrelevant!

zfs_hybrid_storage_model.pngThe definition of hybrid storage within ZFS goes even further! A storage pool can have more than just logical volumes or partitions. You can split the pool into three different areas:
  1. ZIL - ZFS Intend Log
  2. Read / Write Cache Pool
  3. High Capacity Pool
By using different devices for each position above, you can tremendously increase the performance of your filesystem.

ZFS Intend Log (ZIL)

All file system related system calls are logged as transaction records by the ZIL. The transaction records contain sufficient information to replay them back in the event of a system crash.

The ZIL performance is critical for performance of synchronous writes. A common application that issues synchronous writes is a database. This means that all of these writes run at the speed of the ZIL.

Synchronous writes can be quickly written and acknowledged by the "slog" in ZFS jargon to the client before the data is written to the storage pool. The slog is used only for small transactions while large transactions use the main storage pool – it's tough to beat the raw throughput of large numbers of disks. A flash-based log device would be ideally suited for a ZFS slog. Using such a device with ZFS can reduce, latencies of small transactions to a range of 100-200µs.

Read Cache Pool

How many data on your traditional storage systems are active data? 5%? 10%? Wouldn't it be nice to have a low latency solid storage that delivers you the information in time and without additional IO on your traditional storage (disks)? Is your RAM not sufficient to store all hot read data or is it too expensive to have 256GB RAM?

That is exactly where the read cache pool has it's role.

ZFS and most other filesystems use a L1ARC (Adaptive Replacement Cache) that resides in your RAM memory. The drawback of this is that it is not solid and very expensive. Not solid means after each reboot you rely for a certain time on your traditional storage until the cache has been rebuilt for optimal performance.

The people from the ZFS team have now also implemented a L2ARC that can use whatever device to improve your read performance!

The level 2 ARC (L2ARC) is a cache layer in-between main memory and the disk. It uses dedicated storage devices to hold cached data, which are populated using large infrequent writes. The main role of this cache is to boost the performance of random read workloads. The intended L2ARC devices include short-stroked disks, solid state disks, and other media with substantially faster read latency than disk.

Imagine a 10TB file system with a 1TB SSD L2ARC! Screaming fast!

High Capacitiy Pool

The high capacity pool now just takes care for the mass storage. You can basically go with low performing high capacity disks as most of your IO/s are being handled in the L2ARC and ZIL.

Old fashioned Storage vs the new Fashion

The following pictures illustrate the historic view of filesystems and storage versus the actual view and implementation:
Old Model
Old Model

New Model
ZFS Model
ZFS Model


By combining the use of flash as an intent-log to reduce write latency with flash as a cache to reduce read latency, we create a system that performs far better and consumes less power than a traditional system at similar cost. It's now possible to construct systems with a precise mix of write-optimized flash, flash for caching, DRAM, and cheap disks designed specifically to achieve the right balance of cost and performance for any given workload with data automatically handled by the appropriate level of the hierarchy. Most generally, this new flash tier can be thought of as a radical form of hierarchical storage management (HSM) without the need for explicit management.

ZFS allows Flash to join DRAM and commodity disks to form a hybrid pool automatically used by ZFS to achieve the best price, performance and energy efficiency conceivable. Adding Flash will be like adding DRAM - once it's in, there's no new administration, just new capability.

And do you know what? All of those features are part of our recently announced Sun Storage 7000 Unified Storage Systems!


Unfortunately, this storage is still "old fashioned" with regard to data archiving and tiered capacity. You've basically added "tier zero" but have neglected tier 3. When is SAMFS coming to Amber Road?

Posted by Charles Soto on November 13, 2008 at 08:53 AM CET #

Hi Charles,

I don't agree with you. ZFS with all its features (128Bit File System, End-to-End data integrity, Hybrid Storage Pools, Snapshots, DTrace Analytics) is "new fashioned". You are speaking about one particular feature that isn't yet in ZFS. As ZFS is still a new file system, we couldn't implement every feature right on the beginning. Features are prioritized by customers and open source developers. We see a high demand on encryption and de-duplication, so focus will be most likely on this two features in short term. Archive is definitely a feature that is also on the list. Let's see how and when it turns its way into ZFS.


Posted by Anatol Studler on November 13, 2008 at 10:11 AM CET #

Oh, I don't disagree that ZFS brings tremendous performance and data integrity benefits to tier 1 and tier 0 (where most of the money goes in storage). This is all great - get "DMX" like performance from commodity hardware at commodity pricing. It just currently has a big glaring hole with respect to tier 3, and to a lesser extent, tier 2 (which is basically moved down to tier 1 because tier 0 makes formerly tier 2 stuff behave more like tier 1 - whew!).

Anyway, a good temporary solution could involve NFS-based, file-level (e.g. not "zfs send") replication to a SAMFS archive server. I don't know...

Posted by Charles Soto on November 13, 2008 at 10:41 AM CET #

Post a Comment:
  • HTML Syntax: NOT allowed

In this blog you can find interesting content about solutions and technologies that SUN is developing.

The blog is customer oriented and provides information for architects, chief technologists as well as engineers.


« April 2014