So, what makes ZFS so cool? (Part I: high level overview)
By Karoly Vegh-Oracle on May 16, 2012
I have the privilege to do Solaris 11 tech-updates/demos at customers. It always amazes me how much they are amazed by ZFS. Don't get me wrong, ZFS is really cool. But it isn't exactly new technology, it's been around for a while now, the first implementations in 2003, included in Solaris 10 since S10 Update 2 in 2006. Everyone has heard about it being awesome, but every now and then I get the question for details: So tell me, what really makes ZFS so cool?
Let me tell you about it.
First and foremost: What were Sun's motivations to go and implement a new data management technology? Let's see.
- They had enough of storage capacity limitations of existing filesystems
- They were fed up with the complexity and the static nature of the managing data systems
- They considered data loss due to silent datacorruption inacceptable
- They could not bear the thought of partially written IOs endangering data consistence
- They wanted consistent rollback functionality to any other previous states
- They believed tools external to the filesystem were unreliable and an unintegrated way to provide data services
- They wanted performance through using hybrid storage elements and transparent caching/fetching within the pool.
- To remove capacity limitations, they made ZFS a 128-bit filesystem. This makes ZFS capable to address 256 quadrillion zettabytes. With this addressspace you could practically store all the digital data ever generated on Earth. That is, you probably never will meet the problem having to create another zpool because the existing one isn't capable to address more storage. 
- To reduce administration complexity, they did the following:
- They have moved RAID functionality from external (SVM, VxVM, HW raid controllers) into the zpools
- They eliminated semi-static logical volume management completely, and defined filesystems with no static definition of size, but as simple, hierarchical management points in the pool.
- That is, no need to grow/shrink volumes, for they do not exist, and no need to grow/shrink filesystems, for they are dynamically growing and shrinking in the zpool with the amount of data changing within them.
- Also, all you need to use is only two commands, zpool and zfs with their intuitively usable subcommands (create/list/destroy/get/...) to manage your data structures.
- To avoid data corruption, for they understood that you can't avoid physical bitrot, that is silent datacorruption on the disks, so they have decided to checksum every single block written into the pool. These checksums are controlled at read-time and self-healed from the redundant blocks (mirror, raidz).
- To forego partially successful writes ruining data consistence they have implemented ZFS as a transactional filesystem. That is, either a write is completely done, or not at all. Also, changing data happens on a Copy on Write way, that is, reading the relevant blocks, and not modifying the ones being changed, but writing the blocks with changed content to an unused area, leaving the original ones untouched. Both original and changed states are exist at the end of the modification write, then original blocks are marked as freespace (metadata released).
- To be able to rollback to previous states they snapshot the metadata, and after CoW modification simply not throwing that away. See? Doing snapshots, by not releasing metadata and blocks. It is sometimes easier to snapshot than not to :)
- They implemented zfs-internal data services like encryption, deduplication, compression, snapshots, cloning...
- To achieve both read- and write performance, they have implemented a hierarchical and configurable cache mechanism, using main memory, L2 cache (even with SSDs) and disks, and autotiering between them.
So if next time anyone asks you about why ZFS is cool, tell them:
- amazing storage addressing capability
- builtin RAID, no need for LVM, dynamic hierarchical filesystems
- no datacorruption due to everything is checksummed and there is self-healing
- transactional and copy-on-write features
- snapshots as a natural capability
- builtin dataservices
- hybrid storage pool performance
...and then of course we didn't yet talk about replication, shadow migration, shares, hierarchical filesystems, delegation, cache policies, dynamical property settings, dynamic striping and autoexpand, online versionupgrades, etc. Should you want me to write about those in a "Part II: a deeper dive" post - let me know in the comments.
 Although, who knows. Remember working with 180KB "large" floppies? Now my phone has a storage capacity 200.000 times larger than that. My current phone could replace the data storage needs of a smaller country back in the '80s. According to my quick exponential estimation, if data grows at this speed, then in around a 100 years we will need something beyond 128 bit filesystems :)