Cluster File System with ZFS: Introduction and Configuration
Will this scale? (i.e. to 4 nodes, 40 nodes, 400 nodes, 4000 nodes?)
Is it safe for diversity? (i.e. 3 datacenters, located in different cities or continents?)
What is the performance implication when you add a node?
Can multiple nodes drop dead and the application continue to run without losing data?
Is there a picture or theory of operation explaining the working model of Cluster File System with ZFS?
Here is one such diagram that explains the working model of Cluster File System or Proxy File System (PxFS) with ZFS at a high level.

From the diagram above, it can be observed that all the data flow/transactions go through one machine. PxFS is another layer that sits above the real file system. PxFS client talks to the PxFS Server layer on the primary node (the information on which is being checkpointed to secondary nodes). Then finally the PxFS server layer talks to the ZFS file system, which then talks to the physical storage. Note that this is a very high level architecture and has many other components involved. The block that shows “Cluster Communication Subsystem” exists within each node of the cluster.
Will this scale? (i.e. to 4 nodes, 40 nodes, 400 nodes, 4000 nodes?)
See the URL https://docs.oracle.com/cd/E69294_01/html/E69310/bacfbadd.html#CLCONbacbbbbb
Excerpts from the link above:
Supported Configurations
Depending on your platform, Oracle Solaris Cluster software supports the following configurations:
-
SPARC: Oracle Solaris Cluster software supports from one to 16 cluster nodes in a cluster. Different hardware configurations impose additional limits on the maximum number of nodes that you can configure in a cluster composed of SPARC-based systems. See Oracle Solaris Cluster Topologies in Concepts for Oracle Solaris Cluster 4.4 for the supported configurations.
-
x86: Oracle Solaris Cluster software supports from one to eight cluster nodes in a cluster. Different hardware configurations impose additional limits on the maximum number of nodes that you can configure in a cluster composed of x86-based systems. See Oracle Solaris Cluster Topologies in Concepts for Oracle Solaris Cluster 4.4 for the supported configurations
As you can observe from the diagram in answer for #1, all the client requests go through the PxFS server. As many client nodes, so many read/write operations to PxFS primary. So, the scalability is limited by the server’s bandwidth.
Is it safe for diversity? (i.e. 3 datacenters, located in different cities or continents?)
Yes, this feature can be used in campus clusters where nodes of a cluster can be located in different rooms and with Solaris Cluster Disaster Recovery Framework where you can achieve Multi-Cluster and Multi-site capability.
A cluster file system only exists within a cluster instance. Depending on the distance, nodes of one cluster instance can be spread between two locations. See https://docs.oracle.com/cd/E69294_01/html/E69325/z40001987941.html#scrolltoc
There is support for replication of cluster file systems based on ZFS between clusters which can be at different geographies with services availability managed by the disaster recovery framework. See https://docs.oracle.com/cd/E69294_01/html/E69466/index.html for more information.
What is the performance implication when you add a node?
It will have no impact on Cluster File system as such. However, it also depends on the activity being performed on the Cluster File System.
If the new node joins a cluster as a Secondary, all the existing PxFS information related to the file systems will be checkpointed to the Secondary node, which in turn affects the performance. But if the new node joins the cluster when it has the desired number of secondaries, then the new node is added as a Spare which means no existing data from the PxFS primary is checkpointed to the new node.
Can multiple nodes drop dead and the application continue to run without losing data?
File system access will continue on a node failure. Completion of file and directory operations during a node failure will maintain the semantics of local ZFS file system.
Any file system state change that occurred on a globally mounted cluster file system ensures that the latest state is preserved after a failover operation. A multiple node drop dead scenario is similar to a single node death.
I am trying to determine the use cases for clustering with ZFS… (i.e. is it a “Cloud Solution” where we could lose a datacenter and still have applications continue to run in the remaining datacenters?)
Some of the use cases are:
1) For having the live migration support of Solaris Cluster HA-LDOM data service, global access is a requirement.
2) Shared file systems access multi-tier applications like SAP, Oracle E-Business Suite.
3) Shared file systems for multiple instances of same application deployed in the cluster.
4) ZFS snapshot replication for Cluster File systems with ZFS managed by Solaris Cluster Disaster Recovery Framework.
As you can observe from the diagram, PxFS is a layer above the underlying file system. If you have a requirement for multiple nodes to be able to access ‘the same’ filesystem and performance is not the main requirement – Solaris Cluster File System is the way to go. If the application has heavy read/write requirements, then it is always better to use highly available local file systems.