Why SUNW.nfs is required to configure HA-NFS over ZFS in Solaris Cluster?


The typical way of configuring a Highly Available NFS file system in Solaris Cluster environment is by using an HA-NFS agent (SUNW.nfs). The SUNW.nfs agent simply does the sharing of the file systems which has to be exported.

With the support of ZFS as failover file system in SUNW.HAStoragePlus, there are 2 possible ways to configure Highly Available NFS file system with ZFS as underlying file system. They are

  1. By enabling ZFS sharenfs property (i.e sharenfs=on) for filesystems of zpool and without using SUNW.nfs.
  2. By disabling the ZFS sharenfs property (i.e sharenfs=off) for filesystems of zpool and letting SUNW.nfs does the actual share.

Among the above 2 approaches HA-NFS work correctly only when SUNW.nfs agent is used (i.e option 2), and this blog explains the rationale behind the requirement of SUNW.nfs, to configure an Highly Available NFS file system in Cluster environment with ZFS.

Lock reclaiming by Clients (NFSv[23])

The statd(1M) keeps track of clients and processes holding locks on the server.The server can use this information to allow the client to re-claim the lock after NFS server reboot/failover.

When a file system is shared by setting ZFS sharenfs property on and not using SUNW.nfs, the lock information will be kept under /var/statmon which is local file system and specific to a host. So in the case of failover the stored information is not available on the machine to which the server is failed over. This makes server unable to send requests to clients to re-claim the locks.

This problem has been overcome by SUNW.nfs agent by keeping the monitor information in stable storage (which is on multi-ported disks) and accessible from all cluster nodes.

State information of clients (NFSv4)

NFSv4 is stateful protocol where nfsd(1M) keeps track of client status like opened/locked files in stable storage.

When a file system is shared by setting ZFS sharenfs property on, the stable storage will be under /var/nfs which is not accessible from all nodes of cluster. In this case, in a server failover scenario, the clients reclaim requests will fail and might result in client applications being exited (unless the client applications catch the SIGLOST signal).

This problem has been overcome by SUNW.nfs agent by keeping the state information in stable storage which is shared among cluster nodes and this helps server to make clients to reclaim their state.


The pictorial difference is shown below. 

HA-NFS without SUNW.nfs
 HA-NFS without SUNW.nfs

 

 

HA-NFS using SUNW.nfs
 HA-NFS using SUNW.nfs

 
To say more precisely the ZFS sharenfs property of zfs file system is not meant to work in Solaris Cluster environment and hence using SUNW.nfs agent is must for HA-NFS over ZFS.

P.S:
The stable storage where SUNW.nfs keeps information is on ZFS highly available file system (which is value of PathPrefix extension property of SUNW.nfs resource type).

Venkateswarlu Tella (Venku)
Solaris Cluster Engineering

Comments:

Hi Venkateswarlu

Do you have some configuration guides on how to get this working?

They are very hard to find. Any help would be appreciated.

Thanks

Wayne

Posted by Wayne Dovey on April 15, 2008 at 08:19 PM PDT #

Hi Wayne,
The following document explains configuring HA-NFS on ZFS file system.
http://docs.sun.com/app/docs/doc/819-2979/cjabafhj?a=view
Let me know if you are looking for any specific information or any problems you are facing while configuring.

-Venku

Posted by guest on April 15, 2008 at 08:36 PM PDT #

I'm wondering what would be required if there wasn't shared storage and AVS were mirroring the local ZFS pool from a hot system to a standby. Would it possible to forego this complexity by simply making /var/statmon and /var/nfs symlinks to directories on the mirrored ZFS pool?

Posted by Maurice Volaski on July 14, 2008 at 04:38 AM PDT #

It should work as long the AVS syncs the latest data to standby node at the point of primary node being down.

Posted by venku on July 14, 2008 at 05:19 PM PDT #

Hi,

I was doing some tests with HA-nfs cluster and nfs clients and in one solaris 10 up5 client I got the next error for nfs:

NFS compound failed for server ha-nfs: error 5 (RPC: Timed out)
NFS compound failed for server ha-nfs: error 5 (RPC: Timed out)
NFS compound failed for server ha-nfs: error 5 (RPC: Timed out)
NFS compound failed for server ha-nfs: error 5 (RPC: Timed out)
NFS compound failed for server ha-nfs: error 5 (RPC: Timed out)
NFS compound failed for server ha-nfs: error 5 (RPC: Timed out)
nfs mount: mount: /nfs-filesystem: Connection timed out

I got this error after a nfs failover and I cannot mount the nfs again. This problem was only for this server, others were ok, this server as like the others were running tars, finds and cp's over the nfs (stress tests).

After look for some information in internet I found that may be some information in the rpc's for nfs, maybe lockd and nfsd, I reboot the cluster nodes, reboot the complete cluster and the nfs client but nothing was working. I also stop, restart, offline and online the nfs resource group but again didn't work for only this client.

This is a nfs resource group working over zfs filesystem with nfs attr disable.

Any comment about how to resolve it is appreciated...

Posted by Manuel on January 06, 2009 at 07:02 AM PST #

When I try to configure with the guideline shown above (http://docs.sun.com/app/docs/doc/819-2979/cjabafhj?a=view), I got stuck at the point that create a HAStoragePlus resource, due to the invalid global device (error log: Invalid global device path /dev/did/dsk/d1s6 detected) . Please advise how can I tackle this issue before move on.

Thanks a lot.

Posted by Ken on February 08, 2009 at 11:10 AM PST #

HAStoragePlus resource requires global device paths to work.
It looks like you are planning to configure DID disk d1, and hence the corresponding global device path is /dev/global/dsk/d1s6

Posted by Venku on February 09, 2009 at 06:53 PM PST #

So, can I mount a shared folder from another Solaris as a valid global device?

Posted by Ken on February 09, 2009 at 07:02 PM PST #

Thank you Venku for this informative document. However let me explain my issue here. I tried to use SUNW.nfs for zfs storage in a 2 node sun cluster3.2

If I share /zpool/abc using sunw.nfs (/zpool/SUNW.nfs/dfstab..) then sub directories under "abc" don't get nfs shared automatically meaning the client can only mount server:/abc but can not mount server:/abc/def or subdirectories under "def". This means I have to edit dfstab for each new directory under "def"

This will be big issue for me because there will be too many directories under "def" and i don't want to edit all the time.
This is simplified by zfs's sharenfs=on. If I disable sunw.nfs and enable zfs sharenfs on root zfs then all subdirectories get shared automatically by maintaining the hierarchy.

I have explained this on forums.
http://forums.sun.com/thread.jspa?threadID=5401531&tstart=0

by the way note that sub directories under "def" are zfs filesystems only.

Please reply in the thread or forum so I will know when there is an update.

Thanks again.

Posted by upendra on August 10, 2009 at 12:35 AM PDT #

I understand why the option 2 is needed for ha of nfs. But if I need more the flexibility and easy configuration of option 1.The ha point of option 2 is not higher than the usage of option 1. So is it ok and still supported if I don't use the nfs-ha agent and configure option 1?
Or is there an easy configuration of ha nfs agent with the flexibility of zfs sharenfs available (inheritance,...?) so I can use option 2?

Cheers...

Posted by pb on September 20, 2009 at 06:35 PM PDT #

Right now there is no easy option in ha nfs agent to make all the ZFS file systems in the pool to share. It has to be done manually so that applications will behave properly after a failover.

Posted by Venku on September 21, 2009 at 11:47 PM PDT #

Post a Comment:
  • HTML Syntax: NOT allowed
About

mkb

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today