We hear more and more about data hungry workloads and the ravenous appetite of AI training. There are a number of storage solutions that can help with data hungry workloads. We’ve seen a rise in interest in the Lustre file system, GPFS and some other main stream cluster file systems. However don’t overlook NFS for these workloads: pNFS offers a native Linux kernel mechanism to achieve very high data throughput.
Unlike traditional NFS, which fetches data from a single storage server, pNFS enables clients to read and write data across multiple storage servers simultaneously. This parallel access to storage servers dramatically boosts data throughput, particularly when paired with modern, high-speed networks.
This blog will introduce the fundamentals of pNFS and provide a step-by-step guide to setting up a pNFS environment with Block layout type. By following these steps, you’ll establish a functional test environment to explore pNFS components and gain a better understanding of pNFS’s parallel file system capabilities. Use this setup to gain hands-on experience with pNFS and experiment with its features.
How Does pNFS Work?
pNFS achieves its impressive performance by decoupling the roles of the Metadata Server (MDS) and the Data Servers (DS). The Data Servers are the instruments in the orchestra and the MDS acts as the “conductor,” managing metadata and coordinating the flow of information, while the actual data is transferred directly between the DSes and the clients. This division of labor ensures efficient, high-speed data access.
Though pNFS has been around since 2010, as introduced in RFC 5661, its adoption has accelerated in recent years. This is largely driven by the growing demand for highly parallel data access in AI and HPC applications.
Beyond NFS: Flexibility at Its Core
One of the most surprising and powerful aspects of pNFS is its flexibility. The data servers don’t even need to use NFS as their backend protocol. Instead, pNFS extends the NFS protocol to allow clients to retrieve a map of where file data resides and the protocols used to access it. This map makes it possible to support backend protocols such as SCSI, iSCSI, or NVMe, in addition to NFS.
pNFS Compatibility Matrix:
As the Linux kernel NFS server only supports pNFS Block Layout, we’ll focus on that configuration in this blog.
Setting Up pNFS with an iSCSI block device in OCI
In this section we explain how to set up a pNFS environment using an iSCSI block device in OCI with VM instances running OL9 UEK7 version 5.15.0-306.177.4.el9uek.x86_64.
Because OCI Block Storage offers iSCSI endpoints, we don’t have to set up a separate iSCSI target to enable this. If you have an iSCSI target available, these instructions will also apply.
-
Create VM instances for the MDS, DS and pNFS client
For instructions on how to create VM instances in OCI, please refer to this guide: https://docs.oracle.com/en-us/iaas/Content/Compute/tutorials/first-linux-instance/overview.htm.
-
Provision OCI SCSI block device for the DS
The DS must have an SCSI block device with iSCSI volume attachment type which is required in order for the MDS and pNFS client to access this block device over the network.
For instructions on how to provision an OCI SCSI block device, please refer to this guide: https://docs.oracle.com/en-us/iaas/Content/Block/Tasks/creatingavolume.htm.
For instructions on how to attach the SCSI block device with iSCSI as the volume attachment type to the VM instance, please refer to this guide: https://docs.oracle.com/en-us/iaas/Content/Block/Tasks/attachingavolume.htm#top.
Setting up the DS
-
Install ‘targetcli’ if it’s not already installed on the DS.
[DS]# yum install targetcli
-
Identify the iSCSI block device used for the pNFS share
For this blog post, we assume the iSCSI storage device name is /dev/sdb for simplicity. In a real usage scenario, the preferred name is the block device UUID since SCSI device name are not stable across reboots.
-
Collect the iSCSI initiator IQN (iSCSI Qualified Name) of the nodes that will access this iSCSI block device.
In this case, they are the IQN’s of the MDS and the pNFS client. On the MDS:
[MDS]# cat /etc/iscsi/initiatorname.iscsi InitiatorName=iqn.1988-12.com.oracle:b658c4fb9f0
On the pNFS client:
[pNFS_client]# cat /etc/iscsi/initiatorname.iscsi InitiatorName=iqn.1988-12.com.oracle:1e974222bac
On the DS, save the IQNs of MDS and DS in the environment variable MDS_IQN and PNFS_CLIENT_IQN:
[DS]# typeset -x MDS_IQN=iqn.1988-12.com.oracle:b658c4fb9f0 [DS]# typeset -x PNFS_CLIENT_IQN=iqn.1988-12.com.oracle:1e974222bac
-
Use ‘targetcli’ to configure the iSCSI block device to allow access from the MDS and the pNFS client.
Create a block device named ‘DS-share’ using /dev/sdb
[DS]# targetcli /backstores/block create DS-share /dev/sdb
Create an IQN for the iSCSI block device
[DS]# iqn=`targetcli /iscsi create |grep iqn | awk '{ print $3 }' |sed 's/\.$//'`
Configure the ACL to allow the MDS and pNFS client to access this iSCSI block device
[DS]# targetcli /iscsi/$iqn/tpg1/acls create $MDS_IQN [DS]# targetcli /iscsi/$iqn/tpg1/acls create $PNFS_CLIENT_IQN
Create an iSCSI LUN for the iSCSI block device ‘DS-share’
[DS] targetcli /iscsi/$iqn/tpg1/luns create /backstores/block/DS-share
Save the iSCSI configuration in the default location /etc/target/saveconfig.json
[DS]# targetcli saveconfig
-
Start the target.service for the DS
This service is needed for the MDS and the pNFS client to discover and connect to iSCSI block device ‘DS-share’.
[DS]# systemctl enable target.service --now
-
Configure firewall on the DS to allow iSCSI port 3260 traffic
[DS]# firewall-cmd --permanent --add-port=3260/tcp [DS]# firewall-cmd --reload
Setting up the MDS
-
Install the required iSCSI packages iscsi-initiator-utils and iscsi-initiator-utils-iscsiuio
[MDS]# yum install iscsi\* -y
-
Start up iscsid.service
This is the Open-iSCSI daemon that implements the control path of iSCSI protocol, plus some management facilities. See iscsid(8) for more details.
[MDS]# systemctl enable iscsid.service --now
-
Discover and connect to the iSCSI block device on the DS
[MDS]# typeset -x DS_IPADDR=ip_address_of_the_DS [MDS]# dsiqn=`iscsiadm -m discovery -t st -p $DS_IPADDR | awk '{ print $2 }'` [MDS]# iscsiadm -m node -T $dsiqn --login Logging in to [iface: default, target: iqn.2003-01.org.linux-iscsi.instance-20250410-1509.x8664:sn.1859518feeed, portal: xxx.xxx.xxx.xxx,3260] Login to [iface: default, target: iqn.2003-01.org.linux-iscsi.instance-20250410-1509.x8664:sn.1859518feeed, portal: xxx.xxx.xxx.xxx,3260] successful.
-
Find the device name of the newly connected iSCSI block device
[MDS]# lsblk -S |grep DS-share sdh 9:0:0:0 disk LIO-ORG DS-share 4.0 8420af3e-5ab2-4001-86a8-171a0e6915ee iscsi
In this example, it is /dev/sdh.
-
Create an XFS file system on the iSCSI block device /dev/sdh
Currently only XFS file system has the supports required to be used with pNFS.
[MDS]# mkfs.xfs /dev/sdh
-
Setup /etc/fstab to mount the new XFS file system and export the DS-share
Create a directory to mount the XFS file system:
[MDS]# mkdir -p /local/DS-share
Get the UUID of the iSCSI block device:
[MDS]# ls -l /dev/disk/by-uuid |grep sdh | awk '{ print $9 }' 7000dbe6-f8bb-4411-9cdc-9256387bd6c8
Add an entry in /etc/fstab using the device UUID:
UUID=7000dbe6-f8bb-4411-9cdc-9256387bd6c8 /local/DS-share xfs _netdev 0 0
NOTE: the ‘_netdev’ option tells the system that the filesystem requires network connectivity before it can be mounted.
Mount the XFS file system:
[MDS]# mount /local/DS-share
Create an entry for the DS-share with ‘pnfs’ option in /etc/exports:
/local/DS-share *(no_subtree_check,pnfs,no_root_squash,rw,sec=sys,insecure)
NOTE: The MDS is managing an XFS file system on the LUN, but is giving clients direct access to the block devices where the files reside via pNFS. This avoids the page cache on the NFS server and lowers its interrupt rate. File data goes from the LUN directly to the pNFS client.
-
Start NFS service
[MDS]# systemctl enable nfs-server.service --now
-
Configure firewall on the MDS to allow NFS service
[MDS]# firewall-cmd --permanent --add-service=nfs [MDS]# firewall-cmd --reload
Setting up the pNFS client
-
Login and connect to the iSCSI block device on the DS
[PNFS-CLIENT]# typeset -x DS_IPADDR=ip_address_of_the_DS [PNFS-CLIENT]# dsiqn=`iscsiadm -m discovery -t st -p $DS_IPADDR | awk '{ print $2 }'` [PNFS-CLIENT]# iscsiadm -m node -T $dsiqn --login Logging in to [iface: default, target: iqn.2003-01.org.linux-iscsi.instance-20250410-1509.x8664:sn.1859518feeed, portal: xxx.xxx.xxx.xxx,3260] Login to [iface: default, target: iqn.2003-01.org.linux-iscsi.instance-20250410-1509.x8664:sn.1859518feeed, portal: xxx.xxx.xxx.xxx,3260] successful. [PNFS-CLIENT]#
-
Mount the DS-share exported by the MDS with version NFSv4.1 or newer:
[PNFS-CLIENT]# mkdir /tmp/mnt [PNFS-CLIENT]# mount -o vers=4.1 MDS_IP_ADDR:/local/DS-share /tmp/mnt
NOTE: ‘nfsstat -m’ displays that the DS-share was mounted using ‘vers=4.1’ since pNFS is just an extension of the NFSv4.1 protocol
Verify NFS client is using SCSI layout type
[PNFS-CLIENT]# /sbin/mountstats | grep pnfs nfsv4: bm0=0xfdffbfff,bm1=0x40fdbe3e,bm2=0x803,acl=0x3,sessions,pnfs=LAYOUT_SCSI,lease_time=90,lease_expired=0