We hear more and more about data hungry workloads and the ravenous appetite of AI training. There are a number of storage solutions that can help with data hungry workloads. We’ve seen a rise in interest in the Lustre file system, GPFS and some other main stream cluster file systems. However don’t overlook NFS for these workloads: pNFS offers a native Linux kernel mechanism to achieve very high data throughput.

Unlike traditional NFS, which fetches data from a single storage server, pNFS enables clients to read and write data across multiple storage servers simultaneously. This parallel access to storage servers dramatically boosts data throughput, particularly when paired with modern, high-speed networks.

This blog will introduce the fundamentals of pNFS and provide a step-by-step guide to setting up a pNFS environment with Block layout type. By following these steps, you’ll establish a functional test environment to explore pNFS components and gain a better understanding of pNFS’s parallel file system capabilities. Use this setup to gain hands-on experience with pNFS and experiment with its features.

How Does pNFS Work?

pNFS achieves its impressive performance by decoupling the roles of the Metadata Server (MDS) and the Data Servers (DS). The Data Servers are the instruments in the orchestra and the MDS acts as the “conductor,” managing metadata and coordinating the flow of information, while the actual data is transferred directly between the DSes and the clients. This division of labor ensures efficient, high-speed data access.

Though pNFS has been around since 2010, as introduced in RFC 5661, its adoption has accelerated in recent years. This is largely driven by the growing demand for highly parallel data access in AI and HPC applications.

Beyond NFS: Flexibility at Its Core

One of the most surprising and powerful aspects of pNFS is its flexibility. The data servers don’t even need to use NFS as their backend protocol. Instead, pNFS extends the NFS protocol to allow clients to retrieve a map of where file data resides and the protocols used to access it. This map makes it possible to support backend protocols such as SCSI, iSCSI, or NVMe, in addition to NFS.

pNFS Compatibility Matrix:

 

I/O protocol
pNFS Layout Type
In-Kernel MDS
Vendor MDS
iSCSI, SCSI, NVMe
Block
Yes, with Linux nfsd
Linux kernel NFS server
NFSv3
Flexfiles
No
Hammerspace NFS server
NFSv4
File
No
NetApp NFS server

 

As the Linux kernel NFS server only supports pNFS Block Layout, we’ll focus on that configuration in this blog.

Setting Up pNFS with an iSCSI block device in OCI

In this section we explain how to set up a pNFS environment using an iSCSI block device in OCI with VM instances running OL9 UEK7 version 5.15.0-306.177.4.el9uek.x86_64.

Because OCI Block Storage offers iSCSI endpoints, we don’t have to set up a separate iSCSI target to enable this. If you have an iSCSI target available, these instructions will also apply.

  1. Create VM instances for the MDS, DS and pNFS client

    For instructions on how to create VM instances in OCI, please refer to this guide: https://docs.oracle.com/en-us/iaas/Content/Compute/tutorials/first-linux-instance/overview.htm.

  2. Provision OCI SCSI block device for the DS

    The DS must have an SCSI block device with iSCSI volume attachment type which is required in order for the MDS and pNFS client to access this block device over the network.

    For instructions on how to provision an OCI SCSI block device, please refer to this guide: https://docs.oracle.com/en-us/iaas/Content/Block/Tasks/creatingavolume.htm.

    For instructions on how to attach the SCSI block device with iSCSI as the volume attachment type to the VM instance, please refer to this guide: https://docs.oracle.com/en-us/iaas/Content/Block/Tasks/attachingavolume.htm#top.

Setting up the DS

  1. Install ‘targetcli’ if it’s not already installed on the DS.

    [DS]# yum install targetcli
  2. Identify the iSCSI block device used for the pNFS share

    For this blog post, we assume the iSCSI storage device name is /dev/sdb for simplicity. In a real usage scenario, the preferred name is the block device UUID since SCSI device name are not stable across reboots.

  3. Collect the iSCSI initiator IQN (iSCSI Qualified Name) of the nodes that will access this iSCSI block device.

    In this case, they are the IQN’s of the MDS and the pNFS client. On the MDS:

    [MDS]# cat /etc/iscsi/initiatorname.iscsi
    InitiatorName=iqn.1988-12.com.oracle:b658c4fb9f0

    On the pNFS client:

    [pNFS_client]# cat /etc/iscsi/initiatorname.iscsi
    InitiatorName=iqn.1988-12.com.oracle:1e974222bac

    On the DS, save the IQNs of MDS and DS in the environment variable MDS_IQN and PNFS_CLIENT_IQN:

    [DS]# typeset -x MDS_IQN=iqn.1988-12.com.oracle:b658c4fb9f0
    [DS]# typeset -x PNFS_CLIENT_IQN=iqn.1988-12.com.oracle:1e974222bac
  4. Use ‘targetcli’ to configure the iSCSI block device to allow access from the MDS and the pNFS client.

    Create a block device named ‘DS-share’ using /dev/sdb

    [DS]# targetcli /backstores/block create DS-share /dev/sdb

    Create an IQN for the iSCSI block device

    [DS]# iqn=`targetcli /iscsi create |grep iqn | awk '{ print $3 }' |sed 's/\.$//'`

    Configure the ACL to allow the MDS and pNFS client to access this iSCSI block device

    [DS]# targetcli /iscsi/$iqn/tpg1/acls create $MDS_IQN
    [DS]# targetcli /iscsi/$iqn/tpg1/acls create $PNFS_CLIENT_IQN

    Create an iSCSI LUN for the iSCSI block device ‘DS-share’

    [DS] targetcli /iscsi/$iqn/tpg1/luns create /backstores/block/DS-share

    Save the iSCSI configuration in the default location /etc/target/saveconfig.json

    [DS]# targetcli saveconfig
  5. Start the target.service for the DS

    This service is needed for the MDS and the pNFS client to discover and connect to iSCSI block device ‘DS-share’.

    [DS]# systemctl enable target.service --now
  6. Configure firewall on the DS to allow iSCSI port 3260 traffic

    [DS]# firewall-cmd --permanent --add-port=3260/tcp
    [DS]# firewall-cmd --reload

Setting up the MDS

  1. Install the required iSCSI packages iscsi-initiator-utils and iscsi-initiator-utils-iscsiuio

    [MDS]# yum install iscsi\* -y
  2. Start up iscsid.service

    This is the Open-iSCSI daemon that implements the control path of iSCSI protocol, plus some management facilities. See iscsid(8) for more details.

    [MDS]# systemctl enable iscsid.service --now
  3. Discover and connect to the iSCSI block device on the DS

    [MDS]# typeset -x DS_IPADDR=ip_address_of_the_DS
    [MDS]# dsiqn=`iscsiadm -m discovery -t st -p $DS_IPADDR | awk '{ print $2 }'`
    [MDS]# iscsiadm -m node -T $dsiqn --login
    Logging in to [iface: default, target: iqn.2003-01.org.linux-iscsi.instance-20250410-1509.x8664:sn.1859518feeed, portal: xxx.xxx.xxx.xxx,3260]
    Login to [iface: default, target: iqn.2003-01.org.linux-iscsi.instance-20250410-1509.x8664:sn.1859518feeed, portal: xxx.xxx.xxx.xxx,3260] successful.
  4. Find the device name of the newly connected iSCSI block device

    [MDS]# lsblk -S |grep DS-share
    sdh  9:0:0:0    disk LIO-ORG  DS-share 4.0  8420af3e-5ab2-4001-86a8-171a0e6915ee iscsi

    In this example, it is /dev/sdh.

  5. Create an XFS file system on the iSCSI block device /dev/sdh

    Currently only XFS file system has the supports required to be used with pNFS.

    [MDS]# mkfs.xfs /dev/sdh
  6. Setup /etc/fstab to mount the new XFS file system and export the DS-share

    Create a directory to mount the XFS file system:

    [MDS]# mkdir -p /local/DS-share

    Get the UUID of the iSCSI block device:

    [MDS]# ls -l /dev/disk/by-uuid |grep sdh | awk '{ print $9 }'
    7000dbe6-f8bb-4411-9cdc-9256387bd6c8

    Add an entry in /etc/fstab using the device UUID:

    UUID=7000dbe6-f8bb-4411-9cdc-9256387bd6c8 /local/DS-share xfs _netdev  0  0

    NOTE: the ‘_netdev’ option tells the system that the filesystem requires network connectivity before it can be mounted.

    Mount the XFS file system:

    [MDS]# mount /local/DS-share

    Create an entry for the DS-share with ‘pnfs’ option in /etc/exports:

    /local/DS-share *(no_subtree_check,pnfs,no_root_squash,rw,sec=sys,insecure)

    NOTE: The MDS is managing an XFS file system on the LUN, but is giving clients direct access to the block devices where the files reside via pNFS. This avoids the page cache on the NFS server and lowers its interrupt rate. File data goes from the LUN directly to the pNFS client.

  7. Start NFS service

    [MDS]# systemctl enable nfs-server.service --now
  8. Configure firewall on the MDS to allow NFS service

    [MDS]# firewall-cmd --permanent --add-service=nfs
    [MDS]# firewall-cmd --reload

Setting up the pNFS client

  1. Login and connect to the iSCSI block device on the DS

    [PNFS-CLIENT]# typeset -x DS_IPADDR=ip_address_of_the_DS
    [PNFS-CLIENT]# dsiqn=`iscsiadm -m discovery -t st -p $DS_IPADDR | awk '{ print $2 }'`
    [PNFS-CLIENT]# iscsiadm -m node -T $dsiqn --login
    Logging in to [iface: default, target: iqn.2003-01.org.linux-iscsi.instance-20250410-1509.x8664:sn.1859518feeed, portal: xxx.xxx.xxx.xxx,3260]
    Login to [iface: default, target: iqn.2003-01.org.linux-iscsi.instance-20250410-1509.x8664:sn.1859518feeed, portal: xxx.xxx.xxx.xxx,3260] successful.
    [PNFS-CLIENT]#
  2. Mount the DS-share exported by the MDS with version NFSv4.1 or newer:

    [PNFS-CLIENT]# mkdir /tmp/mnt
    [PNFS-CLIENT]# mount -o vers=4.1 MDS_IP_ADDR:/local/DS-share /tmp/mnt

    NOTE: ‘nfsstat -m’ displays that the DS-share was mounted using ‘vers=4.1’ since pNFS is just an extension of the NFSv4.1 protocol

    Verify NFS client is using SCSI layout type

    [PNFS-CLIENT]# /sbin/mountstats | grep pnfs
          nfsv4: bm0=0xfdffbfff,bm1=0x40fdbe3e,bm2=0x803,acl=0x3,sessions,pnfs=LAYOUT_SCSI,lease_time=90,lease_expired=0