What’s new for NFS in Unbreakable Enterprise Kernel Release 7?

August 2, 2022 | 6 minute read
Text Size 100%:

UEK7 is based on the upstream longterm stable Linux kernel v5.15, and introduces many new features compared to the previous version UEK6, which was based on upstream stable version v5.4.

In this article, we look at what has been improved in the UEK R7 NFS client & server implementations.

NFS re-exporting

Support for re-exporting an NFS filesystem, i.e. exporting an NFS mount via NFS, is much improved, with a focus on increased performance.

A main use for this would be to use a Linux NFS server as a local caching server for a remote NFS server. The NFS filesystem being re-exported may itself be mounted from any NFS server; it does not need to be from a Linux NFS server, nor does it need to be the same NFS version. For example it is possible to re-export an NFSv4.2 pNFS filesystem via NFSv3.

Support for accessing user namespace extended attributes (xattrs) over NFSv4.2

Many Linux filesystems support extended attributes, or xattrs, which allow small amounts of named arbitrary metadata to be associated with a file, separate from the file content.

Extended attributes consist of a name and the data. The name must be in one of four namespaces: user, trusted, security, or system, which form part of the xattr name.

In UEK6 and earlier, xattrs on a file in a filesystem could only be set or queried on the computer where that filesystem resided; they could not be accessed via NFS. UEK7 adds support for interacting with user namespace xattrs via NFSv4. This support is part of the NFSv4.2 protocol, and so is only available with NFS filesystems mounted as NFSv4.2.

The xattrs may be set and queried with setfattr/getfattr. Only xattrs in the user namespace may be accessed via NFS.

# -n: specify xattr name; must be within the "user" namespace, which must be specified
# -v: specify xattr value
$ setfattr -n user.testname -v testvalue /mnt/xattr-test-file
$ getfattr -d /mnt/xattr-test-file
getfattr: Removing leading '/' from absolute path names
# file: mnt/xattr-test
user.testname="testvalue"

Eager NFS writes

Normally, an NFS client does not immediately send application write() system call requests to the NFS server: the Linux kernel sends these requests asynchronously in the background. The return status of the system call clearly cannot reflect any error that may later be returned by the NFS server.

This provides better performance, but does have some drawbacks, for example an NFS client application may not see some errors immediately, e.g. ENOSPC.

A new writes= mount option has been added. When set as writes=eager, the kernel sends an NFS WRITE request immediately, but the system call return still does not wait for the server reply; the application will only see any error during a later system call, e.g. write(), close(). This is useful in the NFS re-exporting case (above), allowing the caching NFS server to immediately pass write requests to the remote NFS server.

When set as writes=wait, the WRITE request is sent immediately, and the kernel waits for the NFS server’s reply, before the application write() system call may return, ensuring the application sees any error immediately, albeit with some effect on performance, but not as much as a fully synchronous write.

The default setting is writes=lazy, which keeps the existing behaviour.

Note that this option does not relate to when the NFS server commits the written data to stable storage. The NFS WRITE requests will still be unstable, regardless of the setting of the writes= mount option.

Allow cached access when NFS server is down

The Linux NFS client caches information it receives from an NFS server, including both attibutes (e.g. permissions), and data. To increase consistency, the client will at various times send a GETATTR request to the NFS server, to ensure its locally cached information is still valid. If the NFS server goes down, those GETATTR cache revalidation requests will not receive a reply, and the client will not use the cached data.

There is now a new mount option softreval; when set, if a server goes down, and any GETATTR cache revalidation requests go unanswered, the NFS client will continue to use the attibutes and data it has locally cached, for this filesystem. This will only affect operations that can be handled solely from the local cache. Any operation that requires uncached information will continue to behave in line with the existing hard, soft & softerr mount options, as before.

Connection sharing between multiple NFS server network interfaces

UEK6 added the nconnect mount option, for NFS v4.1 and later mounts over TCP, which enabled an NFS client to set up multiple TCP connections to the same NFS server, over a single NFS server TCP address. This improves total throughput, particularly with bonded networks.

UEK7 adds the ability to share new connections to NFS servers with multiple network interfaces. There is a new max_connect mount option for controlling the number of RPC transports that the NFS client can associate with a separate address, for the same NFS server. The default max_connect value is 1 but can allow up to 16 connections.

The NFSv4 client automatically identifies “trunking”, i.e. when separate NFS server TCP addresses belong to the same NFS server.

Support for fcntl(F_SETLEASE)

Applications may specify a lease on a file, in order to be notified, and given some time to take action, when another process attempts to open() or truncate() the file (although the attempt will always be allowed, after a short period).

The use of fcntl(F_SETLEASE) is now supported over NFSv4, in the case where the NFSv4 client already has a delegation on the file in question.

Performance

General

  • work has continued to further improve the performance of NFS & RPC over RDMA transports.

NFS clients

  • UEK6 introduced the nconnect mount option, which enables an NFS client to set up multiple TCP connections to the same NFS server, improving total throughput. This was only available for NFSv4.1 and above. UEK7 adds support for this mount option to NFSv4.0 mounts.

Various readdir enhancements…

  • Add support for 1MB READDIR RPC calls, caching the entire contents of that 1MB call. For NFS server filesystems that use ordered readdir cookie schemes (e.g. XFS), it optimises searching for cookies in the client’s page cache.

  • Improve scalability when dealing with very large directories by turning off caching when those directories are changing.

  • Improved concurrency for readdir, e.g. within the same directory.

  • When an application uses statx(AT_STATX_DONT_SYNC), the NFS client doesn’t send a GETATTR to the NFS server, if it can use locally cached attributes. This can adversely affect the heuristics used to decide whether to switch between READDIR and READDIRPLUS calls. In this case, we now identify use of statx() in this way so as to allow continued use of READDIRPLUS.

NFS servers

  • We now grant a READ delegation to an NFSv4 client that has opened a file for writing, as long as it is the only client that has that file open for writing.

Diagnostics

  • New sysfs interfaces: to display more information about the various RPC transport connections used by an NFS client, to allow a user to offline an RPC transport that may no longer point to a valid server, and to allow a user to change the server IP address used by the NFS client. A forthcoming article will give more detail on how, and when, to use these new (and those already existing) interfaces.

  • Add source address, and destination port to the sunrpc sysfs info files

  • NFS server statistics are now available per-export, as well as globally

$ cat /proc/fs/nfsd/export_stats
# Version 1.1
# Path Client Start-time
# Stats
/test    localhost    92
  fh_stale: 0
  io_read: 9
  io_write: 1
  • Continuing to improve debugging and diagnosibility, a large number of ftrace events have been added. Work continues on having a subset of these events optionally enabled during normal production, to aid fix on first occurrence without adversely impacting performance.

Miscellaneous

NFS clients

  • NFSv3 support for timeo= and retrans= mount options, to control RPC transport behaviour, which were previously only used for NFSv4 mounts.

  • Allow certain RPC calls, such as the NULL ping, to timeout. Normally, RPC calls for NFSv4 will wait forever, as long as the underlying TCP connection remains established. This can cause problems in the case of a server that is not responding at the RPC level (but is responding with TCP ACKs). This is now relaxed, for calls like the RPC NULL ping, which may now timeout without blocking other client actions.

Summary

In this article we’ve looked at the changes and new features relating to NFS & RPC, for both clients and servers, which are now available in the latest Unbreakable Enterprise Kernel Release 7.

Calum Mackay


Previous Post

How to patch Oracle Linux 9 with Oracle Linux Manager

David Gilpin | 5 min read

Next Post


Why Oracle Database runs best on Oracle Linux

Julie Wong | 2 min read