Oracle Linux Kernel developer Lance Hartmann contributes this blog post on using SPDK, the Storage Performance Development Kit.
Slated to arrive soon in the developer yum channel via ULN, the Storage Performance Development Kit SPDK is an open-source project providing user space tools and libraries for writing high performance, scalable storage applications built largely but not solely around a user space NVMe driver. Harnessing the power of multi-core CPUs and the multi-queue architecture of NVMe, SPDK applications can easily achieve the maximum bandwidth that a NVMe drive supports and enjoy low latency by polling for I/O completions instead of using interrupts.
Zero-copy I/O is managed through the use of hugepages whose physical pages are always pinned for the data buffers and the I/O queues. A single thread per NVMe queue which both dispatches I/Os and checks for completions enables a lockless I/O path.
For those NVMe controllers designated for use by the SPDK, the default Linux kernel nvme driver is unbound from them and replaced with a binding to either the uio_pci_generic or vfio-pci kernel drivers. Using the SPDK API, applications then gain access, via mmap(), to the NVMe controller's register set enabling them to perform admin actions and trigger I/Os.
In addition to providing I/O to locally (PCIe) attached NVMe drives, the SPDK also ships with a NVMe over Fabric target application. The RDMA transport for Infiniband and RoCE has been supported in the SPDK for a while, and the TCP transport was just recently added. A set of patches for supporting the Fibre Channel (FC) transport have been proposed. Configuration of the target is facilitated with either a configuration file, or may be dynamically managed via RPC calls provided by SPDK Python scripts.
The growth and active development of the SPDK has yielded additional functionality. A user space block layer also exists and provides a highly modular architecture enabling the development of "bdevs" which may be used alone or stacked atop one another enabling complex I/O pipelines. Existing bdev modules today include NVMe, RAM disk, Linux AIO, RAID (level-0/striping), iSCSI and more all of which may be configured as targets to the SPDK's target applications. Common features of the block layer include mechanisms for enumerating SPDK block devices and exposing their supported I/O operations, queueing I/Os in the case of the underlying device's queue is full, support of hotplug remove notification, obtaining I/O statistics which may be used for quality-of-service (QoS) throttling, timeout and reset handling, and more.
Traditionally, the SPDK has relied on portions of the Data Plane Development Kit DPDK to provide lower level functionality which is referred to as the run time environment. This includes things like thread and co-process management, memory management, virtual to physical address translation, lockless data structures like rings, and PCI enumeration and mmap()'d I/O. Over time it was realized that a number of consumers of the SPDK already had much of such functionality in place, and moreover, that their implementation was highly tailored to their types of workloads. Hence, an abstraction layer was created in the SPDK enabling consumers to employ their own run time environment if preferred over the default DPDK.
The SPDK is currently in use in a number of production environments around the world, though to date has yet to appear via packages. Instead, consumers of the SPDK have been downloading source and building the SPDK from scratch to enable integration with their applications. Coming soon, SPDK rpm packages "spdk", "spdk-tools" and "spdk-devel" will make their inaugural debut for Oracle Linux. The aim is to offer users the ability to experiment with some example SPDK applications and provide the include files and libraries to build their own SPDK applications saving them the need to locate, download and build the SPDK themselves. Both static and their shared library equivalents are available though note that ABI versioning it not yet in place but planned in a future release.