The introduction of Persistent Memory (PMEM) marks the beginning of a revolution in the computing industry. There has always been a separation between system memory where the contents are ephemeral and byte addressable, and storage where the data is persistent and block oriented. Persistent Memory (such as Intel Optane DC Persistent Memory) blurs the line between storage and memory by being both byte addressable as well as persistent.
This new class of Non-Volatile Memory is fast enough to operate alongside conventional (volatile) DRAM in a DIMM (Dual In-Line Memory Module) form factor. Integrating into systems in DIMM slots means that Persistent Memory is able to play a vastly different role than conventional block-oriented storage such as Disk or SSD.
This article is the 2nd in a series covering Persistent Memory and how Oracle is using this technology to revolutionize database systems. Exadata is the first system on the market that is specifically designed to take advantage of Persistent Memory and accelerate the performance of Oracle Databases, while providing full redundancy and data protection that users require for their mission critical systems.
If you haven't read our Persistent Memory Primer article, be sure to learn some of the fundamentals outlined in that article here: https://blogs.oracle.com/database/persistent-memory-primer
There are 3 major changes in Exadata X8M that are driving the latest performance advancements of the platform. These changes are highlighted in RED below.
The internal fabric has been changed from 40Gbps (Gigabits per second) InfiniBand to 100Gbps RDMA over Converged Ethernet (RoCE) fabric. The RDMA (Remote Direct Memory Access) capabilities of this network fabric are critical for getting the most out of the Persistent Memory. Of course Persistent Memory in Exadata resides in the storage servers, which means it's fully redundant and the amount of Persistent Memory scales as the system scales. The Linux KVM (Kernel Virtual Machine) inside of Exadata reduces the overhead of virtualization and allows the system to be configured with the largest supported memory configuration of 1.5TB per database server. Taken in combination, these new features are keeping Exadata far ahead of the competition for database performance and price/performance. For more information on Exadata X8M, please see here: https://www.oracle.com/engineered-systems/exadata/.
The latest performance advances of Exadata rely on the combination of RoCE with Persistent Memory. While each of these technologies provides benefits alone, the combination of them was required to make the next leap in performance of the Exadata platform.
The performance of Persistent Memory is now measured in microseconds and even nanoseconds, which means other portions of the I/O stack have become much more significant. RDMA across the Converged Ethernet fabric allows Exadata to make the most of the performance of Persistent Memory. The best way to illustrate this is to look at what would happen if Persistent Memory was introduced into Exadata without RoCE.
Persistent Memory should deliver internal response times in the range of 6 µsec (microseconds) for processing 8K blocks of data. This response time is getting into the range where time spent in the existing I/O path is becoming a significant slice of the overall time. While Exadata has, for many years, been delivering faster I/O response times than other storage solutions, this presents an opportunity for Oracle's development team to make some big performance improvements.
We would normally expect about 6 microsecond (6,000 nanosecond) read latency to access Persistent Memory from within the Exadata Storage Server. However, this low latency would be overshadowed by the layers of software, context switches, and network protocol overhead. In the diagram below, we see an older version of Oracle Database (prior to 19c) running on Exadata X8M. The database makes I/O requests from the Exadata storage as normal, but data is cached in Persistent Memory:
The Exadata Storage Software will cache hot blocks regardless of the database version being used. Older database versions (prior to 19c) will use the conventional Exadata I/O path. However, Oracle Database 19c (and later versions) will use RDMA to access data directly in Persistent Memory rather than using the conventional I/O path. This level of integration between the database and storage is only possible due to the tight software/hardware integration of Exadata. This new feature is known as the Exadata X8M Persistent Memory Data Accelerator.
Oracle Exadata X8M uses Persistent Memory (internally) to achieve dramatically higher I/O rates as well as industry leading low-latency storage access. The Oracle Exadata Storage Software is fully integrated with Persistent Memory and addresses the topics discussed above, while allowing Exadata X8M to take full advantage of the performance of Persistent Memory. Exadata X8M is able to achieve less than 19µsec (microsecond) storage latency and 16 million IOPS (Input/Output Operations Per Second) within a single rack. Customers do not need to configure, tune or even choose Persistent Memory in Exadata X8M. Persistent Memory is automatically included in all Exadata systems and there is nothing to configure or administer.
The RoCE network inside Exadata X8M enables RDMA over a Converged Ethernet fabric, accessing data residing in Persistent Memory. This combination of technologies allows Exadata to achieve near-memory speeds with storage that is fully redundant and fully protected from failures.
Persistent Memory in Exadata also accelerates commit processing in Oracle Databases. Commit processing in any database represents a performance bottleneck, especially for OLTP systems. If commits are slow, the entire database can be slowed down across all users and transactions system-wide. Persistent Memory in combination with RDMA is used to accelerate log writes in Exadata X8M, which therefore improves the performance of commits.
Exadata X8M delivers up to 8X faster log writes than the previous generation of Exadata, which was already the fastest on the market. We can see the dramatic effects of PMEM in action by simply looking at the database performance metrics.
The combination of RoCE and PMEM results in dramatically faster response times and much higher IOPS (Input/Output Operations Per Second) than other storage solutions. In this first release of PMEM on Exadata, the unsurpassed performance is focused on the biggest area of benefit, which is database operations that result in the highest IOPS. Single block reads are the highest IOPS event in any Oracle database. We see this as "cell single block physical read" in Exadata, which equates to "db file sequential read" on non-Exadata systems. The following AWR (Automatic Workload Repository) report screen shot shows this in action:
The Exadata Storage Software will cache the most frequently read blocks of data into Persistent Memory. Those blocks will be accessed by pre-19c databases via the pre-existing Exadata I/O path, whereas 19c databases and later will use Remote Direct Memory Access (RDMA) to access those blocks. Persistent Memory is currently used as a write-through cache, so data is always persisted to Flash and Disk.
Exadata also uses Persistent Memory to accelerate commit processing, which is one of the most performance sensitive operations of any database, regardless of the workload. The Oracle Database has to externalize commit records into redo logs and ensure those records are persisted to storage before returning control to the application, so speeding up commit processing delivers performance increases to the entire database. There are 2 primary Oracle Database performance metrics related to commit processing as follows:
These events can be viewed in AWR reports to see the benefits of RoCE and PMEM in Exadata X8M, which delivers up to 8X faster commit processing than the previous X8 release.
It is important to note that Exadata does not suffer from the data integrity issues outlined in MOS Note# 2608116.1. The Exadata Storage Software is designed to address the data integrity challenges that are presented by the behavior of Persistent Memory. Exadata Storage Software uses Persistent Memory in AppDirect mode with devdax (Device Direct Access), and directly manages how data is written into Persistent Memory to ensure data integrity. The Exadata Storage Software also mirrors all writes to data across storage cells to protect against data loss in the event of a failure. Oracle recommends triple mirroring, or what is known as High Redundancy in Exadata to provide the best protection, even ensuring redundancy during maintenance.
Exadata uses the combination of Remote Direct Memory Access over Converged Ethernet (RoCE) network with Persistent Memory in the Exadata storage layer to provide unprecedented performance with the data integrity and availability that customers have come to expect from the Exadata platform.