Oracle Exadata Database Machine (Exadata) excels as the engineered platform for running the Oracle Database. Published articles on extreme scalability and dramatic price/performance and on the Exadata X10M platform are excellent backgrounders in Exadata's extreme scalability and performance. The significant advancements in software capability of the Exadata platform with the recently released Exadata System Software 24ai underscores Oracle's investment in optimizations and new features that benefit all workloads. The intelligent data architecture with Exadata Exascale combines the extreme performance of Exadata smart software with the cost and elasticity benefits of modern clouds. The cloud data architecture of Exadata Exascale technology is also available in Exadata on-premises, extending the competitive advantages of the platform in all its deployment modalities. The power of the converged Oracle Database 23ai running on the Exadata platform brings in all the latest database features, including the native AI Vector Search capability.
Exadata is also engineered to be the most cost-effective and most highly available platform for running Oracle Databases. With its capability of consolidating diverse database workloads, and being available in enterprise data centers, Oracle Cloud Infrastructure (OCI), and multi-cloud environments, Exadata helps entire organizations benefit from database and application performance, with a limited and sustainable data center footprint. Coupled with its positive impact on increased operational efficiency, reduced administration, and lowering of cost, the platform is a flagship product in its class.
Exadata is composed of high-performance database servers and scale-out intelligent storage servers. The internal interconnect fabric used for data flow is a hardware-enabled Remote Direct Memory Access (RDMA) method that transports RDMA over Converged Ethernet. RDMA over Converged Ethernet is abbreviated as RoCE and pronounced "rocky". Exadata storage servers implement leading-edge storage caching technologies for latency-free data access. The functional path of database I/O in Exadata combines the scalability and bandwidth of Ethernet, with the speed of RDMA, and the data availability in cache, to deliver low-latency and IOPS performance metrics that are multiples better than competing platforms. The superior overall performance of the integrated Oracle Database and the Exadata platform reflects engineered architecture.
This blog initiates the Exadata Technology Series and is intended to highlight one critical technology component within the Exadata platform and explain the inner workings of specific platform capabilities that surround this chosen component. Intended to be published as a series with an easy-to-read construct, this series will maintain a high-level engineering flavor with supporting and easily interpretable functional and inter-module flow diagrams. Each article in the series will supplement prior articles.
This blog recaps the chosen functional component, the Exadata Smart Flash Cache. Future articles in the series will address hardware, administration, control, enhancements, and refinements that accelerate performance.
Since the genesis of the Exadata platform, the innovation velocity of the platform has been noteworthy. The graphic in Figure 1 informally captures the sharply ascending software and hardware capability of the Exadata platform with every successive product generation since the year 2008, when the platform was introduced. Each platform generation has adopted the best of hardware, overlaid it with the best innovations in software and operational advancements, and taken huge strides in performance. Each successive product in the Exadata family has sustained industry-leading performance numbers when running database workloads.
Smart Flash Cache in the Exadata storage servers, called "smart" because it is "database-aware", was introduced with the V2 Exadata generation. It has played a critical role in accelerating Oracle database performance. In Figure 1, the V2 generation is not explicitly called out, and its delivery was before the X2 milestone. The moniker "database-aware" calls out that caching-related decisions in the storage servers are made using database directives and data tags.
A concise summary of prominent I/O operations within the Exadata platform will help you understand the nuances of Exadata Smart Flash Cache operation. The Exadata System Software is intelligently run on both the database servers and storage servers, which indicates the tight system and software integration. The Exadata database server is responsible for retrieving, and managing structured and unstructured data in the database. The Exadata storage server's role is to store and process Oracle Database data, and to help improve database performance and availability.
In the unique processing architecture of Exadata, for a storage or retrieval I/O operation, the storage servers share processing responsibility with the database servers through an intelligent operational mechanism called Smart Scan. Smart Scan moves I/O-intensive tasks from the database server to the storage servers. The leveraging of the compute capability of the storage servers by the database server to complete tasks is referred to as an "Offload". The intelligent Smart Scan capability leverages the compute cores of the many available storage servers to parallelize and execute the offloaded tasks. The Smart Scan processing capability on the storage servers includes handling operations such as table scans, row filtering, column projection, smart aggregation, and storage indexes. The offloading interaction between the database servers and storage servers is shown in Figure 2.
Offloading to the storage server brings efficiencies in multiple forms. The offloaded processing at the storage servers also ensures that the database server receives from each storage server only the rows that match the query predicate criteria, a minimal and relevant set. The database server then receives and aggregates all the minimal post-processed data sets. This unique Smart Scan capability makes superior and intelligent use of all available Exadata infrastructure resources.
To summarize Smart Scan:
The ability to offload the query and process predicates in the storage servers is a game-changing differentiator for the Exadata Platform.
A side-by-side comparison of the functional flow of the Intelligent Smart Scan and that in a DIY system is presented in Figure 3. The intelligently managed use of the Exadata storage servers with Smart Scan highlights the architectural and I/O superiority of the Exadata platform.
Exadata efficiencies are presented in the section above and highlighted on the left side of Figure 3.
In DIY systems, denoted as non-Exadata Database Platforms, presented in the right section of Figure 3:
The Exadata platform architecture, with its distributed processing, is superior.
The added efficiency of Smart Scan, and the vastly reduced traffic in the connecting fabric during data retrieval, are unique and prominent Exadata platform differentiators.
For database I/O smart scan processing, what's additionally in the Exadata Intelligent Storage Servers that the Exadata System Software exploits?
Exadata uses scale-out, intelligent storage servers, available in three configurations – High Capacity (HC), Extreme Flash (EF), or Extended Storage (XT). With the latest platform in the product family, the Exadata X10M:
A typical Exadata on-premises deployment involves multiple storage servers, working as partners. A minimal Exadata on-premises deployment involves two database servers and three storage servers. When an Exadata deployment is configured for high redundancy, data copies are mirrored to two additional storage servers. When a deployment is configured for normal redundancy, the data is mirrored to one additional storage server. Normal redundancy protects against a single disk failure or the failure of one storage server. High redundancy protects against two simultaneous partner disk failures in two distinct partner storage servers.
Larger Exadata deployments can have many more database servers and storage servers within a rack and span multiple racks.
The Exadata Database Server interacts with a database component called the Buffer Cache during I/O queries. Database server Buffer Cache is a memory area in the System Global Area (SGA) of the database instance. It stores copies of data blocks that are read from data files. When a database user issues a query, the database server checks the Buffer Cache on the database server for the required data blocks. If the data block is found in the Buffer Cache, the database reads the data directly from the Buffer Cache and completes the I/O. If the data is not found in the Buffer Cache, the data is then read from the storage server cache or the persistence layer.
Each Exadata HC storage server platform implements an Exadata RDMA Memory (XRMEM) cache layer, a Smart Flash Cache layer, and a persistence layer composed of disks. The Exadata EF storage platform varies a bit. It is all flash, supports XRMEM and Smart Flash Cache, and does not have disks.
XRMEM is a DDR5 DRAM-based memory cache that uses Remote Direct Memory Access (RDMA) access to data stored remotely on Exadata storage servers. XRMEM distinguishes itself by being RDMA enabled, another Exadata differentiator. The database uses RDMA to access data cached in the remote XRMEM, bypassing the network and I/O stack on the database and storage servers and providing the superior performance of DRAM. The hottest data, as identified by the Oracle database and usage history, is automatically cached in XRMEM.
Smart Flash Cache is used with XRMEM to automatically cache and service database I/O requests for frequently accessed data. XRMEM has the hottest data - almost exclusively data that suits OLTP queries and some analytics caching. XRMEM and Smart Flash Cache are sized and engineered to balance performance, capacity, and cost.
The XRMEM, Smart Flash Cache, and persistent storage (disk) layers within an Exadata storage server are shown in Figure 4. The hot data is cached in XRMEM, the warm data is in Smart Flash Cache, and the inactive data is in persistent storage. Note that in EF Storage Servers, the persistent storage layer is also composed of flash drives.
An Exadata OLTP Read I/O operation is most performant when the read happens via RDMA from XRMEM. When Exadata System Software understands that a block of data is likely to be frequently used in a database I/O, the smart software ensures that this data has been smartly promoted into XRMEM, resulting in a subsequent query for this data block from XRMEM being successfully serviced. Figure 5 shows the Smart Scan flow and the RDMA-based retrieval of XRMEM cache data for OLTP queries.
Furthermore, XRMEM in a storage server is a shared storage tier that is available to all available database nodes and storage nodes. With the storage servers hosting multiple databases, the available cache is used by all configured database instances.
This architecture and implementation that supports many databases, resident on many database servers, with many storage servers, each hosting a cache hierarchy, is a key Exadata differentiator!
An intelligent, economical, and optimal approach is to cache only the most frequently accessed and hottest data in XRMEM and have the warm tier of data in persistent Smart Flash Cache. In such an implementation, blocks that are characterized as being warm are hosted by the Smart Flash Cache.
Because flash memory is persistent, the operational advantage of data persistence is also gained with Smart Flash Cache. Any disruptive transience in the data center can be more quickly recovered if the data in a cache does not have to be refreshed, and Exadata excels at this. That is, because the data in Smart Flash Cache persists across reboots, the persistence ensures that cached data is not lost after a power outage, system crash, or restart. Thus, the need to re-read that data from slower disk storage upon recovery, which significantly delays system and application recovery from the outage, is eliminated.
The XRMEM and Smart Flash Cache automatically intelligently cache frequently accessed and high-value data. Each database I/O contains tags to indicate why the I/O is being issued and whether the data relevant to that I/O ought to be cached. This cache-likelihood information is internally combined with a variety of data and behavior metrics, such as data access frequency, to determine whether to cache the data. If the answer is "yes", the data is cached.
This collaborative working of Exadata with the Oracle Database is a marquee differentiator of the Exadata platform.
It is worthy of note that, while external factors can play a role with platform transience, Exadata is built to be very resilient to system outages thanks to its design conformance with appropriate tiers of Oracle Maximum Availability Architecture.
When the needed data is not in the XRMEM or the Smart Flash Cache, data is read from disks. Disk reads are intended for infrequently accessed data. When a disk-hosted data block has been requested by a database I/O, and the request is a user I/O and is not an I/O that is part of a recovery or backup operation, the data block on disk then becomes a candidate for promotion to cache, if the determination is that this once-read data will soon be frequently accessed.
The ability to distinguish and process User I/Os distinctly differently from backup and recovery I/Os is an Exadata differentiator.
An X10M Exadata Storage Server has 1.5 TB of DRAM, where 1.25 TB is allocated to the XRMEM Data Accelerator, and the remaining 256 M is allocated for storage server software and operating systems use. Databases vary in size, and multiple database instances may exist so that usage that exceeds the XRMEM cache size will efficiently utilize Smart Flash Cache so that system performance is sustained.
The most active data regions in the flash cache are automatically replicated into the XRMEM cache. An XRMEM cache miss is serviced by Smart Flash Cache. In such an instance, the data in Smart Flash Cache is also promoted to XRMEM. If there is a Smart Flash Cache miss, or when there are data updates in Smart Flash Cache from any processing operation, the updated data in Smart Flash Cache is automatically promoted to XRMEM. Figure 6 illustrates the data flow where an XRMEM cache miss is followed by data retrieval from Smart Flash Cache.
The coordinated working of the XRMEM cache and the Smart Flash Cache is transparent to the database user. This coordinated interaction makes the active data available at the most performant cache layer. The availability of the most active data in cache results in the sustained extreme Exadata system I/O performance with workloads.
When data is not cached in the XRMEM cache or Smart Flash Cache, an I/O request encounters a cache miss in both the XRMEM cache and the Smart Flash Cache. This data is then retrieved from capacity-optimized-flash or disk. Depending on the projected future usage pattern for this data on disk, this data is then a candidate to be promoted to cache, to be available to service future requests.
Smart Flash Cache operates in Write-back mode. This allows write operations to be stored in flash before being written to disk, which improves application write performance. When a write-to-flash is completed, this is considered an operational completion of the I/O so that an acknowledgment can be made immediately. The write to disk then happens asynchronously. Write-intensive applications benefit from Write-back caching by taking advantage of the low latencies provided by flash. The amount of disk I/O also reduces because the cache absorbs multiple writes to the same block before a composite write to disk is performed.
Data is mirrored across available storage servers. As pointed out above, write operations are sent to at least two storage servers when normal redundancy is used or three storage servers when high redundancy is used. The Write-back mode of operation is maintained with the mirrored writes.
The Exadata Smart Flash Cache provides a significant performance boost to Oracle Database users. By utilizing high-speed flash storage and by being managed by smart Exadata System Software, Smart Flash Cache dramatically accelerates data access for frequently queried data. This cache sits between the database and disk storage, and hosts the most active data, enabling faster retrieval of the most needed data blocks. As a result, applications experience faster query responses, improved transaction throughput, and better overall system performance. The Smart Flash Cache automatically caches the most relevant data, optimizing workload performance without requiring manual intervention. Smart Flash Cache also coordinates with XRMEM cache to provide harmonized, high-performance data access.
The Exadata Smart Flash Cache enables Exadata to be a powerful solution for environments with high-demand, data-intensive applications.
Shankar is a Product Manager for Oracle Exadata and has been working with Exadata on-prem since April 2024. Shankar has a strong technical background and experience with Data Networking (Cisco), Data Storage and Virtualization (EMC), Cloud IaaS and PaaS (IBM), and more recently with Algorithms (CyberAtomics) and Data Streaming Technologies and Ad Processing Pipelines (Nielsen, Roku).
Alex Blyth is a Product Manager for Oracle Exadata with over 25 years of IT experience mainly focused on Oracle Database, Engineered Systems, manageability tools such as Enterprise Manager and most recently Cloud. Prior to joining the product management team, Alex was a member of the Australia/New Zealand Oracle Presales community and before that a customer of Oracle's at a Financial Services organisation.