Exadata Resource Management: Consolidation Meets High Performance

June 26, 2024 | 12 minute read
Maruti Sharma
Senior Principal Product Manager
Text Size 100%:

 

Oracle Exadata Database Machine (Exadata) is an engineered system that integrates hardware and software with advanced database-aware algorithms to run Oracle Databases optimally for all workloads. Since its inception, Exadata has increased the hardware capacity in all facets of its architecture, scaling to thousands of cores, terabytes of memory, and petabytes of storage, enabling customers to run large-scale Databases or consolidate workloads into a single Exadata system. Exadata X10M extends these resources, supporting 192 CPU cores per database server, up to 3TB of memory per database server, and up to 4.2 PB raw capacity per rack. You can read more about Exadata X10M here.

By design, the Oracle database (12.1 and above) has multitenancy (container and pluggable databases) built into the database engine, providing added consolidation opportunities. Oracle database offers fully integrated management of system resources for multiple container databases (database instances) on a server. The Exadata platform uniquely manages resources at the computing and I/O tiers. Input Output Resource Manager (IORM) and Database Resource Manager (DBRM) are two complementary methods to efficiently manage system resources within a database (intra-database resource management) and across multiple databases (inter-database resource management).

Exadata is packed with resources to run databases efficiently, but what about the resource distribution for each database on such a powerful machine? How can one run production, development, and mixed (OLTP and Analytics) workloads on the same infrastructure? How does Exadata X10M help control and prioritize resources for demanding database workloads?

This post will explore Oracle Database and Exadata features that help consolidate and prioritize multiple databases while maintaining the required performance for every database workload.

Consolidate Databases

Business considerations and technological innovations are the primary drivers for consolidating databases. Cost reduction and simplicity are the main factors that drive businesses, while hardware resources and software optimizations drive from the technology perspective. The Exadata platform, especially Exadata X10M, combined with Oracle Database's multi-tenant architecture, simplifies consolidating Oracle Databases. Exadata is the ideal platform for consolidating databases because it builds upon and strengthens the fundamental business and technological drivers outlined above. Exadata X10M includes a wide range of innovations specifically designed to make it the most effective platform for consolidation. Exadata has scale-out, high-performance database servers, and scale-out, intelligent storage servers with leading-edge storage cache using Exadata RDMA Memory (XRMEM) and cloud-scale Remote Direct Memory Access over Converged Ethernet (RoCE) internal fabric that integrates database and storage servers. With such a scale-out platform, by design, it becomes the right platform to consolidate multiple workloads.

Database consolidation on Exadata requires understanding the workload and appropriate planning, for example, mixed workloads (OLTP and Analytics running on the same infrastructure), issues such as noisy neighbor problems, and blast radius (scope of failure impact). Exadata provides unique resource management capabilities to consolidate databases successfully using IORM alongside DBRM, collectively known as Exadata Resource Management.

Exadata Resource Management prevents noisy neighbors by properly managing various resources on the platform. Resource Management ensures that each database receives the required resources (CPU, Memory, Storage, etc.). Exadata Resource Management automatically prioritizes work through the entire stack of system resources. When databases are consolidated on the same infrastructure, the blast radius (scope of a failure) becomes wider. Exadata, by design, is an end-to-end redundant system that resolves some blast radius problems. Isolation of databases on different physical machines, virtual machines, multiple database containers (in a multi-tenant environment), and resource management are a few considerations for minimizing the scope of failure (reducing blast radius). Let us deep dive into how resource management can benefit a consolidated environment.

Managed Resources

Various resource knobs can be controlled when running databases on any given platform. Some of these resource knobs are highly dynamic, and others are more static. The resource knobs are CPU, SGA memory, PGA memory, Processes, Storage, and I/O. When configuring a database, some boundaries are set for memory, processes, and storage, while CPU and I/O are dynamic. When a database is operational, the dynamic resources (CPU and I/O) dictate the database's efficiency. Exadata Resource Management covers both resource areas to run the databases efficiently in a consolidated environment.

resources

 

 

 

 

 

 

 

 

 

 

 

Resource Management

Instance Caging has been the top lever in controlling how CPU resources are assigned to Oracle Databases. Instance Caging, introduced in Oracle Database version 11.2, focuses on managing CPU by limiting CPU usage in each Oracle Database instance. The Oracle Database Resource Manager, in conjunction with the CPU_COUNT parameter, limits the amount of CPU consumed by a database instance. CPU_MIN_COUNT parameter, although available to each database instance (non-container, container, and pluggable database), is only relevant in the case of a pluggable database where this parameter can guarantee CPU resources to a pluggable database. Oracle Databases are efficiently run when CPU and I/O can be prioritized for each database. We will see how to efficiently manage CPU and I/O resources in the next section, "Exadata Resource Management."

 

  Instance Caging Shared Based CPU Resource Management  
Database Type CPU_MIN_COUNT CPU_COUNT Share Limit
Non-Multitenant Not Applicable Yes    
Container Database Not Applicable* Yes    
Pluggable Database Yes Yes Yes Yes

 

 

 

 

 

 

 

*CPU_MIN_COUNT for Container Database is part of OS CPU Resource Management, which will be available starting Oracle Database 23ai. Another blog with this feature is forthcoming.

Exadata Resource Management

Database Resource Management (DBRM) is available on all Oracle Database platforms. The main goal of DBRM is to give the Oracle Database server the ability to perform fine-grained CPU scheduling compared to general-purpose operating system scheduling. On Exadata, I/O Resource Management (IORM) allows for the efficient management of resources in the storage tier (DBRM can dictate IORM. Read the section titled – “IORM can inherit DBRM Ratios”). IORM plans are specified on the storage servers, while DBRM plans are specified in the database instances on the database servers. Although IORM and DBRM are specified at different tiers, they are interdependent and complementary. Exadata architecture supports multiple database instances with shared storage. Since storage is shared among databases, IORM manages inter-database (across all databases) plans, while DBRM constitutes intra-database (within the database) plans.

IORM

Exadata is an engineered platform that integrates computing and storage. Since the initial release of Exadata system software, IORM has been an Exadata-only feature that manages Exadata's storage server resources. Over time, IORM has been optimized to support multi-tenant and cloud architectures. Exadata in a consolidated environment hosts different workloads, including online transaction processing, analytics, and mixed workloads. For example, a large query on a production analytics database can impact the performance of the mission-critical transactional processing workload. In a scenario like this, IORM ensures that the mission-critical database has a higher priority on the shared storage resources based on user-defined policies.

When production and non-production databases run on the same infrastructure, IORM can prioritize the workload running on production databases. IORM resource plans are created on storage servers, and I/O is managed by following specific rules. IORM manages I/O resources within a container database, using the I/O resource ratios from CPU settings in the DBRM resource plan. When there is contention for I/O resources, IORM schedules I/O by immediately issuing higher-priority I/O requests while queuing lower-priority requests. I/O requests are fulfilled immediately for workloads that do not exceed their resource allocation.

IORM Plans

IORM manages resources using plans unless we use the preferred “auto” setting. The auto setting sets the IO share value equal to CPU_COUNT or the sum of CPU_COUNT (in an RAC configuration). The most commonly used plan types are the database plan (dbplan) and the cluster plan (clusterplan). Database plans dictate inter-database resource management, enabling administrators to manage resources across multiple databases. Database plans also have directives to manage resource allocations for specific databases.

Cluster plans dictate intra-database resource management and work at the cluster level. Cluster plans were introduced in Exadata System Software release 21.2.0 and have a broader scope than a dbplan. Cluster resource management allocates resources across multiple Oracle Grid Infrastructure clusters and manages the I/O resource usage within each cluster. The clusterplan IORM directives specified for a cluster apply to all databases resident in that cluster.

Shares and Limits

I/O resource management can be specified for a specific database in either shares or limits, but not both. A share can be between 1 (lowest) and 32(highest), denoting the degree of priority for that database relative to other databases.  Shares are the lower bound, while limits are the upper bound. The example below specifies that Database #1 has a 16% guarantee of getting the resources, while Database #2 has an 83% guarantee.  Share-based resource allocation is the recommended interdatabase plan (db plan) method. Share-based resource allocation is the only option for cluster plans.

  • Database #1 Share = 1, Limit = 30%
  • Database #2 Share = 5, Limit = 90%
  • Total shares = 6 (1 + 5)
  • Database #1 gets 16% guaranteed (1/6)
  • Database #2 gets 83% guaranteed (5/6)

Below are two sample IORM plans. The first is a basic plan that runs on-premises for several databases. The second one is for a more consolidated environment, like the cloud, where we assign different profiles to each database.

To specify this plan, the following cellcli command can be run.

CellCLI> ALTER IORMPLAN -

dbplan=((name=HR01, share=1, limit=30, role=primary),

(name=FIN01, share=5, limit=90, flashCacheSize=1G)

CellCLI> ALTER IORMPLAN

dbplan=((name=gold, share=4,limit=100, type=profile),

(name=silver, share=2, limit=60, type=profile),

(name=bronze, share=1, limit=20, type=profile))

The database parameter db_performance_profile specifies the corresponding IORM profile for the database.

SQL> alter system set db_performance_profile=silver scope=spfile;

We use Interdatabase resource management (IORM) to manage resources.

IORM Advanced Controls

For an Oracle database running on a non-Exadata platform, there is no built-in storage management mechanism like IORM. Instead, an Operating System or Virtual Machine functionality is required to manage the database's storage I/O. Another option is for each database to run on its dedicated storage. In most cases, managing storage I/O this way will result in resource wastage. General-purpose storage is not able to distinguish between critical and non-critical database I/O, such as writing to datafiles versus redo logs, and cannot distinguish the context under which I/O is happening, such as redo log writes that contain commit markers.

XRMEM Cache Controls

I/O resource management (IORM) can provide predictable performance by guaranteeing space in the Exadata RDMA Memory Cache (XRMEM cache) for most critical databases on a system. In a consolidated environment, multiple databases share the underlying storage, and XRMEM Cache becomes a crucial resource that requires proper management. IORM can prioritize space for critical databases over non-production or lower-priority databases.

xrmemCacheMin – Guaranteed XRMEM cache for the specified database.

xrmemCacheLimit—The soft maximum limit of the XRMEM cache for a specified database. A database can exceed this limit if the XRMEM cache is not full.

xrmemCacheSize – The hard maximum amount of XRMEM cache that is available to a specified database. This size cannot be exceeded at any time.

Flash Cache Controls

IORM can also control Flash Cache distribution amongst CDB besides CPU shares. Within this control, the IORM plan can have a minimum Flash Cache per CDB, a limit of Flash Cache per CDB, and a Flash Cache size.

flashCacheMin – Guaranteed Flash Cache for the specified database.

flashCacheLimit—The soft maximum limit of Flash Cache for the specified database. A database can exceed this limit if the Flash Cache is not full.

flashCacheSize - The hard maximum amount of Flash Cache available to a specified database. This cannot be exceeded at any time.

Monitoring IORM

The following methods can monitor IORM:

  1. AWR reports (Top Databases by IO requests)
  2. Enterprise Manager

awr

 

 

 

 

 

 

 

 

 

em_iorm

 

 

 

 

 

 

 

 

 

 

 

 

 

Intra-Database Resource Manager

Database Resource Manager (DBRM) is a feature of Oracle Database Enterprise Edition on all Oracle platforms (including Exadata) but only includes integration with IORM on Exadata. DBRM enables us to manage resources for multiple workloads within a database (intra-database). Database workloads are composed of user sessions, which run jobs with different priorities. DBRM enables us to classify these sessions based on session attributes and then allocate resources to these groups that optimally use the platform.

DBRM includes Resource Consumer Groups, Resource Plans, and Resource Plan Directives. Each of these components is stored in the database's data dictionary.

A Resource Consumer Group is a group of sessions based on resource requirements. Resource Consumer Groups provide a way to group sessions that comprise a particular workload. For example, suppose your data warehouse has three types of workloads: critical queries, regular queries, and ETL (extraction, transformation, and loading). In that case, you can create a consumer group for each type of workload. Once Consumer Groups are created, user sessions within the database can be mapped to the Consumer Groups.

A Resource Plan is a set of directives for allocating resources to Resource Consumer Groups. A CDB (Container Database) resource plan comprises CDB Resource Plan directives for a multi-tenant environment.

Each resource plan directive contains different operational attributes that describe how the resources should be managed for the consumer groups or the pluggable databases.

In the following DBRM Resource Plan, there are directives to allocate CPU (70%, 15%, 15%) to each Consumer Group ("OLTP," "DSS," "OTHER_GROUPS"). Consumer Groups can use the available capacity when the system CPU is not 100% utilized.

consumer_group

 

 

 

 

 

 

 

 

What can one do with DBRM?

  1. Guarantee a minimum CPU for a specific consumer group and PDB regardless of the system's load.
  2. Cap the maximum CPU usage for a consumer group or PDB.
  3. Limit the degree of parallelism (DOP) of any operation group members or users perform.
  4. Prioritize parallel statements from a critical application ahead of a low-priority user.
  5. Limit the amount of PGA memory used by each session belonging to a user group.
  6. Terminate a session, call, or switch to a lower-priority consumer group when it consumes more than a specified CPU, I/O, or elapsed time.
  7. Limit the amount of time that a session can be idle.

IORM Can Inherit DBRM Ratios

In a consolidated environment, a noisy neighbor can consume excessive CPU but might also consume excessive I/O on a system. It's essential to control both, and using a single setting works best. DBRM can allocate CPU (limits and minimums) within the database, which determines the resources allocated to that database. IORM can be configured to inherit the same resource ratios used for DBRM. For example, if a Consumer Group is allocated 25% of compute resources (CPU) on a system, that Consumer Group should also be allocated 25% of I/O resources. This means a single set of controls is used to govern resources in the system, making the system more straightforward to manage. Setting the IORM objective to auto is the most straightforward setting for better workload management. Some scenarios require advanced controls for specific workloads. For example, a mission-critical high transactional database may need more space in the Exadata RDMA memory or Flash Cache. In such cases, IORM can allocate specific resources to specific workloads, as described in the "IORM Advanced Controls" section above.

Conclusion

Oracle Database Resource Manager (DBRM) provides the most flexible method for managing system resources across databases and within databases using Consumer Groups. Resources can be efficiently allocated and managed at any level required to meet business needs. Exadata I/O Resource Manager (IORM) complements Oracle Database Resource Manager (DBRM) and provides a set of integrated controls that govern the use of system resources. Resource Management is designed to prevent the "Noisy Neighbor" problem in multi-user and multi-database (consolidation) environments. Exadata Resource Management also guards against overusing resources even in dedicated environments, providing improved system stability and availability. This blog demonstrates that managing resources becomes more important as the number of consolidated workloads on a system grows. Exadata X10M is the most suitable platform to consolidate Oracle databases. When workloads with different priorities (production and non-production) run on the same infrastructure, Exadata Resource Management (IORM and DBRM) can efficiently prioritize and distribute the resources. Exadata IORM is an extension of DBRM, and both should be configured in tandem. IORM settings may be used for advanced configurations, such as when allocating Flash Cache or XRMEM Cache to a particular workload. In most cases, Oracle recommends setting the IORM objective to "auto" and allowing IORM to inherit the resource management settings for the workloads.

Maruti Sharma

Senior Principal Product Manager

Maruti Sharma is a Senior Principal product manager for mission critical database systems at Oracle with over 25 years of software experience focused on relational databases, Big Data, NoSQL data stores, server programming, microservices. Prior to joining Oracle, Maruti was a Chief Architect and Associate Technical Fellow at The Boeing Company where he was responsible for managing everything data related.

Show more

Previous Post

Exadata System Software Updates - June 2024

Alex Blyth | 2 min read

Next Post


Exadata Exascale - World's Only Intelligent Data Architecture for Cloud

Alex Blyth | 15 min read