Contributing Author: Ashish Agrawal, Director, Product Management, Oracle
Oracle Exadata is a high-performance system for hosting the Oracle Database and delivers the highest levels of database performance available. Oracle Exadata Database Machine consists of database servers, Oracle Exadata Storage Servers, an InfiniBand fabric for storage networking and all the other components required to host an Oracle Database. It delivers outstanding I/O and SQL processing performance for online transaction processing (OLTP), data warehousing (DW) and consolidation of mixed workloads. It delivers high performance due to many unique features such as ability to provides database aware storage with the ability to offload database processing from the database server to storage and accelerated Oracle Database processing by speeding I/O operations using Flash Cache.
Oracle Enterprise Manager provides the tools to effectively and efficiently manage your Oracle Exadata Database Machine. Oracle Enterprise Manager provides Oracle Exadata monitoring, patching and provisioning, Oracle Exadata virtualization support, compliance management. With a consolidated view of the Oracle Exadata, Oracle Enterprise Manager provides a consolidated view of all the hardware components and their physical location with indications of status. Oracle Enterprise Manager also provides a software view of the databases residing on it and their resource consumption on compute node and Oracle Exadata Storage Cell.
There are additional challenges in Oracle Exadata management.
Optimize Resource Usage
Which resources will the database machine be bottlenecked based on current load growth from on-boarded databases? Which Oracle Exadata systems have available capacity headroom to support consolidation of additional databases?
Maximize SQL performance
For databases migrated to Oracle Exadata, which SQLs are performing better or worse than expected? Which of the poor SQLs regressed due to bad plans vs. excessive waits? Is the application performance suffering due to excessive non-CPU or non-I/O waits?
Which databases are reporting the most storage-related problems? How do I drilldown from the application tier to Oracle Exadata to perform RCA on my application?
Oracle Management Cloud is a suite of next-generation integrated monitoring, management, and analytics cloud services. Data is automatically analyzed and correlated across all Oracle Management Cloud services, and the resulting insights are made instantly available via intuitive dashboards, Real time diagnostics, capacity planning, operational forecasting, and business analytics. Oracle Management Cloud’s unified data platform helps customers improve IT stability, prevent application outages, improve capacity planning, troubleshoot issues, increase DevOps agility and harden security across their entire application and infrastructure portfolio. Oracle Management Cloud enables Oracle Exadata customers to maximize their investments in Oracle Exadata by leveraging machine learning against the full breadth of the operational data set to maximize performance, optimize resources and troubleshoot operational issues rapidly.
So, let us see how Oracle Management Cloud helps to overcome the above discussed challenges.
Oracle Exadata Analytics application in Oracle Management Cloud IT Analytics pinpoints Oracle Exadata systems expected to run out of resources in the short & long term. It Identifies the top growing databases across the fleet, identifies the key resource (CPU, Memory, I/O or Storage) expected to run out, expected lead time to acquire additional capacity and uncovers underutilized capacity.
Oracle Exadata Analytics provides unified view of inventory, availability, performance and errors. It Identifies database Nodes that are CPU or Memory bound, Storage cell performance issues, Storage cells approaching max performance capacity and Storage cell outliers. Additional it can analyze and detect if Flash Cache utilization is underutilized and not used as expected.
Oracle Exadata Analytics provides inventory and capacity across enterprise. It includes fleet view of Oracle Exadata systems across the enterprise by system type (Quarter, Half Rack), number of Oracle Exadata databases by database version. Further it provides drilldown to single Oracle Exadata to view hardware and software components, compute, storage, networking components. It also shows the configured capacity vs. utilized capacity of CPU, Memory, I/O and Storage.
Oracle Exadata Analytics using regression models provides forecast into capacity growth and compares it with the available headroom. It shows single view of resource utilization across entire database fleet that includes compute (CPU & Memory), storage (ASM, Disk). Oracle Exadata Analytics identifies total lead time to expand capacity using machine learning based forecast, further it projects resource growth and compares it with configured capacity headroom. It uses seasonality to identify growth patterns, e.g. weekly peaks vs. daily peaks. It classifies each Oracle Exadata systems based on available lead times 30, 60, 90, 180 days of capacity. It has the ability to alert users on critical systems that are expected to hit capacity headroom in 30 days or less.
One can set alert rule conditions and specify preference for receiving notifications when alerts are triggered. By default, the following alert rules are available for use:
Oracle Exadata IOPS Capacity Headroom Rule
Oracle Exadata Storage Capacity Headroom Rule
Host CPU Capacity Headroom Rule
Host Memory Capacity Headroom Rule
One can receive these alerts as early warnings that inform about capacity utilization of host resource types such as Oracle Exadata IOPS, Oracle Exadata storage, host CPU and host memory. One can then take remedial measures to manage capacity, such as move the workload to another host, add more capacity, or analyze the reason for increased resource usage and resolve the issue.
Oracle Management Cloud also provides out of the box Oracle Exadata Dashboard which analyzes Storage Cells and shows Storage Cells that are approaching IOPS & MBPS Limits.
One can look at Small Read I/O per sec for Disk type = FlashDisk or Hard Disk and compare with Maximum Disk IOPS for Flash or Hard Disk to identify Storage Cells approaching max performance capacity for IOPS and Small Read Throughput for Disk type = FlashDisk or Hard Disk and compare with Maximum Flash Disk MBPS for Flash or Physical Disk.
For maximizing database performance on Oracle Exadata, Database Performance Analytics automatically provides insights into the database performance by analyzing database performance degradation, databases with varying workload performance, database inefficiency and top SQL statements. There are some unique database and SQL performance challenges that are difficult to solve using conventional tools and methodology primarily resulting from lack of long term data and means to get insights into the data. identify databases that are not taking full advantage of CPU or I/O due to poor application design or contention, database workloads that are degrading with growth or anomalous workloads that are showing degradation and identify unstable workloads that are showing a high degree of variability in performance. identify SQLs whose response time is getting worse slowly over time, identify SQLs and associated databases contributing to high resource usage (CPU or I/O) and identify application SQLs that are a high degree of variability. Database Performance Analytics and SQL Analytics application in IT Analytics helps to solve these above challenges.
The Database Performance Analytics application analyzes database performance based
on long term performance data and provides insights into database performance. it Analyzes performance degradation by response time, identifies inefficient databases by wait time, analyze performance of high variant SQLs and databases, identifies databases which are increasingly inefficient. It identifies Top SQLs across enterprise wide databases.
SQL Analytics is long term SQL warehouse which stores SQL performance data.
SQL Analytics application analyzes SQL performance problems for enterprise-wide applications across a fleet of databases, provides trends and key insights to SQL performance issues thereby helping you to be proactive in avoiding future database performance problems.
Oracle Exadata Database Administrator can find SQLs resulting in poor end-user experience because these SQLs are degrading in response time, SQLs that are resulting in varying end user performance because their response times are varying, SQLs that are inefficient and that represent the best tuning opportunity, so that the application can perform even better and SQLs that have multiple execution plans. An Oracle Exadata Database Administrator can also find CPU and IO Intensive SQLs as well. SQL Analytics categorizes SQLs across databases and applications, which need attention. The categories are Degrading SQLs: SQLs with more than 20% increase in SQL response time, based on linear regression. The value of the SQL response time is derived from the total elapsed time divided by the total number of executions for the SQL. Variant SQLs: SQLs with a relative variability of more than 1.66. Relative variability of an SQL is measured by the standard deviation of the SQL response time divided by the average of the SQL response time. Those SQLs that have a relative variability of more than three are identified as SQLs with highly variant performance. A Relative Variability of a SQL close to Zero indicates stable Response Times, while greater than 1.66 indicates higher variability in Response Times. A Relative Variability greater than three indicates a very high degree of variability in Response Times. Inefficient SQLs: SQLs with inefficiency of more than 20%. Inefficiency percentage of an SQL is derived from the inefficient wait time (wait time other than I/O, CPU, or idle wait time events) divided by the total database time. SQLs with Plan Changes: These SQLs use multiple execution plans. Typically, SQL with multiple execution plans may be a source of SQL performance issues leading to varying, unexpected or poor application performance.
All the above category of SQLs needs an Oracle Exadata Database Administrators attention. SQL Analytics provides this analysis out of the box and helps them quickly narrow down the problem to specific database and SQL. Tuning these SQLs proactively will result in better application performance.
Out of the box, SQL Analytics provides answer to the question which SQLs are consuming most CPU or IO across the entire Oracle Exadata estate. This helps the Oracle Exadata Database Administrator to quickly narrow down where is the most CPU or I/O resources consumed and if needed proactively tune those SQLs or control the resource consumption by using Oracle Database Resource Manager or I/O Resource Management (IORM).
Out of the box, SQL Analytics provides fine-grained performance information and insights about an individual SQL. It provides SQL Text, the database name and host the SQL where the SQL is being executed. Further, it provides Performance Summary, which includes Average Response Time, % change in Average Response Time, Executions Per Hour, Variability and Inefficiency. The Execution Plan Insights shows number of execution plans the SQL has used, the best and worst performing plan, the execution plan which has consumed the most CPU and IO. The Performance Trend by various measure section includes Average Response Time, Executions Per Hour, Active Sessions, I/O Time, CPU Time and Another Wait Time. SQL Analytics provides Activity breakdown of Active Sessions by I/O Wait, CPU Time, and Another Wait. It provides by Response Time Distribution and Response Time Breakdown of the SQL. By getting access to find grained SQL Performance statistics of an individual SQL, Oracle Exadata Database Administrator can find out where the SQL went bad, if there was an execution plan change, which execution plan is good, which execution plan is bad.
One of the most common issues in IT operation is related to application performance, so when apps are slow, fingers often point at the database and DBAs. DBAs should be able to answer if the problem is somewhere else or the problem is in the database. The reality is databases are complex and full of critical information, so database monitoring and troubleshooting needs to become a priority. DBAs should work with the rest of the IT department to provide complete visibility over database related entities to make sure they are not a bottleneck and help the team to faster troubleshoot issues. When it comes to troubleshooting, logs are one of the most important assets but unfortunately, logs spread across a variety of sources. When an issue happens, DBAs need to access different logs across many sources which is not easy and sometimes not possible because of lack of privileges.
To provide answers/solution to issues mentioned above, DBAs can use Oracle Log Analytics as part of Oracle Management Cloud to collect, aggregate and store logs from across all databases (single instance, RAC, ASM, Oracle Exadata). Log Analytics can monitor each separate entity to automatically collect all database related logs through out-of-the-box log sources. There are different capabilities to collect/access logs in Log Analytics such as monitoring log files through Cloud Agent or extract data from Oracle Database through the same agent. Also, users can send data through syslog or API to Oracle Management Cloud. In addition, Log Analytics can automatically parse all events through rich built in parsers. So, collecting and parsing data require almost zero effort. Log Analytics solution supports all the component of Oracle Exadata out-of-the-box including: Database Alert logs, Database Trace logs, Database Incident, Listener log file, Database Audit logs, Database Audit XML logs, ASM alert log, ASM Trace log, OS message log file, OS Secure log file & etc.
Once all the necessary data is in Log Analytics, system administrators can use the different search capabilities (query language or visual builder) to easily search and slice & dice through their data to troubleshoot, get insight, find unknowns and eventually find root-cause of issues. databases can easily get answers to questions such as; how many times a database has restarted? How many login failures happened with last 24 hours? How many instance shutdowns and crashes happened within the last 7 days? Or monitor the hourly trend of logs to identify the change over the past 24 hours. The example below shows how easily users can search for a specific “OS Process ID” from all their “Oracle Database Instance” & “Oracle Database Listener” logs within the last 24 hours.
Oracle Exadata Database administrators can easily build different custom dashboards to monitor their Oracle Exadata health overview. They can edit the content of the dashboards at any time. They can create dashboards which contain both log related data and metric related data from the monitoring services in one place. They can have all the necessary information in one place to correlate events together to get the context around the issue in troubleshooting scenarios. As an example, the user is monitoring “ORA-600” and “ORA-7445” errors at the dashboard level to get visibility over number of individual occurrences as shown below. This dashboard is also visualizing the error distribution over time to be able to easily drill-down into any specific one while troubleshooting.
Database Health Overview Dashboard Through Logs
Dashboards and alerts are typically the first two things that administrators and product owners check to make sure the system is healthy and there is no issue or outages. Users can easily create these kind of dashboards and visualize what is important to them based on their own use cases. Here is the link to how create a dashboard in Oracle Management Cloud: https://docs.oracle.com/en/cloud/paas/management-cloud/logcs/creating-custom-dashboards.html.
Oracle Exadata Database administrators can also take advantage of smart alerts in Log Analytics. They can create complex saved search queries and create alert to get notified about issues in real-time. They can even create anomaly alerts based on their data and use cases to eliminate noises based on calculated baseline for each search. In the example below, users can easily create an alert for authentication failures based on “Linux Secure Logs” since “Authentication Failure” is a known error category in Oracle Management Cloud Log Analytics.
Log Analytics Service is not only able to collect logs from any source/entity; but also, it is aware of associations between those entities; for example, databases, servers, middleware servers etc. In addition, further associations can be defined to customize the topology view of an environment or application. Database Administrators can easily see all the components made their applications and use the topology flow to filter and drill-down into specific entities and its associations for troubleshooting and root-cause analysis. For instance, users can use Log Analytics to quickly find if there are any errors in their database instances as shown in the diagram below. As shown in the entity flow diagram below, there is a database instance which is not normal (colored red), users can select the entity or combination of entities to drill-down into relevant logs and search for any specific error i.e. “ORA*” from the events.
Topology Aware Log Exploration
One of the unique capabilities of Log Analytics is all logs automatically get classified into commonly known/used error categories when it comes to Oracle products like Oracle Database, Oracle Exadata, Weblogic Server. All the relevant logs get associated with labels based on out-of-the-box defined conditions. There are many labels built in the product like “Data Corruption” instead of “ORA-0227”, “Connection Error” instead of “ORA-03106” or “Deadlock” instead of “ORA-00060’.
As mentioned, for Oracle products like databases, Oracle Exadata, Web Logic Servers, etc. All error categorizes, and their trends will be shown automatically if there is any in data coming to Oracle Management Cloud. As shown in the screenshot below, the system automatically finds all the error category, priority, count and trend of each errors. Users not only get visibility over these out-of-the-box, but also can use them to drill-down into relevant data for troubleshooting and root-cause analysis use cases. Users can create their own labels based on simple/complex conditions to enrich their dataset and expand the error category labels for different data types.
Out-of-the-box Error Categories
These error categories (labels) help system Administrators to be able to search faster and easier in troubleshooting and root-cause analysis use cases. For instance, users can search for all “Memory Error” events across log sources in their environment by just searching for label = “memory error” as shown in the first screenshot below or search for label = deadlock in the second screenshot below.
Search for Error Categories Across Different Data Types
Search for Error Categories Across Different Data Types
Using Oracle Log Analytics Machine Learning “Cluster” capability, users can reduce millions of log events into a smaller set of patterns based on common signatures, rapidly troubleshoot problems by quickly identifying outliers and potential Issues and show trend of each clustered group; correlate clustered events that show similar trends. Sometimes, there are unknown issues and DBAs have a hard time to find out where is the best place to start the investigation and how to narrow down the scope of the search to only relevant data. Using Machine Learning Clustering empowers users to automatically find potential issues and outliers (events that have only happened once). These are all out-of-the-box insight provided to users automatically specially when there is no indication or obvious information around the issue/problem for faster troubleshooting and get to the root-cause.
All the above helps the Oracle Exadata Database administrator rapidly troubleshoot Oracle Exadata problems.
While most enterprise IT systems provide a reactive approach to system monitoring, capacity utilization, workload performance and issue resolution, Oracle Management Cloud delivers an automated, proactive and end to end integrated management solution to customers.
Oracle Management Cloud for Oracle Exadata provides complete management solution for Oracle Exadata deployments–Oracle Exadata, Oracle Exadata Cloud Machine and Oracle Exadata Cloud Service. Oracle Management Cloud for Oracle Exadata enables customers to get complete Oracle Exadata visibility, streamline the capacity planning process, automate database performance issue identification while helping to proactively troubleshoot issues. It comes with low management effort, costs than comparable solutions and administrators get best prebuilt functionality for Oracle Exadata, Oracle Database and applications. One can conclude that Oracle Management Cloud provides comprehensive solutions for Oracle Exadata to enable Oracle Exadata administrators and DBAs to maximize performance, optimize capacity and rapidly troubleshoot problems in Oracle Exadata.
Join us for an Oracle webcast where our experts will provide you with the tips and tricks you need to get 360–degree insight into Oracle Exadata's performance.