We’ve noticed several performance tuning sessions being presented and blogged about recently, so thought we’d add to the collective by going to the source for the best database performance tuning techniques, the Exadata Engineering team (they do this for a living!).
And there’s no-one more qualified to give performance tuning tips than our own Cecilia Grant, presenter of the popular (yet blink-and-you've-missed-it) session "Exadata Performance Diagnostics" theater talk from Oracle Openworld SF last year. Cecilia is one of our resident Performance tuning experts here at Oracle. We've asked Cecilia to give us ten tips for database performance tuning.
Here we go:
When measuring performance, identify a key application metric that reflects how the application is performing, or that would be a good proxy for the end user experience. This metric should be external to the database. For example, orders processed per second or the elapsed time of a batch job. Having an application level metric allows you to take an objective measurement as to whether or not performance is improving where it matters.
Based on the key metric, define the success criteria. This allows you to measure progress, and also allows you to know when you can stop. It is sometimes tempting to keep making changes, but if you've already achieved your goal, and any change will not make a material impact, then stop.
There should only be one key metric - if you have too many metrics, some may improve, while others may regress, so you won't be able to cleanly evaluate performance changes. If you must have multiple measurements, then measure them in such a way that they are completely independent of each other and can be evaluated separately.
It is necessary and critical to have a clear definition of the performance problem, specifically what is slow, and how slow is it?
As part of defining the performance problem, also understand the scope – is it limited to a set of queries, or a set of users? Or is it more widespread, affecting all users in the database instance, or perhaps even multiple databases? By understanding the scope of the problem, you can then use diagnostic data that matches the scope of the problem. Using statistics for the entire system may not be relevant if the problems are only limited to a few users.
Similarly, any solution should also match the scope of the problem – for example, if the problem is limited to a few queries or a few users, then the solution should only be focused on those users/queries – for example, do not change an init.ora parameter that can potentially negatively impact all users of the system.
First of all, this assumes that the performance problems are reproducible to begin with, or at least reproducible within a known tolerance range. If the problems are not reproducible, then you won't be able to measure the effects of any changes that are made. So go back and structure a workload or a test case in such a way that the results are reproducible. Once you've ensured that you have a reproducible performance issue, then only change one thing at a time, to help you identify whether or not the change helped.
There may be times when you may need to change multiple things in the interest of time (especially when you want to reduce required downtime or outages). In that case, change things that are not going to impact the same area, and that can be evaluated separately. For example, you can implement a SQL profile that affects a single SQL statement and a mid-tier change that controls connection pooling behavior.
Remember, performance tuning is an iterative process. People tend to get impatient and want to see immediate improvements. But realize that some changes you make may help, while others may not. In some cases, there may be a bottleneck right behind the one you're fixing. This doesn't mean the fix doesn't help – it simply means further improvements are needed. Analyze the performance statistics provided by the database, OS and Exadata to help determine where the bottlenecks are, and the areas to improve. The key application metric is simply an overall measure to help you gauge performance, while the individual statistics are there to help focus the performance tuning efforts.
This is also why defining a success criteria up front, as stated in the first tip is important. Continuing to make changes when the goal has been met, risks resulting in a regression.
The Oracle database provides a large number of performance statistics and performance reports. The AWR report is the most commonly used performance diagnostic tool - and there's actually a whole family of AWR reports (global, single instance, compare period, global compare period, AWR SQL report, PDB report). The AWR report tends to be useful when problems are widespread.
If the problems are affecting a subset of users or queries, the AWR SQL report, the SQL Monitor report, and even the SQL Details report tend to be extremely useful to look at.
The ASH report can be used in either case – it has active session history for the entire instance/database, and it can also be filtered based on specific users, SQL statements, or other dimensions of the session.
The Exadata sections available in the AWR report show the statistics as maintained and collected by the storage cells, which means that they show activity on the storage cells, regardless of instance or database. So the scope of the Exadata statistics would typically be larger than the scope of the AWR report(s), and as such, they are not included in the AWR PDB report or the AWR SQL report.
The performance reports summarize performance data that is readily available in the database, through dynamic performance views, also known as v$ views. However, there are different types of measurements and statistics exposed by the database views:
a. measured/counted - this includes most of what is exposed in AWR, including the Exadata information.
b. derived (metrics) - these statistics are derived or calculated based on the measured/counted statistics to come up with per second or per transaction rates, and are typically exposed in v$*metric* views (such as v$sysmetric, v$sysmetric_history, etc).
c. sampled - such as ASH. Active sessions are sampled at regular intervals and can be used to determine how time in the database is spent. ASH is an extremely useful utility, however, because it is sampled, you need to use it wisely. Specifically, beware of "Bad ASH Math" as has been presented extensively by John Beresniewicz and Graham Wood.
On Exadata, there are additional data sources as well, such as
a. ExaWatcher - this contains granular data (every 5 seconds), and includes both OS statistics and cellsrv statistics
b. cell metric history - this contains both cumulative and derived (per second, per request) statistics, as measured by the cell software
Use DB time. Even though the Exadata statistics have been exposed in the AWR report, ensure you're running into an I/O issue first, as reflected by the database wait events, before diving into the Exadata statistics.
Continue practicing good database design principles. We see more and more cases of applications that ignore good database design principles, as most of the time, users are able to get away with it due to the smart features that Exadata provides. However, you will get more from your system if you follow good database design principles. For example, just because it is running on Exadata, it doesn't mean everything should be a smart scan. Review your schema to see if indexes or partitioning are appropriate based on the access patterns of the SQL statements.
Also, having better storage on Exadata is not going to help address issues that aren't storage related. For example, poor connection management, or repeated parsing are application-level issues that will still occur even with a superior storage solution. Similarly, row by row processing (or as coined by Tom Kyte, "Row by row = slow by slow"), has tremendous overhead compared to set processing, and is not necessarily going to be improved by a better storage solution.
A badly written SQL statement will execute poorly - regardless of the environment that it's running on. The cost-based optimizer can only attempt to find the optimal execution plan of the SQL as it is written. Running a badly written SQL statement on Exadata may give you a boost in terms of offloaded processing, or faster I/O, but it will still not perform as well as a well-written SQL statement with a good execution plan.
For example, SQL statements accessing a table multiple times, when it can be done once, result in unnecessary extra processing while accessing the table. Another example would be applications that execute SQL statements that do not specify the filter predicates, but instead rely on the application to filter data out. In such a case, it would be better to reduce the data set that is being processed by the database up front, rather than relying on the application to filter the data. Relying on the application would typically result in unnecessary processing of large data sets in the database.
Oftentimes, we see users make changes based on gut feel, making assumptions on where the issues are, without looking at the multitude of performance data available. Or, users choose to ignore the available data, especially if it doesn't support what they believe is occurring. Instead, use the available data to formulate a hypothesis on why the performance problem is occurring, and then make your tuning recommendations based on that.
Keep up with the new features that the database provides, and use them wisely rather than doing things the way it's always been done. For example, with DB 19c's real-time stats and auto indexing, the database can auto-index and keep statistics using machine learning, reducing the reliance on manual intervention such as locking statistics, using stored outlines, etc.
Similarly, if you have an opportunity to use a best of class solution from a vendor, such as Exadata for the Oracle Database, use it! The system has been designed by professionals, it removes a lot of the guess work, reduces the amount of mundane tasks, allowing you to focus more on higher value projects and innovations, adding value.
So there you have it, we hope these 10 tips help. If you have questions or comments, Cecilia will be joining Gavin and Cris at the Exadata Office Hours chat in March, so make sure you register for that, there'll be a link to the recording there too if you've missed it.
Gavin is a product manager in Oracle’s Exadata team, with a focus on software and hardware roadmap. Prior to the Exadata team, Gavin was a founding member of the team responsible for launching the industry’s first on-premises public cloud technology - Oracle Cloud at Customer. In his 15 years in Oracle Product Management, Presales and Consulting roles, Gavin has developed a robust understanding of all things Oracle, helping customers architect and implement a variety of infrastructure and application technologies.