In this blog post I will discuss best practices I shared with a DevOps team responsible for several Exadata systems at an Oracle customer. These systems provide business-critical services to a large number of users and one of the key objectives for the DevOps team was to be able to rapidly identify issues and prevent impact on users. At their disposal was the typical log data you would encounter in such environments including the OS system logs, DB alert and trace logs, and ExaWatcher logs to get Exadata utilization data.
The ExaWatcher tool is part of the Exadata software stack and responsible to collect performance data from the individual Exadata nodes. The collected logs contain a wide area of system metrics:
In a recent post in the Oracle Cloud Customer Connect forum, I have provided a custom parser / source for ExaWatcher logs and have shown step by step instructions on how to get the logs from the Exadata nodes into OCI Logging Analytics. As an example, in this forum post I analyzed the Vmstat.ExaWatcher logs but the techniques described there or in previous forum posts can be used to process all the other ExaWatcher logfile types as well.
From the ingested data we were able to create dashboards like this one:
Together with the system logs and various DB logs, DevOps will have all logs available in one place, perfect for analyzing and troubleshooting Exadata issues or even better, proactively avoiding them.
Now, what if the DevOps team wants to get notified if the CPU utilization is growing beyond a certain threshold value?
I will use this valid request as an example in this blog post and will show step by step how to achieve it. Actually, it is a very good use case for all kind of other log queries, not just for Exadata utilization metrics.
The overall steps to achieve this are the following:
In Metrics Explorer view, select Alarm Definitions from the left, the Create Alarm wizard will open. Give it a name and select the Metric namespace, the Metric and the interval which should be used to check for the alarm condition:
Set the trigger rule to when you want to get notified:
In Notifications, a Topic is describing the Notification Channel to be used, select Create Topic if you don't have one yet:
You can select from various options where the notification alert should get sent to, Pager Duty, Slack, eMail etc.
Save the Alarm Definition and you are all set. Review the status and history of Monitoring Alarms in Alarm Definitions:
With that, we have finished all steps to set up monitoring for Exadata Utilization as requested.
Like mentioned above, the steps described here can be used for any other Logging Analytics query you want to get notified on, e.g. on certain ORA-XXXXX error messages in the DB logs or OS errors in /var/log/messages.
To learn more about Logging Analytics Cloud Service, visit our product page where you can find links to our Free Trial, related products and additional information.