Announcing enhanced alarms with custom messages and dashboard

June 18, 2024 | 5 minute read
Satyendra Kuntal
Principal Product Manager
Text Size 100%:

Oracle Cloud Infrastructure (OCI) Monitoring is happy to announce enhancements to alarm definition and alarm visualization capabilities. You can now customize the alarm body, summary and notification subject to include the specific information you want, making it easier and faster to understand the context that triggered the alarm. You also get an enhanced single view of all alarms to get a consolidated summarized information of all alarms for quick decision making.

Previously, the admins, ITOps and DevOps engineers, who are the first responders when an alarm fires, couldn’t create actionable instructions in the alarm messages because users could include only static values in the alarm body. They were unable to add dynamic variables that could be replaced with real-time dimension values, before notification was sent.

Additionally, no fields captured the alarm subject or summary. The alarm summary was static with the following format: Alarm <Dimension key> is in <alarm status> state; because the resources with dimensions listed below meet the trigger rule <query>, with a trigger delay of <Trigger delay minutes>. The alarm subject line also had a static format as follows: <Type> | <Severity> |<Name of the alarm>|Timestamp. For example, "OK_TO_FIRING | Critical | CPU Utilization Alert|2023-10-26T13:48:00Z".

Moreover, the key pieces of information, such as affected resources (or dimensions), was not easily visible unless the engineer scrolls through the notification and looks for the affected metric stream. The issue resulted in delayed response or required engineers with more technical knowledge on the workings of notification messages. 

We heard your need for notifications to be more human readable, intuitive and actionable. So, we're now enabling you to configure alarm body, summary and subject line with values such as affected resources and metric values, while defining an alarm. These enhancements can empower you to not only view real-time values for metrics in alarm body, summary and notification subject when an alarm triggers, but also make it easy for you to understand the context of the holistic list of alarms.

Customize alarm body, summary and notification subject

We're providing you an ability to construct your own alarm message with an alarm body, subject and summary. You can now include context specific information such as metric name, resource name, metric value that triggered the alarm, and alarm condition. Values of these variables are populated dynamically in the alarm content to help convey a meaningful and actionable message so that it's obvious which metrics or severity a particular alarm is being triggered for.

As part of this enhancement, we're adding two new information capture fields on the Alarm Definition flow - notification subject and alarm summary. We're also providing an ability to add variables that are part of the alarm notification JSON, to the existing alarm body field and the new fields.

To customize the alarm content, you can use a list of variables of the form . When the documentation is sent with a notification, the string  is replaced with the actual value drawn from the corresponding alarm notification payload. For example, if the alarm body is entered as " alarm triggered because threshold got breached due to at ", it's rendered as "CRITICAL alarm triggered because threshold got breached due to [CpuUtilization[1m].mean():92] at 2023-08-15T19:51:00". The variables are supported for all the notification channels like email, Slack, and functions.

Notable details

This update also has the following features:

Group notification option: When the option to send single group notification is enabled for all the firing metric streams only the first dimension is allowed to be added. For split notifications, because only one metric stream is allowed per notification, support for only one variable works just fine.

Character limit: The alarm body character limit is 1,000 characters. The summary is limited to 500 characters for quick review by the user, while the alarm notification subject is limited to 160 characters. No OCIDs are supported in the subject line, as they take 100 or more characters.

Metric Value: We’ll return only the first metric value of all the subqueries. For example, if the query is: CPUutilization.mean > 75 || Memoryutilization.mean() > 48. If CpuUtilization is 76 and MemoryUtilization is 38, then we return  [76, 38].

Alarm definition with ability to add dynamic variables in alarm body, summary and notification subject

Alarms status dashboard

We understand your pain of viewing a list of firing alarms that lacks context and needs you to go into individual alarm to understand the context, or other details. We're now providing an alarms status dashboard with a holistic view that gets you a quick consolidated summarized information. The Alarm Status view is enriched with brief alarm summaries, resource types that resulted in triggering alarms, and other relevant details so that you can get a meaningful succinct information for making actionable decisions from the summarized view. It is a single pane of glass for all alarms that are firing.

The alarms status dashboard creates an alarm triage view with cross-service alarms listed. It provides you with a summarized count of alarms by severity and the alarm details, such as alarm name, alarm subject and summary, count of impacted metric streams for the alarm, and severity. You can now customize the columns in the alarm dashboard, where you can select the columns you want to see in the list. You can also filter the alarms based on criteria such as metric namespace, alarm severity, resource group, and relative time window that the alarm was triggered, such as the last hour.

Enhanced alarms status view with consollidated and summarized alarm information to understand alarms context better

Getting Started

The enhanced ability to customize alarms message content and view a summarized list of alarms is available in all commercial regions. These features do not cause breaking changes to your existing alarm definitions with static content available until now. For more details, refer to the technical documentation.

We welcome you to sign up for the Oracle Cloud Free Trial or sign in to your account to experience Oracle Cloud Infrastructure Monitoring for monitoring your infrastructure. We'd love to hear from you. If you have any questions or feedback, contact us.

Satyendra Kuntal

Principal Product Manager

Previous Post

Transitioning from Terraform to Pulumi: Implementing multiple IaC tools in the cloud

Akarsha Itigi | 4 min read

Next Post

Behind the Scenes: Using OCI Generative AI Agents to improve contextual accuracy

Egor Pushkin | 15 min read