Host Observability with Stack Monitoring

September 29, 2022 | 6 minute read
Aaron Rimel
Principal Product Manager, Observability and Management
Text Size 100%:

We are pleased to announce the general availability of host monitoring with Stack Monitoring, supporting both OCI (Oracle Cloud Infrastructure) Compute Instance and on-premises host servers.  Stack Monitoring provides DevOps engineers with the tools to quickly alert and identify problems such as a filesystem running out of disk space or high CPU utilization.  Operations teams can utilize the Enterprise Summary view to review the availability status across their fleet of OCI Compute and on-premises hosts.  Stack Monitoring focuses on discovery and monitoring of an application and application stack technologies, such as Oracle Database and E-Business Suite.  With the addition of host monitoring, Stack Monitoring provides greater visibility into the health of the entire application stack to proactively identify issues such as host resource starvation that may lead to application failure.

Getting started with host monitoring

Full monitoring enables the collection of the multi-dimensional metrics of the host.  After configuring the management agent, run the OCI Command Line Interface (CLI) discovery-job create command to discover the OCI compute resource or on-premises host as new resource of type 'host'.  Once the discovery job completes, full monitoring of the host will begin.

If you have an Oracle Database, WebLogic Server, etc. running on the host, discover these next using the resource discovery workflows.  Once these resources have been discovered, run CLI resource associate command to create associations between the resources and its host.

Host homepages provide access quick access to the data that matters

Stack Monitoring provides a specially curated homepage with quick access to key performance indicators.  This homepage includes details on the health and status of the host.  Quickly identify performance issues such as high memory utilization using the out-of-the-box Charts.  Easily review filesystem utilization across all mount points using Tables.  Review any open alarms and click a link to view the alarm in a new tab with greater details.  Properties provide quick access to critical information such as the OS version, a summary of open alarms by severity, and details on the date and time of the last status change for the host. 

Figure 1: Homepages provides an overview of the status and performance of a host
Figure 1: Homepages provides an overview of the status and performance of a host

Review the overall health of a host with homepage Charts

The charts within Stack Monitoring have been carefully chosen and organized to provide quick access to critical KPIs (Key Performance Indicators) of a host.  Quickly assess if the host is running out of CPU, memory, or filesystems are filling up.

Figure 2: Charts provide quick access to key performance indicators
Figure 2: Charts provide quick access to key performance indicators

When monitoring Linux memory, it is imperative to monitor Swap Utilization.  High swap may be attributed to memory pressure and result in slower performance as physical memory is much faster than swap.  Leverage Stack Monitoring’s charts to view current Swap Utilization or look back historically at a period of time to review past performance.  The charts provide easy visibility if swap utilization is slowly increasing, a task that would be difficult using commands such as free.

Figure 3: Easily view Swap Utilization over time
Figure 3: Easily view Swap Utilization over time

Visibility into all dimensions of the metrics with the homepage Tables

Stack Monitoring Tables display all dimensions and values of a given metric.  This information can be helpful when reviewing the Filesystem Usage in GB used for all filesystems on a host without the need to log on to the host.  Disk activity is an indicator to how busy a host is.  Disk Activity Summary reports the number of read, write, and total operations per second of all disks on a host.

Figure 4: Tables display all dimensions and values of metrics
Figure 4: Tables display all dimensions and values of metrics

Understanding what is running on the host

Using the Related Resource page, easily identify if a resource such as an Oracle Database, WebLogic Server, or a concurrent manager runs on a host.  This visibility is important when triaging a performance issue, by providing details into what is running on a host.  Additionally, the Related Resources page allows for easy navigation up and down the application stack. 

Figure 6: Identify what resources are running on a host
Figure 6: Identify what resources are running on a host

Clicking a resource’s Name on this page will navigate to the resource’s homepage.  For example, when viewing high memory usage on a host, this page identifies the host is used by a WebLogic Server. Using the links quickly navigate to the WebLogic Server's homepage and investigate metrics such as Memory Usage to help identify any potential issues.  Inversely, if your JVM is running slow, leverage the WebLogic Server's Related Resource page, to identify on which host WebLogic Server is running on.  Then easily navigate to the homepage and review the host's key performance indicators.

Putting the metrics to work with alarms

Monitoring a host is more complex than simply setting a threshold on the amount of RAM used.  As such, monitoring must be capable of measuring more than a single metric dimension.  Stack Monitoring provides a large number of rich, multi-dimensional metrics that are collected immediately once the host has been promoted.  These multi-dimensional metrics allow a single metric to have its’ data categorized and individual alarms created on each dimension.  For example, with a single metric such as Filesystem Utilization, when monitoring a filesystem that is not expected to grow it may be prudent to set a warning threshold of 90% and a critical threshold of 95%.  However, the criticality of the root filesystem may warrant a 70% warning threshold and an 85% critical.  These metric dimensions provide the flexibility to see metric data more granularly and set thresholds that meet the business requirements.  Other examples of metrics provided by Stack Monitoring include CPU Utilization, Disk Activity, and Network Activity.

Figure 7: Critical filesystem alarm notification
Figure 7: Critical filesystem alarm notification

 

With the addition host monitoring, Stack Monitoring provides the necessary top-down visibility into an application stack.  Using Stack Monitoring, monitor everything from the host to the EBS application running on it and everything in-between.  The multi-dimensional host metrics provide peace of mind your application is running as expected.  Since Stack Monitoring provides curated charts and tables out-of-the-box, this allows the operations team to focus on business-critical applications not building monitoring.

 

Get started today!

Getting Started

 

Resources

Command Line Interface (CLI)

Updating Application Topology

Resource Discovery

Host Metric Reference

Recent Blogs

OCI Stack Monitoring

Support for Multitenant Databases

Aaron Rimel

Principal Product Manager, Observability and Management


Previous Post

Operations Insights' Exadata Warehouse for simplifying Exadata capacity planning and forecasting

Sriram Vrinda | 5 min read

Next Post


Top 5 sessions and hands-on labs on Observability and Management at Oracle CloudWorld 2022

Daniel Schrijver | 5 min read