Organizations are moving VMware estates to the Oracle Cloud VMware Solution for better control and security, on-demand scale, and a better, cost-saving, subscription-based model. But after the workloads have moved to a VMware software-defined data center (SDDC) on Oracle Cloud Infrastructure (OCI), VMware administrators might wonder about the options to monitor the ESX hosts that are part of the VMware Solution SDDC.
OCI covers this area as a part of the Oracle Cloud VMware Solution provisioning process using Infrastructure Health Monitoring for all the hosts at no extra cost. The oci_compute_infrastructure_health metric metric is automatically created when the ESXi hosts are booted up and OCI Monitoring service manages all the alarms.

How does this monitoring work?
When the ESXi host is active, infrastructure health metrics for CPUs, memory, and disk usage are sent at regular intervals to the OCI Monitoring service. If any alarm or alert is observed on these infrastructure components, an automated email notification is created and sent to the tenancy administrator (by default) to act based on the alarm. We recommend creating user-defined notifications for all infrastructure metrics and ensuring that the notifications are sent to multiple Oracle Cloud VMware Solution VMware admins.

You can see the same error in the Oracle Cloud Console by looking under the Compute option in the menu, Instance, and selecting Instance name.

Generating custom metrics, alarms, and notifications
VMware Solution administrators can also create custom queries for oci_compute_infrastructure_health metrics to generate alarms and notifications as required using the OCI Observability services.
At the time of this post, OCI Notifications service integrates and provides notifications using email, SMS, functions, PagerDuty, Slack, and custom URLs. OCI Monitoring also plugs into existing Grafana dashboards using the OCI Grafana plugin.
Responding to an alert
At OCI, we understand that hardware can go bad. To protect our customers’ investment in Oracle Cloud VMware Solution, we provide our customers with a zero-cost host replacement capability.

As a part of this feature, an ESXi host is created under the SDDC. When the host is active, customer VMware administrators can ingest the host into their vCenter, NSX, and vSAN configurations using the solution playbook, Add an ESXi host to Oracle Cloud VMware Solution.
When the new host is added to the cluster in vCenter and moved into production, the vCenter admin must delete the faulty host from the vCenter, NSX, and vSAN and mark the replace host operation as complete within the stipulated time period, as displayed in the Console. Failure to complete this process results in the tenancy being billed for both the replacement and faulty ESXi host. Only hosts that the monitoring system detects as faulty are eligible for host replacement. For assistance, raise a service request for OCI support.
Next steps
Enhance your Oracle Cloud VMware Solution infrastructure observability further using the Oracle Cloud Infrastructure Monitoring and Notification services. If you have any questions or suggestions, contact the Oracle Cloud VMware Solution team.
