Contributing Author: Debu Panda, Senior Director, Product Management, Oracle
We are seeing a rapid adoption of public cloud by organizations worldwide in the last couple of years. Several organizations have adopted multiple public cloud vendors such as Amazon Web Services (AWS) and Oracle Cloud Infrastructure (OCI). Although organizations are moving to the cloud, they have many of their mission critical applications still running on-premises or in a private cloud environment.
While organizations embark on their journey to the cloud, they still face several challenges managing a diverse multi-cloud or hybrid IT environment. When I talk to customers about their challenges, what I’ve found is a wide range of issues that are occurring across different regions and industries. You can group these challenges into several key problem areas that include:
Lack of unified visibility due to disparate management tools,
Availability and performance issues take time to diagnose,
Growing application and/or VM sprawl,
Difficulty in managing and optimizing IT resource usage,
On-going compliance drift of critical applications and systems,
Ability to secure hybrid cloud.
In this blog, I will discuss how you can use Oracle Management Cloud to address these areas, so you can get a unified view in to the health and performance of your application and infrastructure in a multi-cloud and/or hybrid cloud environment. I will cover how the machine learning capabilities in Oracle Management Cloud improves the time-to-value, reduces your cost of ownership and improves your mean-time-to-repair. Let’s start off by looking at the key use cases that apply to the development and IT operations (DevOps) model many organizations are employing today.
The primary goal for DevOps / ITOps organization is to ensure that they meet their service level agreements and to help improve the coordination, efficiency and speed of deploying new services. While at the same time, DevOps needs to help with the task of reducing the overall cost of IT operations. To meet these objectives, as DevOps/ITOps personnel, you need a management solution that fulfills all these requirements and one that provides you with:
Complete visibility of application and infrastructure resources deployed on-premises and public cloud,
Ability to monitor end-users, application, database and infrastructure’s performance,
Rapid diagnostics capabilities to detect application performance degradation and system errors,
Automate remediation of application failures,
IT resource capacity planning based on demand, you want flexibility to automatically spin up or down resources when needed,
Cost control, this goes without saying, your management solution should help you lower your operational expenses.
The following graphics shows key capabilities of Oracle Management Cloud that DevOps/ITops personnel can leverage to manage hybrid cloud environments.
The power of Oracle Management Cloud is in its ability to help you see a holistic view of your entire IT estate, whether it’s on-premises or in the cloud or in multiple clouds. The solution provides a unified view in to system availability and application performance across all tiers of your cloud and on-premises resources—all from a single management console. Let’s take a deeper look in to how Oracle Management Cloud provides this proactive monitoring.
Oracle Management Cloud allows you to collect all types of telemetry for your cloud or on-premises applications that includes server metrics such as CPU, I/O utilization for your compute nodes and underlying entities such as databases, logs for your compute and application infrastructure, end-user metrics for your web-based or mobile applications, deep-dive application performance metrics, etc. The solution allows you to bring metrics in from other monitoring systems such as Oracle Enterprise Manager, Microsoft SCOM, VMWare vCenter, and public clouds such as Oracle Cloud Infrastructure, AWS, Microsoft Azure to name a few. You can also bring your business and operational metrics into Oracle Management Cloud by using our REST API. Oracle Management Cloud’s machine learning and data explorer tools can also be used to gain insight into your application performance based on your metrics and log data.
The following figure shows types of data collected and stored in Oracle Management Cloud’s unified big data platform.
Oracle Management Cloud provides a unified monitoring console for all your compute resources (public cloud vs on-premises), PaaS services, on-premises applications, databases infrastructure, application servers, and networking devices. To see the complete list of entity types supported by Oracle Management Cloud click here.
As you can see from the figure below, we monitor the availability and performance of all entities which are distributed across on-premises, Oracle Cloud and Amazon Web Services. This view centralizes entities running in on-premises or in public cloud infrastructure.
Oracle Management Cloud reduces your cost of operations by leveraging machine learning and helps you to proactively monitor problems in your application and infrastructure before the business is impacted. It can also automatically remediate many infrastructure problems by using run-book automation. We will discuss this in more detail later in the article.
Oracle Management Cloud provides innovative ways to identify hot spots within your application and infrastructure by arranging your resource entities in a topology viewer. Notably the entity map visualization helps you manage at scale and provides a novel way to identify problematic entities by highlighting them, so you can quickly troubleshoot issues and preform root-cause analyses by different dimensions such as, location, entity type, cost center, resource tier, etc.
Using system and log data, we can apply machine learning to search, analyze and correlate millions of log files to help identity any potential security problems or performance anomalies that might occur. Oracle Management Cloud’s machine learning algorithms does that heavy lifting for you. You don’t have to spend costly labor-intensive hours searching for that “needle in the haystack,” the solution takes care of finding, analyzing and reporting issues for you.
Oracle Management Cloud ingests your system and log data into the solution so that we can apply machine learning to it. When you apply machine learning to the data you can start to see that it’s highly patterned. If you look at the logs—most of the times, it’s a lot of the same patterns which means we can derive insights into what the data is telling you.
Watch the video below to see how Betacom, a European solution provider, is using Oracle Management Cloud with their clients to help resolve issues faster.
Recently, I was working with a large retail company whose workload is distributed between Amazon Web Services, Oracle Cloud Infrastructure and their on-premises data center. They wanted to centralize their visibility to all their compute infrastructure. What was interesting in this case, was that I helped them build a custom dashboard within Oracle Management Cloud, similar to the screen shot below, that used different types of data (metrics, logs, etc.). All the data and log files were collected from their cloud and on-premises infrastructure and brought under one management console and view in Oracle Management Cloud.
Let me describe the dashboard a little bit more. The dashboard illustrated above brings together some key metrics. For example, the customers wanted compute resources such as host availability, status, and open alerts in their IaaS Infrastructure on the dashboard. The dashboard was being used by senior management to help identify issues by specific cloud service providers.
Besides alerting based on static thresholds, Oracle Management Cloud leverages machine learning algorithms to detect anomaly and can send early warning alerts to you. This helps you be proactive at catching problems early before things get outof hand and cause significant damage. We will learn how to create alert rules later in the article.
The dashboard also included compute host distribution by cloud service provider and compute hosts by CPU vs Memory consumption. By having one consolidated view of their multi-cloud estate, it helped the customer make critical business decision about their IT. Another important discovered was that it helped them to optimize the resources they had and lower costs by switching off any VMs that were not being used.
The cloud VMs are generally accessed through keys which are highly secure. In an ideal environment, there shouldn’t be any failed login attempts. But if there are, Oracle Management Cloud can automatically report these attempts to you, which is an indication of a potential hacking.
Oracle Management Cloud supports native monitoring of your system resources and cloud native applications by using agents. The following architecture diagram shows different components and agents that can be optionally deployed to collect telemetry and log data. These agents can be used to run remediation jobs that help you to streamline your IT operations. For more information on these components, review the Oracle Management Cloud documentation.
Oracle Management Cloud provides a unified platform by allowing you to ingest operational metrics and logs from a variety of different sources. These sources include (but not limited to):
Cloud agents running on compute hosts monitoring the host and underlying applications, databases, container management solutions such as Kubernetes, Docker Swarm,
Application Performance Monitoring agents running on your application servers and containers that brings end-user and application metrics,
Native collection from public clouds using their APIs such as Oracle Cloud, AWS CloudWatch and Microsoft Azure,
Monitoring solutions such as Oracle Enterprise Manager, Microsoft SCOM, VMWare venter,
Custom collection of metrics and logs using REST API.
Let’s dive deeper in to how Oracle Management Cloud provides native monitoring of third-party cloud providers by looking Amazon Web Services as an example.
You can start monitoring resources in Amazon Web Service (AWS) or Microsoft Azure by adding a Cloud Discovery profile as below.
Oracle Management Cloud executes AWS functions to monitor AWS Entities. AWS users must have the permissions as documented here for discovery and monitoring of AWS services.
As depicted in the following figure, Oracle Management Cloud discovery UI allows you to select the region and services you want to monitor. Alternatively, you can use Oracle Management Cloud’s command line utility or REST API to add a cloud discovery profile.
As you see above, you can filter the entities that you want to monitor by either your regions or services. You have to provide your credentials and access key. Note, that you must update these credentials if they change in future.
After you add the Cloud service profile, Oracle Management Cloud will discover all services in AWS and will start to monitor your resources as shown below in the screen shot.
Oracle Management Cloud will automatically discover any new entities that have been added.
Oracle Management Cloud also imports all tags that you defined in your cloud service provider and you can use the same tag to perform filtering or to create a dynamic group.
You can use the entity card in-context to get more information about a specific entity. The entity card will show some key performance metrics for the entity, its membership information and any tags associated with it. It also helps you navigate to view relevant information such as, monitoring metrics or log data that you need to diagnose issues.
Let’s use the entity card to navigate to the home page for RDS instance. As you can see, Oracle Management Cloud provides all key performance metrics for this RDS instance. You can Oracle Management Cloud’s machine learning capabilities to find anomalies such as a high CPU usage and Write IOPS. From the screen shot below you can see that these two items seem to be correlated.
Oracle Management Cloud enables you to create availability alerts automatically. For other metrics, you can create alert rule based on either static threshold or anomaly alert based on automatic baselining. In the figure below, you can see here that I set an anomaly alert for the AWS RDS Instance. If there are any issues, then Oracle Management Cloud will send an alert once the condition has been met. This is a great way to become more proactive at preventing mission-critical services from being disrupted.
Oracle Management Cloud allows you to either send a notification via email or push notification to the Oracle Management Cloud Mobile App. It also provides integration with third-party ticketing system such as, ServiceNow and PagerDuty.
Oracle Management Cloud now offers Oracle Orchestration Service which lets you automate scheduling and tracking of workflows and jobs where you can execute scripts on hosts or invoke web services. It’s ideal for those looking to schedule recurring maintenance tasks across their on-premises and cloud infrastructure as well as developers looking to integrate a scheduler into their applications development processes.
We give you the ability to orchestrate and schedule workflows that can executed at any given time whether you want to run a configuration check or if you want to do spin-up or shut-down of a certain service or resource. You can tie this orchestration into your standard alerts to auto control your application. You can even take care of the common problems with its auto remediation capabilities. The following figure shows an auto-remediation job that starts a new instance.
You can install Oracle Management Cloud agents on your compute node for deep monitoring of the host and underlying applications and databases. In the following figure, we see a VM running in AWS EC2.
Oracle Management Cloud allows you to view all logs for your compute node and associated entities to provide real-time insight and rapidly isolate of infrastructure issues.
Additionally, you can leverage Oracle Management Cloud’s unique machine learning capabilities such as log clustering to link and diagnose issues faster. The following figure shows how Oracle Management Cloud’s log clustering capabilities allow you to find “needle in the haystack” by reducing large number of log entries down to a simply few patterns.
IT organizations are challenged in gaining unified visibility in to the health and performance of applications deployed in hybrid cloud environments. In this article, I talked about how Oracle Management Cloud provides unified visibility in your hybrid cloud infrastructure so that you can quickly understand what’s going on and troubleshoot issues faster. I covered how Oracle Management Cloud enables you monitor your workload running a multi-cloud scenario that included AWS.
About the Author:
Debu Panda has authored numerous books, articles on cloud, enterprise Java and databases. He is the lead author of the EJB 3 in Action (Manning Publications) and Middleware Management (Packt).
Watch the full Oracle Management Cloud keynote and customer panel from Oracle OpenWorld.
Try Oracle Management Cloud! Get the EasyStart Kit Today.
Learn & Experience Oracle Management Cloud.