Machine learning is showing up everywhere. The near-ubiquitous help from ‘Siri’ or ‘Alexa’ or ‘hey Google’ allows us to make queries with our voice for example. Machine learning is driving innovation in every industry, including healthcare, where machine learning helped identify 52%1 of breast cancer diagnoses up to a year ahead of being actually diagnosed. And automotive forecasts predict fully autonomous vehicles on the road in 2022. This begs so many questions, from how will we spend our newly available time to how will we own vehicles of the future.
What's on your IT management machine learning wish list? Do any of these make your priorities?
Monitor my apps without all the false positives
Most IT managers are familiar with alert fatigue. How many of you have created email filters to remove alerts from your inbox? It’s a laudable goal for machine learning since it’s designed to learn from examining metric (time series) data to establish a dynamic normalized baseline. Alerts are raised based on anomalies with this baseline, rather than through static thresholds.
Troubleshoot ten times faster
Unplanned downtime causes havoc for virtually every company. No matter how much you prepare, problems will occur, and the challenges continue to worsen with the growing number of dynamic, more complex environments with applications running across multi-cloud and on-premises environments. Problem-solving can require hours of your best people in war rooms, each with their own tools, trying to identify anomalies, and correlate issues. On the other hand, topology-aware log exploration makes it easy to troubleshoot problems. You can utilize machine learning techniques like cluster to find the outliers or trends. You can use the link feature to understand latency issues across various components or steps or tiers in a given transaction. Using link along with cluster, for example, will find the outliers among all steps or components that together result in unusually long latencies and often point at steps or components which cause the slowdown or problem.
Maximize utilization and capacity of my resources, both cloud, and on-premises
This is an age-old challenge, often requiring valuable team members to go from system to system, manually looking for utilization information. Typically this means only seeing a slice of time and taking many hours to get an analysis together. Many in IT are taking advantage of the ability to scale up with the cloud, which fixes half of the problem, and would appreciate saving costs by scaling down when it's not needed. Machine learning is a perfect tool to take massive amounts of usage data to predict utilization trends, identify largest users of resources, or fastest growing applications and it can be done in real-time with zero data collection effort, then scale up or down, automatically as needed to save IT resource expense, whether on-prem or in the cloud.
Analyze all my data not just some of it
This is a common problem of the previous generation IT tools. It’s expensive and or difficult to store and query larger amounts of data, especially when you’re keeping it on-premise, and have to purchase, and install the storage and compute systems yourself. The challenge is even more difficult with the proliferation and growth of data being generated. So much compute and storage capacity is needed it ‘s been prohibitive in the past. You get a short window of time or pieces of application or infrastructure data and piece information together manually. These problems are well suited for big data and machine learning-based solutions where the more data, the better, and massive scaling up and scaling down in the cloud makes it all possible.
Give me complete visibility without needing many tools
When applications were relatively stable and monolithic, it was common to have one set of tools to monitor and manage your applications, another for the mid-tier, and more for your database, networks, and storage. With the advent of the cloud, you can just about double it for each cloud vendor. And then there is the rapid pace of change with mobile and online applications and new faster delivery models like containers and serverless change the game. The silo-specific tools of the past just don’t provide the visibility you want, into an application stack or across the DevOps spectrum. You want something that can dynamically and even autonomously adjust as your applications and environments change, something built using machine learning.
Photo credit: Quora and Deep Networks
After delivering IT operations management tooling for decades, and listening to 10s of 1,000s of users and executives, Oracle designed a new system to do just that. Based on the demands of IT professionals, some of the design tenets include:
A new chapter in Automation: Autonomous Management
Oracle Management Cloud with embedded automation capabilities, enables IT operations to reduce risks, and increase DevOps agility. An integral part of Oracle’s adaptive, intelligent security and systems management portfolio, Oracle Management Cloud provides customers with a complete and integrated solution for managing and securing hybrid IT environments. Customers and partners around the world are already accelerating performance with Oracle Management Cloud, including Accenture, Amis, Astute Business Solutions, Betacom, Bias, Compasso, Fors, Future Robots, Infosys, Kapstone, Mythics, OneGlobe, Promata, Tech Democracy.
Image: Oracle Management Cloud is a Unified SaaS Platform
Oracle Management Cloud leverages a unified data platform to ingest the full breadth of operational data shown on the left, both structured metrics and unstructured log data, and from global threat feeds and access identity to user experience metrics, both real users and synthetic including transaction latency, browser and user device to server side data from web app, and database transaction metrics, down to platform metrics and logs, from containers, virtual machines and OSes to compute and storage data. Once unified into a single dataset in the cloud, a high-powered big data platform can crank out more complete insights from machine learning algorithms. Unlike other options where only part of the dataset is available, the unified dataset enables more complete insights with less human intervention required.
There are other advantages to having the entire dataset. Oracle Management Cloud data can now be viewed from any number of points of view, for example for monitoring apps from top-level key applications in real time. You can also view the performance of each component of the full IT stack or you can view across the estate of infrastructure or databases for example, from a point in time, to multiple months for capacity planning or cloud utilization. Arguably, the most valuable capability is using all the data, along with security feeds to monitor in real time and also over time to look for threat vectors systematically and rapidly. Finally, with these rapid insights, you also have the option of using automation to trigger prevention steps before issues impact customers or apply corrective actions automatically. So, for example, with all this data unified together, you can analyze security vulnerabilities in real-time, check compliance and even automate enforcement of compliance rules automatically based on your needs.
It’s a SaaS product, which means you use it with no maintenance required. You are up and running nearly instantly, and It’s designed for IT operations and DevOps, with pre-built ML capabilities that answer IT questions. You don’t have to develop your inner-data scientist to use it.
Comprehensive, Intelligent Management Platform
One aspect of a unified solution is the opportunity to design for IT workflows. Like monitoring apps or databases or infrastructure, then when you see an issue, troubleshoot it. With out-of-the-box machine learning algorithms designed specifically for IT, you get the benefit of the system highlighting normal dynamically, and finding a statistically significant change in user performance metrics, and then, within context of the issue identified, OMC can present the relevant log records filtered specifically to the issue and area you've been monitoring with no delay, no human intervention, no switching to another system. That's the power of unified systems management.
Even though IT organizations have been investing in automation for several decades, benefits haven’t kept up with the proliferation of the tools that claim to bring yet new levels of automation. One major challenge is too many tools. Now they are delivered in the cloud too. The issue really is the complexity that results when you have to make decisions based on different tools that weren’t designed to work together. Oracle Management Cloud provides a unified solution that not only brings your data together, it also provides solutions across the entire spectrum of IT operations management as depicted in the diagram. Furthermore, these tools work across heterogeneous environments and hybrid cloud so you can get end-to-end visibility and comprehensive insights about your IT environment as well as automate response actions.
Zero-Effort Operational Insights
At its core, Oracle Management Cloud is built to answer IT questions, and support IT workflows.
Yes, it's built on a big data platform and it applies machine learning to the entire data set. As a user of the system, you don’t need to worry about data collection, aggregation, normalization and the associated big data analytics as well as the machine learning algorithms. The system continuously examines the environment and constantly refines thresholds. You will not need to define thresholds for your alerts. They are automatically set for you. You will be presented with IT ready dashboards to answer your most common questions. to accomplish this, Oracle Management Cloud is prebuilt to employ key machine learning techniques out of the box:
Anomaly detection; flags unusual resource usage and non-standard application behavior
Clustering; groups alerts or data on related symptoms
Correlation; discovers dependencies, allowing stack trace views
Forecasting; Forecasts outages before they happen; enables easier capacity and resource planning
Automated Preventative and Corrective Actions
There are siloed orchestration solutions available, but with insights in-hand, a complete solution builds in capabilities to orchestrate responses and remediation. What if your system identified an application that would reach maximum utilization in the next day? With an automated response, you can spin up additional resources. What if the system detected a security risk? The system can be set to automatically turn off permissions to specific users for example without requiring a separate orchestration tool.
What are others saying?
The results customers are seeing cover the gamut from saving costs to reducing downtime to making what previously seemed impossible, just work. For example, a company is using the platform for managing their robots that interact with humans.
Photo credit: FutureRobot Attendees at this year’s Winter Olympics have some fun with FutureRobot’s FURo-D.
“To ensure a seamless interaction, we must collect and analyze massive amounts of human-robot interaction data in order to train our robots to respond appropriately, refining with social AI,” said Youngki Hwang, chief technology officer, FutureRobot. “We use Oracle Management Cloud to collect that data as well, noting factors such as the number of people approaching a robot and interacting with it, users’ emotional data based on facial recognition technology, the conversations people have based on voice recognition technology, and statistics on the content people access.” Future Robots is also using Oracle Management Cloud to collect operating data on each of its robots, such as location, sensors, CPU and memory use, battery level, and network status - alerting the company to any problems.
“Compasso is managing the infrastructure supporting over a dozen e-commerce environments with Oracle Management Cloud,” said Ernie Molinaro, Divisional President, US, UK, and Canada, Compasso. “Leveraging its autonomous management capabilities, including unified data collection, machine learning analytics, and forecasting, Oracle Management Cloud has given Compasso a 50 percent increase in productivity and has enabled us to automatically identify and resolve anomalies that point to potential areas of concern.”
“We are supporting our customers in their transformation to DevOps and faster time-to-market,” said Lucas Jellema, Chief Technology Officer of Amis,. “Oracle Management Cloud’s autonomous management capabilities help us avoid costly outages and proactively resolve issues so that we can maximize developer resources while achieving better performance.”
Are You Ready To Try It?
If an autonomous management cloud sounds like something you’ve been wishing for in your IT environment, it is easy (and free) to give it a try. The Easystart kit is designed to give you a hands-on first-hand experience within minutes. You can start with a simple use case for troubleshooting your database environments and then expand to many more use cases.
Oracle Management Cloud is available for trial at oracle.com/easystart
1 Forbes 9.30.16 What Are The Top Use Cases For Machine Learning and AI