But first, let’s go back to the first Olympic games in modern times, held in Athens in April of 1896. This photograph is from the men’s 100m final. There’s only one runner in the 4-point stance, crouched down with hands on the ground, right behind the start line. That was Tom Burke, and he won—even though he was actually more of a distance runner.
Today every sprinter uses that starting stance. But back then it was new information and only a few athletes were exploiting that data.
Exploiting data is what we’re going to talk about today. But instead of the 4-point stance and gold medals, we’ll be discussing machine learning, data lakes, and how they can help you exploit data about your business, your customers, your partners, and anything else you need to get that competitive edge. Essentially, we’re here today to say—what’s the extra information you need to gain an edge like Tom Burke? And then, how can you make use of it?
Machine learning is trendy right now, but why should it matter to you? McKinsey Global Institute shows that proactive machine learning adopters simply make more profit than their peers who are less proactive.
As responsible data users, we can’t prove causation from this one diagram alone. But there’s enough material out there that says it’s more than just a simple correlation.
We think the cloud could be the best place for your machine learning workloads. We’ll get into this more later.
Here’s one example of the value that machine learning has brought to an organization. The UK’s National Health Service offers health care to all residents of the UK and holders of a valid European insurance card. Their Business Service Authority branch set up a Data Analytics Learning Lab with the goal of getting more value out of their existing data with machine learning. They had a long-term goal of providing ongoing annual savings of one billion pounds per year by:
For a small team, they accomplished a lot. In just a few months, they found confirmed annual savings of well over £561 million, with additional savings waiting to be confirmed and implemented.
There are a few machine learning best practices from this.
But machine learning is about much more than healthcare fraud and improving patient outcomes, as important as those may be. Take a look at these machine learning business use cases.
Machine learning can help you:
And these are just a few examples. Organizations in all industries are putting machine learning to use.
So let’s say you’re sold, and you want to start exploiting your data by using it for machine learning. What else do you need?
The answer is, access to all of your data—lots and lots of data. Data lakes are a great place to store, manage, process, and analyze your data. People often mistake data lakes as just a place to store data, but they’re more than that.
Data lakes were originally built on premise with racks of dedicated hardware, and that has some advantages:
But having a data lake in the cloud offers some different advantages:
While Oracle offers you both on-premise and cloud data lake solutions, we also offer you a third option that we call Cloud at Customer. This is a cloud service where the hardware sits in your data center. Here are the advantages:
With Cloud at Customer, Oracle owns and manages the hardware, but you consume the services just like you would in the public cloud. In many ways, you’re getting the best of both worlds.
To summarize, the trend has been to go from only using relational database technology, to adding big data technology, to adding specialized big data services in the cloud.
All of these technologies are important depending on the problem you’re trying to solve—and we offer all of these technologies. But the trend we’re seeing is that the first generation of big data technologies like Hadoop are giving way to more modern Spark services in the cloud.
Here’s an example of how you can take advantage of these specialized Spark services in the cloud.
In the cloud, object storage becomes the persistent storage repository for the data in your data lake. What is object storage? Object storage is a very simple system for storing any kind of data file with scalability and redundancy. You only pay for the amount of data that you have stored, and you can add or remove data whenever you want. In addition, object storage is very low-cost storage.
Then you spin up Spark clusters tailored to the specific processing work. One cluster can be for real-time processing, and a second can be a data lab for your data scientists and analysts. Another could be for batch jobs. Each cluster can be configured with the needed processing resources and local storage and each can be scaled up or down as needed. The node storage can be either disk or solid state. When you’re not using the cluster, you can turn it off so that you’re not paying for it. That’s the beauty of a cloud-based data lake.
So that’s great. But now let’s look at how you can start using machine learning in the cloud to fulfill your data science goals. The solution pattern below is a simplified depiction of a data lab for data science, and it shows the different services that are used together.
First, data is uploaded to cloud storage (object storage). The data engineer or data scientist can do this with open source tools, Oracle’s free Big Data Connectors, or with the free Oracle Software Appliance that makes object storage look like a disk drive to your other systems.
The stored data is accessed for machine learning in Apache Spark and both raw data and generated data in Spark can be accessed for data visualization. Oracle provides both open source and value-added machine learning capabilities that run on Spark. With a cloud-based data lab, you can start small with just a few CPUs and quickly prove value for a business case.
If you or someone in your organization would like to try this out for yourself, Oracle offers free cloud credits that you can use to run these cloud services. And we also offer step-by-step guides to walk you through the process and show you how to use some of the features.
In this article, we’ve talked a lot about how to exploit your data. But here are the three key ideas:
Together, these are the keys to successfully exploiting your data. With this, you have the power to use your data to find a competitive edge. Don't forget to download your free ebook, "Demystifying Machine Learning." Or, if you're ready to get started, get started today with a guided trial for building your own data lake to try machine learning techniques.
This article features content and writing by Wes Prichard and Peter Jeffcock.