The rewards of machine learning can be compelling, and it may make you want to get started, now. At the same time, however, you'll want to consider machine learning challenges before you start your own project.
This article isn’t meant to scare you away; rather, it’s meant to ensure you’re prepared and that you’re carefully thinking about what you’ll need to consider before you get started.
We spoke with Brian MacDonald, Data Scientist on Oracle’s Information Management Platform Team, about the pitfalls he’s seen and what companies can do to avoid them.
These machine learning challenges include:
The biggest difficulty, of course, is the skills gap that lies with using machine learning in a big data environment. There’s a certain community of people who think that big data makes life beautiful and it will be easy to get started.
The biggest challenge you’re going to find is discovering the right people. There is a big demand for people who are skilled in machine learning and a small pool to choose from. But as we described in our article about machine learning success, having executive support is key to this. If you have executive support, you’re also going to have the funding to find and recruit those valuable people.
Here’s something to think about. If you’re in a situation where you’re very sensitive to cost because skilled data scientists are expensive, then you probably don’t have a big enough business problem to make machine learning worth doing.
Let’s say a skilled data scientist costs your company $300-400,000 (including all benefits and incentives). If that person can’t help you solve a problem that’s worth at least a million a year, then you probably don’t need that person. Right?
On the other hand, if you truly believe this person (or team of people) can help you solve a problem in the tens of millions, then what are you waiting for?
It is difficult to find people. But if it’s truly important to your company, you can find them.
Here’s another issue to think about: the tools and software. While there are of course tools that will help, you’ll rarely be able to find the exact, perfect machine learning tools you need that are ready to go for you, right out of the box. You’ll have to think about the tooling you’re going to use.
Python, R, SQL, TensorFlow? And if you use those, how will they work with your data lake? And how will you handle the setup and configuration that can create challenges? Think through the details before you get started and ensure you have enough funding.
Machine learning is a messy process. And just having a big data platform doesn’t automatically mean it will be easier. In fact, it might make it messier, because you’ll have more data. That data enables you to do more, but it also means more data prep that has to be done.
You’ll have to think holistically about how you’re going to approach the problem. Here are some questions to think through:
If you don’t already have a good BI practice or an analytics practice and if you’re not using data in all the ways you can think of already—well, jumping over to machine learning is really going to be a challenge. Already having data-driven decision making is absolutely critical. If you don’t have that, we recommend having that in place before you get started with machine learning.
If you do decide to start, here are some other considerations. Think about them carefully before you get started:
In the machine learning world, innovation is coming quickly which means rapid change. What’s good today may not be so good tomorrow, and you can’t always rely on the software because it’s a more volatile space. You might get more issues with different versions and conflicts.
The Sheer Volume of Data
With machine learning, you’ll have to deal with data—lots and lots of different kinds of data. Understanding whether you use all of it, the processes, whether to sample, etc.—all of that can be a challenge, especially when you’re getting deeper into your data and dealing with data movement.
Ensure you’re up to facing that challenge, and have your plan in place.
What’s the biggest issue most data scientists face? It’s operationalizing the data.
Let’s say you’ve built a model and it can predict factors that lead to churn. How do you get that model out to the people who can affect those numbers? How can you get it to the CRM or mobile app?
If you have a model that predicts equipment failure, how can you get it to the operator in time to prevent that failure? There are many challenges with taking a model and making it actionable. And it’s probably the biggest technical challenge that exists for data scientists these days.
You can build the most beautiful models in the world. But will your c-suite truly care if it’s not actually making an impact on the company’s bottom line? You might think your part of the bargain is just to make the data available. But it’s not. You have to make sure your data is actually going to be used. Gaining executive support is hugely helpful for this.
So machine learning isn’t really easy. But it can accomplish big things. To inspire you and remind you of what’s possible, we’re sharing a real-life customer example and their machine learning project.
This company is one of the largest providers of wireless voice and data communications services in the United States.
In order to accomplish this, the company purchased a wide variety of Oracle big data products including Oracle Golden Gate for Big Data, which is part of Oracle Data Integration Platform Cloud.
Addressing the skills gap, managing the data, and operationalizing it are challenges that need to be dealt with – but they can be. And the results can be incredible. Read more on tips on success with machine learning for more information.
And if you'd like to try building a data lake and use machine learning on the data, Oracle offers a free trial. Register today to see what you can do.