It’s no surprise that tech startups depend on data science. At the Cloud Native Computing Foundation’s KubeCon conference on November 18, 2020, many presenters revealed complicated pipelines created to manage machine learning at companies like Snapchat and Shell. Increasingly, there’s talk of hiring data engineers and using MLOps (machine learning operations techniques similar to DevOps) to keep those insights humming in the cloud.
Data science doesn’t have to be this complex, however. As demonstrated by several talks at Oracle’s Make Machine Learning Work for You event, also held on November 18th, it’s easier than ever to discover valuable insights in your data. Here are five road-tested techniques to get started with powerful platforms and open source tools.
The easiest place to start with ML is in the operational data you have right now, says Greg Pavlik, senior vice president of AI for Oracle Cloud Infrastructure: “Think about business problems that you’d like to address. Is your existing data hiding some insight that might help?” One success story he shared in his talk "Machine Learning for Executives" is Forth Smart, which serves millions of banking customers in rural Thailand with self-service kiosks. Forth Smart wanted to understand customer behavior on its kiosk network, which handles more than 2 million transactions a day.
“The basic building block technology Forth Smart is using is the Oracle Autonomous Data Warehouse. They have gotten great results for their analytics workloads: Response time to queries was shortened from three hours to minutes, and they don’t have a DBA on staff.” But beyond that, they used machine learning algorithms to understand predict how upselling offers would fare. The result? A doubling of the ad conversion rate.
“Your existing data, what you use to run your business today, is the most important data you have,” says Pavlik. “It’s almost certain that it could do more for you that it’s currently doing. And machine learning may be the best route to unlock that value.”
The beauty of data science lies in its infinite variety. Instead of going big, try laser focusing on the unique aspects of your business or specialty. That’s what the UK’s National Institute for Health Research did when their Cochlear Research Team teamed up with DSP, a machine learning consultancy. The goal was to analyze the process of fine-tuning cochlear implants in order to reduce the number of in-person visits.
“We looked at statistical importance and the relationships between each of those 22 electrodes and how they related to each other,” says Philip Brown, CTO for DSP. “We could look at a case based on certain electrodes and then predict what the other electrode setting would be over periods of time.”
Watch the video "Improving Patient Experience with Machine Learning"
Data science is an early exemplar of the cloud native world — Hadoop, for example, predates Kubernetes by nearly a decade, and libraries written in Python have become standard. That means that many of the free Python, R, and Java-based tools for machine learning are not only established, they’re a heavenly match for powerful cloud platforms such as Oracle Cloud Infrastructure. Today, you can pair the cutting-edge open source at your fingertips with stellar performance in the cloud.
“Our collaborative workspace is called projects. This is where you can organize your work around a specific business question,” says Elena Sunshine, product manager for data science, ML, and AI at OCI. “Within a project, we have fully managed compute storage and networking with a Jupyter lab interface.” That development environment lets data scientists build and train models using Python and open source, machine learning and deep learning libraries. Using conda environments, Oracle Cloud Infrastructure Data Science provides packages of over 300 of the latest and most popular open source tools such as Tensorflow, Keras, Pytorch, scikit-learn — all the favorites — as well as packages for specific use cases like natural language processing, time series, and computer vision.
Watch the video "Accelerate Data Science for Competitive Advantage"
“You don't need to think about managing this underlying infrastructure, upgrading it, provisioning it, and maintaining it over time. We do that for you. And we support a variety of CPU and GPU shapes, allowing you to customize the amount of resources that you have for your workload,” says Sunshine. “And best part is you pay only for the compute and storage that you consume on demand. There's no standing infrastructure to maintain or pay for, there's no overhead on top of what you use, you only pay for what you use on demand.”
As machine learning has become more pervasive and strategic for enterprises, automation has improved — and amplified fears that data scientists will be replaced. There is good news on both fronts, says Pavlik: “There have been a number of advances that make it easier for less experienced staff to deliver real value with machine learning. One that you should watch for is called AutoML, which automates common repetitive steps like algorithm selection, tuning, and aspects of data preparation.” Does this mean data scientists will be replaced by AutoML? Not likely, he says: “For experienced data scientists, this frees them up to perform higher value, more complex tasks. For the less experienced, it enables them to deliver very good results, relying on automation to make the kind of choices that normally require an expert.”
Indeed, Pavlik notes that the individual at Forth Smart who built their first model (using Oracle Machine Learning, not AutoML, he clarifies) was not, at the time, a data scientist. However, “in the long term it’s clear that all organizations need to build up that data science capability and that means hiring or training data scientists. But to go after low hanging fruit, to do a proof of concept on your initial data, that’s quite possible with your existing staff.” The key point? Your existing team may be able to show the first successes with machine learning.
Data science starts, not surprisingly, with data. That’s why it’s critical to make sure it’s collected and maintained appropriately. “For people who are looking at undertaking machine learning on data, especially clinical data, security and compliance are obviously going to be key,” says Brown. “That was embedded in the Oracle platform.”
Another great insight the DSP CTO had is how the democratization of technology changes the game: “I can go into my Oracle Cloud portal. I can spin up a data science service within minutes, I can provide my data scientist access. He can then work on that, knowing that the data is secure and simple.”
For any data project today, people want results in hours, not months, Brown says: “If we have to go, it's going to take us six months to build a platform. We're going to have to think about data for another two months and the security of that. The model will just not work like that anymore. So the fact that we can give a data scientist — not a computer scientist — the platform within an hour, the R code is being written, the data is being manipulated — that is a key benefit.”
Try AutoML, one of many features in the new Oracle Cloud Infrastructure Data Science.