X

Oracle AI & Data Science Blog
Learn AI, ML, and data science best practices

  • May 27, 2020

How AI Can Take The Drudgery Out Of Tuning Machine-Learning Models

This post previously appeared on Forbes Oracle BrandVoice

Cloud computing has made it so that artificial intelligence and machine learning are now tools that almost any company can use to find practical answers to difficult business problems. As companies scramble to get a competitive edge using AI and machine learning, they’re learning an important lesson: these aren’t “one strategy fits all” technologies.

“A lot of companies want to apply machine-learning techniques on large amounts of data, but often the projects don’t work in the real world as well as they’d like,” says Nipun Agarwal, vice president of research at Oracle Labs.

That’s in part because many products that incorporate machine-learning capabilities use predetermined algorithms and ways of handling data. In reality, each organization’s project data has specific characteristics that might not fit the predetermined machine learning software’s configuration.

“Each company will have different thresholds for their data and their results,” Agarwal says. “One size does not fit all. That’s the problem we are solving with AutoML.”

AutoML takes over the often labor-intensive job of choosing and tuning machine-learning models. While there is no substitute for skillful problem definition and data preparation in the machine-learning process, AutoML takes on many of the repetitive tasks, reducing the need to understand algorithm parameters and shortening the compute time needed to produce better models.

 

Machine Learning Isn’t Magic

Let’s back up a step to look at how AutoML fits into the larger machine-learning picture. Machine learning is a type of artificial intelligence: A model is created and trained on a set of previously gathered data—often with known outcomes. Then, when seeing new data with unknown outcomes, the model can be used to make predictions using that data. For example, a model might be used to segment customers, spot anomalies, or forecast sales.

But machine learning isn’t magic. Getting accurate results depends on a data scientist who can study the input data, understand the desired output to solve a business problem, and then choose from dozens of mathematical algorithms, tune those algorithm’s parameters (called hyperparameters), and evaluate the resulting models.

What if the results aren’t sufficiently accurate? The data scientist may adjust the algorithm’s tuning parameters again and again until the machine-learning model produces the desired results, Agarwal explains. If the results aren’t acceptable, the data scientist might even start the process again, using an entirely different ML algorithm to see if it can better model the training data.

That’s where AutoML comes in. AutoML uses machine learning to choose and optimize the machine-learning pipeline–a technique called metalearning.

 

4 Tough Machine Learning Problems

Here are four big reasons why it can be challenging today to choose and tune a machine-learning algorithm, and why technology like AutoML can prove useful.

  1. There are many well-known algorithms for machine learning, and it’s not always obvious which algorithm will work best for building real-value prediction, anomaly detection, or classification models for a particular data set. In complex, real-world situations, a data scientist may need weeks or months to choose the right algorithm and refine the model created using that algorithm.
  2. A business problem’s data set might have dozens, hundreds, or even thousands of variables, or predictors, that a model can consider, so it’s not easy to tell which of those data points are significant for making a decision. This process of selecting the most relevant information to include in a data model is called “feature selection.”
  3. There might be too much data, but it’s difficult to know which subset of data to use for training a machine-learning model. In some cases, training against some data variables or predictors can increase training time while actually reducing the accuracy of the ML model. “It’s not easy to achieve significant size reduction without affecting accuracy,” Agarwal says. But with care, it can be done.
  4. Lastly, when tuning how the chosen algorithm works, the process called “hyperparameter tuning” involves lots of trial and error. Complex ML algorithms can have over a dozen configurable parameters, and each of these parameters can have a large impact on model performance.

One of the benefits of AutoML, explains Agarwal, is that very quickly it can make a well-educated guess to select a suitable ML algorithm and effective initial hyperparameters. AutoML can then test the accuracy of training the chosen algorithms with those parameters, make tiny adjustments, and test the results again. AutoML can also automate the creation of small, accurate subsets of data to use for those iterative refinements, yielding excellent results in a fraction of the time.

“Instead of having to test a set of parameters against 10 billion rows of training data, AutoML can test against .01% of that data, without compromising model accuracy,” Agarwal says. “That’s going to be 100 to 1,000 times faster in training time, even on the same dataset.”

Again, machine learning isn’t a one-size-fits-all strategy. Some companies will only use ML that’s embedded inside applications, like Oracle ERP Cloud and Oracle HCM Cloud, which use ML to help people make better finance and HR decisions. But for teams that are managing their own data science, applying machine learning to large datasets, they need the right tools in order to quickly choose, build, and deploy machine-learning models that deliver results.

 

AutoML Tools for Data Scientists

Analysts and data scientists can tap AutoML as just one of the capabilities within the new Oracle Cloud Infrastructure Data Science. The goal of Oracle’s cloud service is to make it easier for data science teams to collaboratively build, train, and deploy machine-learning models so that more projects succeed.

Oracle Cloud Infrastructure Data Science is one of seven services that make up the company’s Oracle Cloud Data Science Platform, focused on improving the effectiveness of data science teams. The use of automation tools like AutoML is one example of how it does that.

One big part of helping teams of analysts and data scientists succeed is helping them use the valuable data they have in their Oracle Databases. Here, too, AutoML functionality will be available in Oracle Database and Oracle Autonomous Database through Oracle Machine Learning for Python. Python is a programming language often used by data scientists solving AI problems.

“Oracle Machine Learning for Python offers scalable machine learning using Oracle Database as a high-performance compute engine,” says Mark Hornick, senior director of data science and machine learning product management at Oracle. “AutoML within Oracle Machine Learning for Python will automate algorithm and feature selection, as well as hyperparameter tuning for the in-database classification and regression algorithms.”

Building upon Oracle Machine Learning for Python, AutoML UI will be a new code-free user interface integrated with Oracle Machine Learning Notebooks on Oracle Autonomous Database that streamlines the model development, deployment, and monitoring process.

These tools increase data scientist productivity and also open machine learning to nonexpert ML users, Hornick says.

Plus, by saving computing time, using AutoML to automate algorithm selection means cost savings. “In cloud environments, where the cost of compute time can be directly quantified, AutoML offers clear advantages over manually choosing and tuning algorithms,” Hornick says.

 

To learn more about data science and ML for your business, visit the Oracle Data Science page, and follow us on Twitter

Be the first to comment

Comments ( 0 )
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.