By Mike Faden
If you’ve ever received a customized offer while web surfing, or had a credit card unexpectedly declined because you were traveling, you may have been on the receiving end of machine learning. Today, machine learning is already used in a wide range of applications—customer segmentation and fraud detection are two of them—and it’s in the process of being integrated into many more.
Machine learning is based on algorithms that can learn from data without rules-based programming. One reason it’s become so valuable is because companies are dealing with vast and rapidly growing volumes of data. Some estimates suggest that the amount of data globally doubles every two years.
“Trying to find the value in that data, getting insights into the data, can be very challenging,” says Aali Masood, senior director of big data marketing at Oracle. “This is the appeal of machine learning—you can let the algorithms figure it out.” Machine learning may find things that humans would miss; furthermore, the more data that is fed to the algorithms, the better they get at identifying trends and patterns.
Common Machine Learning Techniques
Machine learning isn’t a single approach—it’s a collection of different techniques used for solving different problems, says Peter Jeffcock, director of big data product marketing at Oracle. Multiple techniques may be combined in some cases. Some of the best-known are:
- Regression is used to predict numbers, such as optimum pricing, Jeffcock says. For example, it might be used to predict home prices based on the number of rooms and the size of the yard. Other uses include estimating customer lifetime value. Interestingly, Jeffcock notes, even though machine learning is one of today’s hottest buzzwords, not all the techniques are new. Regression concepts can be traced back to British statistician Sir Francis Galton, Charles Darwin’s half-cousin, as early as 1875.
- Classification is a “supervised” learning technique in which the system learns to identify members of a known class based on their characteristics. Jeffcock’s analogy: If you tell a system that a specific fruit is a banana, it can start to get an idea of the characteristics that make something a banana. Then it will be able to recognize other bananas. Uses include identifying likely high-value customers—or customers likely to “churn”—and fraud detection.
- Clustering, in contrast, is an “unsupervised” learning technique in which the system recognizes similarities between items and groups them as members of an inferred class—even though it doesn’t know what the class is. To return to the banana analogy: A system might recognize that thin, curved, and yellow items are all members of the same class—even though it hasn’t been told that the class is called bananas. Similarly, it might cluster together round, hard, green things even though it doesn’t know they are apples. Common uses include customer segmentation and credit risk analysis, Jeffcock says.
- Anomaly detection is used to look for differences rather than similarities. As its name suggests, it can be used to find things that are unusual: “A needle in the haystack or a black sheep,” Jeffcock says. It may find anomalies that might otherwise be hard to spot in masses of data. Jeffcock notes an example of the healthcare company that found one dentist billing for 85 fillings per hour—which works out to less than a minute per filling—suggesting investigation for potential fraud.
- Association rules are used to identify like-minded people for purposes such as market basket analysis. Companies use it to make offers: it’s often the technique behind the “you might be also interested in…” suggestions on ecommerce websites, Jeffcock says. It is used to determine rules, such as people who buy spaghetti, meatballs, and tomato sauce also tend to buy Parmesan cheese—so if you put the first three items in your online shopping cart you may also be offered the fourth. The technique can also used for root cause analysis and to find “harbingers of failure:” consumers whose tastes are so far out of the mainstream that if they like a product, it can actually indicate that the product is likely to fail.
- Time series analysis can be used to analyze data over time. Uses include forecasting household energy consumption based on past usage, or predicting seasonally adjusted employment data.
- Neural networks are based on the concept of neurons that are somewhat like the neurons in the human brain, and are applied to problems in which they learn somewhat like humans. For example, a neural network might be trained to recognize handwritten digits by being shown examples. Neural networks can become large and multilayered to tackle complex problems.
Companies are applying machine learning in many ways to achieve business benefits, Jeffcock explains. Britain’s National Health Service, for example, has used Oracle’s machine learning to identify hundreds of millions of pounds in savings. Sports-ticketing company StubHub uses machine learning to detect potentially fraudulent transactions.
Organized fraudsters continually look for new ways to evade detection—so it’s important that fraud-detection models can quickly adapt. “What works today may not work tomorrow,” Jeffcock says.
Mike Faden is a principal at Content Marketing Partners.