Overview of Traditional Machine Learning Techniques

January 24, 2019 | 5 minute read
Text Size 100%:

While there’s not a day that goes by without machine learning, deep learning, and artificial intelligence mentioned in the news, these fields have been around for decades. However, if you move past the self-driving cars and digital assistants, you’ll discover that most of what’s being applied today is traditional.

In this context, traditional means the things that we have been doing for years and is often the foundation for more cutting edge machine learning. Below, we will go over four types of machine learning techniques that is considered as traditional machine learning. 

Clustering

Clustering is a technique that is used to find natural groupings in data based on similarities like behavior and demographics. This may sound like just clever groupings done with an analytics tool or using SQL, but it goes way beyond that. 

There are many clustering algorithms for doing clustering, but k-means clustering may be the most common. K-means is a so-called centroid model, meaning each cluster is represented by a center point and the algorithm tries to find the best coordinates for the centroids using an iterative approach. Basically, the coordinates are your attributes, so the distance is the difference between the value of the data and the centroid—a distance that we want as small as possible. Since we are just grouping the data and not trying to predict anything, this is called unsupervised learning.

To illustrate this point, let’s say that you own a company that sells coffee beans online. Since all of the data is being collected, you know where your customers live, what beans they buy, when they make purchases, and how often they make them. Through clustering, you may discover a segment of customers who make their purchases on a regular basis (once a quarter, every half year, or once a year), buy the same beans in larger quantities, and live in an area primarily made up of homeowners. You may also discover another segment of customers who make frequent purchases with different beans every time in smaller quantities, and live in an area primarily made up of young professionals. Now, I’ll call the first segment “consistent” customers since they know what they like, and the second segment “adventurous” customers since they are more adventurous with their coffee. 

Based on these machine learning insights, we can make a variety of decisions. For example, we can stop sending offerings to the “consistent” customers about new coffee beans because they are not interested in changing and instead offer them a subscription, along with a discount if they sign up for regular deliveries and pay in advance. For the “adventurous” customers, we can work with offerings around new beans, maybe having them be the first to get a new bean before it is available to everyone else and by that increase their loyalty.

One challenge with clustering or segmentation is that in order to decide what segment a customer belongs to, they need to be customers for a while so we can get their buying behaviour first. This means that for new customers we do not know what segment they belong to until they have been with us for a while. Fortunately, we can apply a machine learning technique called classification to predict the segment.

Classification

In the above scenario, we would take all the customers that we have mapped into segments and remove data around buying behavior based on previous purchases. We will then use the segment as our target variable and apply a machine learning algorithm on the data to learn the pattern that can be used to predict the segment. There are many algorithms that can be used in order to learn, but on a high level they behave rather similarly. 

Before training begins, we divide the data into two parts—training and test. Training is used for learning, and test is used for verifying the learnings. The algorithm iterates through the train data to identify the pattern or rules that can explain the outcome. It tests the pattern/rule-based outcome against the known outcome with the goal of minimizing the difference, or error, between the two. Once it cannot get a smaller error, it will stop and produce the model. Then, we use the model on the test data. If it performs just as well, we can say that the model is capable of generalizing and will most likely perform well on new unseen data. If it doesn’t perform well, we need to go back and adjust the settings on the algorithm, add more data, or create additional attributes for the data. Once we are satisfied with our model, we can start applying it on our new customers. 

Regression

Regression is also prediction but instead of predicting a class we are predicting a number. Using our case, we could use this technique for predicting our customers’ lifetime value, the best discount for a customer, or demand. 

It works the same way as classification in that we divide data into two parts and use the first for learning and the second for verifying the learning. The end goal is also the same; we want to minimize the difference between the predicted value and the actual. Since we are predicting an infinitive number, we will always get a difference and so we use confidence intervals to show the reliability of the estimate (for example, that there is a 95% confidence that the number is in a specific interval).

Market Basket Analysis

Market Basket Analysis is a technique within the detection area that is used to better understand what products customers are buying together. It uses association rule mining to find associations or relationships among data items.

The output is often a number of association rules. For example: 

Milk => Bread [support = 2%, confidence = 60%]

This rule tells us that people who buy milk also buy bread, while the support and confidence measures tell us that 2% of all our baskets contains both milk and bread and 60% of the baskets that has milk also have bread.

Association rule mining is done in two steps:

  1. Find all frequent item sets and products that are frequently bought together.

  2. Using those item sets to generate rules, describe the different combinations that exists and how often they exist.

We can then use the most interesting rules for understanding what products drive the sale for others. For instance, using the earlier coffee bean scenario, if we wanted to promote a specific coffee bean, we could use association rule mining to see what other products were bought together with the bean. Then, we can specifically target the segment of customers who bought those beans but not the specific bean we want to promote.

Association rule mining can also be used to do the “other customers also bought … “ offers we often see on online shops.

These four techniques just scratch the surface of what you can do with traditional machine learning. As you can see, machine learning can be used for so much more than recognizing cats and dogs on pictures and there are many cases in which it can provide immediate business value.

Mats Stellwall

Mats is working at Oracle as a Data Scientist in the EMEA Analytics & Big Data SWAT Team where he supports customers across EMEA in how to use Oracle solutions within the data science domain. He has been working with analytics and information centric systems since mid nineties in various roles.


Previous Post

Using the Artificial Neural Network for Credit Risk Management

Manish Bhoge | 7 min read

Next Post


Database as the Killer App of the Cloud and Machine Learning Era

Justin Charness | 1 min read