From detecting skin cancer to sorting corn cobbs to predicting early equipment maintenance, machine learning has granted computer systems entirely new abilities. Algorithms are the methods used to extract patterns from data for the purpose of granting computers the powers to predict and draw inferences. It will be interesting to learn how machine learning really works under the hood. Let's walk through a few examples and use it as an excuse to talk about the process of getting answers from your data using machine learning.
Here are top 10 machine learning algorithms that everyone involved in Data Science, Machine Learning, and AI should know about.
Before we go further it is worth explaining the Taxonomy. Machine learning algorithms are divided into three broad categories:
Supervised learning is the task of inferring a function from the training data. The training data consists of a set of observations together with its outcome. This is used when you have labeled data sets available to train e.g. a set of medical images of human cells/organs that are labeled as malignant or benign.
Supervised learning can be further subdivided into:
Regression analysis is used to predict numerical values. The top regression algorithms are:
Linear regression model relationships between observation and outcome using a straight-line. Root mean squared error and gradient descent is used to fit the best possible line. The methodology provides insights into the factors that have a greater influence on the outcome, for example, the color of an automobile may not have a strong correlation to its chances of breaking down, but the make/model may have a much stronger correlation.
Polynomial regression it is a form of regression analysis in which the relationship between the observation and the outcome is modeled as an nth degree polynomial, the method is more reliable when the curve is built on a large number of observations that are distributed in a curve or a series of humps, and not linear.
Classification analysis is a series of techniques used to predict categorical values, i.e. assign data points to categories e.g. Spam Email vs Non-Spam Email, or Red vs Blue vs Green objects. The top classification algorithms are:
Logistic regression is a misleading name even though the name suggests regression but in reality, it is a classification technique. It is used to estimate the probability of a binary (1 or 0) response e.g. malignant or benign. It can be generalized to predict more than two categorical values also e.g. is the object an animal, human, or car.
K nearest neighbors is a classification technique where an object is classified by a majority vote. Suppose you are trying to classify the image of a flower as either Sunflower or Rose, and if K is chosen as 3, then 2 or all 3 of the 3 nearest classified neighbors should belong to the same flower class for the test sample image to be assigned that flower class. Nearness is measured for each dimension that is used for classification, for example, color and how close the color of the test sample to the color of other pre-classified flower images. It is neighbors the observation is assigned to the class which is most common among its K nearest neighbors. The best choice of K depends upon the data generally. The larger value of K reduces the effect of noise on the classification number.
Decision trees is a decision support tool that uses a tree-like model of decisions. The possible consequences decision trees aim to create is a model that predicts by learning simple decision rules from the training data.
Unsupervised learning is a set of algorithms used to draw inferences from data sets consisting of input data without using the outcome. The most common unsupervised learning method is cluster analysis which is used for exploratory data analysis to find hidden patterns or groupings in data. The popular unsupervised learning algorithms are:
K-means clustering aims to partition observations into K clusters, for instance, the item in a supermarket are clustered in categories like butter, cheese, and milk - A group dairy products. K-means algorithm does not necessarily find the most optimal configuration, the k-means algorithm is usually run multiple times to reduce this effect.
Principal Component Analysis
Principal component analysis is a technique for feature extraction when faced with too many features or variables. Say you want to predict the GDP of United States, you have many variables to consider – Inflation, stock data for index funds as well as individual stocks, interest rate, ISM, jobless claims, unemployment rate, and the list goes on. Working with too many variables is problematic for machine learning as there can be risk of overfitting, lack of suitable data for each variable, and degree of correlation of each variable on the outcome. The first principal component has the largest possible variance that accounts for as much of the variability in the data as possible, each succeeding component, in turn, has the highest variance possible under the constraint that it is orthogonal to the preceding component.
Reinforcement learning is different from both supervised and unsupervised learning. The goal in supervised learning is to find the best label based on past history of labeled data, and the goal in unsupervised learning is to assign logical grouping of the data in absence of outcomes or labels. In reinforcement learning, the goal is to reward good behavior, similar to rewarding pets for good behavior in order to reinforce that behavior.
Reinforcement learning solves the difficult problem of correlating immediate actions with the delayed outcomes they create. Like humans, reinforcement learning algorithms sometimes have to contend with delayed gratification to see the outcomes of their actions or decisions made in the past, for example, the rewarding for a win in a game of chess or maximizing the points won in a game of Go with AlphaGo over many moves.
Top reinforcement learning algorithms include Q-Learning, State–Action–Reward–State–Action (SARSA), Deep Q-Network (DQN), and Deep Deterministic Policy Gradient (DDPG). The explanation for these algorithms gets fairly involved and is worthy of its own dedicated blog post in future.
Oracle offers a complete data science and machine learning frameworks, algorithms in its data science platform and also embedded in its SaaS applications and database. Click here to learn more about Oracle’s AI and Machine Learning offerings.
Image Sources/Credits (in order of appearance):