A recommender system or a recommendation system seeks to predict the "rating" or "preference" a user would give to an item. The system recommends users certain items that they think the user may be interested in, based on what they know about the user, especially when the catalogue of items is very large. Recommender systems are a useful alternative to search algorithms since they help users discover items they might not have found otherwise.
Some of the examples of recommendation engines are:
Recommendations can be:
Suppose we have 5 users and 3 items/movies which users have rated. Recommender system predicts the ratings of the empty cells.
Steps involve gathering known ratings and extrapolate known ratings from unknown ratings.
There are two methods of ratings collection
Implicit ratings: Learn ratings from other user actions. For example purchase implies high rating but we can’t learn low ratings. Examples of implicit data collection include the following:
When building a model from a user's behaviour, a distinction is often made between explicit and implicit forms of data collection.
Content-based filtering approaches utilise a series of discrete characteristics of an item in order to recommend additional items with similar properties. For example,
Collaborative filtering approaches build a model from a user's past behaviour (items previously purchased or selected and/or numerical ratings given to those items) as well as similar decisions made by other users. This model is then used to predict items that the user may have an interest in. Collaborative filtering is based on the assumption that people who agreed in the past will agree in the future, and that they will like similar kinds of items as they liked in the past. The system generates recommendations using only information about rating profiles for different users or items. Collaborative systems locate peer users / items with a rating history similar to the current user or item and generate recommendations using this neighbourhood.
Many algorithms have been used in measuring user similarity or item similarity in recommender systems. For example, the k-nearest neighbour (k-NN) approach and the Pearson Correlation (centered cosine similarity). A particular type of collaborative filtering algorithm uses matrix factorization, a low-rank matrix approximation technique.
There are two models of collaborative filtering
Item-to-item collaborative filtering (people who buy item “i” also buy item “j”) is an algorithm popularised by Amazon’s recommender system.
Can use same similarity metrics and prediction functions as user-user model. Item to item collaborative filtering works much better than user-user Collaborative Filtering. It is easy to explain to others but not as accurate and doesn’t scale well.
We must consider the following biases :
A demographic recommender provides recommendations based on a demographic profile of the user. Recommended products can be produced for different demographic niches, by combining the ratings of users in those niches.
A knowledge-based recommender suggests products based on inferences about a user’s needs and preferences. This knowledge will sometimes contain explicit functional knowledge about how certain product features meet user needs.
Hybrid approaches can be implemented in several ways: by making content-based and collaborative-based predictions separately and then combining them; by adding content-based capabilities to a collaborative-based approach (and vice versa); or by unifying the approaches into one model.
|Works on any item||
Cold start problem: New items have no ratings, new users have no history.
Need enough users in a system to find a match. These systems often require a large amount of existing data on a user in order to make accurate recommendations.
No feature selection is needed for complex systems like images, movies, music etc.
It does not rely on machine analyzable content and therefore it is capable of accurately recommending complex items such as movies without requiring an "understanding" of the item itself.
|Sparsity: If the matrix is sparse, it is hard to find users who rate the same items. For example, the number of items sold on major e-commerce sites is extremely large. The most active users would only have rated a small subset of the overall database. Thus, even the most popular items have very few ratings.|
|First rater: Can not recommend an unrated item, new items.|
|Popularity bias: This system tends to recommend popular items|
|Scalability: In many of the environments in which these systems make recommendations, there are millions of users and products. Thus, a large amount of computation power is often necessary to calculate recommendations.|
|No need for other user's data.||Finding appropriate features is hard in complex systems like images, movies, and music.|
|Able to make recommendations for users with unique tastes.||Overspecialisation.|
|Able to recommend new and unpopular items (no first rater problem).||Never recommends outside user's content profile.|
|We know why a user is being recommended that item.||People may have varied interests.|
|Unable to exploit quality judgements of other users.|
|Cold start problem (what to do with new users with few ratings?) for new users as there is no user profile for new users.|
|Knowledge-based||Knowledge engineering bottleneck|
|Hybrid||Provides more accurate recommendations than pure approaches.|
|Used to overcome some of the common problems in recommender systems such as cold start and the sparsity problem.|
Evaluation is important in assessing the effectiveness of recommendation algorithms. The commonly used metrics are the mean squared error and root mean squared error or use "prediction at top k". The information retrieval metrics such as precision and recall or DCG are useful to assess the quality of a recommendation method. Recently, diversity, novelty, and coverage are also considered as important aspects in evaluation.
This blog is also available here.