X

Blogs about Deep Learning, Machine Learning, AI, NLP, Security, Oracle Traffic Director,Oracle iPlanet WebServer

Recommendation Systems

A recommender system or a recommendation system seeks to predict the "rating" or "preference" a user would give to an item. The system recommends users certain items that they think the user may be interested in, based on what they know about the user, especially when the catalogue of items is very large. Recommender systems are a useful alternative to search algorithms since they help users discover items they might not have found otherwise.

Some of the examples of recommendation engines are:

  • Amazon and Netflix recommend products based on previous behaviour.
  • Mobile application Peapod uses a recommendation engine to allow users to fill their shopping basket based on previous orders.

Recommendations can be:

  • Editorial hand curated
    • For example: staff picks, home pages of websites
  • Simple aggregates
    • Depends on aggregated activities of other users not on user.
    • For example: "top", "most popular", "most recent" youtube videos.
  • Tailored to individual users like Amazon, Netflix recommendation systems.

Suppose we have 5 users and 3 items/movies which users have rated.  Recommender system predicts the ratings of the empty cells.

User_item_rating.png

Steps involve gathering known ratings and extrapolate known ratings from unknown ratings.

There are two methods of ratings collection

  • Explicit Ratings: Simply ask users to rate items. In this case data we get is excellent but it doesn’t scale. Problem is matrix is sparse i.e. most people don’t rate. Examples of explicit data collection include the following:
    • Asking a user to rate an item on a sliding scale.
    • Asking a user to search.
    • Asking a user to rank a collection of items from favourite to least favourite.
    • Presenting two items to a user and asking him/her to choose the better one of them.
    • Asking a user to create a list of items that he/she likes.

Implicit ratings: Learn ratings from other user actions. For example purchase implies high rating but we can’t learn low ratings. Examples of implicit data collection include the following:

  • Observing the items that a user views in an online store.
    • Analysing item/user viewing times.
    • Keeping a record of the items that a user purchases online.
    • Obtaining a list of items that a user has listened to or watched on his/her computer.
    • Analysing the user's social network and discovering similar likes and dislikes.

When building a model from a user's behaviour, a distinction is often made between explicit and implicit forms of data collection.

Different Kinds Of Recommendation Systems

Content Based filtering (personality-based approach)

Content-based filtering approaches utilise a series of discrete characteristics of an item in order to recommend additional items with similar properties. For example,

  • Recommend items to customer “c” similar to previous items rated highly by “c”.
  • Recommend movies with same actor, director, genre.
  • Recommend articles with similar content
  • Recommend people with many common friends

Collaborative filtering

Collaborative filtering approaches build a model from a user's past behaviour (items previously purchased or selected and/or numerical ratings given to those items) as well as similar decisions made by other users. This model is then used to predict items that the user may have an interest in. Collaborative filtering is based on the assumption that people who agreed in the past will agree in the future, and that they will like similar kinds of items as they liked in the past. The system generates recommendations using only information about rating profiles for different users or items. Collaborative systems locate peer users / items with a rating history similar to the current user or item and generate recommendations using this neighbourhood.

Many algorithms have been used in measuring user similarity or item similarity in recommender systems. For example, the k-nearest neighbour (k-NN) approach and the Pearson Correlation (centered cosine similarity). A particular type of collaborative filtering algorithm uses matrix factorization, a low-rank matrix approximation technique.

There are two models of collaborative filtering

User-to-user similarity model

User-User

Item-to-item similarity model

Item-to-item collaborative filtering (people who buy item “i” also buy item “j”) is an algorithm popularised by Amazon’s recommender system.

Item-Item.png

Can use same similarity metrics and prediction functions as user-user model. Item to item collaborative filtering works much better than user-user Collaborative Filtering. It is easy to explain to others but not as accurate and doesn’t scale well.

We must consider the following biases :

  • Normalisation Bias
  • user bias (score of grouchiness or mood of customer that day)
  • item bias (some items have much higher rating)
  • time bias (movies get more popular with time as only people who like those movies will watch and rate them)

Demographic Recommendation System

A demographic recommender provides recommendations based on a demographic profile of the user. Recommended products can be produced for different demographic niches, by combining the ratings of users in those niches.

Knowledge-based Recommendation System

A knowledge-based recommender suggests products based on inferences about a user’s needs and preferences. This knowledge will sometimes contain explicit functional knowledge about how certain product features meet user needs.

Hybrid Recommender Systems

Hybrid approaches can be implemented in several ways: by making content-based and collaborative-based predictions separately and then combining them; by adding content-based capabilities to a collaborative-based approach (and vice versa); or by unifying the approaches into one model.

Comparison of Recommendation Systems

 

  PRO CONS

Collaborative

Filtering

Works on any item

Cold start problem: New items have no ratings, new users have no history.

Need enough users in a system to find a match. These systems often require a large amount of existing data on a user in order to make accurate recommendations.

No feature selection is needed for complex systems like images, movies, music etc.

It does not rely on machine analyzable content and therefore it is capable of accurately recommending complex items such as movies without requiring an "understanding" of the item itself.

Sparsity: If the matrix is sparse, it is hard to find users who rate the same items. For example, the number of items sold on major e-commerce sites is extremely large. The most active users would only have rated a small subset of the overall database. Thus, even the most popular items have very few ratings.
  First rater: Can not recommend an unrated item, new items. 
  Popularity bias: This system tends to recommend popular items
  Scalability: In many of the environments in which these systems make recommendations, there are millions of users and products. Thus, a large amount of computation power is often necessary to calculate recommendations.

Content

Based

No need for other user's data. Finding appropriate features is hard in complex systems like images, movies, and music.
Able to make recommendations for users with unique tastes. Overspecialisation.
Able to recommend new and unpopular items (no first rater problem). Never recommends outside user's content profile.
We know why a user is being recommended that item. People may have varied interests.
  Unable to exploit quality judgements of other users.
  Cold start problem (what to do with new users with few ratings?) for new users as there is no user profile for new users.
Knowledge-based   Knowledge engineering bottleneck
Hybrid Provides more accurate recommendations than pure approaches.  
Used to overcome some of the common problems in recommender systems such as cold start and the sparsity problem.  

Evaluation

Evaluation is important in assessing the effectiveness of recommendation algorithms. The commonly used metrics are the mean squared error and root mean squared error or use "prediction at top k". The information retrieval metrics such as precision and recall or DCG are useful to assess the quality of a recommendation method. Recently, diversity, novelty, and coverage are also considered as important aspects in evaluation.

This blog is also available here.

References

Be the first to comment

Comments ( 0 )
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.