Overview of recommended algorithms (i)
It is very important to choose the correct recommendation algorithm for the recommender system, and there are many algorithms available, and it is very difficult to find the algorithm that best suits the problem you are dealing with. Each of these algorithms has its own merits and limitations, so we should measure it before making a decision. In practice, we will probably need to test multiple algorithms to find the one that best suits the user, to understand the concepts of these algorithms and how they work, and to have a visual impression of them would be helpful.
The recommended algorithm is usually implemented in the recommendation model, and the recommendation model collects data such as user preferences and item descriptions that can be used as recommendations to predict items that may be of interest to specific groups of users.
The main recommended algorithm series are four (table 1-4):
- Recommended algorithm for collaborative filtering (collaborative Filtering)
- Recommended algorithm based on content filtering (content-based Filtering)
- Hybrid recommendation algorithm
- Popularity recommendation algorithm
In addition, there are many advanced or non-traditional ways to see table 5.
This article is the first in a series of articles that will introduce the main classification of recommendation algorithms in tabular form, including the introduction of algorithms, typical input content, common forms and their pros and cons. In the second and third articles of the series, we will describe the different algorithms in more detail, so that we can understand how they work in a more profound way. Some of the content of this article is based on a 2014-year recommendation algorithm 2014 tutorial "recommendation problem re-explore (Recommender problem revisited)" To write, the author of the article is Xavier Amatriain.
Table One: Overview of recommended algorithms for collaborative filtering
Table II: Overview of recommended algorithms based on content filtering
Table Three: Recommended algorithms for hybrid methods overview
Table IV: Overview of popularity recommendation algorithms
Table V: Overview of Advanced or "non-traditional" recommended algorithms
Part One original: Overview of Recommender Algorithms–part 1
Overview of recommended algorithms (II.)
This article is the second in a series of articles, which will list the recommended algorithm's memo lists, and introduce the main classification of the recommended algorithms. In this article, we will introduce the collaborative filtering recommendation algorithm in more detail, and discuss its merits and demerits so that we can understand its working principle more deeply.
The Collaborative Filtering (CF) recommendation algorithm will look for user behavior patterns and create user-specific recommendations accordingly. This algorithm uses data based on the user's usage in the system-such as the user's comments on the reading of the book to determine how much the user likes it. The key concept is that if two users have a similar rating for an item, their ratings for a new item are likely to be similar. It is important to note that this algorithm does not need to be dependent on item information (such as description, metadata, etc.) or user information (such as items of interest, statistical data, etc.). Collaborative filtering recommendation algorithms can be divided into two categories: neighborhood-based and model-based. In the previous algorithm (that is, the memory-based collaborative filtering recommendation algorithm), user-item scoring can be directly used to predict the score of new items. Model-based algorithms study predictive models by scoring, and then make predictions for new items based on models. The general idea is to use machine learning algorithms to find patterns in the data and to model the interaction between users and objects.
Neighborhood-based collaborative filtering focuses on the relationship between objects (i.e., object-based collaborative filtering) or the relationship between users (user-based collaborative filtering).
User-based collaborative filtering is the search for users who have similar tastes for items and are based on each other's favorite items.
Item-based collaborative filtering is a favorite item for a user and suggests something similar. This similarity is based on the appearance of objects, such as the purchase of x items of users also purchased Y items.
First, before performing an item-based collaborative filtering, let's look at a user-based collaborative filtering case.
Let's say that some of our users have expressed their preference for certain books, and the more they like a book, the higher the score (1 to 5 points). We can reproduce their preference in a matrix, with rows representing the user, and columns representing the books.
Figure One: User book preferences all preferences range from 1 points to 5 points and 5 points to the highest (ie favorite). The first user (line 1) scored 4 points for the first book (column 1), and if a cell is empty, it means that the user has not evaluated the book.
In the user-based collaborative filtering algorithm, the first thing we want to do is to calculate the similarity between them according to the user's preference for books. Let's take a look at this problem from an individual user's perspective, taking the first line in figure one as an example. Usually we use each user as a vector (or an array) that contains the user's preference for items. It is quite straightforward to compare users with a number of similar indicators. In this example, we will use the cosine similarity point. We compare the first user to the other five bits, and we can see how similar the first bit is to other users (figure II). Just like most similarity indicators, the higher the similarity between vectors, the more similar they are to each other. In this example, the first user has two books of the same number as two of them, with a higher similarity, and a lower similarity to the other two bits with the same book; no same book with the last one, with a similarity of zero.
Figure II: The similarity of the first user to other users. You can draw the cosine similarity between users in a single dimension.
More often than not, we can calculate how similar each user is to all users and show it in the similarity matrix (figure III). This is a symmetric matrix, which means that some of the useful properties can perform mathematical function operations. The background color of the cell indicates how similar the user is to each other, and the darker the red is, the higher the similarity.
Figure III: Similarity matrix between users, the similarity of each user is based on the similarity between the users ' reading books.
We are now ready to use user-based collaborative filtering to generate recommendations for users. For a particular user, this means finding the user with the highest similarity, and recommending it based on what the user likes, depending on how similar the user is. Let's take the first user as an example and generate some recommendations for it. First we find n users who are the most similar to this user, delete the books that the user has already liked, and then weight the books that the most similar users have read, then add all the results together. In this example, we assume that n=2, that is, to fetch two users who are most similar to the first user, to generate the recommended results, the two users are user 2 and User 3 (Figure IV). Since the first user has scored book 1 and book 5, it is recommended that the results generate books 3 (score 4.5) and book 4 (Score 3).
Figure Four: Generate recommendations for a user. We weighted the books that were read by the two most similar users, and then recommended the books that the user had not scored.
Now we have a deeper understanding of user-based collaborative filtering, and then we look at a case of object-based collaborative filtering. Let's use the same set of users (figure I) for example.
In object-based collaborative filtering, similar to user-based collaborative filtering, the first thing we need to do is to compute the similarity matrix. But this time, we want to look at the similarities between the objects and not the users. Similar to the previous one, we use books as vectors (or arrays) of our favorite people to compare them to the cosine similarity function, thus revealing how similar a book is to other books. Because the same group of users give a similar rating, the first book in column 1 and the fifth book in column 5 is the highest similarity (figure V). Next is the third-ranking book of similarity, with two of the same users; the fourth and second books have only one common reader, while the last book with no common readers has zero similarity.
Figure V: Comparison of the first book with other books. Books are expressed through the comments of the users they read. By comparing the cosine similarity index (0-1), the higher the similarity, the more similar the two books are.
We can also show how similar the books are to each other in the similarity matrix (Figure VI). The similarity between the two books is also distinguished by the background color, and the darker the red is, the higher the similarity.
Figure VI: Similarity matrix of books
Now that we know how similar each book is to each other, we can generate recommendations for users. In the article-based collaborative filtering, we recommend items that are most similar to those that users have previously overestimated. In the case, the first user was given the third book, followed by the sixth one (Figure seven). Similarly, we only take two books that are most similar to the one that the user has commented on before.
Figure VII: Generate recommended results for a user. We took the list of books that they had commented on before, found the two that were the most similar to each book, and then recommended the books that the user had not commented on.
Based on the above description, the user-based collaborative filtering of items seems very similar, so it's really interesting to have different results. Even in the example above, both of these methods can produce different recommendations for the same user, although the inputs are the same. These two forms of collaborative filtering are worth considering when building recommendations. Although the two methods look very similar when described to the layman, they can actually produce very different recommendations, thus creating a completely different experience for the user.
Due to its simplicity and efficiency, the resulting recommendations are accurate and personalized, and the neighborhood approach is also quite popular. However, because of the similarity between the user or the item, there are some scalability limitations as the number of users or items increases. In the worst case, O (m*n) needs to be calculated, but in reality the situation is slightly better, as long as the O (m+n) is calculated, in part because of the sparse degree of the data being exploited. Although sparsity helps to extend the implementation, it also challenges the neighborhood-based approach because there are only a handful of items that are commented on by the user. For example, there are millions of articles in the Mendeley system, and a user may have read only hundreds of of them. The likelihood of two users reading 100 articles with similar degrees is only 0.0002 (in the 50 million-article directory).
The model-based collaborative filtering method can overcome the limitation based on the neighborhood approach. Unlike neighborhood approaches that use user-item scoring to predict new items directly, the model-based approach uses scoring to study predictive models and to predict new items based on models. The general idea is to use machine learning algorithms to find patterns in the data and to model the interaction between users and objects. In general, model-based collaborative filtering is a more advanced algorithm for building collaborative filtering. Many different algorithms can be used to build models for predictions, such as Bayesian networks, clustering, classification, regression, matrix factorization, restricted Boltzmann machines, and some of which have played a key role in obtaining the Netflix Prize award. Netflix held a competition between 2006 and 2009, when it offered a $1 million award for a recommendation system production team capable of generating accuracy over 10% of its systems. The winning solution is a combination of more than 100 different algorithmic models, and the use of matrix factorization and restricted Boltzmann machines in production environments.
Matrix factorization (such as singular value decomposition, svd++) transforms the object and the user into the same hidden space, showing the underlying interaction between the user and the object (Figure eight). The rationale behind Matrix factorization is that the underlying features represent how the user scores the item. Based on the potential performance of the user and the item, we can predict how much the user likes the non-rated items.
Figure Eight: The matrix Decomposition algorithm demonstration, the user preference matrix can be decomposed into the user topic matrix multiplied by the item theme matrix.
In table one, we list the key advantages and disadvantages of the neighborhood algorithm and the model-based collaborative filtering algorithm. Because the collaborative filtering algorithm relies only on the user's usage data and wants to generate good enough recommendations without much understanding of the technical work, the algorithm has its limitations. For example, CF makes it easier to recommend popular items, so it can be difficult for a unique user to recommend items (that is, items that interest them may not have too much usage data), which is a matter of popularity preference, which is usually addressed through a content-based filtering algorithm. One of the more important limitations of the CF algorithm is the so-called "cold boot Problem"-the system cannot provide referrals (that is, problems for new users) to users who do not have or use very few behaviors, nor can they provide recommendations for items that do not or use very little (i.e., new items). The new user's "cold start problem" can be solved by popularity and hybrid algorithms, and new item problems can be solved by content-based filtering or Multi-armed Bandit recommendation algorithms (i.e., exploration-utilization). In the next article we will discuss some of these algorithms in detail.
In this paper, we introduce three kinds of basic collaborative filtering algorithm implementation. The differences between item-based, user-based collaborative filtering algorithms, and matrix decomposition algorithms are subtle, and it's often difficult to simply explain their differences. Understanding the differences between these algorithms can help us choose the best algorithm for recommending the system. In the next article, we will continue to delve into the popular algorithms of Recommender systems.
Part II Original: Overview of Recommender Algorithms–part 2
Overview of recommended algorithms (III.)
This article is the third in a series of articles. The first article introduces the main classification of recommendation algorithms in the form of a list, and the second article introduces different types of collaborative filtering algorithms, emphasizing some subtle differences. In this article, we'll cover content-based filtering algorithms in more detail and discuss their pros and cons to better understand how they work.
Content-based filtering algorithms recommend those that are similar to those of the user's favorite items. However, unlike collaborative filtering algorithms, the algorithm summarizes its similarity based on content such as title, year, and description, rather than how people use items. For example, if a user likes the first and second parts of The Lord of the Rings, the recommendation system will recommend the third part of The Lord of the Rings to the user through the title keyword. In the content-based filtering algorithm, it is assumed that each item has enough descriptive information to be used as a feature vector (Y) (such as title, age, description), and these eigenvectors are used to create user preference models. Various information retrieval (such as TF-IDF) and machine learning techniques (such as naive Bayesian algorithms, support vector machines, decision trees, etc.) can be used to generate user models, which are then recommended according to the model.
Let's say that some of our users have expressed their preference for certain books, and the more they like a book, the higher the score (1 to 5 points). We can reproduce their preference in a matrix, with rows representing the user, and columns representing the books.
Figure One: User book preferences all preferences range from 1 points to 5 points and 5 points to the highest (ie favorite). The first user (line 1) scored 4 points for the first book (column 1), and if a cell is empty, it means that the user has not evaluated the book.
In the content-based collaborative filtering algorithm, the first thing we need to do is to calculate the similarity between books based on the content. In this case, we used the keywords in the book title (Figure II), just to simplify. In practice we can also use more properties.
Figure II: The title of the book that the user has commented on
First of all, we usually want to remove the stop words from the content (such as grammatical words, too common words), and then use the vector (or array) representing which words appear (figure III), which is called the vector space representation.
Figure III: Vocabulary using headings If there is a word in the title, we mark it with 1, otherwise it is empty.
With this form, we can use a variety of similar indicators to directly compare the books. In this example, we will use the cosine similarity point. When we use the first book and compare it to five other books, we can see how similar the first book is to other books (Figure IV). Just like most similarity indicators, the higher the similarity between vectors, the more similar they are to each other. In this case, the first book is similar to the other three books, and there are two common words (recommendations and systems). The shorter the title, the higher the similarity between the two books, which is also reasonable, because the less common words are. Given that there is no common vocabulary, the first book has nothing to do with two of the other books.
Figure IV: Similarity between the first book and other books can be drawn by the cosine similarity between two books in a single dimension.
We can also show how similar the books are to each other in the similarity matrix (figure V). The background color of the cell indicates how similar the user is to each other, and the darker the red is, the higher the similarity.
Figure V: Similarity matrix between books, each similarity is based on the cosine similarity between the book vector representations.
Now that we know how similar each book is to each other, we can generate recommendations for users. Similar to item-based collaborative filtering, as we've described in previous articles, the recommendation system recommends the highest similarity among other books based on the books previously evaluated by the user. The difference is that the similarity is based on the content of the book, which is exactly the title, not the usage data. In this example, the system will recommend the sixth book to the first user, followed by the fourth book (Figure VI). Similarly, we only take two books that are most similar to the one that the user has commented on before.
Figure VI: Generate recommended results for a user. We took the list of books that they had commented on before, found the two that were the most similar to each book, and then recommended the books that the user had not commented on.
The content-based algorithm solves some of the limitations of the collaborative filtering algorithm, especially to help us overcome the prevalence bias and the cold start of new items, which we have already discussed in the Collaborative filtering section. However, it is important to note that purely content-based recommender systems typically do not perform as well as those based on the use of data (such as collaborative filtering algorithms). Content-based filtering algorithms can also be overly specialized, and the system may recommend too many items of the same type to the user (such as getting recommendations for all the Lord of the Rings films), rather than recommending items that are of a different type but that are of interest to the user. Finally, content-based algorithms are implemented using only the words contained in the item's metadata (such as the title, description year), which makes it easier to recommend more of the same content, limiting the user's discovery of the content beyond those words. A summary of the merits of content-based filtering is shown in table two.
Part III Original: Overview of Recommender Algorithms–part 3
Overview of recommended algorithms (IV)
This article is the fourth in a series of articles. The first article introduces the main classification of recommendation algorithms in the form of a list, the second article introduces different types of collaborative filtering algorithms, emphasizes some subtle differences, and in the third we introduce the content-based filtering algorithm in detail. This article will discuss the hybrid recommendation system based on the previous algorithm, and will also discuss how to use popularity to solve some of the limitations of collaborative filtering algorithms and content-based filtering algorithms.
The hybrid algorithm combines the content characteristics of users and articles with the use of data to take advantage of these two types of data. A hybrid recommender system combining the A and B algorithms will try to take advantage of the a algorithm to solve the disadvantage of the B algorithm. For example, there is a problem with new items in the collaborative filtering algorithm, which means that the algorithm cannot recommend items that the user has not evaluated or used. Because it is based on content (attribute) predictions, this does not impose limits on content-based algorithms. Combined with the collaborative filtering and content-based filtering algorithm, the hybrid recommendation system can solve some limitations in the single algorithm, such as the problem of cold start and popularity preference. Table One lists a number of different approaches, including how to combine two or more basic recommendation system technologies to create a new hybrid system.
Table One: Combine two or more basic recommendation algorithms to create different ways of creating a new hybrid algorithm.
Let's say that some of our users have expressed their preference for certain books, and the more they like a book, the higher the score (1-5). We can reproduce their preference in a matrix, with rows representing the user, and columns representing the books.
Figure One: User book preferences all preferences range from 1 points to 5 points and 5 points to the highest (ie favorite). The first user (line 1) scored 4 points for the first book (column 1), and if a cell is empty, it means that the user has not evaluated the book.
In the second part of this series, we present two cases, including how to calculate recommendations using item-based and user-based collaborative filtering algorithms, and in the third part of this article, we demonstrate how to use content-based filtering algorithms to generate recommendations. Now we combine these three different algorithms to produce a completely new blend of recommended results. We will use the Weighted method (table I) to combine the results of multiple technologies, and then these three algorithms can be combined to produce a new set of recommendations based on different weights (depending on the importance).
Let's take the first user as an example and generate some recommendations for it. First, based on the user-based collaborative filtering algorithm in the second article, the collaborative filtering algorithm based on the item, and the third article based on the content filtering algorithm, each generates the recommended results. It's worth noting that in this small case, these three methods are slightly different for the same user-generated recommendations, even though the input is exactly the same.
Figure II: Recommended results for a user-using user-based collaborative filtering algorithms, object-based collaborative filtering algorithms, and content-based filtering algorithms.
Next, we use the weighted hybrid recommendation algorithm to generate recommended results for the specified user, weighted values are: User-based collaborative filtering algorithm 40%, based on the object of the collaborative filtering algorithm 30%, based on the content filtering algorithm 30% (figure III). In this case, the system would recommend to the user all three books that they had never seen before, while using a single algorithm would only recommend two of them.
Figure III: Use the Weighted hybrid recommendation system to generate recommendations for a particular user, see above for specific weights.
Although the hybrid algorithm solves some of the major challenges and limitations of the CF and CB algorithms (see figure III), it also takes a lot of effort to balance the different algorithms in the system. Another way to combine a single recommendation algorithm is to use an integrated approach, and we have developed a function for how to combine the results of different algorithms. It is important to note that the general integration algorithm not only combines different algorithms, but also combines different variants and models derived from the same algorithm. For example, the solution for the Netflix Prize award includes more than 100 different models from more than 10 algorithms (popularity, neighborhood algorithms, matrix decomposition algorithms, restricted Boltzmann machines, regressions, and so on), and combines these algorithms with models through iterative decision Trees (GBDT).
In addition, popularity-based algorithms are also an excellent solution for cold-start problems for new users. These algorithms rank items by some popular measurement criteria, such as the most downloaded or most purchased, and recommend these most popular items to new users. This approach, while having the right level of popularity metrics, is very effective, and can often provide a good baseline for other algorithms. The popularity algorithm can also be used as an algorithm to guide Recommender systems to get enough activity and usage before switching to other algorithms that are more relevant to the user's point of interest, such as collaborative filtering algorithms and content-based filtering algorithms. The popularity model can also be introduced into the hybrid algorithm to solve the cold start problem of new users.
Part IV Original: Overview of Recommender Algorithms–part 4
Overview of recommended algorithms (v)
This article is the fifth in the proposed algorithm series. The first article introduces the main classification of the recommendation algorithm in the form of a list, the second article introduces the different types of collaborative filtering algorithms, emphasizes some subtle differences, in the third we introduce the content-based filtering algorithm in detail, in the fourth we explain the hybrid recommendation system and the algorithm based on popularity. In this article, we'll take a quick look at some of the advanced recommendation algorithms to choose from, and then review how the underlying algorithm makes the difference in recommendations so that the series will end up perfectly.
In addition to some of the more traditional recommendation system algorithms we have mentioned so far (such as popularity algorithms, collaborative filtering algorithms, content-based filtering algorithms, hybrid algorithms), there are many other algorithms that can also be used to enhance the functionality of Recommender systems, including:
- Deep Learning Algorithms
- Social referrals
- A sorting method based on machine learning
- Multi-armed Bandits recommended Algorithm (Exploration/utilization)
- Situational Awareness recommendation (tensor decomposition & decomposition machine)
These more advanced non-traditional algorithms are good for pushing the quality of existing recommender systems to a higher level, but are also more difficult to understand and not enough to support the recommended tools. In practice, we always weigh the cost of implementing an advanced algorithm against the gain of the underlying algorithm. Based on experience, the basic algorithm can be used for a long time to provide services for some excellent products.
In this series, we want to introduce some common recommended module algorithms, including user-based collaborative filtering algorithms, object-based collaborative filtering algorithms, content-based filtering algorithms, and hybrid algorithms. We used a case to illustrate the different recommended results for the same user when the four different algorithms were applied to the same case at the same time as the input data (figure I). This effect persists when applied to large data in the real world, so you need to consider the pros and cons as well as the execution effect when deciding which algorithm to use.
Figure 1:4 Recommended system algorithms all use the same set of data, but come up with different results. On the left we give a matrix that identifies the user's preferences for a large number of items, and lists the names of the items that can be recommended. In the middle, we show how these four different algorithms generate recommended results for the first user (i.e. the user in the first row of the user preference matrix). As shown in the similarity matrix, these algorithms have different definitions of similarity. On the right is a number of items generated by each recommendation algorithm, sorted from top to bottom according to the order in which the four algorithms were introduced.
In practice, if the collaborative filtering algorithm is used in the recommendation model, it will not make too much mistake. Collaborative filtering algorithms seem to be better than other algorithms, but there are problems with cold-start users and items, so content-based algorithms are often used as an aid. If there is time, the use of hybrid algorithms can be combined with collaborative filtering and based on the advantages of content filtering algorithm. Putting these basic algorithms together is certainly a good idea, even better than advanced algorithms.
This last point is worth remembering: the recommendation model is only one of the five recommended system components. Like all components, it is important to set up and try to build the model correctly, but it is also important to choose datasets, processes, post-processing, online modules, and user interfaces. As we have repeatedly emphasized, the algorithm is only part of the recommendation system, and the entire product should take your decisions into consideration.
Introduction to recommended Algorithms