Recommended algorithm Confluence

Source: Internet
Author: User

Our problem is such a M-item, m-user data, only some users and some of the data is scored data, the other part of the score is blank, at this time we want to use the existing part of sparse data to predict the gap between the items and data between the scoring relationship, to find the highest rated items recommended to the user.

The recommendation algorithm has a lot of application scenarios and commercial value, so it is worth studying the recommendation algorithm well. There are many kinds of recommendation algorithms, but the most widely used one should be the recommended algorithm of collaborative filtering category, this paper summarizes the recommended algorithm of collaborative filtering category, and the following will also summarize some typical collaborative filtering recommendation algorithms.

1. Recommended Algorithms Overview

The recommended algorithm is very old and needs to be applied when machine learning has not yet arisen. In summary, there are 5 types of these:

1) Content-based recommendations : This type of knowledge generally relies on natural language processing NLP, through the mining of text TF-IDF eigenvector, come to the user's preferences, and then make recommendations. This type of recommendation algorithm can find the user's unique niche preferences, but also a good explanation. In this category, due to the need for the basis of NLP, this article is not much to say, in the later discussion of NLP.

2) coordinated filtering recommendation : This article will be devoted to the following. Coordinated filtering is the most mainstream category in the recommended algorithm, and there are many wide applications in the industry. Its advantage is that it does not require much knowledge of a particular domain, and can be better recommended by the machine learning algorithm based on statistics. The greatest advantage is that it is easy to implement in engineering and can be easily applied to products. At present, most of the recommended algorithms in practical application are collaborative filtering recommendation algorithm.

3) Mixed recommendation : This is similar to our machine learning in the integration of learning, BOCAI, through the combination of several recommended algorithms, to get a better recommendation algorithm, play a role of Zhuge Liang Three Stooges. For example, through the establishment of a number of recommended algorithm model, and finally use the voting method to determine the final recommendation results. Hybrid recommendations in theory will not be worse than a single recommendation algorithm, but the use of hybrid recommendations, the complexity of the algorithm is improved, in practical applications, but there is no single coordinated filtering recommendation algorithm, such as logistic regression, such as the two classification recommendation algorithm is extensive.

4) rules-based recommendations : Such algorithms are common, such as based on the most user clicks, the most user browsing, etc., belong to the popular type of recommendation method, in the current Big data era is not mainstream.

5) recommendation based on demographic information : This class is the simplest recommendation algorithm, it is simply based on the user's basic information to find the user's relevance, and then recommend, currently in large-scale systems have been less used.

2. Coordination Filter Recommendations Overview

Collaborative filtering (collaborative Filtering) is the most classic type of recommendation algorithm, which includes both online collaboration and offline filtering. Online collaboration is the use of online data to find things users may like, while offline filtering, is to filter out some of the less recommended data, than such as the recommended value of low-scoring data, or although the recommended value is high but the user has already purchased the data.

Collaborative filtering model is generally m items, m users of data, only some users and part of the data between the scoring data, the other part of the score is blank, at this time we want to use the existing part of sparse data to predict the gap between the items and data between the scoring relationship, to find the highest rated items recommended to the user.

In general, there are three types of collaborative filtering recommendations. The first is user (user-based)-based collaborative filtering, the second is project-based (item-based) collaborative filtering, and the third is the model based-based collaborative filtering .

User (user-based)-based collaborative filtering is primarily concerned with the similarity between users and users, as long as they find similar items, and predict the target user's rating of the corresponding items, they can find the highest rated number of items to recommend to the user. The collaborative filtering based on the project (item-based) and the user-based collaborative filtering are similar, except when we turn to find the similarity between items and items, and only if we find the target user's rating on certain items, we can predict similar items with high similarity, Recommend a number of similar items with the highest ratings to the user. For example, you bought a machine learning related Books Online, the site will immediately recommend a bunch of machine learning, big data related books to you, here is clearly based on the project of collaborative filtering ideas.

We can simply compare the user-based collaborative filtering and project-based collaborative filtering: User-based collaborative filtering needs to be online to find the relationship between users and users, computational complexity will certainly be higher than based on project-based collaborative filtering. But it can help users find new categories of items that have surprises. Collaborative filtering based on the project, due to the similarity of the items to be considered for a period of time will not change, so it can be easily offline calculation, accuracy is generally acceptable, but the recommended diversity, it is difficult to bring users pleasantly surprised. In general, project-based collaborative filtering is certainly the mainstream for small recommender systems. However, if it is a large recommendation system, you can consider the user-based collaborative filtering, of course, we can consider our third type, model-based collaborative filtering.

Model based-based collaborative filtering is currently the most mainstream type of collaborative filtering, and a lot of our machine learning algorithms can be found here. Here we focus on model-based collaborative filtering.

3. Model-based collaborative filtering

Model-based collaborative filtering, as the most mainstream type of collaborative filtering, its related algorithms can write a book, of course, we are here mainly on its thinking to do have a classification summary. Our problem is such a M-item, m-user data, only some users and some of the data is scored data, the other part of the score is blank, at this time we want to use the existing part of sparse data to predict the gap between the items and data between the scoring relationship, to find the highest rated items recommended to the user.

For this problem, using the idea of machine learning to model the solution, the mainstream method can be divided into: with correlation algorithm, clustering algorithm, classification algorithm, regression algorithm, matrix decomposition, neural network, graph model and semantic model to solve. Let's introduce them separately.

3.1 Using correlation algorithm to do collaborative filtering

In general, we can find out the frequently occurring itemsets in all item data purchased by users to do frequent set mining, and find the frequent n itemsets or sequences of related items satisfying the support threshold. If the user buys a frequent n itemsets or some of the items in the sequence, then we can recommend the frequent itemsets or other items in the sequence to the user according to a certain scoring criteria, which can include support, confidence, and promotion.

The commonly used association recommendation algorithms are APRIORI,FP tree and Prefixspan. If you are unfamiliar with these algorithms, you can refer to my other articles:

Summary of principle of Apriori algorithm

Summary of principles of FP tree algorithm

Summary of principle of Prefixspan algorithm

3.2 Using clustering algorithm to do collaborative filtering

Using the clustering algorithm to do collaborative filtering is somewhat similar to the previous user-or project-based collaborative filtering. We can cluster according to the user or according to the item based on a certain distance measure. If based on user clustering, users can be divided into different target groups according to a certain distance measurement, and the items with the same target population are recommended to the target users. Based on the item clustering, it is recommended to users of similar items with high user rating items.

Commonly used clustering recommendation algorithm has K-means, BIRCH, Dbscan and spectral clustering, if you are not familiar with these algorithms, you can refer to my other articles:

Principle of K-means Clustering algorithm

Principle of birch Clustering algorithm

Dbscan Density Clustering algorithm

A summary of the principle of spectral clustering (spectral clustering)

3.3 Using classification algorithm to do collaborative filtering

If we divide the score into segments based on user ratings, the problem becomes a classification problem. For example, the most direct, set a score threshold, score above the threshold is recommended, score below the threshold is not recommended, we turned the problem into a two classification problem. Although there are a plethora of algorithms for classification problems, the most widely used is logistic regression. Why is it a logistic regression rather than a support vector machine that looks bigger? Because the logical regression is more explanatory, each item is recommended we have a definite probability in this, at the same time, the characteristics of the data can be engineered to get the purpose of tuning. At present, the logical regression to do collaborative filtering in bat and other manufacturers have been very mature.

The common classification recommendation algorithms are logistic regression and naive Bayes, both of which are characterized by strong explanatory properties. If you are unfamiliar with these algorithms, you can refer to my other articles:

Summary of the principle of logistic regression

Summary of the principle of naive Bayesian algorithm

3.4 Using regression algorithm to do collaborative filtering

It seems more natural to use the regression algorithm to make collaborative filtering than the classification algorithm. Our score can be a continuous value instead of a discrete value, and we can get the target user to score a forecast for a product through the regression model.

The commonly used regression recommendation algorithms are ridge regression, regression tree and support vector regression. If you are unfamiliar with these algorithms, you can refer to my other articles:

Summary of the principle of linear regression

Principle of decision Tree algorithm (bottom)

Principle of support vector Machine (v) linear support regression

3.5 using matrix decomposition to do collaborative filtering

Using matrix decomposition to do collaborative filtering is a widely used method at present. Because the traditional singular value decomposition SVD requires that the matrix cannot have missing data, it must be dense, and our user item scoring matrix is a very typical sparse matrix, the direct use of the traditional SVD to collaborative filtering is more complex.

At present, the main matrix decomposition recommendation algorithm is mainly SVD of some variants, such as FUNKSVD,BIASSVD and svd++. The biggest difference between these algorithms and traditional SVD is that the matrix is no longer required to be decomposed intoUΣVT

The form, and the change is two low-rank matricesPtQ

The product form. For the proposed algorithm of matrix decomposition, follow-up I will specifically.

3.6 Using neural networks to do collaborative filtering

Using neural networks and even deep learning to do collaborative filtering should be a trend in the future. At present, the more mainstream use of two-layer neural network to do the recommended algorithm is to limit Boltzmann machine (RBM). In the current Netflix algorithm race, the RBM algorithm behaves very well. Of course, if the use of deep neural network to do collaborative filtering should be better, the manufacturers of commercial deep learning methods to do collaborative filtering should be a trend in the future. Follow-up I will be specifically opening to talk about RBM.

3.7 Using graph model to do collaborative filtering

Using graph model to do collaborative filtering, the similarity between users is put into a graph model to consider, the commonly used algorithm is SimRank series algorithm and Markov model algorithm. For the SimRank Series algorithm, its basic idea is that two objects referenced by similar objects also have similarity. The idea of arithmetic is somewhat similar to the famous PageRank. Markov model algorithm is of course based on Markov chain, its basic idea is based on the conductivity to find out the common distance measurement algorithm difficult to find the similarity. Follow-up I will specifically talk about the SimRank series algorithm.

3.8 using the semantic model to do collaborative filtering

The metaphorical model is mainly based on NLP, which involves the semantic analysis of user behavior to make scoring recommendations, the main methods are implicit semantic analysis LSA and the implicit Dirichlet distribution LDA, these are referred to NLP again specifically.

4. Some new directions for collaborative filtering

Of course, the transformation of the recommended algorithm is also in progress, the most popular logic-based regression recommendation algorithm is also in the face of being replaced. What algorithms might replace traditional collaborative filtering such as logistic regression? Here's what I understand:

A) integrated learning-based approach and hybrid recommendations : This and hybrid recommendations are also on the same side. Due to the maturity of integrated learning, the proposed algorithm also has a good performance. An algorithm that may replace logistic regression is GBDT. At present, GBDT has a good performance in many algorithmic competitions, and it has industrial-grade parallelization to implement the class library.

b) based on matrix decomposition method : Matrix decomposition, because the method is simple, has been favored. At present, the matrix decomposition method, which is becoming popular, has decomposition machine (factorization machines) and tensor decomposition (Tensor factorization).

c) based on deep learning methods : Currently two layers of neural network RBM have a very good recommendation algorithm effect, and with the rise of deep learning and multilayer neural networks, the future may be recommended algorithm is the world of deep learning? The hottest thing to see now is the recommended algorithm based on CNN and RNN.

5. Collaborative Filtering summary

As a kind of classical recommendation algorithm, collaborative filtering is widely used in industry, its advantages are many, the model versatility is strong, it does not need much expertise in the field of data correspondence, the project realizes simple, and the effect is good. These are the reasons why it is popular.

Of course, collaborative filtering also has some difficult problems to avoid, such as the headache of "cold start" problem, we do not have new users any data, it is not good for new users to recommend items. It also does not consider the differences in scenarios, such as the user's scene and the user's current mood. Of course, it is not possible to get some niche preferences, this is based on the content of the recommendation is more adept at.

Recommended algorithm Confluence

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.