Article Description: The research progress of personalized recommendation system. |
Last month, wrote a product recommended articles, please see "I know the product recommended", the content is very extensive, more work experience. Read a few related papers this week, harvest a lot, share some dry goods.
The following is excerpted from the research progress of personalized recommendation system, which is published in the January 2009 "Progress of Natural science", the author is Liu Jianguo, Zhou, Wang Binghong.
I omitted the specific algorithm and many formulas, focusing on the principle, thinking and comparison.
The rapid development of Internet technology makes a lot of information at the same time in front of us, the traditional search algorithm can only be presented to all users the same sort results, can not be for the interests of different users to provide the corresponding services. Information explosion makes the utilization of information lower, which is called information overload. Personalized referrals, including personalized search, are considered to be one of the most effective tools for solving this problem at the moment. The recommendation question is fundamentally a substitute for a user to evaluate a product that it has never seen, including books, movies, CDs, Web pages, even restaurants, music, paintings, and so on.
By establishing the two-yuan relationship between users and information products, personalized recommendation system can use the existing selection process or similarity relationship to excavate each user's potential interested object and then make personalized recommendation. The highly effective recommendation system can excavate the potential consumption tendency of users and provide personalized service for many users. A complete recommendation system consists of 3 parts:
A behavioral recording module for collecting user information
Model Analysis module for analyzing user preferences
Recommended algorithm Module
The recommended algorithm module is the core part. Depending on the recommended algorithm, the recommendation system can be divided into the following categories:
Collaborative filtering (collaborative filtering) system
Content based (content-based) recommendation system
A recommendation system based on user-product two graph network structure (network-based)
Hybrid (hybrid) recommendation system
Other
1. Collaborative filtration System
The first generation is proposed and widely used in the recommendation system. such as Amazon's book recommendations, jester jokes recommended, and so on.
1) Core idea: Using the user's historical information to compute the similarity between the users--> use of users with high similarity to target users to predict the degree of preference of the target users to the specific products--> according to the degree of preference to recommend the target users.
When calculating the similarity between users, most of them are based on the user's rating of the common preference products. The most common method is Pearson correlation and angle cosine.
The algorithm of collaborative filtering recommendation system can be divided into two types, based on memory (memory-based) & Model-based (model-based). The former is based on all the products in the system to predict the excessive product information, focus on predicting the user's relative preference rather than the absolute value of the score, the latter is the collection of scoring data to learn and infer the user behavior model, and then the prediction of a product rating.
2) Advantages:
Identify potential user preferences and recommend new information
Can recommend products that are difficult to conduct content analysis
3) Disadvantages:
Because is based on the user to the product grade, therefore recommends to the new user or is recommends the new product to the user, the precision is not high
With the increase of user's quantity, the computational volume increases linearly, which affects the performance of the system.
2. Content-based recommendation system
is the continuation and development of collaborative filtering technology.
1 The core idea: the user and the product establishes the profile--> compares the user and the product configuration file the similarity degree--> to recommend the product which is most similar to its configuration file.
For example, in the movie recommendation, the content-based system first analyzes the similarities (actors, directors, styles, etc.) of the highly rated films that users have seen, and then recommends other films with high similarity to the content of the films that are of interest to these users. The content-based recommendation algorithm is based on information acquisition and information filtering. Because the research on text information acquisition and filtering is more mature, many content-based recommender systems are recommended by analyzing the text information of products. In the information acquisition, the most commonly used is the TF-IDF method.
2) Advantages:
Able to deal with new users, product issues (cold start)
The user's scoring information on the product is very few in the actual system, and the content-based recommendation system can not be constrained by the problem of scoring sparsity.
Can recommend new products and not popular products, find hidden information
By listing the characteristics of the recommended content, you can explain why these products are recommended, so that users have a better user experience when they use them
3) Disadvantages:
Constrained by information acquisition technology, such as the automatic extraction of multimedia data (graphics, video streaming, voice flow, etc.) of the content characteristics of technical difficulties
If two different products are represented by exactly the same characteristic words, the two products cannot differentiate
If a system only recommends products that are highly relevant to the user's profile, the recommended products are only those that are very similar to those previously purchased by the user, and do not guarantee a variety of recommendations.
3. The recommendation algorithm based on network structure
The content features of users and products are regarded as abstract nodes, and all the information used in the algorithm is hidden in the selection relationship between users and products.
1) Core idea: Establish user-product two Diagram Association network
For any target user I, assuming I have selected all products, each product has the ability to recommend other products to I, all I have not selected products according to the level of his preference, the top recommendation to me.
In the same degree of user preferences, the recommendation of unpopular products is more meaningful than the recommended hot products. With the same precision, the fewer products are recommended, the better.
There are also ways to improve accuracy:
Removal of repeatability
By introducing coupling thresholds (that is, only users with similarity greater than or equal to a given threshold and products connected to those users)
2) Advantages: A new direction of research on recommended algorithms is opened up
3) Disadvantages:
Also face the problem of new users ' new products. The system is unable to establish a network with other users or products when a new user or product has entered the system without any choice or selection of information.
Influenced by the time when the user chooses the relationship, if the user and the product are all related to the relationship, can not distinguish between long-term interest and short-term interest points, too much consideration of long-term interest points will make the system can not provide users with short-term interest in products, greatly reduce the recommended accuracy
4. Mixed recommendation
Combining the above recommendation methods organically, the most common in the practical recommendation system is based on collaborative filtering and content-based.
1) Independent systems are integrated with each other
Independent application of collaborative filtering, based on content and network structure of the proposed algorithm, and then the two or more of the recommendations of the system combined, the use of predictive scoring linear combination of recommendations. Alternatively, only the results of an algorithm that would perform better at a given moment in a given evaluation index are recommended.
2 adding content-based algorithm to collaborative filtering system
Using the user's configuration file for the traditional collaborative filtering calculation, the user's similarity is calculated by the content based configuration file, rather than the common product information. This can overcome the sparsity problem in the collaborative filtering system, and it is not only recommended if the product is similar to the user's profile, and if the product is similar to the user's profile, it will be directly recommended.
5. Other methods
1 Association Rule Analysis: Pay attention to the association mode of user behavior. People who buy cigarettes tend to buy lighters, so they can establish a relationship between cigarettes and lighters, recommending other products through this relationship.
2) based on the social network analysis of the recommended algorithm: such as the use of user's purchase behavior to establish his preference for the product similarity, in accordance with the recommendation of the user products and forecast product sales, thereby enhancing user stickiness.