Recommendation System-Combat summary

Source: Internet
Author: User

Recommended System Combat


This week to see the recommendation system actual combat this book, which basically introduced a more comprehensive, but each part is not very deep, in-depth essence are all in the comments below, notes there are many papers, you can further study.


Let's start by reviewing some of the framework information and thinking about some of these important parts:

3 recommended ways to connect users and item:

1 According to the user's historical behavior, the item which expresses the feedback is forecasted by the item, the traditional ITEMCF

2 According to the user's historical behavior, find similar users, make predictions, USERCF

3 According to user preferences and personal information, to extract the characteristics of users, like the characteristics of the items, to make predictions. Is basically the idea of building a model.

The user's own characteristics, 1 such as age, gender, 2 user history behavior, time and geography are equally important 3 according to historical behavior to find the topic model, like the type of goods (may involve topic model) part, but also need to extract the characteristics of the items, such as the type of goods, properties ... The book introduced to the cold start of items also need to extract the characteristics of the item, such as keyword vector, d={(E1,W1), (E2,W2) ...} Represents keywords and weights, weight information can be calculated using TF-IDF.

After extracting the feature information, if it is to calculate the similarity of two items, then the expression is a eigenvector, two eigenvectors are directly multiplied, if you want to directly get the estimated results, such as directly according to the CTR sort, then use the machine learning model to train.


The general system is characterized by a lot of, recently done an example is about 20 million + of the characteristics of information, if a system to consider all the characteristics of unrealistic, configuration file configuration is cumbersome, generally is a number of recommendation engine work together, in accordance with a certain weight and priority combination.


the structure of the recommendation system:A User Characteristics

Generally divided into the user's behavior characteristics, from the user log in the behavior of the extraction, and then carry out behavioral features, and for attribute characteristics, you can directly according to the information into the eigenvector


B generate a list of initialization recommendations

According to the characteristics of the user, and the offline calculation of the completed characteristics-the relevant matrix of the goods, the relevant recommendations for conversion. Here's an important point: a collection of candidate items is generally introduced here, i.e. the items that need to be recommended are in the candidate set.

If the user has a behavior on item A, the candidate list of B would like to be recommended, A is popular, B is not popular, then, b in a in the relevance of the calculation of the points certainly will not rely on the front (there will also be a way to suppress the popularity), then the TOPN recommendation may not have B, recommended is not in the recommended list Then if the final simple filtering, may lead to a small number of recommended products, so here, it should be appropriate according to the list of recommendations to increase the weight of items in the list information.


C Filter Module

Filter to unsatisfactory items, such as user-operated, poor quality ...


D Ranking Module

1 novelty in the recommendation process, the right to the popular items

2 diversity The weight of the displayed commodity is weakened, and by the user's historical behavior to dig more topics, to recommend

3 Time Diversity system real-time commodities decay based on time factor


A few of the more important points, first a brief record, followed by an in-depth introduction to each:


1 in the CF recommendation process, must pay attention to suppress the item, the user's active factor. At the same time when calculating the relevance, pay attention to the normalization of the items always belong to a lot of categories, each category within the similarity of the items are also different, so the best category for normalization


2 The semantic model relates the user's interests and items according to the implied characteristics, according to the user behavior Auto-clustering user u with the K implied class relationship * k implied class and article I relationship, the computational amount is large, if can not be a candidate list, not suitable for in real-time applications, but not the results of interpretation

3 when sampling or generating negative samples for the model, try to maintain positive and negative equilibrium, and negative samples as far as possible, but the user does not have the behavior

4 How to deal with real-time problem of real-time Recommender system

5 Cold start problem

User cold start problem, the user at the time of registration has basic information, can be based on the basic information on the user classification, while finding and user characteristics F and the item I is characterized by the user of the F-like degree of recommendation, but also in the registration time to let the user explicit to complete some feedback operation, I remember like Weibo, Lofter and so on have this process

Item cold start problem, the simple method is to display the item randomly, then collect certain information to be processed. The most commonly used method is to generate keyword vectors based on the content of the item, to find similar items according to the keyword vectors, to recommend a method similar to CF, generally involving words, then you need to think about synonyms, the LDA model may need

LDA consists of 3 elements of documents, topics, words by convergence make words into different topics, according to the distribution of items on the topic, calculate the similarity of items. Calculate the similarity of the distributions KL


6 by labeling items, calculating the label distribution of items, calculating similarities, the tags are similar to keywords, and can also be introduced id-tfd when calculating weights.


7 in the process of making recommendations, pay attention to the factors of time and geography, pay attention to the life cycle of an object, the life cycle of the system. When the system is time-sensitive, it calculates the popularity of the item and evaluates it with the average number of online days of the item. Then calculate the similarity of the popularity of the items in the adjacent T-day time, and get the timeliness of the system.

Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.

Recommendation System-Combat summary

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.