A summary of the 2015 Ali Mobile Recommendation Algorithm Contest (II.)--Recommendation algorithm

Source: Internet
Author: User

Although began to go the wrong way, but also learned something, the group of technical team's documentation is good, like the children's shoes can often go to look, I will give links to the link ~ ~ ~

——————————————————————————————————————————————————————————————


Specific process

The basic flow is as follows, borrow the chart of the American Regiment.


From the frame point of view, the recommendation system can be divided into data layer, trigger layer, Fusion filter layer and sorting layer. The data layer includes data generation and data storage, mainly using various data processing tools to clean raw logs, processing formatted data, landing into different types of storage systems for downstream algorithms and models. The candidate set trigger layer is mainly based on the user's historical behavior, real-time behavior, geographic location and other triggering strategies to produce the recommended candidate set. Candidate set fusion and filtering layer has two functions, one is to the departure layer generated by the different candidate sets to improve the coverage and accuracy of the recommendation strategy, but also to undertake a certain filtering duties, from the product, operational perspective to determine some of the artificial rules, filter out the non-eligible item. The sorting layer is mainly based on the machine learning model to reorder the candidate sets filtered by the trigger layer.

In this game is equivalent to the data, no need to consider the production of data, it is possible to consider storage, for the time being not considered. So the general process is to analyze the data first, then preprocess the data, enter the candidate set trigger, consider the method of collaborative filtering and location clustering to recommend the set, and then train the final result by machine learning method.


Application of theoretical analysis data

Behavior Category

Behavior details

Active behavior Data

Search, filter, click, bookmark, order, pay, score

UGC

Text evaluation, uploading pictures

Negative feedback data

Left slide Delete, Cancel collection, cancel order, refund, negative rating, low rating

User Portrait

User demographics, group DNA, category preference, consumption level, place of work and place of residence


User active behavior data records the user's various behaviors on the platform, which is used for off-line computing in the candidate set triggering algorithm (described in the next section), and on the other hand, the intentions of these acts represent different strengths and weaknesses, Therefore, it is possible to set different regression target values for different behaviors during the training of reorder models, so as to describe the user's behavior intensity more finely. In addition, user-deal behavior can also be used as a cross-feature of reorder models for offline training and online prediction of models.

Negative feedback data reflect that the current results may not meet the needs of users in some aspects, so in the subsequent candidate set trigger process need to consider the specific factors to filter or down, reduce the risk of the recurrence of negative factors, improve the user experience, while in the reordering model training, Negative feedback data can be used as a rare negative example to participate in the model training, these negative examples are more than those who have not clicked on the display, not the order of the sample significantly more.

User portrait is the basic data depicting user attributes, some of which are directly obtained raw data, some of which are mined two times processing data, these properties can be used in the candidate set triggering process to weighted or down the deal, on the other hand can be a reorder model in the user dimension features.

Through the data mining of UGC can extract some key words, and then use these keywords to deal tag, for deal personalized display.


Recommended engine
1, the recommendation engine is not for different users to recommend different data

according to the recommendation engine for popular behavior , each user is given the same recommendation, which can be statically set manually by the system administrator or calculated based on the feedback statistics of all users of the system.

Personalized recommendation engine , for different users, according to their tastes and preferences to give more accurate recommendations, at this time, the system needs to understand the characteristics of the recommended content and users, or based on social networks, by finding the same preferences with the current user, the implementation of recommendations.

This is a basic recommendation engine classification, in fact, most people discuss the recommendation engine is to be personalized recommendation engine, because fundamentally, only the personalized recommendation engine is a more intelligent information discovery process.


2, according to the recommendation engine data source

In fact, this is about how to find the relevance of the data, because most of the recommendation engine work is based on the object or the user's similarity set to recommend. The methods for discovering data dependencies based on different data sources can be divided into the following types:

Based on the basic information of the user of the system to discover the relevance of the user, this is called demographic-based recommendation (demographic-based recommendation)

Discover the relevance of items or content based on the metadata of recommended items or content, known as Content-based recommendations (content-based recommendation)

Depending on the user's preference for items or information, the relevance of the item or content itself, or the discovery of the user's relevance, is referred to as the recommendation based on collaborative filtering (collaborativefiltering-based recommendation).


3, according to the establishment of the recommended model mode

It can be imagined that in a large number of items and users of the system, the recommended engine calculation is considerable, to achieve real-time recommendations must be set up a recommendation model, about the proposed model can be divided into the following types of establishment:


based on the item and the user itself , this recommendation engine treats each user and each item as a separate entity, predicting how much each user likes each item, which is often described by a two-dimensional matrix. Because the user is interested in items far less than the total number of items, such a model leads to a large number of data vacancy, that is, we get a two-dimensional matrix is often a very large sparse matrix. At the same time, in order to reduce the amount of computing, we can cluster items and users, and then record and calculate a class of users of a class of preference, but such a model will be in the recommended accuracy loss.


recommendation based on association rules (rule-basedrecommendation): The mining of Association rules is a classic problem in data mining, which is mainly to excavate some data dependencies, the typical scene is "shopping basket problem", and through the mining of association Rules, We can find which items are often purchased at the same time, or what other items are usually purchased after the user has purchased some items, and we can recommend them based on these rules when we dig out these association rules.


model-based recommendation (model-basedrecommendation): This is a typical machine learning problem, you can use existing user preferences as a training sample, training a model to predict user preferences, so that users in the future to enter the system, Recommendations can be calculated based on this model. The problem with this approach is how to feed the user's real-time or recent preferences to a well-trained model to improve the recommended accuracy.

In fact, in the present recommendation system, very few use only a recommendation engine, generally in different scenarios using different recommendation strategy to achieve the best recommendations, such as Amazon's recommendation, it will be based on the user's own history of the purchase of data recommendations, and based on the user's current view of the item recommendations, and popular items based on popular preferences are recommended to users in different regions, allowing users to find the items they are really interested in from a full range of recommendations.


Algorithm principle
1. Recommendations based on demographic statistics

The recommendation mechanism based on demography (demographic-based recommendation) is one of the easiest to implement, it simply finds the relevance of the user based on the basic information of the user of the system, and then recommends other items similar to the user's favorite to the current user. The working principle of this recommendation is given.

Working principle diagram of recommendation mechanism based on demographic statistics

It can be clearly seen that, first of all, the system has a user profile modeling for each user, including basic user information, such as the user's age, gender, etc., then, the system will calculate the user's similarity according to the user profile, you can see User A's profile and user C, Then the system will think that users a and C are similar users, in the recommendation engine, you can call them "neighbors", finally, based on the "neighbor" user group preferences recommended to the current user a number of items, the figure is a favorite item A is recommended to user C.

The benefits of this demographic-based referral mechanism are:

1. There is no "cold start" problem for new users because they do not use the current user's preferences for historical data.

2. This method does not depend on the data of the item itself, so this method can be used in the field of different items, it is domain independent (domain-independent).

So what are the drawbacks and problems of this approach? This basic user-based information on the classification of users is too rough, especially in the areas of high taste requirements, than books, movies and music and other fields, can not be very good recommendations. Perhaps in some e-commerce sites, this method can give some simple recommendations. Another limitation is that this approach may involve sensitive information that is not relevant to the information discovery problem itself, such as the age of the user, and the user information is not well acquired.


2. Content-based recommendations

Content-based recommendation is the most widely used recommendation mechanism at the beginning of the recommendation engine, and its core idea is to discover the relevance of items or content based on the metadata of the recommended items or content, and then recommend to the user similar items according to the user's previous preferences. The basic principle of content-based recommendation is given.

Fundamentals of Content-based recommendation mechanism

Given a typical example based on content recommendation, the film recommendation system, first we need to have a model of the movie metadata, here simply describe the type of movie, and then through the movie metadata to find the similarity between movies, because the type is "love, romance" movies A and C is considered similar to the film (of course, only according to the type is not enough, to get a better recommendation, we can also consider the film director, actors, etc.); Finally, the recommendation is that for user A, he likes to watch movie A, then the system can recommend a similar movie C.

The benefit of this content-based recommendation mechanism is that it can model the user's tastes well and provide more accurate recommendations. But it also has the following problems:

1. It is necessary to analyze and model items, and the recommended quality depends on the completeness and comprehensiveness of the item model. In today's application we can observe that the keywords and tags (tag) are considered as a simple and effective way to describe the item metadata.

2. The analysis of the similarity of items depends only on the characteristics of the item itself, and there is no consideration of the attitude of the person to the item.

3. There is a "cold start" issue for new users because they need to make recommendations based on the history of their previous preferences.

Although this method has a lot of shortcomings and problems, but he is still successful application in some movies, music, books, social sites, some sites also ask professional personnel to encode items, such as Pandora, said in a report, in Pandora's recommendation engine, each song has more than 100 metadata characteristics, Including the style of the song, year, singers and so on.


3, based on the recommendation of collaborative filtering

With the development of Web2.0, the WEB site advocates user participation and user contribution, so the recommendation mechanism based on collaborative filtering is born. Its rationale is simple, based on the user's preference for items or information, to find the relevance of the item or content itself, or to discover the relevance of the user, and then based on these related to the recommendation. Recommendations based on collaborative filtering can be divided into three sub-categories: User-based recommendations (User-basedrecommendation), project-based recommendations (item-based recommendation), and model-based recommendations (model-based Recommendation). Below we are a detailed introduction of the three kinds of collaborative filtering recommendation mechanism.


4. User-based collaborative filtering recommendations

The basic principle of user-based collaborative filtering recommendation is that, based on the preference of all users for goods or information, the "neighbor" user group which is similar to the current user's tastes and preferences is found, and the algorithm of "K-neighbor" is used in general application. Then, based on the history preference information of the K-neighbor, the current user is recommended The schematic diagram is given.

The basic principle of user-based collaborative filtering recommendation mechanism

To show the basic principle of user-based collaborative filtering recommendation mechanism, suppose user A likes item A, item C, User B likes item B, user C likes item A, item C and item D; From these users ' historical preferences, we can find that the tastes and preferences of user A and user C are more similar, When user C also likes item D, then we can infer that user A may also like item D, so you can recommend item D to User A. The user-based collaborative filtering recommendation mechanism and the demographic-based recommendation mechanism are calculated for the user's similarity, and are based on the "neighbor" user base calculation recommendations, but they are different how to calculate the user's similarity, based on the demographic mechanism only consider the user's own characteristics, The user-based collaborative filtering mechanism, however, calculates the user's similarity on the data of the user's historical preference, the basic assumption being that the user who likes the similar item may have the same or similar tastes and preferences.


5. Project-based collaborative filtering recommendations

The rationale for project-based collaborative filtering recommendations is similar, except that it uses all users ' preferences for items or information, discovers similarities between items and items, and then recommends similar items to users based on their historical preferences, which is a good illustration of its rationale.

Suppose user A likes goods A and item C, user B likes items A, item B and item C, User C likes item A, from these user's historical preferences can analyze items A and item C compared to similar, like item a people all like item C, based on this data can infer user C is very may also like item C, so the system will recommend the item C to User C. Similar to the above, collaborative filtering recommendations based on projects and content-based recommendations are all based on item similarity prediction, but the similarity calculation method is not the same, the former is inferred from the user's historical preferences, and the latter is based on the property characteristics of the item itself information.

The basic principle of collaborative filtering recommendation mechanism based on project

At the same time, how should we choose between user-based and project-based two strategies? In fact, project-based collaborative filtering recommendation mechanism is a strategy for Amazon to improve on the user-based mechanism, because in most Web sites, the number of items is much smaller than the number of users, and the number of items and similarity is relatively stable, and the project-based mechanism is better than the user-based real-time. But not all of the scenarios are the case, you can imagine that in some news recommendation system, perhaps the number of items, that is, news may be greater than the number of users, and the news update degree is also very fast, so its shape is still unstable. So, in fact, it can be seen that the choice of recommendation strategy is actually very much related to the specific application scenario.


6. Model-based collaborative filtering recommendations

Model-based collaborative filtering recommendation is a sample-based user preferences information, training a recommendation model, and then based on real-time user preferences of the information to predict, calculate recommendations.

The recommendation mechanism based on collaborative filtering is the most widely used recommendation mechanism today, and it has several notable advantages:

1. It does not require strict modeling of items or users, and does not require the description of the item to be machine understandable, so this method is also field-independent.

2. This method calculates the recommendation is open, can share the experience of others, very good support users to identify potential interest preferences

And it also has the following problems:

1. The core of the approach is based on historical data, so there is a "cold start" problem with new items and new users.

2. The recommended effect depends on the amount and accuracy of the user's historical preference data.

3. In most implementations, user history preferences are stored using sparse matrices, and there are some obvious problems with the computation of sparse matrices, including the possibility that a few people's error preferences will have a great impact on the recommended accuracy.

4. For some special tastes of the user can not give a good recommendation.

5. Due to historical data, it is difficult to modify or evolve according to user's preferences after crawling and modeling the user's preference, resulting in a lack of flexibility in this approach.


7. Hybrid recommendation mechanism

The recommendations on the current Web site are often not purely based on a single recommended mechanism and strategy, they tend to mix multiple methods together to achieve better recommendations. about how to combine each recommendation mechanism, here are some of the more popular combination methods.

1. Weighted mixing (weightedhybridization): With a linear formula (linear formula) will be several different recommendations in accordance with a certain weight combination, the specific weight of the value of the test data set needs to be repeated experiments, so as to achieve the best recommendations.

2. Switching Mix (switchinghybridization): As mentioned above, in fact, for different situations (data volume, system health, the number of users and items, etc.), the recommendation strategy may be very different, then the mixed way of switching, is to allow the selection of the most appropriate recommendation mechanism to calculate recommendations under different circumstances.

3. Partition Mix (mixedhybridization): Use a variety of referral mechanisms, and the different recommendations are divided into different areas to display to the user. In fact, Amazon, Dangdang and many other e-commerce sites are used in this way, users can get very comprehensive recommendations, but also easier to find what they want.

4. Layered hybrid (meta-levelhybridization): Adopt a variety of recommendation mechanisms, and the results of one recommendation mechanism as another input, so as to synthesize the pros and cons of each recommendation mechanism, get more accurate recommendations.


Concrete Models1 Building a matrix

The user's score is divided into explicit and implicit scoring, this time the data is only an implicit scoring, browse, collect, add shopping cart and purchase.

Table 1. User Behavior Definition Table

Behavior Name

Behavior description

Browse Times

Value for browse Click Number

Collection

Value is (0,1), 1 is a favorite

Add to Shopping Cart

Value is (0,1), 1 for Join

Buy

Value is (0,1), 1 for purchase

Processing of data to get structured data

Table 2. Access Behavior structured data table

Serial number

User ID

Product Identification

Browse Times

Collection

Add Shopping Cart

Buy

1

User1

Item1

1

1

1

0

2

User1

Item2

5

0

1

1

3

User2

Item1

1

1

0

1

N

UserN

Item1

0

0

0

1


Suppose that m represents the number of users, n represents the number of goods, the actual rating of the commodity J is represented by the account, 1≤i≤m,1≤j≤n; the rules for converting user behavior to implicit scoring are as follows:

1) If user I purchased a commodity j, then = 5;

2) If user I adds the item J to the shopping cart, = 4;

3) If user I adds the item J to the Favorites folder, then = 3;

4) If the user I to the commodity J the number of visits is more than 2 times, then = 2; If the number of clicks is 1 times, then = 1;

The grading rules can be adjusted by the accuracy of the recommended results;

Usually the user for a product will do multiple operations at the same time, for example, a user first click on the product, add people's favorites, and then put into the shopping cart, and finally buy, then take the highest scoring value.

Then you can create a user-product scoring matrix



2 Time Dimension
2.1 H.ebbinghaus Forgetting Curve

The user's interest is dynamic change, the user recently visits and scores the product to be able to reflect the user current interest hobby more, can affect the user current purchase decision. Early-access products are less likely to affect products that users may now be interested in, meaning that the user's access behavior and the importance of scoring continue to fade over time. Consumer behavior can be regarded as a kind of psychological behavior, follow the law of forgetting curve.

Figure 1 Forgetting Curve


Represents the start time, which represents the time that the item was scored. The service's scoring time refers to the user's comprehensive scoring time for the product, that is, the time the behavior occurred. T represents the interval between the user's rating time and the valid start time for a product: t=-.

The exponential function formula (1), which represents the change in user interest over time, is as follows:

(1)

, the weight λ belongs to (0,1) and can be adjusted dynamically according to the accuracy of the recommended results. The larger the λ, the more quickly the interest decays over time, and the slower the inverse.

2.2 Recommended Process 2.2.1 User-based collaborative filtering

Step 1: Use the improved Pearson correlation coefficient formula to calculate the similarity between the two users, the formula is as follows:

(2)

In the formula: YAJ,YBJ represents user A and User B's ratings for commodity J, and the IAB represents the project collection that user A and User B have scored together, and F (t) is a forgotten function that represents the average score for user A's scored collection of items, representing the average score of the set of items that User B has scored.

Step 2: Set the top K user with the highest similarity to user A as its nearest neighbor collection U.

Step 3: Synthesize the neighbor user's evaluation of commodity J and predict the score of user A on commodity J. Assuming that C represents the neighbor user, PS (A,J) represents the target user's forecast score, the formula for the forecast score is as follows:

(3)

Step 4: The top N products that predict the highest score are the recommended items for the system.


2.2.2 Product-based collaborative filtering

Step 1: Calculate the similarity between the two items using the improved Pearson correlation coefficient formula, which can be added to the project category attributes, as follows:

(4)

In the formula:, respectively, the product A and commodity B to the commodity J rating, indicating that commodity a and commodity B have a common rating of the user set, as a forgotten function, respectively, the average score of commodity a and commodity B. Represents the category similarity of commodity A and commodity B, where the value is 0 or 1. Is the balance parameter, can take 0.5 first.

Step 2: Take the top K with the highest similarity to commodity a as its nearest neighbor set I.

Step 3: The target user is not rated products according to the formula (4) for the prediction score, from large to small sort, take n values corresponding to the project recommendation.


2.2.3 Mixed weighting

Based on the user's recommendations and product-based recommendations, and then consider the geographical location of factors, comprehensive weighting.


As for the spatial geographical location is directly clustered and then weighted in.

——————————————————————————————————————————————————————————————

Resources:

[1] American Mission recommendation algorithm practice http://tech.meituan.com/mt-recommend-practice.html

[2] very good collaborative filtering primer article http://www.cnblogs.com/wentingtu/archive/2011/12/16/2289926.html

[3] Mrzhang, Yang Yi, Rangwi. E-Business recommendation system based on change of customer behavior and interest [J]. Journal of Baoji University of Arts and Sciences: Natural Science Edition, 2012, 32 (2): 52-56. doi:10.3969/j.issn.1007-1261.2012.02.011.

[4] Wesseuine, Yip Ning, Yang Xu Bing. Collaborative filtering algorithm combining project category and dynamic time weighting [J]. Computer engineering, 2014, 40 (6): 206-210. doi:10.3969/j.issn.1000-3428.2014.06.044.

[5] Zhu Yansong, Kanguichen. A collaborative filtering model for time-dependent allocation of weights and values of integrated projects [J]. Computer Engineering and Science, 2014, 36 (11). doi:10.3969/j.issn.1007-130x.2014.11.030.

A summary of the 2015 Ali Mobile Recommendation Algorithm Contest (II.)--Recommendation algorithm

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.