Password in the Recommendation System

Source: Internet
Author: User

From: http://www.ibm.com/developerworks/cn/web/1103_zhaoct_recommstudy1/index.html

Recommendation Engine

Now we have entered the era of data explosion. With the development of Web 2.0, Web has become a data sharing platform, it is increasingly difficult for people to find the information they need in massive data volumes.

In this case, search engines (Google, Bing, Baidu, and so on) have become the best way to quickly find target information. When you have specific requirements for yourself, you can use a search engine to quickly search for the information you need using keywords. HoweverSearch EngineIt does not fully meet users' requirements for information discovery, because in many cases, users do not know their own needs or their needs.It is difficult to express it with simple keywords.. Or they needBetter suit their personal taste and good resultsAs a result, a recommendation system appears, which corresponds to a search engine and is also called a recommendation engine.

Receng is already in e-commerce (such as Amazon and Dangdang) and social websites based on social (including music, movies, and book sharing, such as Douban and mtime) they all achieved great success. This further demonstrates that in the Web2.0 environment, users need to be more intelligent and better aware of their needs, tastes, and information discovery mechanisms in the face of massive data.

The receng uses special information filtering technologies to recommend different items or content to users who may be interested in them.

Figure 1 how the recommendation system works

The recommendation system has three inputs: item information, data about data, metadata that describes item information, user information, gender, and age, and users' preferences on items, it consists of the user's rating of items, user browsing behavior, feedback and evaluation.

Classification of Recommendation Systems

1. Does the recommendation system Recommend different data for different users, which can be divided into a recommendation engine based on public behavior (providing the same recommendation for each user) and a personalized recommendation engine. Fundamentally speaking, only a personalized recommendation engine is a more intelligent information discovery process.

2. Data source-based classification

It is called demographic-based recommendation.
Discovering the relevance of an item based on the metadata of the recommended item is called Content-based recommendation.
Based on your preferences, you can find the relevance of an item or content, or discover the relevance of a user. This is called collaborative filtering-based recommendation.

3. Process classification based on the Recommendation Model

Based on items and users, this recommendation engine treats each user and each item as an independent entity and predicts the user's preferences for each item, this information is often described using a two-dimensional matrix. Because the number of items that users are interested in is much smaller than the total number of items, such a model leads to a large amount of data being vacant, that is, the two-dimensional matrix we get is usually a large sparse matrix. At the same time, in order to reduce the amount of computing, We can cluster items and users, and then record and calculate the preferences of a type of users for a type of items, however, such a model may cause loss in the accuracy of recommendation.

  • Recommendation Based on association rules. Association rule mining is already a classic issue in Data Mining. It mainly involves data dependency mining. A typical scenario is the shopping basket issue. Through Association Rule Mining, we can find out which items are often purchased at the same time, or what other items are usually purchased after users buy some items. After we mine these association rules, we can recommend these Rules to users.
    • Model-based recommendation: this is a typical machine learning problem. You can use existing user preferences as training samples to train a model that predicts user preferences, in this way, users can enter the system to calculate recommendations based on this model. The problem with this method is how to feed back the user's real-time or recent preference information to the trained model to improve the recommendation accuracy.

In-depth recommendation Mechanism

Demographic-based recommendation

First, the system creates a profile for each user, including the user's basic information, such as the user's age and gender. Then, the system calculates the user similarity based on the user profile. We can see that the profile of user a is the same as that of user C. Then, the system considers that user a and user C are similar users. In the recommendation engine, they can be called "neighbors". Finally, some items are recommended to the current user based on the preferences of the "neighbors" user group. In the figure, item A liked by user a is recommended to user C.

The benefits of this demographic-based recommendation mechanism are:

  1. 1. Because the current user's preferences for items are not used, there is no "Cold Start" problem for new users.
  2. 2. This method does not depend on the item data, so it can be used in different item fields. It is domain-independent ).

So what are the shortcomings and problems of this method?This method of classifying users based on basic user information is too rough, especially for fields with high taste requirements., Such as books, movies, and music, cannot get good recommendation results. This method may provide some simple recommendations on some e-commerce websites. Another limitation is that this method may involve sensitive information that is irrelevant to the information discovery problem, such as the user's age. The user information is not well obtained.

Content-based recommendation

This section provides a typical example of content-based recommendation. For a movie recommendation system, we first need to model the metadata of the movie. Here we only briefly describe the type of the movie; then, the similarity between movies is discovered through the metadata of the movie, because the types are "Love, romantic" movies A and C are considered to be similar movies (of course, only the type is not enough, for better recommendations, we can also consider movie directors, actors, and so on.) Finally, we recommend that user a like movie, then the system can recommend a similar movie C to him.

The benefit of this content-based recommendation mechanism is that it can well model user tastes and provide more accurate recommendations. However, it also has the following problems:

  1. 1. You need to analyze and model items. The recommendation quality depends on the completeness and comprehensiveness of the item model. In our current application, we can see that keywords and tags are considered a simple and effective method to describe item metadata.
  2. 2. The analysis of item similarity only depends on the characteristics of the item. The attitude of the person to the item is not considered here.
  3. 3. Because we need to make recommendations based on the user's past preferences, there is a "Cold Start" problem for new users.

Although this method has many shortcomings and problems, it is still successfully applied in someMovies, music, booksSome websites also require professional personnel to perform genetic code for items, such as Pandora. In a report, in Pandora's recommendation engine, each song has over 100 metadata features, including the style, year, and singer of the song.

Collaborative Filtering-based recommendation

With the development of WebPromote user participation and user contributionTherefore, the collaborative filtering-based recommendation mechanism is developed. The principle is very simple, that is, discovering the relevance of the item or content, or discovering the relevance of the user based on the user's preference for the item or information, and then making recommendations based on the relevance. Collaborative Filtering-based recommendation can be divided into three sub-categories: user-based recommendation and project-based recommendation) and model-based recommendation ).

User-based collaborative filtering and recommendation

Assume that user a prefers item A, item C, and item B, and user C prefers item A, item C, and item D. From the historical preferences of these users, we can find that user a and user C have similar tastes and preferences, and user C also prefers item d, so we can infer that user a may also like item D, therefore, item D can be recommended to user.

User-based collaborative filtering recommendation mechanism and demographic-based recommendation mechanism both calculate user similarity and calculate recommendations based on the "Neighbor" user group, however, what they differ from is how to calculate user similarity,The demographic-based mechanism only takes into account the characteristics of the user. The user-based collaborative filtering mechanism calculates the user similarity on the user's historical preference data.The basic assumption is that users who like similar items may have the same or similar tastes and preferences.

Project-based collaborative filtering and recommendation

The basic principle of project-based collaborative filtering and recommendation is similar. It only means that it uses all users' preferences on items or information to discover similarity between items, then, similar items are recommended to the user based on the user's historical preferences..

Assume that user a prefers item A and item C, user B prefers item A, item B, and item C, and user C prefers item, from the historical preferences of these users, we can analyze the similarities between item A and item C. People who like item A like item C, based on this data, we can infer that user C may also like item C, so the system will recommend item C to user C.

At the same time, how should we choose collaborative filtering based on user and project? In fact, the project-based collaborative filtering and recommendation mechanism is an improved method of Amazon's user-based mechanism, because in most web sites, the number of items is far smaller than the number of users, and the number and similarity of items are relatively stable. At the same time, the project-based mechanism is better than the user-based Real-time performance. However, this is not the case in all scenarios. In some news recommendation systems, the number of items, that is, news, may be larger than the number of users, in addition, news are updated quickly, so its shape is still unstable. Therefore, we can see that the selection of Recommendation policies is closely related to specific application scenarios.

Model-based collaborative filtering recommendation

Model-based collaborative filtering recommendation is to train a recommendation model based on sample user preferences, and then predict and calculate recommendations based on real-time user preferences.

In summary, the recommendation mechanism based on collaborative filtering is the most widely used recommendation mechanism today. It has the following significant advantages:

  1. 1. It does not need to strictly model items or users, and does not require item descriptions to be understandable by machines. Therefore, this method is also irrelevant to the field.
  2. 2. The recommendation calculated in this method is open and can share the experience of others, which is good for users to discover potential interests and preferences.

It also has the following problems:

  1. 1. The core of the method is based on historical data, so there is a "Cold Start" problem for new items and new users.
  2. 2. The recommendation effect depends on the quantity and accuracy of the user's historical preference data.
  3. 3. in most implementations, the user's historical preferences are stored using a sparse matrix, while the computing on the sparse matrix has some obvious problems, including the possibility that a small number of people may have a great impact on the accuracy of recommendations.
  4. 4. users with special tastes cannot give good recommendations.
  5. 5. Based on historical data, it is difficult to modify or change user preferences after capturing and modeling user preferences. As a result, this method is not flexible enough.

Hybrid recommendation Mechanism

The recommendation on the current web site often does not simply adopt a recommendation mechanism and strategy. They often combine multiple methods, to achieve better recommendation results. Here are several popular combination methods for how to combine various recommendation mechanisms.

  1. 1. weighted hybridization: uses a linear formula (linear formula) to combine several different recommendations based on a certain weight. The specific weight value needs to be tested repeatedly on the test dataset, to achieve the best recommendation results.
  2. 2. switching Hybridization ), recommendation policies may vary greatly, so the Mixed Mode of switching is to allow the most suitable recommendation mechanism to calculate recommendations under different circumstances.
  3. 3. Mixed hybridization: Multiple recommendation mechanisms are used to display different recommendation results to users in different regions. In fact, Amazon, Dangdang, and many other e-commerce websites use this method. Users can get comprehensive recommendations and find what they want.
  4. 4. meta-level hybridization: uses multiple recommendation mechanisms and uses the results of one recommendation mechanism as another input to integrate the advantages and disadvantages of each recommendation mechanism, get more accurate recommendations.
Summary

In the age of network data explosion, how can users quickly find desired data and discover their potential interests and needs, it is vital for e-commerce and social network applications. The emergence of the recommendation engine has attracted more and more attention. But for most people, they may still wonder why they can always guess what they really want. The magic of the receng is that you don't know what the engine records and reasoning behind this receng.

Password in the Recommendation System

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.