Intermediary transaction SEO diagnosis Taobao guest Cloud host technology Hall
[Core Tip] Read Xiangliang "recommendation system practice" summed up the results: Although the algorithm can not solve all the problems, but the algorithm could become more humane. Network is a society, in fact, the algorithm and people have not been so clear-cut.
Recommendation system this thing in fact in our life everywhere, such as I buy dumplings in the morning, the boss often asked me if I would like to have a cup of soya-bean milk, this is a simple recommendation. With the development of the Internet, the line of this model moved to the line became the trend, it greatly expanded the application of the recommendation system: Amazon's product recommendations, Facebook friends recommended, Digg's article recommended, watercress, last.fm and Watercress FM music recommendations, Gmail ads ...... In a situation where Internet information is overloaded today, information consumers want to easily find what they are interested in, and information producers want to push their content to the most appropriate target audience. And the recommendation system is to act as the intermediary of both, stone solve these two problems.
Evaluation criteria for recommendation systems
First we need to be clear about what a good referral system is. Can be judged by the following criteria.
User satisfaction describes the user satisfaction with the recommended results, which is the most important indicator of the recommended system. Generally through the user questionnaire or monitoring user online behavior data obtained.
Predictive accuracy describes the ability of the recommended system to predict user behavior. In general, the proposed list and the coincidence rate of the user's behavior are calculated by the algorithm on the off-line dataset. The greater the coincidence rate, the higher the accuracy rate.
Coverage describes the recommendation system's ability to explore the long tail of goods. Generally, all recommended items are calculated as the proportion of total items and the probability distributions recommended for all items. The greater the proportion, the more uniform the probability distribution, the greater the coverage.
Diversity describes whether the recommended results in the recommendation system can cover different areas of interest for users. It is generally calculated by the similarity between items 22 in the referral list, and the more dissimilar the objects, the better the diversity.
Novelty if the user has not heard of most of the items in the recommended list, the recommendation system is of a better novelty. It can be obtained by the average popularity of the recommended results and by questionnaires to the users.
Surprise if the results of the recommendation and the user's historical interest is not similar, but let the user very satisfied, it can be said that this is a surprise for users to recommend. It can be measured qualitatively by the similarity of the user's historical interest and the user's satisfaction.
In short, a good recommendation system is based on the recommended accuracy, to all users recommended items as wide as possible (mining long tail), to the individual user recommended items as far as possible to cover multiple categories, while not to recommend too many popular items, the most awesome is to allow users to see the recommendation after a kind of "encounter" feeling.
Classification of recommendation Systems
Recommendation system is based on a large number of effective data, the algorithm behind the many kinds of ideas, to the general classification of the data can be processed from the start.
1. Using User behavior Data
The user behavior on the internet thousands of people, from simple web browsing to complex evaluation, the order ... This contains a lot of user feedback information, through the analysis of these behaviors, we can infer the user's interests and preferences. And the most basic is the "collaborative filtering algorithm."
"Collaborative filtering algorithms" are also divided into two types, based on user (USERCF) and based on objects (ITEMCF). The so-called based on the user, is the behavior of the goods according to the user to find out the interests of similar users, one of the user's favorite things to recommend to another user. For example, Lao Zhang likes to read books have a,b,c,d, old Wang likes to read books have a,b,c,e. Through these data we can judge Old Zhang and Lao Wang taste slightly similar, so give old Zhang recommend e this book, while give old Wang recommend D this book. Corresponding, based on the object is to find similar items first. How to find it? Also look at the user's preferences, if you like two items more people, you can think of these two items similar. Finally, as long as the user recommended and his original preference for similar items. For example, we find that people who like to see "from one to infinity" most like to see "What is Math", if you have just enjoyed reading "from one to infinity", we can immediately recommend "what is math."
As to when to use USERCF, when to use ITEMCF, this depends on the situation. Generally speaking, the USERCF is closer to the socialization recommendation, applies to the user to be few, the item is many, the timeliness is stronger the occasion, like Digg's article recommendation, but ITEMCF is closer to the personalized recommendation, applies to the user many, the item few occasions, for example watercress bean sauce guessing, watercress FM, At the same time Itemcf can also give a reliable reason for recommendation, such as Watercress's "like oo people like xx" and Amazon, "bought xx people also bought oo".
Collaborative filtering algorithms also have a number of shortcomings, the most obvious one is the interference of hot items. For example, collaborative filtering algorithms often lead to a high degree of similarity between the top items in two different fields, which is likely to recommend Harry Potter to a classmate who likes "introduction to Algorithms", which is obviously unscientific! To avoid this situation, you have to start with the content data of the object, The content filtering algorithm mentioned in the article is one of them.
In addition to the collaborative filtering algorithm, as well as the use of the semantic model (LFM) is also more, it is based on user behavior of the automatic clustering of items, so that items according to a number of dimensions, multiple granularity categories. Then, according to the user's favorite category of items to recommend. This method based on machine learning is better than collaborative filtering in many indexes, but its performance is not very effective, so it is generally possible to get the recommendation list through other algorithms and optimize by LFM.
2. Using User tag Data
We know that many sites in the processing of items in the article will be labeled by the user's own label to classify, such as bookmark delicious, blog Tags cloud, watercress book audio and video labels. The label itself is a user of the object of a cluster, as the basis of the recommendation system is still very effective.
On the recommendation of the label, one is based on the user to label the behavior of their recommended items, there is a user to label the items when they recommend the appropriate label.
The basic idea of a label recommendation is to find some of the tags that users use, and then find the hot items with those tags and recommend them to the user. Here to pay attention to two problems, one is to ensure that novelty and diversity, you can use the TF method to reduce the weight of hot items, and the other is the need to clear some of the synonyms duplicate tags and meaningless tags.
It is also very important to label the user when they label it, on the one hand, it is convenient for users to input tags, on the one hand, can improve the quality of labels, reduce redundancy. A typical application scenario is to use watercress to mark books and audio. The idea here is to combine the hottest tags on current items with the user's own most commonly used tags and recommend them to the user. In fact, the watercress is to do so, it in the user marking items, the user recommended to the label is divided into "my label" and "Common label" two categories, and in the "My label" also considered the factors of the item.
Based on the recommendation of the label has a lot of advantages, on the one hand can provide users with more accurate recommendation reasons; On the other hand, the form of the label Cloud also enhances the recommended diversity, giving users a certain choice. Labels can actually be viewed as the content of an item, such as the author, publisher, genre, music country, style, author, and so on, and the recommendation based on this information can make up for some of the weaknesses of this recommendation based on user behavior.
3. Use contextual information
The so-called context, refers to the user's time, place, mood and so on. These factors are also critical to the recommendation, such as the mood of the song, the seasonal commodity, and so on.
Here mainly take time as an example, in a lot of news and information types of websites, timeliness is very important point, you have to recommend a year ago the news to the user, is expected to be scolded dead. In this recommendation, you need to add the time attenuation factor, for the longer the item, the smaller the weight assigned. The same idea can also be used in recommendations based on user behavior, where there are many things to optimize. For ITEMCF, the same user can give a higher degree of similarity to different items that they like in a short time interval. And in the search for similar items can also focus on the user's recent favorite items; for USERCF, if two users like the same items at the same time, you can give these two users a higher degree of similarity , while recommending items, you can also focus on recommending tastes similar to the recent items that users like. We can give the similarity and the user's behavior to give a certain weight, the longer the time interval weight is lower, after this improved "collaborative filtering algorithm" can often get users more satisfied with the results.
Similarly, in LBS becomes the application standard today, may according to the item and the user's distance assigns the corresponding weight, then synthesizes other factors to obtain the reliable place to recommend.
4. Using Social networking data
Now the social network, headed by Facebook,twitter, is a huge treasure trove of data. Experiments have shown that, because of the role of trust, the recommendations from friends often get a higher click rate, in view of this, Amazon has used Facebook's information to users to recommend friends like products. This kind of recommendation is similar to USERCF, just looking for the relationship between users, in addition to the similarity of interest, but also consider familiarity (such as the number of common friends), so that your girlfriends and the base friends favorite items will likely be recommended to you.
There are also many recommended algorithms within the social network. One of the most important of these is a friend recommendation, which can be based on a number of data: demographics (such as everyone's looking for a classmate), common interests (such as the information forwarded in Twitter), friends (the number of common friends, N-degree contacts). There is also the information flow (Timeline) recommendation, which is represented by Facebook's EdgeRank, and the idea is that if a conversation (Feed) is recently produced by your familiar friend, it will have a higher weight in the sort of information flow. In addition, accurate advertising based on social networking interest maps and social maps is also a key application of the referral system, which determines the ability of social networking sites to become realizable.
Cold start problem of recommendation system
Introduced so many kinds of recommendation system, finally said a recommendation system of a major problem: cold start problem. Specific three kinds of situations: how to give new users personalized recommendations, how to recommend new items to users, the new site in the case of sparse data how to do personalized recommendations.
There are also corresponding solutions. For new users, the first can be based on their registration information for coarse-grained recommendations, such as age, sex, hobbies and so on. It is also possible to provide some content to the new user after registering, so that they can feed their interest in the content, and then make recommendations based on the data. These content needs to meet both hot and diverse requirements. The recommendation for new items may be based on their content data. We can use semantic analysis to extract keywords and give weights, this content feature similar to a vector, through the cosine of the vector between the similarity between the objects can be found, so as to recommend. This kind of content filtering algorithm obtains a large number of applications in the service of updating the goods (content) quickly, such as personalized recommendation of news information class.
In the case of a website that has not enough data, it may be necessary to establish an early recommendation system by artificial force. Simple, manual editing popular list, advanced point, Manual classification annotation. Foreign personalized music Radio Pandora has hired a group of computer-literate musicians to label a large number of music, called the music gene. With these initial data, you can easily recommend. At the beginning of the domestic jing.fm is also through the physical information of music, emotional information, social information for the artificial classification, and then through the machine learning and recommended algorithm constantly improve, creating a different personalized radio.
In addition to this, it is a good idea to use a lot of data from social networking platforms, especially those relying on other SNS account systems.
Algorithm vs person
There are a lot of people who doubt that the recommendation system will make a person's attention more and more limited, but after reading these you will feel that is not the case, diversity, novelty and surprise is also a review of the recommendation system elements. And as far as the algorithm is concerned, I agree with Tang Cha founder Li Ruyi:
In the technical community discussion, it's a good idea to make the recommendation algorithm smarter and more "intelligent" by default. But people can't be that lazy. Do you have to give it to the machine to "find out what you might be interested in"? Don't think I'm luddite. True techno-followers always put people first.
I want to add that although the algorithm does not solve all the problems, the algorithm can become more humane. To apply someone's "network is a society" argument, in fact, the algorithm and people are not so clear-cut.
Head Chart Source: Dribbble
Unless the special statement, the Geek observation is the Geek Park original report, reproduced please indicate the author and the original link.
Original address: http://www.geekpark.net/read/view/190041