With the development of the Internet, it is estimated that most of the products will meet the recommendation mechanism of planning, as the Internet product personnel also need to study the recommendation mechanism of the core algorithm, this article is I saw the concise and the basis of some of the recommended mechanism of the article, turn to share to everyone
Information Discovery
Now that it has entered an era of data explosion, as Web 2.0 has grown and the web has become a platform for data sharing, it becomes increasingly difficult for people to find the information they need in massive amounts of data.
In such cases, search engines (google,bing, Baidu, etc.) are the best way to find information quickly. When users have a relatively clear demand for their own needs, the search engine is very convenient to search through the keyword quickly find the information they need. But search engines do not fully meet the user's need for information discovery, because in many cases, users are not really clear about their needs, or their needs are difficult to use simple keywords to express. Or they need to be more in line with their personal tastes and preferences of the results, so the recommendation system, and search engine corresponding, we also used to call it the recommendation engine.
With the advent of recommendation engines, the way users get information from simple targeted data searches to higher-level information that is more consistent with people's habits.
Nowadays, with the development of recommendation technology, the recommendation engine has achieved great success in e-commerce (commerce, such as Amazon, Dangdang) and some social social sites (including music, film and book sharing, such as watercress, mtime, etc.). This further illustrates that, in the face of massive data in Web2.0 environments, users need this more intelligent information discovery mechanism to understand their needs, tastes and preferences.
recommendation Engine
This chapter describes the importance of the recommendation engine for the current Web2.0 site, and we'll talk about how the recommendation engine works. The recommendation engine uses special information filtering techniques to recommend different items or content to users who may be interested in them.
Classification of
recommendation engines
Figure 1 shows the working diagram of the recommendation engine, where the recommendation engine is considered a black box, and the input it accepts is the recommended data source, and in general, the data sources required for the recommendation engine include:
To recommend metadata for items or content, for example, keywords, gene description, and so on, the basic information of the system user, such as sex, age and other user's preference to goods or information, depending on the application itself, may include the user to the item rating, the user view the item record, the user's purchase record and so on. In fact, the preferences of these users can be divided into two categories: explicit user feedback: This is the user's natural browsing on the site or using the site, explicit feedback information, such as the user on the item rating, or the article comments. Implicit user feedback: This is the user in the use of the site is generated data, implicitly reflects the user's preferences, such as the user bought an item, the user looked at the information of an item and so on.
Explicit user feedback can accurately reflect the user's real preferences for items, but users need to pay an additional price, and implicit user behavior, through some analysis and processing, but also reflects the user's preferences, but the data is not very accurate, some behavior analysis there is a larger noise. But simply by choosing the right behavioral characteristics, implicit user feedback can also be very effective, but the choice of behavior characteristics may be very different in different applications, such as E-commerce site, the purchase behavior is a good performance of user preferences implicit feedback.
The recommendation engine may use a part of the data source according to the different recommendation mechanism, then according to these data, analyze certain rules or directly to the user to the other item preferences to predict the calculation. The recommendation engine can then recommend items that he might be interested in when the user enters.
recommendation Engine
The classification of the recommendation engine can be based on a number of indicators, we will introduce the following:
recommendation engine does not recommend different data for different users according to this indicator, the recommendation engine can be divided into a recommendation engine based on popular behavior and a personalized recommendation engine, giving the same recommendation to each user, which can be statically set by the system administrator. Or based on the feedback statistics of all users of the system to calculate the current more popular items. Personalized recommendation engine, for different users, according to their tastes and preferences to give more accurate recommendations, at this time, the system needs to understand the need to recommend content and user characteristics, or based on social networks, by finding the same preferences with the current user to achieve the recommendation.
This is the most basic recommendation engine classification, in fact, most people discuss the recommendation engine is the personalized recommendation engine, because fundamentally, only personalized recommendation engine is more intelligent information discovery process.
According to the data source of the recommendation engine, this is actually about discovering the relevance of the data, since most of the recommendation engines work based on the similarity set of items or users. So refer to the recommendation system schematic diagram given in Fig. 1, the method of discovering data correlation according to different data sources can be divided into the following kinds: According to the basic information of the system user discovers the correlation degree of the user, this is called the recommendation based on demography (demographic-based Recommendation) based on the metadata of the recommended items or content, discovering the relevance of the item or content, this is called the Content-based Recommendation (content-based recommendation) based on the user's preference for goods or information, Discovering the relevance of an item or content itself, or discovering a user's relevance, is known as a recommendation based on collaborative filtering (collaborative filtering-based recommendation). According to the recommendation model of the establishment of a large number of goods and users of the system, the recommended engine calculation is considerable, to achieve real-time recommendations must establish a recommendation model, the recommended model can be established in the following ways: based on the goods and the user itself, This recommendation engine treats each user and every item as a separate entity, predicting how much each user likes each item, often described in a two-dimensional matrix. Because the user is interested in goods far less than the number of total items, such models lead to a large number of data vacancy, that is, we get the two-dimensional matrix is often a large sparse matrix. At the same time, in order to reduce the amount of computing, we can cluster the objects and users, and then record and calculate a class of user preferences for a class of items, but such a model will be in the recommended accuracy of loss. Recommendation based on association Rules (rule-based Recommendation): Mining Association Rules is a classic problem in data mining, mainly mining some data dependencies, the typical scene is "shopping basket problem", through the mining of association Rules, We can find out which items are often purchased at the same time, or what other items are usually purchased after the user buys some items, and when we dig out these association rules, we can recommend them to the user based on these rules. Model-based recommendation (model-based recommendation): This is a typical machine learning problem, you can use the existing user preferences as a training sample, training a model to predict the user's preferences, so that users can enter the system, based on this model to calculate recommendations. The problem with this approach is how to feed the user's real-time or recent preferences to the trained model, thereby improving the recommended accuracy.
In fact, in the present recommendation system, there are very few recommendation engines that use only one recommendation strategy, typically using different recommendation strategies in different scenarios to achieve the best recommendations, such as Amazon's recommendation, which will be based on the user's own history of purchasing data and recommendations based on the user's current browsing of items, and popular items based on popular preferences are recommended to users in different areas, allowing users to find the items they really are interested in from a full range of recommendations.
The length of this chapter will detail the working principles of each recommendation mechanism, their pros and cons, and their application scenarios.
Recommendations based on demography
The recommended mechanism based on demography (demographic-based recommendation) is one of the easiest to implement, it simply discovers the user's relevance based on the basic information of the system user, and then recommends other items that are similar to the user's favorite to the current user, Figure 2 The working principle of this recommendation is given.
Figure 2. Working principle of recommendation mechanism based on demography
As you can see clearly in the diagram, first of all, the system has a user profile model for each user, including the user's basic information, such as the user's age, gender, etc. then, the system will be based on user profile to calculate the user's similarity, you can see User a profiles and user C , the system will assume that users a and C are similar users, which in the recommendation engine can be called "Neighbors" and, finally, recommend to the current user some items based on the preferences of the "neighbor" user group, which is recommended to user C for user A's favorite item A.
The benefits of this demographic-based referral mechanism are:
because it does not use the current user's preference for items, there is no "cold start" problem for new users. This method does not depend on the data of the object itself, so this method can be used in the field of different objects, it is domain independent (domain-independent).
So what are the drawbacks and problems of this approach? This basic user based information on the classification of users is too rough, especially for the areas of high taste requirements, such as books, movies and music and other fields, can not be a good recommendation effect. Perhaps in some E-commerce sites, this method can give some simple recommendations. Another limitation is that this approach may involve information that is not relevant to the information discovery problem itself but rather sensitive, such as the age of the user, which is not well acquired.
Content-based recommendations
Content-based recommendation is the most widely used recommendation mechanism at the beginning of the recommendation engine, and its core idea is to discover the relevance of items or content based on the metadata of the recommended items or content, and then to recommend similar items to users based on their previous preferences. Figure 3 shows the basics of content-based recommendation.
Figure 3. Basic principles based on content recommendation mechanism
A typical example of a content-based recommendation is given in Figure 3. Film recommendation system, first we need to have a model of the metadata of the film, here only a simple description of the type of film, and then through the film's metadata to find the similarity between movies, because the type is "love, romance" movies A and C is considered a similar movie (of course, only according to the type is not enough, to get better recommendations, we can also consider the film director, actor, etc.); Finally, for user A, he likes to watch movie A, then the system can recommend a similar movie C.
The advantage of this content-based recommendation mechanism is that it can model the user's tastes and provide more accurate recommendations. But it also has the following problems:
needs to analyze and model items, and the recommended quality depends on the integrity and completeness of the product model. In the present application we can observe that the keywords and tags (tag) are considered as a simple and effective method to describe the item metadata. The analysis of object similarity depends only on the characteristics of the object itself, and there is no consideration of the person's attitude to the object. Because of the need to make recommendations based on the user's past preferences, there is a "cold start" problem for new users.
Although this method has many deficiencies and problems, but he is still successful in some movies, music, books, social sites, and some sites have also asked professional personnel to carry out genetic code, such as Pandora, in a report said, in Pandora's recommendation engine, each song has more than 100 metadata characteristics, Including the style of the song, the year, the singer and so on.
Recommendation based on collaborative filtering
With the development of Web2.0, the Web site promotes user participation and user contribution, so the recommendation mechanism based on collaborative filtering is born. The principle is simple, based on the user's preference for goods or information, discovering the relevance of the item or content itself, or discovering the relevance of the user, and then recommending it based on these correlations. Recommendations based on collaborative filtering can be grouped into three subclasses: based on user recommendations (user-based recommendation), Project based recommendations (item-based recommendation), and model-based recommendations (model-based Recommendation). Here is a detailed introduction to the three recommendations of collaborative filtering mechanism.
Collaborative filtering recommendation based on user
The basic principle of user-based collaborative filtering recommendation is that according to all users ' preference for goods or information, it is found that the "neighbor" user group similar to current user tastes and preferences uses the algorithm of computing "K-neighbor" in general application, and then recommends the current user based on the history preference information of K neighbors. The schematic diagram is given in Figure 4 below.
Figure 4. The basic principle of user-based collaborative filtering recommendation mechanism
The diagram above illustrates the basic principle of collaborative filtering recommendation mechanism based on user, suppose user a likes goods A, goods c, User B likes goods b, User C likes goods A, goods C and goods D; from the historical preference information of these users, we can find that the tastes and preferences of user A and user C are quite similar, while User C Also like article D, then we can infer that user A may also like the item D, so you can recommend the item D to user A.
The user-based collaborative filtering recommendation mechanism and the recommendation mechanism based on demography are both calculating the similarity of users, and based on the "neighbor" user group to compute the recommendation, but they are different how to calculate the similarity of the user, based on the demographic mechanism only consider the characteristics of the user itself, The user-based collaborative filtering mechanism calculates the user's similarity in the user's historical preference data, and its basic assumption is that users who like similar items may have the same or similar tastes and preferences.
Collaborative filtering recommendation based on project
The basic principles of collaborative filtering recommendations based on projects are similar, it just says that it uses all the users ' preferences for goods or information, finds similarities between items and objects, and then recommends similar items to users based on their historical preference information, and Figure 5 illustrates its rationale.
Suppose user A likes goods A and goods C, user B likes goods A, goods B and goods C, user C like goods A, from the historical preferences of these users can be analyzed items A and goods c more similar, like goods a people like goods C, based on this data can be inferred that User C may also like the object C, So the system will recommend the item C to User C.
Similar to the above, based on collaborative filtering recommendations and content-based recommendations are based on the similarity prediction, but the similarity calculation method is different, the former is from the user history preference inference, and the latter is based on the property characteristics of the object information.
Figure 5. The basic principle of collaborative filtering recommendation mechanism based on project
At the same time collaborative filtering, in the user based and project based on the two strategy should be how to choose? In fact, the collaborative filtering recommendation mechanism based on project is a strategy for Amazon to improve on the user-based mechanism. Because in most WEB sites, the number of items is much smaller than the number of users, and the number of items and similarity is relatively stable, and based on the mechanism of the project than based on the user's real-time better. But not all of the scenarios are such situations, you can imagine in some news recommendation system, perhaps items, that is, the number of news may be greater than the number of users, and news updates also have a very fast, so its likeness degree is still unstable. So, in fact, we can see that the choice of recommendation strategy in fact and specific application scenarios have a great relationship.
Collaborative filtering recommendation based on model
Based on the model of collaborative filtering recommendation is based on sample user preferences information, training a recommendation model, and then based on real-time user preferences of the information to predict, computing recommendations.
The recommendation mechanism based on collaborative filtering is the most widely used recommendation mechanism, which has several notable advantages:
it does not require rigorous modeling of objects or users, and does not require that the description of the item be machine-understandable, so this method is also domain-independent. This method calculates the recommendation is open, can share the experience of others, good support users to discover potential interest preferences
And it also has the following problems:
The core of the method is based on historical data, so there is a "cold start" problem for new items and new users. The recommended effect relies on the number and accuracy of user history preference data. In most implementations, user history preferences are stored with sparse matrices, while the computation on sparse matrices has some obvious problems, including the possibility that a few people's error preferences will have a significant impact on the recommended accuracy. For some special taste of the user can not give a good recommendation. Because of the historical data, it is difficult to modify or evolve according to user's preferences, which makes this method inflexible.
Mixed recommendation mechanism
Recommendations on current WEB sites are often not simply a recommendation of a mechanism or strategy, they tend to mix multiple methods to achieve better results. about how to combine each recommendation mechanism, here are a few more popular combination methods.
weighted mixing (weighted hybridization): Using a linear formula (linear formula) to combine several different recommendations according to a certain weight, the value of the specific weight needs to be repeated experiments on the test data set to achieve the best recommended results. Switched blends (switching hybridization): As mentioned before, in fact, for different situations (data volume, system operation, number of users and items, etc.), the recommendation strategy may be very different, then the switching mode of mixing, is to allow in different circumstances, Select the most appropriate recommended mechanism for calculating recommendations. Partitioning (Mixed hybridization): Uses a variety of recommendation mechanisms and displays different recommendation results to the user. In fact, Amazon, Dangdang and many other e-commerce sites are used in this way, users can be very comprehensive recommendations, but also easier to find what they want. Layered blending (meta-level hybridization): Use a variety of recommendation mechanisms, and the results of a recommendation mechanism as another input, so as to synthesize the advantages and disadvantages of each recommendation mechanism, get more accurate recommendations. Recommended Engine Application
After introducing the fundamentals of the recommendation engine, the basic recommendation mechanism, the following brief analysis of several representative of the application of the recommendation engine, here Select two areas: Amazon as a representative of E-commerce, watercress as a social network representative.
Application of recommendation in e-commerce –amazon
Amazon, the originator of the recommendation engine, has infiltrated the recommended ideas in every corner of the application. The core of Amazon's recommendation is to compare the data mining algorithms with the consumer preferences of other users to predict the products that users may be interested in. In response to the various recommended mechanisms described above, Amazon uses a hybrid mechanism for partitioning and displays different recommendations to the user, as shown in Figure 6 and Figure 7, which shows the user's recommendation on Amazon.
Figure 6. Amazon's recommended mechanism-home
Figure 7. Amazon's referral mechanism – browsing items
Amazon exploits the behavior of all users that can be logged on the site, processes them according to the characteristics of different data, and divides them into different areas to push referrals for users:
today (Todays's recommendation for you): usually buy or view records based on the recent history of the user, and give a compromise recommendation in combination with the popular items. Recommendations for new products (new for you): using the content-based recommendation mechanism (content-based recommendation) to recommend some new items to the user. In the choice of methods because the new items do not have a lot of user preferences information, so based on the content of the recommendation can be a good solution to the "cold start" problem. Bundled sales (frequently bought up): The use of data mining technology to analyze the user's purchase behavior, to find often together or the same person to buy the collection of goods, for bundling, this is a typical project based collaborative filtering recommendation mechanism. Goods purchased/browsed by others (Customers who bought/see the This item Also bought/see): It is also a typical application of collaborative filtering based on projects, which enables users to find items that are of interest to them faster and more conveniently through social mechanisms.
It is worth mentioning that Amazon is making recommendations, the design and user experience is also unique to do:
Amazon uses the advantage of its large historical data to quantify the reasons for the recommendation.
Based on the recommendation of socialization, Amazon gives you factual data that convinces the user, for example, how much of the user buys the item and, based on the item's own recommendation, Amazon lists the reasons for the recommendation, for example: because there is a * * * in your shopping box, or because you bought * * *, So I recommend a similar * * * to you.
In addition, Amazon's recommendations are based on user profile, and user profiles record the user's behavior on Amazon, including looking at those items, buying items, Favorites and Cytopathic list items, and so on, of course, Amazon There is also a way to integrate ratings and other user feedback, which are part of profile, and Amazon provides the ability to allow users to manage their profile in such a way that users can more specifically tell the recommendation engine what his tastes and intentions are.
Recommended applications in social networking sites – watercress
Watercress is a relatively successful domestic social networking site, it is the book, film, music and city activities as the center, the formation of a pluralistic social networking platform, the nature of the recommended function is essential, below we see how the watercress recommended.
Figure 8. The recommended mechanism of watercress-watercress film
When you're in a watercress movie, add some movies that you've seen or are interested in to the list that you've seen and want to see, and give them a corresponding rating, then the Watercress recommendation engine has got some of your preference information, then it will show you as shown in Figure 8 film recommendations.
Figure 9. Recommended mechanism for watercress – recommendations based on user tastes
Watercress recommended through the "watercress guess", in order to let users know how these recommendations are coming, watercress also gave a "watercress guess" a brief introduction.
"Your personal recommendation is automatically based on your collection and evaluation, and everyone's list of recommendations is different," he said. The more you collect and evaluate, the more accurate and rich your recommendation will be.
The recommended daily content may vary. With the growth of watercress, the content recommended to you will be more and more accurate. ”
This allows us to know clearly, watercress is necessarily based on the social collaborative filtering recommendations, so the more users, users feedback more, then the recommended effect will be more and more accurate.
Compared to Amazon's user behavior model, the Watercress film model is simpler, that is, "see" and "want to see", which also makes their recommendations more focused on the user's taste, after all, the motives of shopping and watching movies are still very different.
In addition, Watercress also has based on the recommendation of the article itself, when you check the details of some movies, he will recommend to you "like this movie people also like the movie", as shown in Figure 10, this is a collaborative filtering based applications.
Summary
In the era of network data explosion, how to let users find the desired data faster, how to let users find their potential interests and needs, whether for e-commerce or social network applications are critical. The emergence of the recommendation engine makes the issue more and more interesting. But for most people, you may wonder why it always guesses what you want. The magic of the recommendation engine is that you don't know what the engine is recording and reasoning behind this recommendation.
Through this review article, you can understand that in fact, the recommendation engine is just silently record and observe your every move, and then by all the users generated by the mass of data analysis and discovery of the rules, and then slowly understand you, your needs, your habits, and silently silent to help you quickly solve your problems, Find what you want.
In fact, think back, a lot of times, the recommendation engine knows you better than you.
Through the first article, I believe you have a clear first impression of the recommendation engine, the next article in this series will delve into the recommendation strategy based on collaborative filtering. In today's recommended technology and algorithms, the most widely accepted and adopted is based on collaborative filtering recommendations. It is simple in its method model, data dependence is low, data is convenient to collect, the recommendation effect is superior to many advantages to become the recommended algorithm "No.1" in the public eye. This paper will take you deep into the secret of collaborative filtering and give an efficient implementation of the cooperative filtering algorithm based on Apache Mahout. The Apache Mahout is a more recent open source project for ASF, which originated in Lucene, built on Hadoop, and focused on the efficient implementation of machine learning classical algorithms on massive amounts of data.
Figure 10. Recommended mechanism for watercress-based on the recommendations of the film itself
In the era of network data explosion, how to let users find the desired data faster, how to let users find their potential interests and needs, whether for e-commerce or social network applications are critical. The emergence of the recommendation engine makes the issue more and more interesting. But for most people, you may wonder why it always guesses what you want. The magic of the recommendation engine is that you don't know what the engine is recording and reasoning behind this recommendation.
Through this review article, you can understand that in fact, the recommendation engine is just silently record and observe your every move, and then by all the users generated by the mass of data analysis and discovery of the rules, and then slowly understand you, your needs, your habits, and silently silent to help you quickly solve your problems, Find what you want.
In fact, think back, a lot of times, the recommendation engine knows you better than you.
Through the first article, I believe you have a clear first impression of the recommendation engine, the next article in this series will delve into the recommendation strategy based on collaborative filtering. In today's recommended technology and algorithms, the most widely accepted and adopted is based on collaborative filtering recommendations. It is simple in its method model, data dependence is low, data is convenient to collect, the recommendation effect is superior to many advantages to become the recommended algorithm "No.1" in the public eye. This paper will take you deep into the secret of collaborative filtering and give an efficient implementation of the cooperative filtering algorithm based on Apache Mahout. The Apache Mahout is a more recent open source project for ASF, which originated in Lucene, built on Hadoop, and focused on the efficient implementation of machine learning classical algorithms on massive amounts of data.
From: http://www.ibm.com/developerworks/cn/web/1103_zhaoct_recommstudy1/index.html