Intermediary transaction SEO diagnosis Taobao guest Cloud host technology Hall
Now people online shopping are accustomed to the system given the "Guess You (also) like", sometimes it seems more than you have to know you. How does the recommendation system "guess" your mind?
(/joseph A. Konstan & John Riedl) Nowadays, people who shop online are accustomed to receiving personalized referrals from the system. Netflix will recommend videos that you might like to watch. TiVo will automatically record the program, if you are interested to see. Pandora generates a personalized stream of music by predicting what songs we want to listen to.
All of these recommendations come from a variety of referral systems. They rely on computer algorithms to run, according to the customer's browsing, search, orders and preferences, for customers to choose the products they may like, may buy goods, thereby serving consumers. The recommendation system was designed to help online retailers increase sales, which is now a huge and growing business. At the same time, the development of recommender systems has been studied by only dozens of people in the middle of the 90, with hundreds of researchers working in universities, large online retailers and dozens of other companies focusing on such systems.
Over the years, the Recommender system has made considerable progress. They are relatively rough at first, often making inaccurate predictions of behavior, but as more and different types of Web site user data become available, the recommendation system has been able to apply innovative algorithms to these data, and they have improved rapidly. Today, recommender systems are extremely complex and highly specialized systems that often seem to know you better than you do. At the same time, recommender systems are expanding beyond retail sites: Universities use them to guide students to elective classes, and mobile phone companies rely on them to predict which users are likely to switch to another vendor, and the conference organizers have tested them for assigning papers to reviewers.
The two of us have been developing and studying them from the early days of the referral system, initially as academic researchers, in the Grouplens program (Grouplens project). Since 1992, Grouplens has sorted through messages in the Usenet discussion area of the American Interest Forum website, directing users to topics they may be interested in but have not yet discovered. A few years later, we set up Net perceptions, a recommended algorithm company that has been in the industry's leading position during the first Internet Boom (1997-2000). In view of this, although these companies rarely talk openly about how their recommendation systems work, our experience has enabled us to gain insight into the behind-the-scenes scenes of Amazon and other online retailers. (In this article, our analysis is based on observation and reasoning and does not contain any internal messages).
Here's what we see.
What is the recommended algorithm for "guessing you like"?
Source: recommenderapi.com
Have you ever wondered what you look like in Amazon's eyes? The answer is: You are a large, large table with a long string of numbers. This number describes everything you've seen, each link you click and every item you buy on Amazon, and the rest of the table represents millions of other people who go shopping in the Amazon. Your numbers change every time you log on to the site, and the number changes as you move around the site. This information will in turn affect what you see on every page you visit, and what emails and offers you will receive from Amazon.
For many years, the developers of Recommender systems have tried to collect and parse all of these data in a variety of ways. Most of these days, most people have chosen to use an algorithm called personalized collaborative recommendation (personalized collaborative recommender). It's also a key algorithm for Amazon, Netflix, Facebook's friends, and Last.fm, a British pop music website. It is "personalized" because it tracks every user's behavior (such as browsing pages, order records, and product ratings) to make recommendations; they're not the blind cats that run into dead mice-all by chance. To say that it is "synergistic" is because the algorithm will buy the product or show goodwill to many other customers, and treat two items as being related to each other, and it is not judged by analyzing the product features or keywords.
Different types of personalized collaborative recommender systems have emerged at the latest since 1992. In addition to the Grouplens program, another early recommendation system is MIT's Ringo, which gives users the chance to recommend other music they might like, based on their music playlist.
User-user algorithm: Calculating the similarity between users
Both Grouplens and Ringo use a simple collaborative algorithm, known as the "User Association" (user-user) algorithm. This type of algorithm calculates the "distance" between a pair of users based on how similar they are to the same item. For example, if both Jim and Jane have played 5 points for the Tron movie, the distance between them is 0. If Jim had played 5 points for the sequel, Tron:legacy, and Jane had only played 3 points, the distance between them would have been greater. According to this calculation to taste the relative "proximity" of the user, we call them a total "neighborhood" (neighborhood).
However, this user-associated policy does not work well. First, the formation of a meaningful neighborhood is difficult: a lot of users 22 only a few common scores, some are completely not, and the only few of those who have been divided into the project, is often the box office blockbuster, basically everyone likes the kind. Again, because the distance between users can become very fast, the algorithm has to do most of the calculations on the spot, which may take longer before the next action is issued by the person poking at the site.
Item-item algorithm: Calculating the association between objects
As a result, most recommender systems now rely on an "item-item" algorithm that calculates the distance between two books, two movies, or two other things, based on the similarity of the users who give them too much. Like Tom Clancy book is likely to give Clive Cussler works high score, so Clancy and Cussler book is in a neighboring set. The distance between a pair of items may be calculated according to the scores of millions of users, which tend to remain relatively stable for a period of time, so the recommended system can compute the distance in advance and produce the recommended results faster. Both Amazon and Netflix have openly stated that they are using a variant of the object-object correlation algorithm, but are silent on the details.
The User association algorithm and the object-Object association algorithm all have one problem, is the user score inconsistency. When giving them the opportunity to comment again, the user often gives a different score to the same item. Taste changes, mood changes, the impression is also changing. A study by MIT in the 90 's showed that users scored an average of 1 points (out of 7 points) after a year of initial scoring. Researchers have also been experimenting with different methods of incorporating this variable into the model; For example, if a user gives a score to a product, but the score is not consistent with the recommended algorithm for all other information about the person and the product, some recommended algorithms invite users to evaluate the product again.
dimensionality reduction algorithm: Generalization of the characteristics of things
However, there is a bigger problem than consistency between the user association algorithm and the object-object correlation algorithm: They are too dead. That is, they can find people who like the same thing, but ignore the potential user mix that tastes very similar. For example, you like the water lilies of Monet. So, what do you like best about the 250 water lilies painted by the French impressionist master? In a group of people who like Monet, it is possible that everyone likes a different water lily, and the basic algorithm may not recognize these people have a common hobby.
About 10 years ago, researchers came up with a way to show things more dimensionality through a process called dimensionality reduction (reduction). This method is much denser than the user association and the object-Object association algorithm in computational volume, so it is not used so quickly. But as the computer becomes faster and cheaper, the dimensionality reduction algorithm has made some progress.
To figure out how the dimensionality reduction algorithm works, let's look at what you like to eat and how to compare it to what other 1 million people love to eat. You can show this information in a giant matrix, each bar represents the same food, and each person loves to eat something that naturally forms a line. In your line of business, you may be able to show that you have given the roast steak 5 stars, braised small rows of 4 stars, roast chicken wings 2 stars, frozen tofu rolls 1 stars, cheese baked mushrooms 5 stars, salt water soybeans 4 stars, and so on.
However, the recommended algorithm for using this matrix does not care about the number of stars you are evaluating for which food. It wants to understand your general preferences so that it can be applied to a richer variety of foods. For example, based on the information you give, the algorithm may think you like beef, salty stuff and baked goods, don't like chicken and anything fried, dislike or dislike vegetables, and so on. The characteristics or dimensions of the food you love to eat are much smaller than the amount of food you require-at most 50 or 100. By checking these dimensions, the recommended algorithm can quickly determine whether you would like a new food (such as salted ribs) by comparing the various dimensions of the food (salty, beef, not chicken, not fried, not vegetables, not baked) with your data. This more general presentation makes it possible for the recommended algorithm to accurately detect users with similar but different preferences. Moreover, it greatly compresses the scale of the matrix and makes the algorithm more efficient.
This is a cool solution. But where is the dimension of the food you love to eat? Certainly not to ask the chef. The recommended system uses a mathematical method called singular value decomposition to compute the dimension. This approach involves splitting the original mega-matrix into two "taste matrices"-one containing all the users and 100 taste dimensions, the other contains all the food and 100 taste dimensions-plus a third matrix, multiplied by any of the first two matrices, Will get the original matrix (※ here has changed).
Unlike the example above, the computed dimensions are neither descriptive nor intuitive; they are pure abstract values.) This is nothing, as long as the values eventually generate accurate recommendations. The main disadvantage of this approach is that the time it takes to create a matrix increases rapidly as the number of customers and products grows-creating a matrix of 250 million customers and 10 million of products requires a matrix of 250,000 customers and 10,000 products to spend as much as 1 billion times times as many. And this process needs to be repeated frequently. Once a new score is received, the matrix is obsolete; At a company like Amazon, new comments are received every second. Fortunately, even a little outdated, the matrix can still operate at a very good level. Researchers have also been designing new algorithms to provide approximate values for singular value decomposition and significantly shorten computational time.
Joseph A. Konstan and John Riedl are both computer science professors at the University of Minnesota. Konstan, an IEEE senior member, and Riedl of IEEE have been involved in the creation of the Movielens recommendation system. In the following article, the two authors will continue to introduce what the recommended algorithm will never recommend to you.
Correction Note: At the beginning of the article, the second and last natural paragraph on the decomposition of the singular value of the contents of the statement has been changed, is hereby explained. (2012-11-13)
Compiled from: IEEE Technology overview Deconstructing recommender BAE
Article Picture: IEEE spectrum.org