Wen/joseph A. Konstan & John Riedl) Nowadays, people who go shopping online are accustomed to receiving personalized recommendations from the system. Netflix will recommend videos you might like to watch. TiVo will record the program automatically, if you are interested, you can read it. Pandora will generate a personalized music stream by predicting what songs we want to listen to.
All of these recommendations are based on a wide range of referral systems. They rely on computer algorithms to run, according to the customer's browsing, search, order and preferences, for customers to choose what they may like, may buy goods, thereby serving consumers. The recommendation system was designed to help online retailers increase sales, and now this is a huge and growing business. At the same time, the development of Recommender Systems has been researched by only dozens of people in the middle of the 90 century, and has grown to hundreds of researchers today, working at universities, large online retailers and dozens of other companies focused on such systems.
The recommendation system has progressed considerably over the years. They are relatively coarse at first, often with inaccurate predictions of behavior, but as more and different types of site user data become available, the recommendation system has been able to apply innovative algorithms to these data, and they have quickly improved. Today, recommender systems are extremely complex and sophisticated systems that often look more familiar to you than you are. At the same time, referral systems are expanding beyond retail sites: Universities use them to guide students in elective classes, and mobile phone companies rely on them to predict which users are likely to switch to another vendor, and the conference organizers have tested them to distribute papers to reviewer experts.
Both of us have been developing and studying them since the early days of the referral system, initially as an academic researcher in the Grouplens Program (Grouplens project). Since 1992, Grouplens has been sorting through the messages in the Usenet discussion area of the U.S. Interest Forum site, pointing users to threads that they may be interested in but have yet to discover. A few years later, we set up Net perceptions, a recommendation algorithm company that has been leading the industry for the first time during the Internet Boom (1997-2000). In view of this, although these companies rarely openly talk about how their referral system works, our experience has enabled us to gain insight into the behind-the-scenes scenarios at Amazon and other online retailers. (In this article, our analysis is based on observation and reasoning, and does not contain any internal messages).
Here's what we've seen. What is the recommended algorithm for "Guess what you Like"?
Source: recommenderapi.com
Have you ever wondered what you look like in the Amazon eye? The answer is: You are a large, large table with a long string of numbers. This number describes everything you've seen, every link you click and every item you buy on Amazon's website, and the rest of the table represents the other millions of of people who shopped on Amazon. Every time you log on to the site, your numbers will change, and in the meantime, every move you make on the site will change that number. This information, in turn, affects what you see on each page you visit, as well as what emails and offers you receive from Amazon.
For many years, developers of Recommender systems have tried to collect and parse all of this data in a variety of ways. Most recently, most people have chosen to use an algorithm called personalized collaborative recommendation (personalized collaborative recommender). This is also the core algorithm for Amazon, Netflix, Facebook's best friend recommendation, and a British pop music website, last.fm. Said it was "personalized" because it would track every user's behavior (such as viewed pages, order records, and product scores) to recommend them; they're not blind cats run into dead rats-all by luck. It "works together" because the algorithm buys the goods or shows goodwill against many other customers, and treats the two items as related, not by analyzing the characteristics of the product or the keywords.
Different types of personalized collaborative recommender systems have emerged at the latest since 1992. In addition to the Grouplens program, another early recommendation system was MIT's Ringo, which would give users a chance to recommend other music they might like, based on the user's music playlist.
user-user algorithm: Calculating the similarity between users
Both Grouplens and Ringo use a simple synergistic algorithm, called the "User Association" (user-user) algorithm. This type of algorithm calculates the "distance" between a pair of users, based on how similar they are to the same item. For example, if both Jim and Jane played 5 points for the movie "The World of Electronics" (Tron), the distance between them was 0. If Jim gave his sequel, "Genesis: War" (Tron:legacy), a 5-point film, and Jane only played 3 points, the distance between them became bigger. According to this calculation to taste the relatively "close" users, we call them a common "neighbor" (neighborhood).
However, this user-associated policy does not work very well. First of all, it is difficult to form meaningful neighborhoods: Many users have a very small number of common scores between 22, and some do not, while the only ones that have scored points are often box-office blockbusters, basically everyone likes. Again, because the distance between users can become very fast, the algorithm must be done on the spot to do most of the calculation, and this may be more than a point on the site where the next action of the person will take a longer time to send.
Item-item algorithm: Calculating the association between items
As a result, most recommender systems now rely on an "object-to-matter" (Item-item) algorithm that calculates the distance between two books, two movies, or two other things, based on the similarity of the users who have been beaten too far. People who like Tom Clancy book are likely to give Clive Cussler's work a high score, so Clancy and Cussler's books coexist in a neighboring set. The distance between a pair of items may be calculated based on the scores of millions of users, which tend to remain relatively stable over time, so the recommended system can calculate distances in advance and generate recommendations faster. Both Amazon and Netflix have publicly expressed their use of a variant of the object-to-object correlation algorithm, but none of the details.
The user correlation algorithm and the matter-matter correlation algorithm all have one problem, is the user scoring inconsistency. When giving them a chance to review a tick again, users often give different scores to the same item. Taste is changing, the mood is changing, and the impression is changing. A study conducted by MIT in the 90 's showed that, after a year of initial scoring, users scored an average of 1 points (7 points). Researchers have also been experimenting with different approaches to incorporating this variable into the model, for example, if a user has scored a point for a product, but the rating does not match any other information that the recommendation algorithm knows about the person and the product, and some recommended algorithms invite users to evaluate the product again.
dimensionality Reduction algorithm: Generalization of the characteristics of things
However, there is a bigger problem than consistency between the user association algorithm and the object-to-object Association algorithm: they are too dead. That is, they can find people who like the same thing, but ignore the potential user groups that are very similar to hobbies. For example, you like Monet's water lily. So, which of the 250 water lilies painted by the French Impressionist master, which one do you like best? In a group of people who like Monet, it is possible that everyone likes the water lily is different, and the basic algorithm may not recognize that these people have a common hobby.
About 10 years ago, researchers came up with a way to show things more generically through a process called dimensionality Reduction. This method is much more dense than the user association and object-to-object correlation algorithm, so it is not so fast to be adopted. But with the computer getting faster and cheaper, the dimensionality reduction algorithm has gradually made some progress.
To find out how the dimensionality reduction algorithm works, let's look at what you love to eat and how to compare it with other 1 million people who love to eat. You can show this information in a giant matrix, each vertical line represents the same food, and everyone likes to eat something that naturally forms a row. In your line, it might show that you gave the grilled steak. 5 stars, braised small rows 4 stars, grilled chicken wings 2 stars, frozen tofu Roll 1 stars, cheese grilled mushrooms 5 stars, brine soybeans 4 stars, and so on.
However, the recommended algorithm for using this matrix does not care about the number of stars you give to which food is evaluated. What it wants to know is your general preference, so it can apply this information to a richer variety of foods. For example, based on the information you have given above, the algorithm may think you like beef, salty and baked dishes, don't like chicken and any fried things, dislike and dislike vegetables, and so on. The characteristics or dimensions of the food you like to eat are much smaller than the number of foods that match your requirements-at most 50 or 100. By checking these dimensions, the recommended algorithm can quickly determine whether you would like a new food (say, salted ribs) by comparing the dimensions of the food (salty, beef, not chicken, not fried, not vegetables, not baked) with your data. This more general presentation allows the proposed algorithm to accurately discover users with similar but different preferences. Moreover, it greatly compresses the size of the matrix, making the algorithm more efficient.
This is a cool solution. But where do you find the dimensions of the food you like to eat? Certainly not to ask the chef. The recommendation system uses a mathematical method called singular value decomposition to calculate the dimension. This approach involves decomposing the original mega-matrix into two "flavor matrices"-one containing all the users and 100 flavors, the other containing all the food and 100 flavor dimensions-plus the third matrix, when multiplied by any one of the preceding two matrices, Will get the original matrix (* changed here).
Unlike the above example, the dimensions of calculations are neither descriptive nor intuitive; they are purely abstract values. This is nothing, as long as these values eventually produce the exact recommended results. The main disadvantage of this approach is that the time required to create a matrix increases rapidly as the number of customers and products increases-creating a matrix of 250 million customers and 10 million products, which takes up to 1 billion times times as much time as the matrix of creating a 250,000 customer and 10,000 products. And this process also needs to be repeated frequently. Once a new score is received, the matrix is obsolete; In a company like Amazon, a new comment is received every second. Fortunately, the matrix can still work at a pretty good level, even if it's slightly outdated. Researchers have also been designing new algorithms that provide approximate values for singular value decomposition and significantly shorten computational time.
Joseph A. Konstan and John Riedl are all computer science professors at the University of Minnesota in the United States. Konstan and IEEE Riedl, an IEEE senior member, were involved in the creation of the Movielens referral system. In the next article, the two authors will continue to introduce what the recommendation algorithm will never recommend to you.
Correction Note: at the beginning of the article, the second-to-last natural section on the singular value decomposition of the content of the wrong expression, has been changed, hereby stated. (2012-11-13)
"Guess what you Like" is how to guess your mind?