Summary of the main recommendation system algorithm and the example of YouTube advanced Learning recommendation algorithm

Source: Internet
Author: User
Summary of main recommendation system algorithm and YouTube Advanced Learning recommendation algorithm example by Zhuzhibosmith July 09, 2017 17:00 Nowadays, many companies use large data to make super related recommendations, and to increase revenue. In the massive recommendation algorithm, data scientists need to choose the best algorithm based on business constraints and requirements. To make it simpler, the Statsbot team prepares an overview of the existing major recommendation system algorithms.

Collaborative filtering

Collaborative filtering (CF) and its variation is one of the most common recommended algorithms. Even beginners of data science can build their own personalized film recommendation system, for example, a resume project.

When we want to recommend something to a user, the most reasonable thing to do is to find a user who has the same hobby as him or her, analyze its behavior, and recommend the same thing. Or we can focus on things that are similar to what the user bought before, and recommend similar products.

Collaborative filtering (CF) has two basic methods: Collaborative filtering based on user and collaborative filtering based on project.

The recommended algorithm includes two steps in the above scenario:

1. Find out how many users/projects in the database are similar to the target users/projects.

2. Evaluate other users/projects to predict the ratings of the related products you give to your users, given the total weight of users/projects that are more similar to a product user/project.

In this algorithm, the "most similar" means something.

What we have is each user's preference vector (the column of Matrix R), and the vector of each product's user rating (the row of the Matrix R).

First, only two elements that are known to be in the vector are left.

For example, if we want to compare Bill and Jane, the information we know is that bill hasn't seen the Titanic, Jane hasn't seen Batman, then we can only measure their similarity by Star Wars. How could anyone not watch Star Wars, right? Smile

The most popular method of measuring similarity is to measure the cosine similarity (cosine similarity) or correlation (correlations) of the user/project vector. The final step is to use the weighted arithmetic averaging method to fill the empty cells in the table according to the similarity degree.

Matrix decomposition for recommendation

Another interesting approach is to use matrix decomposition. This is an elegant recommendation because we don't think too much about which items in the row of the result matrix will be retained when the matrix is decomposed. But using the recommendation tool, we can see clearly that u is a vector of interest for the I user, and V is a vector for the parameters of part J movies.

So we can estimate X (the first user's rating for part J) by the dot product of U and v. We build these vectors with a known score to predict the unknown score.

For instance, after matrix decomposition we obtained Ted's vector (1.4;.9) and movie A's vector (1.4;. 8), now we can restore the score of the film a-ted only by calculating the dot product (1.4;. 9) and (1.4; 8), with the result being 2.68.

Clustering class

Previous recommended algorithms are simpler and apply to small systems. And until now, we still envision the recommendation problem as a supervised machine learning task. Now is the time to use unsupervised methods to solve such problems.

Imagine that we are building a large recommendation system in which collaborative filtering and matrix decomposition work should take longer. And the first idea is clustering (clustering).

At the beginning of the business, it is often the lack of previous user hierarchy, and clustering is the best way.

But if used alone, clustering seems to be a bit weak, because the fact that we're doing this is actually identifying the user group and recommending the same thing to every user in the group. When we have enough data, using the clustering method as the first step is a better choice, which can reduce the choice of relevant nearest neighbor (neighbor) in the collaborative filtering algorithm. It can also improve the performance of complex recommendation systems.

Each cluster (cluster) is assigned a representative preference based on the preferences of the users who belong to the cluster. Users of each set of clusters receive recommended results calculated at the cluster level.

The depth learning method of recommendation system

Over the past decade, the development of neural networks has taken a huge leap forward. Now they are being applied to a variety of applications and are gradually replacing traditional machine learning methods. Below I will show how the depth learning approach is being used on Youtube.

There is no doubt that the creation of recommender systems for such services is a challenging task because of its size, the changing corpus, and the various external factors that are not observable.

According to the research on the deep neural network of the "youtube recommendation System, the YouTube recommendation system algorithm consists of two parts of the neural network: one for candidate generation (candidate generation) and the other for sorting. If you don't have enough time, I will give you a brief summary here.

Using the user's history as input, the candidate Set generation network (candidate generation network) significantly reduces the number of videos and can select a set of most relevant video sets from a large corpus. The generated candidate sets are most relevant to the user, and the purpose of this neural network is simply to provide a broad personalized service through collaborative filtering.

In this step, we have a smaller number of candidate results that are closer to the user's needs. Our goal now is to carefully analyze all the candidate results so that we can make the best decisions. This task is done by the Sort network (ranking network), which assigns a score to each video based on a desired goal function that uses data to describe the video and information about the user's behavior.

Using the two-phase method (Two-stage approach), we can make video recommendations from a large video corpus, but it is certain that only a small amount of these recommendations are personalized and are actually applied by the user. This design also allows us to mix the results of other resources generated with these candidate results.

A recommended task is like an extreme category of multiple classifications. The prediction problem becomes a problem of precisely classifying a specific video (WT) in the video category (i) of millions of in the Corpus (V), based on the user (U) and context (C) at a given time t.

The key points to note before creating your own recommendation system:

If you have a large database and you want to use it for online referrals, the best way to do this is to divide the problem into two sub questions: 1 Select the top N candidate results, 2, and sort them. How do you measure the quality of your model? In addition to standardized quality metrics, there are a number of specific indicators for recommendation issues: Recall@k,precision@k can also look at the best description of the recommended system. If you are using the classification algorithm to solve the recommendation problem, you should consider generating a negative sample (negative samples). If a user buys a recommended item, you should not add it as a positive sample (positive sample) or treat the remainder as a negative sample. Consider the online and offline scores of your algorithm's quality. A training model based only on historical data can produce simple recommendation results, because the algorithm does not know the new trends and preferences for the future.

Original link:

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.