Collaborative Filtering with temporal dynamics ------ Yehuda Koren

Source: Internet
Author: User

The preferences of individual users may change over time. We call it concept drift (concept transfer). A major task of the Recommendation System is to influence the preferences temporarily, and get the features that users have long-term preferences. Concept drift includes the emergence of new items or services (special holidays, seasonal changes)-these changes are group changes (that is, everyone will change ); there are also various changes (Family Structure Changes, user growth impacts on preferences of movies and commodities)-These changes cannot be captured by user-level models. Therefore, we aim to establish a concept drift model for each user to obtain the behavior changes on the timeline.

Creating a model for each user will inevitably result in a small amount of behavior data (the model is only based on the current user behavior data). Therefore, it is inappropriate to discard or assign a low weight value to the old scoring data. Instead, you need to extract long-term (persistent signal) from the entire historical behavior and delete noise.

Taking the movie system as an example, "3" is used to indicate users who do not like or hate it. Now, "3" may be used to indicate users who do not like it. In addition, many systems cannot identify multiple users on a single device. A simple solution may be to divide users by time.

Netflix Dataset: from 31 December, 1999 to 31 December, 2005, 480,000, collected the scores of 17,700 million users over 100 million in 5,600 movies. Each video scored million on average, each user gives an average of 208 scores. Two temporal effects in Netflix data: the average score in 3.4 has increased from 3.6 to. As movies grow older, the score is also on the rise (old movies get a higher score than new movies ):

RMSE as the scoring standard: the small progress on RMSE has greatly improved the top-n quality of the recommendation system.

Solve the concept Drift Problem:

1. instance Selection: use time-window (Time Window), the problem is that only the instances in the window are considered (and the significance of these instances are the same), and all instances outside the window are discarded.

2. instance weighting: estimate the relevance of the instance to give it a certain weight. Use the time decay function to grant permissions to past instances.

3. Ensemble Learning: integrates many predictors to grant permissions to the predictor based on the correlation between the predictor and the current time (predictors that were more successful on recent instances get higher weights ).

 

Principles:

1. A model is required to explain the behavior changes on the entire timeline, not just the current behavior. This is the basic condition for extracting signal on each time slice and Deleting Noise.

2. A variety of concepts changes need to be captured, including user-dependent, item-dependent, gradual and sudden.

3. essential to combine all those concepts within a single framework. This allows modeling interactions crossing users and items thereby identifying higher level patterns.

4. You do not need to speculate on the potential preferences of users in the future (this is difficult in sparse data), but need to isolate persistent signal from transient noise in historical data.

 

4. Time-aware Factor Model

The baseline predictor:

  

The Factor Model:

   

Complete Model (including implicit data ):

   

We wocould not perform CT a signi? Cant temporal variation of item characteristics, more speci? Cally, we identify the following effects: (1) User biases (BU) change over time; (2) item biases (BI) change over time; (3) user preferences (PU) change over time.

4.2 time changing baseline Predictors

The two main time factors associated with baseline predictors are: 1. The popularity of item changes over time; 2. The average score of users changes over time. New model baseline predictor:

  

First, deal with Bi (t), because we do not expect that the film's bias will change a lot in the short term (unlike the user's ), therefore, we divide the entire time slice (a small time slice can have better performance, and a large time slice has more data in each slice), taking 10 weeks as one slice, divided into 30 parts in total, assign a bin (t) (an integer between 1 and 30) for each time t, so that you can put Bi (t) static and time changing:

   

For bu (t), we need to be able to detect a short period of Temporal effect (User bias can make major changes within a short period of time); in addition, we do not expect users to be sharded like items (because there is not enough scoring data ).

First, define a time offset function of the linear model to indicate the distance between the current scoring time and the average scoring time, as shown in the following (Beta = 0.4 in this algorithm ):

Assign an α u to each user, so we can get a model (for a simple linear model, we need to learn the Bu and α U parameters ):

  

In addition, a curve model is provided:

  

This model can only capture gradual concept drift, while Netflix data sets, some users tend to score the same value in a specific day, which can be described as the user's mood of the day. Therefore, to solve this kind of short lived effects, assign each user a parameter Bu, T, to absorb day-speci? C variability.

  Notice that in some applications the basic primitive time unit to work with can be shorter or longer than a day. e. G ., our notion of day can be exchanged with a notion of a user session.

  In Netflix data sets, each user has scored an average of 40 days. Therefore, the user bias requires an average of 40 parameters and is added to the model:

  

  

Use Stochastic Gradient Descent to learn parameters (20-30 iterations are required, λ = 0.01 ):

  

  

The following table compares the performance of different baseline predictors (User bias is significantly different from item bias Time Offset, day-speci? C variability has the greatest impact ):

  

  

 

Periodic effects: Some items may be popular in specific seasons or festivals; TV programs may have different popularity in different periods of the day (dayparting ); users may be more interested in shopping on weekends. The solution is to assign a parameter to an item or user. In this way, the original format is changed to (because periodic effects in the video recommendation system is not found, this article is not used ):

  

  

Because different users provide different scoring standards, add a parameter on Bi (t) (RMSE down to 0.9555 ):

  

  

CU is static, Cu, and T is day-speci? C variability.

Without considering the interaction between users and videos, RMSE has dropped to 0.9555, which is similar to Netflix's 0.9514.

4.3 time changing Factor Model

  The user's preference changes over time. Similar to formula (9), the user preference transfer formula is obtained (the meaning of each parameter is basically the same as that of formula (9 ):

  

Merge all the above formulas to obtain the final formula (timesvd ++, which can converge after about 30 iterations ):

  

Compare the SVD, SVD ++, and timesvd ++ algorithms to obtain the following information:

 

5. temporal dynamicsNeighborhoodmodels

This part is the model of item-similarity that changes over time. The additional part is the same for the time being.

  

 

Collaborative Filtering with temporal dynamics ------ Yehuda Koren

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.