Matrix decomposition with bias

Source: Internet
Author: User

first, the basic concept

The basic matrix decomposition method is used to predict the characteristics of the user and the object, that is, the interaction information between the user and the item. The user's characteristic vector represents the user's interest, the item's characteristic vector represents the item characteristic, and each one dimension corresponds, two vectors ' inner product expresses the user to this item preference degree. However, most of the scoring data we observe is the result of factors unrelated to the user or the item, that is, a large part of the factor is irrelevant to the user's preference for the item and depends only on the user or the item itself. For example, for optimistic users, its scoring behavior is generally high, and for critical users, his scoring record is generally low, even if they have the same item scored the same, but they do not have the same preference for the item. Similarly, for the goods, for example, in the case of films, popular movies scored generally high, while some rotten films scored generally low, these factors are independent of the user or product factors, and the user's preference for the product is irrelevant.

second, the model

We refer to these separate user-independent or item-independent factors as the bias (Bias) section, which is called the personalization portion of the user's and object's interaction, the user's preference for the item. In fact, in the matrix decomposition model, the preference part of the increase in the accuracy of scoring prediction is much higher than the role of the personalization part, with the Netflix prize recommended competition data set as an example, Yehuda Koren only use biased parts can reduce the scoring error by 32%, The addition of personalized parts can be reduced by 42%, that is, only 10% is the role of the personalized part, which also fully illustrates the importance of the bias part, the remaining 58% of the error Yehuda Koren will be called the model is not explained, including data noise and other factors.

The offset portion is expressed as

The offset part consists of three sub-parts, respectively

    • The global average μof all scoring records in the training set represents the overall scoring of the training data, which is a constant for a fixed set of data.
    • User-biased bu, which is independent of the factor of the item feature, indicates the scoring habit of a particular user. For example, critical users tend to score poorly on their own, while optimistic users are more conservative and have a higher overall rating.
    • Item bias bi, a factor that stands for user interest, indicates the score of a particular item. Take the film as an example, the overall score obtained by a good piece is high, while the score obtained by the rotten piece is generally low, and the object bias captures this characteristic.

All of the above bias and the user's preference for items is irrelevant, we will bias part as a basic prediction, on this basis to add user preferences information, that is, personalized parts, so the total score forecast formula is as follows:

The last one is the personalization part of our predictive model, with the curly braces partially biased, and two parts added to get the final predictive score.

In the model training process, the square error is used as the loss function, and the optimization function is represented as follows:

Optimize the above functions, obtain the user characteristic matrix P, the item characteristic matrix Q, the user offset bu, each item biased bi, the optimization method can still use cross least squares or random gradient descent.

Iii. Summary

Data set bias: data overall scoring situation

User paranoia: User's scoring habits

Item offset: Spam score is low, good high.

The bias plus the inner product of the PQ is the predictive score.

Matrix decomposition with bias

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.