Referral System (1)--splitting approaches for Context-aware recommendation

Source: Internet
Author: User

Opening words:when I was a freshman, I started to contact the referral system under the leadership of my lab teacher and brother. Time in a hurry, the blink of an eye is a junior, because the junior class is very little, so they have time to learn what they have to do the next summary. The first blog dedicated to the teachers and seniors who have flown me over the past three years, thank you for your selfless help and teaching!

Collaborative filtering algorithm:

In the traditional collaborative filtering algorithm, the algorithm is based on a user rating matrix on the user recommendation, the core idea is to use similar users or similar products to the user recommendation of the information, take a picture of the example, here we want to predict the "Great Gatsby" This film's score is how much (1-5), One of the simplest ideas is to find from all users similar to the red taste of the people, a look at the past, Xiao Ming and red Taste the most similar (are all small plum fans?). Like to watch sensational movies? So we can predict that little Red's score on The Great Gatsby is supposed to be higher, so we can recommend The Great Gatsby film to Xiao Hong in the film recommendation, This algorithm, which uses the information of similar users, is called user-based (user-based) collaborative filtering algorithm, which corresponds to the item-based (commodity-based) collaborative filtering algorithm, based on the idea of the product collaborative filtering algorithm is roughly, those with the user's favorite products similar to the goods, To some extent the user should also like, put here, little red like "Titanic" this film, because "Titanic" and "Great Gatsby" more similar, so little Red also rightfully like the "Great Gatsby" this film. This blog is a detailed introduction to the collaborative filtering algorithm.


(figure I)

in order to elicit the following, here is a little story of no story, no taste:
A company uses the collaborative filtering algorithm to generate personalized recommendations for each user under it, the company's turnover is higher than a day, and as the data is more and more abundant, a company in addition to having users of the product ratings of these data, but also have a variety of contextual information (such as when Xiao Hong is watching this movie, is with someone to watch, after watching the mood how and so on, the boss wants to use these contextual information, but the above has been mentioned, the traditional collaborative filtering algorithm is based on the user's rating of the product to the user recommendation, then, there is no way to let a company in the premise of not replacing this set of algorithms, To use these contextual information?

Splitting:categories of splitting:
splitting method is divided into item-splitting,user-splitting,ui-splitting three kinds.

The basic idea of splitting:
For a thing x, if it changes with the environment (context), and X also changes very much, you should think that X is different in different contexts (contexts), so we should separate them and treat them as different things.

Splitting's approach:
Here to Item-splitting to illustrate, item-splitting well, translated into Chinese is "project-segmentation", literally means to a project (merchandise) to be divided into several, since the division, we want to take what to split? How to split? Split into several?

Segmentation??? ~~~

Let's start with the first and second questions, what to split, how to split? Looking back at the small story, we will naturally think that this time we will use the context of the information to split the item, here we look at figure two and figure three of the two matrices:
(Figure II)


(Figure III)

Figure three is the user's rating of the Titanic, and figure two is the user in the movie to see some contextual information, the idea of the Item-splitting method is to use a context condition to divide the project attempt, try to split into two vectors, if the two vectors are significantly different, The original project should be divided, assuming that the context of the above conditions, "whether with lovers" the context can make the Titanic column vector after the partition is significantly different, it should be divided into two vectors, four:


(Figure Four)


In Figure four, according to the "Whether with lovers" the context of the "Titanic" this column vector division, formed two different vectors (with lovers, not with lovers), in the Item-splitting method, the corresponding is actually two different movies.

Here, to introduce a new question, how to tell if a vector is divided into two vectors, the two vectors are significantly different, that is, how to judge the Titanic (and Lovers) and Titanic (not with the lover) the two column vectors are significantly different?

In this paper [1], the whole process of splitting is explained in detail, where one of the two vectors is judged to have a significantly different formula:

(Figure Five)


The formula is a T-test of two samples, in case the P-value satisfies the threshold (usually p<=0.05 satisfied), the higher the T value, the more different the two vectors, where the UI is the average score of the movie, Si is the score variance of the film, Ni is the film has a number of non-0 values, Subscript C and non-c denote different contexts (with lovers and not with lovers).

The algorithm pseudocode for item-splitting (from the paper [1]) six shows:
(Figure Six)


After executing item-splitting, the matrix of graph one assumes that except for The Great Gatsby, the other two films have been splitting and will become a seven matrix:
(Figure VII)

After using the context information to get the seven matrices, the traditional collaborative filtering algorithm can be applied to the matrix.

Finally, let's take a look at a few of the questions mentioned above, in the above process, for each item, we are divided into 2 parts, that is, this context and is not the context, then we can divide it into more parts? For example, in the context of the weather in Figure II, divided into 4 parts per item, that is, sunny, rainy, cloudy, snow, four, can be, but the paper [1] mentioned that this division will not only lead to the increase in time costs, but also cause the scoring matrix is too sparse, will also cause the problem of overfitting.

Advantages and disadvantages of S-plitting method and its thinking: First of all, the advantages of the splitting method can be valuable contextual information into the traditional collaborative filtering algorithm, so as to enhance the recommended effect.
Next to the shortcomings, from the above pseudo-code analysis, we can know that in the context of the dimension is very high, and each dimension has a lot of values, the whole splitting process is very time-consuming, and splitting spend these times may not be able to greatly improve the recommended effect, Note that when I say the advantages of splitting, there is a special description of "valuable contextual information", but what contextual information is valuable? This is related to the business background, some people specifically on the context of the selection has done research, found that not all contextual information can improve the recommended effect, sometimes introduce inappropriate context, but will reduce the recommended effect, therefore, for the splitting process to use what context to improve the recommendation effect, This also requires us to have a certain understanding of the business background.

here is a very good project on GitHub on the context recommendation system:Https://github.com/irecsys/CARSKit

Reference Study Materials:[1]L. Baltrunas and F. Ricci. Experimental evaluation ofcontext-dependent Collaborative filtering using item splitting..
[2]yong zheng,robin Burke,Bamshad mobasher.splitting approaches for Context-aware recommendation:An empirical study.2014








Referral System (1)--splitting approaches for Context-aware recommendation

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.