Referral System (1)--splitting approaches for Context-aware recommendation

Source: Internet
Author: User

Opening words:When I was a freshman. Under the leadership of the lab teacher and brother. I started contacting the referral system. Time hurried, the blink of an eye is a junior, because the junior class is very little. So I had time to summarize what I had learned.

First blog post. Dedicated to the teachers and seniors who have flown me over the past three years, thank you for your selfless help and teaching!



Collaborative filtering algorithm:

In the traditional collaborative filtering algorithm, the algorithm is based on a user scoring matrix to recommend to the user. The core idea is to use similar users or similar products of information to recommend users, take a picture of a case. Here we have to predict how much red has scored for The Great Gatsby (1-5), and one of the simplest ideas is to find people who have similar tastes to red from all users. A look at the past, Xiao Ming and red Taste the most similar (are all small plum fans?) More like a sensational movie? )。 So we can predict that little Red's score for The Great Gatsby film should be higher, so we can recommend The Great Gatsby film to Xiao Hong when we make the film recommendation. This algorithm, which uses information from similar users, is called the user-based (user-based) collaborative filtering algorithm, which corresponds to this. There is a item-based (commodity-based) collaborative filtering algorithm, based on the idea of collaborative filtering of commodities is roughly. The products that are similar to the products that users like, in a way that users should like, put here. Little Red likes the movie Titanic, because the Titanic is similar to The Great Gatsby, so Little Red takes it for granted that The Great Gatsby is a film of course. This blog is a detailed introduction to the collaborative filtering algorithm.


(figure I)

in order to elicit the following, here is a little story of no story, no taste:
A company uses the collaborative filtering algorithm to generate personalized recommendations for each user under it, the company's turnover is higher than a day, and as the data is more and more abundant, a company in addition to the real users of the product scoring these data. Also has a variety of contextual information (such as when Xiao Hong is watching the movie, who is watching with whom, how the mood after reading, etc.), the boss wants to use these contextual information. But the above has been mentioned. The traditional collaborative filtering algorithm, however, is based on the user's rating of the product to recommend to the user, then. Is there any way to let a company do not replace this set of algorithms under the premise. To use these contextual information?

Splitting:categories of splitting:
The splitting method is divided into item-splitting. User-splitting,ui-splitting three kinds.

The basic idea of splitting:
For a thing x. Assume that it changes with the environment (context). X also has a very big change, you should think that X is different in different contexts (context), so we should cut them out. Treat as a different thing.

Splitting's approach:
Here item-splitting to illustrate, item-splitting, translated into Chinese is "project-cut". It literally means cutting a project (a product) and cutting it into several, since it's going to be cut. What are we going to take to cut? How to cut? Cut into several?

Cutting??? ~~~

Watermark/2/text/ahr0cdovl2jsb2cuy3nkbi5uzxqv/font/5a6l5l2t/fontsize/400/fill/i0jbqkfcma==/dissolve/70/ Gravity/center "width=" "height=" "style=" WHITE-SPACE:PRE; font-size:14px; line-height:21px; Widows:auto ",

anyway, Let's look at the first and second questions first. What to cut, how to cut? Look back at the small story, we will be very natural to think. At this point we will use the context information to cut the item. Here we look at the two matrices in Figure II and figure three:
( figure II )


Watermark/2/text/ahr0cdovl2jsb2cuy3nkbi5uzxqv/font/5a6l5l2t/fontsize/400/fill/i0jbqkfcma==/dissolve/70/gravity /center "width=" 650 "height=" 171 "style=" WHITE-SPACE:PRE; font-size:14px; line-height:21px; Widows:auto; Text-align:center "> ( figure III )

Figure three is the user's rating of the Titanic, and figure two is the user in the movie when the context of the information, the idea of the Item-splitting method is to use a context condition to cut the project attempt, try to cut into two vectors, if the two vectors are significantly different. The original item should be cut if the context condition above. Whether the "Lovers" context allows the Titanic to have a significantly different column vector after cutting, it should be divided into two vectors, four:


Watermark/2/text/ahr0cdovl2jsb2cuy3nkbi5uzxqv/font/5a6l5l2t/fontsize/400/fill/i0jbqkfcma==/dissolve/70/gravity /center "width=" 650 "height=" 175 "style=" FONT-SIZE:14PX; line-height:21px; Widows:auto; White-space:pre; Text-align:center "> (Figure Four)


In Figure four, according to the context of "whether the lover" and the "Titanic" this column vector is cut, the formation of two different vectors (with lovers, not with lovers), in the Item-splitting method. The corresponding is actually two different movies.

Here, a new question is introduced, how to infer that a vector is cut into two vectors. The two vectors are significantly different, that is, how to judge the Titanic (with Lovers) and Titanic (not with the lover) the two column vectors are significantly different?

The whole process of splitting is explained concretely in the paper [1]. Here is a formula for judging whether two vectors are significantly different:

(Figure Five)


The formula is a T-test of two samples. In cases where the P-value satisfies the threshold (typically p<=0.05), the higher the T value, the more different the two vectors, in which the UI is the average score of the movie, and Si is the score variance of the movie. NI for the film has how many non-0 values, subscript C and non-c denote different contexts (with lovers and not with lovers).

Item-splitting's algorithm pseudocode (from the paper [1]) six saw:
(Figure Six)


The matrix of graph one is item-splitting after running. If, in addition to The Great Gatsby, the other two films were splitting, they would become a seven matrix:

Watermark/2/text/ahr0cdovl2jsb2cuy3nkbi5uzxqv/font/5a6l5l2t/fontsize/400/fill/i0jbqkfcma==/dissolve/70/gravity /center "width=" "height=" "style=" WHITE-SPACE:PRE; font-size:14px; line-height:21px; Widows:auto; Text-align:center "> ( figure VII )

After using contextual information to get seven matrices. The traditional collaborative filtering algorithm can be applied to the matrix.



Finally look at the above-mentioned cut into several problems, in the above process, for each item, we are divided into 2 parts, that is the context and not the context, then. Can we divide it into many other parts? For example, in the context dimension of the weather in Figure II, each item is divided into 4 parts, namely, sunny, rainy, cloudy. It snowed in four of them. Can be able, but the paper [1] has mentioned that such partitioning will not only bring about the addition of time costs. Also causes the scoring matrix to be overly sparse, which can also cause overfitting problems.

Advantages and disadvantages of S-plitting method and its thinking: first of all, the splitting method can incorporate valuable contextual information into traditional collaborative filtering algorithms. This improves the recommended results.
Again, the shortcomings, from the above pseudo-code to analyze. We can know that the dimensions in the context are very high. And each dimension has a very large number of values, the whole splitting process is very time-consuming, and splitting cost of these times does not necessarily greatly improve the recommended effect, note that I said splitting the merits of the time, there is a special description of "valuable contextual information ", but. What contextual information is valuable? This is also related to the business context, some people specifically to the context of the selection has done research. Discovering that not all contextual information can improve the recommended effect. Sometimes the inappropriate context is introduced, but the recommended effect is reduced, so what context skills should be used to enhance the recommended effect for the splitting process. This also requires us to have a certain understanding of the business background.

here is a very good project on GitHub on the context recommendation system:Https://github.com/irecsys/CARSKit

Study Materials:[1]L. Baltrunas and F. Ricci. Experimental evaluation ofcontext-dependent Collaborative filtering using item splitting..
[2]yong zheng,robin Burke,Bamshad mobasher.splitting approaches for Context-aware recommendation:An empirical study.2014








Referral System (1)--splitting approaches for Context-aware recommendation

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.