Collaborative filtering recommendation (1th week)

Source: Internet
Author: User

Description: The article for beginners to see the recommendation System (LANCI), combined with the notes made on the Internet, does not guarantee its correctness ~

First, the current mainstream recommended methods are:

1, collaborative filtering recommendation;

2, Content-based recommendations;

3, based on the recommendation of knowledge;

4, mixed recommendation;

(Note: The learning process does not have to deliberately separate them, by their recommendation formula to know their flaws, mixed use)

1, Collaborative filtering (CF collaborative Filtering):

1-1: User-based nearest neighbor recommendation:

1) (I understand the definition) from the "collaborative filtering" four words know what it refers to: because of the thought of recommendation, we all want to ask the user what kind of things, and then according to his answer recommended to him, or see him to some already like things, recommend to him similar things. However, "collaborative recommendation" and the above ideas, the recommendation system needs other users "synergy", all users to "filter", select the target users have similar interests, and then recommend;

2) Example Description (1-5 of items scored, corresponding dislike to like):

Item 1

Item 2

Item 3

Item 4

Item 5

Average

Alice

5

3

4

4

4 (total number of items is 4)

User 1

3

1

2

3

3

2.4 (the total number of items in this line and below is 5)

User 2

4

3

4

3

5

3.8

User 3

3

3

1

5

4

3.2

User 4

1

5

5

2

1

2.8

A, now we want to figure out whether Alice likes item 5, that is, using the collaborative filtering method to calculate Alice's 5 rating, if the high score indicates Alice likes item 5, if the score indicates Alice does not like items 5

B, now for all users to "filter", choose, and Alice have similar hobbies, usually we judge whether two people have similar hobbies, compare their scores on the item is good, such as the following is our first thought:

A, the most intuitive method:

Alice and the user 1:|5-3|+|3-1|+|4-2|+|4-3|=7

Alice and the user 2:|5-4|+|3-3|+|4-4|+|4-3|=2

......

B, but this method has a disadvantage, that is, the user standards are different, such as the requirements of Tom and Mary is similar, Tom's scoring standard LAX, the item was scored {4,3,5,5,4}, and Mary scored a strict standard, the score is {2,1,3,3,2}, This score is likely to indicate that Tom and Mary both like Items 3, 4, and the above calculation shows that they differ greatly, which is inaccurate. Therefore, each of the above table data can be subtracted by the corresponding average, and converted to the following table:

Item 1

Item 2

Item 3

Item 4

Item 5

Alice

1

-1

0

0

User 1

0.6

-1.4

-0.4

0.6

0.6

User 2

0.2

-0.8

0.2

-0.8

1.2

User 3

-0.2

-0.2

-2.2

1.8

0.8

User 4

-1.8

2.2

2.2

-0.8

-1.8

The table avoids the different user's scoring standard influence, in the performance is, two users are similar, does not depend on their score whether the difference is very big, but depends on their trend is similar (can see, Alice and User 1 's hobby very similar, with User 4 difference very big):


C, and because the above absolute value is not conducive to calculation, consider the use of radical, the root of the square is difficult to calculate. The book has the following method Pearson coefficient:


Understanding of the Pearson coefficients: User A and User B adopt Pearson coefficients:


My understanding: Why can this, we look at two-dimensional, because compare their trend direction, so COS value can indicate their similarity (generalize to n should also be OK, but I do not know the reason that can be promoted now):

Item 1

Item 2

User A

A1

A2

User B

B1

B2

D, by the above analysis know: We can use the following companies to calculate whether two users are similar:


E, calculation results:


F, to find out who is similar to Alice, we can use these similar users to calculate Alice's rating of the item:


3) Defects:

A, when the user and the item thousands, the computer complexity is big;

B, the user's number of items scored too little, even no comment difficult to apply.

1-2. The nearest neighbor recommendation based on the item

1) In order to know the user 5 likes sunny, such as, found that the vast majority of users of the sunny and Qi Qi fragrance of similar scores, so select Sunny, because these two songs are very similar, if the user 5 likes Qi Qi incense, he is likely to like sunny days, if he does not like Qi Qi incense, it is likely not like sunny days;

User 1

User 2

User 3

User 4

User 5

Sunny day

1

2

3

4

Qi Qi Xiang

2

2

3

4

4

Sky

3

2

1

4

5

Anonymous Friends

2

4

5

2

2

2) The calculation method is similar to the above based on the user, and is still calculated using the user-based example:

1-3, SVD (singular value decomposition) (This method I do not quite understand, but also to continue to look at the information):

(ref. 1)

(Ref. 2)

1) Basic knowledge:

A, diagonal matrix: a matrix that all elements are 0 outside the diagonal

B, eigenvalue: to a m*m C matrix, if there is a constant λ and matrix X satisfies the following formula, then said λ is the eigenvalues of Matrix C, called Matrix X is a matrix C eigenvector;

C, eigenvalues λ and eigenvectors x:


D, decomposition of a matrix into the following form called eigenvalue decomposition, where q is a matrix of matrix A eigenvector, representing the diagonal array, the elements on the diagonal are eigenvalues


E, singular value (eigenvalue decomposition is a good way to extract matrix features), for a n*m matrix, we decompose into the following form, U is the n*n square (called the left singular vector), is the matrix of n*m (except that the diagonal elements are 0, the diagonal elements are called singular values), The transpose of V is called the right singular vector:



F, The Matrix A * (a transpose) to get the square, can be processed according to the above formula, V is the right singular vector, singular value, U is the left singular vector:



G, singular values and eigenvalues, in the matrix from large to small arrangement, in most cases, the first 10% or even 1% of the total singular value of the sum of more than 99%, that is, the R can be reduced (and r can be far less than n,m):

H, as described above, when R is closer to N, the closer the result is to a. When R is an hour, the smaller the storage, that is, when we decompose a into the right three matrices, save the three smaller matrices to the right:

2, Content-based recommendation introduction:

1) My understanding: The analysis of the user once liked or purchased items, see what they have characteristics, and then according to these characteristics to the user recommendation. For example, the user buys the computer introduction and the data structure, then we analyze these two books to know that they all originate from "Tsinghua University Press", "the computer specialized book", then the system finds satisfies these two characteristics the book to recommend to the user;

2) Advantages:

A, without the need for large-scale data to obtain high accuracy;

B, know the attributes of the goods can search out similar book recommendations;

C, I also think that: In contrast to collaborative filtering, collaborative filtering needs to constantly update the data for users, such as the user 1 recommended books, and other users over time in the purchase of books, so the user 1 of the nearest neighbor users may change, and content-based recommendations on the update frequency is lower;

3) Applicable: When the amount of data is very large;

3, knowledge-based recommendation introduction:

1) My understanding: The above two recommended methods need to have the user's historical data or some behavior as a result of analysis two, and similar to the camera, the purchase of mobile phones is difficult to derive the recommendation from the historical data, and therefore mainly from the user's interaction with the relevant information, the user is recommended. For example: When purchasing a computer, it will let the user choose CPU type, main memory, RAM size;

2) Disadvantages:

For some professional products, less understanding of users, information provided to users caused by the problem

Collaborative filtering recommendation (1th week)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.