14-Simplifying data with SVD

Source: Internet
Author: User

SVD (Singular value decomposition) singular value decomposition:

Advantages: Used to simplify data, remove noise, and improve the results of the algorithm.

Cons: Data conversions can be difficult to understand.

Applicable data type: numeric data.

I. SVD and recommendation system

The dishes are made by the restaurant's food and vegetable master, who can use any integer from 1 to 5 to rate the dish, and if the food master has not tasted a dish, it is rated at 0.

Create a new file svdrec.py and add the following code:

def loadexdata ():       return [[0, 0, 0, 2, 2],             3, 3],             1, 1],             [1, 1, 1, 0, 0],             [2, 2, 2, 0, 0],             [5, 5, 5, 0, 0],             [1, 1, 1, 0, 0]
U, s, vt = LA.SVD (loadexdata ())print  s#[  9.64365076e+00   5.29150262e +00   9.99338251e-16   4.38874654e-16#   1.19121230e-16]

We can find the eigenvalues, the first two are much larger than the others, so we can get rid of the last three values because they have very little effect.

Can be seen in the top three people, like roast beef and hand-torn pork, these dishes are American barbecue restaurant has the dishes, the two eigenvalues can be corresponding to the food BBQ and Japanese food two categories of food, so you can think of these three people belong to a class of users, the following four people belong to a class of users, so recommend is very simple.

Create a new file svdrec.py and add the following code:

def loadexdata ():     return [[1, 1, 1, 0, 0],      [2, 2, 2, 0, 0],      [1, 1, 1, 0, 0],      [5, 5, 5, 0, 0],      [1, 1, 0, 2, 2],      3, 3],      
U, s, vt = LA.SVD (loadexdata ())print  s#[  9.72140007e+00   5.29397912e+00   6.84226362e-01   1.18665567e-15#   3.51083347e-16]

We can find the eigenvalues, the first 3 are much larger than the other values, so we can get rid of the last 2 values because they have very little effect.

The above example can approximate the original data with the following results:

Second, the recommendation engine based on collaborative filtering

Collaborative filtering (collaborative filtering) is recommended by comparing users to other users ' data.

1. Calculation of similarity

defEcludsim (INA,INB):return1.0/(1.0 + la.norm (INA-INB))#The second normal form of the calculated vector is equivalent to the direct calculation of the Euclidean distance defPearssim (INA,INB):ifLen (InA) < 3:return1.0return0.5+0.5*corrcoef (InA, InB, Rowvar = 0) [0][1]#Corrcoef directly calculates Pearson correlation coefficients. Pearssim checks to see if there are 3 or more points. There is no return 1 because the two vectors are fully correlated at this time.  defCossim (INA,INB): Num= Float (ina.t*InB) Denom= La.norm (InA) *la.norm (InB)return0.5+0.5* (Num/denom)#Calculate cosine similarity

2. Similarity based on item and user-based similarity degree

When the number of users is very large, it is better to use item-based similarity calculation method.

3. Example: Restaurant dish recommendation engine based on item similarity

14-Simplifying data with SVD

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.