SVD (Singular value decomposition) singular value decomposition:
Advantages: Used to simplify data, remove noise, and improve the results of the algorithm.
Cons: Data conversions can be difficult to understand.
Applicable data type: numeric data.
I. SVD and recommendation system
The dishes are made by the restaurant's food and vegetable master, who can use any integer from 1 to 5 to rate the dish, and if the food master has not tasted a dish, it is rated at 0.
Create a new file svdrec.py and add the following code:
def loadexdata (): return [[0, 0, 0, 2, 2], 3, 3], 1, 1], [1, 1, 1, 0, 0], [2, 2, 2, 0, 0], [5, 5, 5, 0, 0], [1, 1, 1, 0, 0]
U, s, vt = LA.SVD (loadexdata ())print s#[ 9.64365076e+00 5.29150262e +00 9.99338251e-16 4.38874654e-16# 1.19121230e-16]
We can find the eigenvalues, the first two are much larger than the others, so we can get rid of the last three values because they have very little effect.
Can be seen in the top three people, like roast beef and hand-torn pork, these dishes are American barbecue restaurant has the dishes, the two eigenvalues can be corresponding to the food BBQ and Japanese food two categories of food, so you can think of these three people belong to a class of users, the following four people belong to a class of users, so recommend is very simple.
Create a new file svdrec.py and add the following code:
def loadexdata (): return [[1, 1, 1, 0, 0], [2, 2, 2, 0, 0], [1, 1, 1, 0, 0], [5, 5, 5, 0, 0], [1, 1, 0, 2, 2], 3, 3],
U, s, vt = LA.SVD (loadexdata ())print s#[ 9.72140007e+00 5.29397912e+00 6.84226362e-01 1.18665567e-15# 3.51083347e-16]
We can find the eigenvalues, the first 3 are much larger than the other values, so we can get rid of the last 2 values because they have very little effect.
The above example can approximate the original data with the following results:
Second, the recommendation engine based on collaborative filtering
Collaborative filtering (collaborative filtering) is recommended by comparing users to other users ' data.
1. Calculation of similarity
defEcludsim (INA,INB):return1.0/(1.0 + la.norm (INA-INB))#The second normal form of the calculated vector is equivalent to the direct calculation of the Euclidean distance defPearssim (INA,INB):ifLen (InA) < 3:return1.0return0.5+0.5*corrcoef (InA, InB, Rowvar = 0) [0][1]#Corrcoef directly calculates Pearson correlation coefficients. Pearssim checks to see if there are 3 or more points. There is no return 1 because the two vectors are fully correlated at this time. defCossim (INA,INB): Num= Float (ina.t*InB) Denom= La.norm (InA) *la.norm (InB)return0.5+0.5* (Num/denom)#Calculate cosine similarity
2. Similarity based on item and user-based similarity degree
When the number of users is very large, it is better to use item-based similarity calculation method.
3. Example: Restaurant dish recommendation engine based on item similarity
14-Simplifying data with SVD