In this paper, we introduce the singular value decomposition SVD in the geometrical sense, then analyze the difference and relation between eigenvalue decomposition and singular value decomposition, and finally use Python to apply SVD to the Recommender system.
1.SVD explanation
SVD (singular value decomposition), translated into Chinese is singular value decomposition. There are many uses of SVD, such as LSA (implicit semantic analysis), recommender system, feature compression (or data dimensionality reduction). SVD can be understood as: a more complex matrix is represented by multiplying the smaller and simpler 3 sub-matrices, and these 3 small matrices describe the important characteristics of a large matrix.
1.1 the geometrical meaning of singular value decomposition ( the way it is troublesome to enter by Formula input )
2.SVD apply to Recommender systems
The rows in the dataset represent the user, and the column represents the item, where the value represents the user's rating of the item. The advantage of SVD is that the user's scoring data is sparse matrix, which can be used to map the original data to the low-dimensional space and the low-dimensional space.
Overall idea: First find the user did not score items, by calculating the similarity between the non-rated items and other items, get a forecast score, and then the ratings of these items from high to low to sort, return to the first n items recommended to the user.
The code below is divided into 5 main parts:
Part 1th: Load the test data set;
The 2nd part: Define three kinds of calculating similarity degree method;
The 3rd part: by calculating the singular value of the sum of squares to determine the number of dimensions to reduce the appropriate, return to the dimension to be reduced;
Part 4: In the already reduced dimension of data, based on SVD on the user has not scored items to predict, return to the non-rated items of the forecast score value;
Part 5th: Produce items with a high value for the top N, return item number and forecast score value.
The advantage is that the user's scoring data is sparse matrix, can use SVD to map the data to the low-dimensional space, and then calculate the similarity between the item in the low-dimensional space, the user has not scored the item scoring forecast, and finally the prediction of high-scoring item recommended to the user.
#Coding=utf-8 fromNumPyImport* fromNumPyImportLinalg as La" "load test Data set" "defloadexdata ():returnMat ([[[0, 0, 0, 0, 0, 4, 0, 0, 0, 0, 5], [0, 0, 0,3, 0, 4, 0, 0, 0, 0, 3], [0, 0, 0, 0,4, 0, 0, 1, 0, 4, 0], [3, 3, 4, 0, 0, 0, 0, 2, 2, 0, 0], [5, 4, 5, 0, 0, 0, 0, 5, 5, 0, 0], [0, 0, 0, 0,5, 0, 1, 0, 0, 5, 0], [4, 3, 4, 0, 0, 0, 0, 5, 5, 0, 1], [0, 0, 0,4, 0, 4, 0, 0, 0, 0, 4], [0, 0, 0,2, 0, 2, 5, 0, 0, 1, 2], [0, 0, 0, 0,5, 0, 0, 0, 0, 4, 0], [1, 0, 0, 0, 0, 0, 0, 1, 2, 0, 0]])" "Here are three algorithms for calculating similarity, namely Euclidean distance, Pearson correlation coefficient, and cosine similarity, noting that the parameters of three calculation methods INA and INB are column vectors" "defEcludsim (INA,INB):return1.0/(1.0+la.norm (INA-INB))#The norm is calculated by Linalg.norm (), where the 1/(1+ distance) indicates that the range of similarity is placed between 0 and 1defPearssim (INA,INB):ifLen (InA) <3:return1.0return0.5+0.5*corrcoef (ina,inb,rowvar=0) [0][1]#Pearson's correlation coefficient calculation method Corrcoef (), the parameter rowvar=0 represents the similarity to the column, where the 0.5+0.5*corrcoef () is to place the scope normalized between 0 and 1defCossim (INA,INB): Num=float (ina.t*InB) Denom=la.norm (InA) *la.norm (InB)return0.5+0.5* (Num/denom)#return the similarity between 0 and 1" "The value of K is determined by the sum of squares of the first k singular values and the sum of squares of the total singular values, and the percentage of the original matrix to the K-dimensional space after the subsequent computation of the SVD." "defsigmapct (sigma,percentage): Sigma2=sigma**2#Squared for SigmaSumsgm2=sum (SIGMA2)#sum of squares of all singular value sigmaSumsgm3=0#SUMSGM3 is the sum of squares of the first k singular valuesk=0 forIinchSIGMA:SUMSGM3+=i**2k+=1ifsumsgm3>=sumsgm2*percentage:returnk" "the parameters of the function svdest () include: Data matrix, user number, item number and singular value of the threshold value, the data matrix row corresponding to the user, the column corresponding to the item, function is based on the similarity of the item to the user's non-evaluated items to predict the score" "defsvdest (datamat,user,simmeas,item,percentage): N=shape (Datamat) [1] Simtotal=0.0;ratsimtotal=0.0U,SIGMA,VT=LA.SVD (Datamat) k=sigmapct (Sigma,percentage)#determined the value of KSigmak=mat (Eye (k) *sigma[:k])#Building a diagonal matrixXformeditems=datamat.t*u[:,:k]*sigmak.i#converts raw data to K-dimensional space (low-dimensional) according to the value of K, Xformeditems represents the value of the item after the conversion of the K-dimensional space forJinchrange (N): userrating=Datamat[user,j]ifUserrating==0orJ==item:ContinueSimilarity=simmeas (Xformeditems[item,:]. T,xformeditems[j,:]. T#calculate the similarity between item and item JSimtotal+=similarity#sum all of the similaritiesRatsimtotal+=similarity*userrating#use "Item item and item J similarity" multiplied by "user's rating of item J" and Sum ifSimtotal==0:return0Else:returnRatsimtotal/simtotal#get a predictive score on item item" "The function recommend () produces the N recommendations with the highest predictive score, returning 5 by default; parameters include: Data matrix, user number, method of similarity measurement, method of predicting scoring, and threshold value of singular value; row of data matrix corresponds to user, column corresponding item, function is based on the similarity of the item to the user is not over-evaluated items to predict the score; method of similarity measurement default cosine similarity" "defRecommend (datamat,user,n=5,simmeas=cossim,estmethod=svdest,percentage=0.9): Unrateditems=nonzero (Datamat[user,:]. a==0) [1]#Create a list of user-not-rated item ifLen (unrateditems) ==0:return 'You rated everything' #if they have already been excessively evaluated, exititemscores=[] forIteminchUnrateditems:#for each item that is not scored, its forecast score is calculatedEstimatedscore=Estmethod (Datamat,user,simmeas,item,percentage) itemscores.append ((item,estimatedscore)) ItemScores=sorted (itemscores,key=LambdaX:x[1],reverse=true)#sort from big to small according to item's score returnITEMSCORES[:N]#returns the item name of the top N large score value, and its forecast score value
Name the file svd2.py and enter it at the Python prompt:
>>>Import svd2>>>testdata=svd2.loadexdata ()>>> Svd2.recommend (testdata,1,n=3,percentage=0.8) #对编号为1的用户推荐评分较高的3件商品
Reference:
1.Peter Harrington, "machine learning combat", people's post and Telecommunications press, 2013
2.HTTP://WWW.AMS.ORG/SAMPLINGS/FEATURE-COLUMN/FCARC-SVD (a very good explanation of SVD is very helpful for understanding SVD, the geometrical meaning of SVD in this article is to refer to this article)
3. http://blog.csdn.net/xiahouzuoxin/article/details/41118351 (an article explaining the difference between SVD and eigenvalue decomposition)
[Machine Learning Notes] A brief introduction of singular value decomposition SVD and its simple application in Recommender system