User movie scoring data set download
http://grouplens.org/datasets/movielens/
1) item-based, non-personalized, everyone sees the same
2) user-based, personalized, everyone sees not the same
After the user's behavior analysis gets the user's liking, can calculate the similar user and the item according to the user's liking, then may base on the similar user or the item to recommend. This is the two branches in collaborative filtering, based on user-and item-based collaborative filtering.
When calculating the similarity between users, a user's preference for all items is used as a vector, while the similarity between items is calculated as a vector for all users ' preferences for an item. After finding the similarity, the next step is to make a similar neighbor.
3) Model-based (MODELCF)
According to the model, can be divided into:
1) Nearest neighbor Model: distance-based collaborative filtering algorithm
2) Latent Factor mode (SVD): A model based on matrix decomposition
3) Graph: Graph model, social Network diagram model
Applicable scenarios
for an online website, the number of users is often more than the number of items, At the same time the item data is relatively stable, so the similarity of the items is not only
calculation is small and does not have to be updated frequently. However, this only applies to e-commerce types of websites, such as news, blogs and other such sites
System recommendations, the situation is often the opposite, the number of items is huge, and frequent updates.
r language implementation of object-based collaborative filtering algorithm
  
#引用plyr包 library (plyr) #读取数据集 train<-read.table (file= "C:/users/administrator/desktop/u.data", sep= " ") Train<-train[1:3] names (train) <-c ("User", "item", "Pref") #计算用户列表方法 usersunique<-function () { users<-unique (Train$user) users[ Order (users)] } #计算商品列表方法 itemsunique<-function () { items<- Unique (Train$item) items[order (items)] } # user List users<-usersunique () # Product List &NBSP;&NBSp; items<-itemsunique () #建立商品列表索引 index< -function (x) which (items %in% x) data<-ddply (train,. ( USER,ITEM,PREF), Summarize,idx=index (item)) #同现矩阵 cooccurrence<-function (data) { n<-length (items) co<-matrix (Rep (0,n*n), nrow=n) for (U in users) { idx<-index (Data$item[which (data$user==u)) m<-merge (IDX,IDX) for (I in 1:nrow (m)) { co[m$x[i],m$y[i]]=co[m$x[i],m$y[i]]+1 } } return (CO) } #推荐算法 recoMmend<-function (udata=udata,co=comatrix,num=0) { n<-length (items) # all of pref pref<-rep (0,n) pref[udata$idx]<-udata$ pref # User Ratings Matrix userx<-matrix (pref,nrow=n) # co-existing matrix * Scoring matrix r<-co %*% userx # Recommended Results Sorting # Set the recommended value for the product that the user has scored to 0 r[udata$idx]<-0 idx<-order (r,decreasing=true) topn<-data.frame (User=rep (udata$ User[1],length (IDX)), Item=itemS[IDX],VAL=R[IDX]) topn<-topn[which (topn$val>0),] # recommended results before taking Num if ( num>0) { topn<-head (topn,num) } #返回结果 return ( TOPN) } #生成同现矩阵 Co<-cooccurrence (data) #计算推荐结果 recommendation<- Data.frame () for (I in 1:length (users)) { Udata<-data[which (Data$user==users[i]),] recommendation<-rbind ( Recommendation,recommend (udata,co,0)) }
Mareduce implementation
Reference article:
Http://www.cnblogs.com/anny-1980/articles/3519555.html
Code download
Https://github.com/bsspirit/maven_hadoop_template/releases/tag/recommend
Spark ALS Implementation
Spark Mllib uses a matrix decomposition for collaborative filtering, not userbase or itembase.
Reference article:
Http://www.mamicode.com/info-detail-865258.html
Import org.apache.spark.sparkconfimport org.apache.spark.mllib.recommendation. {als, matrixfactorizationmodel, rating}import org.apache.spark.rdd._import org.apache.spark.sparkcontextimport scala.io.sourceobject movielensals { def Main (args:array[string]) { //set the operating environment val sparkconf = new sparkconf (). Setappname ("Movielensals"). Setmaster ("local[5]") val sc = new sparkcontext (sparkconf) //load user ratings for ratings generated by the indexer ( Generate file PersonalRatings.txt) val myratings = loadratings (args (1)) val myratingsrdd = sc.parallelize (myratings, 1) / /Sample Data Catalog val movielenshomedir = args (0) //loading sample scoring data , where the last column timestamp takes the remainder of 10 as the value of key,rating, i.e. (int,rating) &nbsP;val ratings = sc.textfile (movielenshomedir + "/ratings.dat") .map { line => val fields = line.split ("::") // format: (timestamp % 10, rating (userid, movieid, rating)) ( Fields (3). tolong % 10, rating (Fields (0). Toint, fields (1). Toint, fields (2). ToDouble)) } //Loading Film catalogue (film id-> movie title) val Movies = sc.textfile (movielenshomedir + "/movies.dat") .map { line => val fields = Line.split ("::") // format: (Movieid, moviename) (Fields (0). Toint, fields (1)) }.collect () .tomap //counts the number of users and the number of movies and the number of users who rated the film val numratings = ratings.count () val numusers = ratings.map (_._2.user). Distinct (). Count () val nummovies = ratings.map (_._2.product). Distinct (). Count () println ("got " + numRatings + " ratings from " + numusers + " users " + numMovies + " movies") //the sample scoring table with a key value divided into 3 parts, respectively, for training (60%, and adding user ratings), check (20%), and test (20%) //This data is applied multiple times during the calculation, so the cache to memory val numpartitions = 4 val training = ratings.filter (x => x._1 < 6). Values.union (Myratingsrdd). RepartitioN (numpartitions). Persist () val validation = ratings.filter (x => x._1 >= 6 && x._1 < 8). Values.repartition (numPartitions). Persist () val test = ratings.filter (x => x._1 >= 8). Values.persist () val numtraining = training.count () val numvalidation = validation.count () val numtest = test.count () println ("training: " + numtraining + " validation: " + numValidation + " test: " + numtest) //training models under different parameters and validating in the calibration set, obtaining the model under the best parameters val ranks = List (8, 12) val lambdas = list (0.1, 10.0) val numiters =&nbSp List (10, 20) var bestmodel: option[matrixfactorizationmodel] = None var bestValidationRmse = Double.MaxValue var bestRank = 0 var bestLambda = -1.0 var bestNumIter = -1 for (rank <- ranks; lambda <- lambdas; numiter <- numiters) { val model = als.train (TRAINING,&NBSP;RANK,&NBSP;NUMITER,&NBSP;LAMBDA) val validationrmse = computermse (model, validation, numvalidation) println ("RMSE (validation) = " + validationrmse + " for the model trained with rank = " + rank + ",lambda = " + lambda + ", and numiter = " + numIter + ". ") if (VALIDATIONRMSE&NBSP;<&NBSP;BESTVALIDATIONRMSE) { bestmodel = some (model) bestValidationRmse = validationRmse bestrank = rank bestlambda = lambda bestNumIter = numIter } } //predicts the score of the test set with the best model and calculates the root mean square error between the actual scores (RMSE) val testrmse = computermse (bestmodel.get, test, numtest) println ("the best model was trained with rank = " + bestrank + " and lambda = " + bestLambda + ", and numiter = " + bestNumIter + ", and its rmse on the test set is " + testRmse + ". ") //create a naive baseline and compare it with The best model val meanrating = training.union (validation). Map (_.rating). Mean () val baselinermse = math.sqrt (Test.map (x => (meanrating - x.rating) * (meanrating - x.rating). Reduce (_ + _) / numtest) val improvement = (baselinermse - TESTRMSE) / baselinermse * 100 println ("The best model improves the baseline by " + "%1.2f ". Format (Improvement) + "%. ") //recommended the top 10 most interesting movies, note to remove the user has scored the film val myratedmovieids = myratings.map (_.product). Toset val candidates = sc.parallelize ( Movies.keys.filter (!myratedmovieids.contains (_)). Toseq) val recommendations = bestmodel.get .predict (Candidates.map (0, _)) .collect () .sortby (-_.rating) .take (&NBSP;&NBSP;&NBSP;&NBSP;VAR&NBSP;I&NBSP;=&NBSP;1&NBSP;&NBSP;&NBSP;&NBSP;PRINTLN) ("Movies recommended for you: ") recommendations.foreach { r => println ("%2d". Format (i) + ": " + movies (r.product )) &NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;I&NBSP;+=&NBsp;1 } sc.stop () } /** The RMS error between the calibration set forecast data and the actual data **/ def computermse (Model:matrixfactorizationmodel,data:rdd[rating], N:long):D ouble = { val predictions:rdd[rating] = model.predict (Data.map (x => (x.user,x.product))) val predictionsandratings = predictions.map{ x => ((x.user,x.product), x.rating)} . Join (Data.map (x => ((x.user,x.product), x.rating)). VALUES&NBSP;&NBSP;&NBSP;&NBSP;MATH.SQRT ( Predictionsandratings.map ( x => (x._1 - x._2) * (x._1 - x._2)) . reduce (_+_)/n) } /** load user ratings files personalratings.txt **/ def loadratings (path:string):seq[rating] = { val lines = Source.fromfile (Path). Getlines () &NBSP;&NBsp; val ratings = lines.map{ line => val fields = line.split ("::") rating (Fields (0). Toint,fields (1). Toint,fields (2). ToDouble) }.filter (_.rating > 0.0) if (ratings.isempty) { sys.error ("no ratings provided.") }else{ ratings.toSeq } }}
Reference article:
http://blog.csdn.net/acdreamers/article/details/44672305
Http://www.cnblogs.com/technology/p/4467895.html
http://blog.fens.me/rhadoop-mapreduce-rmr/
This article is from "not what Daniel qq:934033381" blog, please make sure to keep this source http://tianxingzhe.blog.51cto.com/3390077/1710048
Collaborative filtering algorithm R/mapreduce/spark Mllib multi-language implementation