"Spark Mllib" performance evaluation--mse/rmse and MAPK/MAP

Source: Internet
Author: User
Tags spark mllib
Recommendation Model Evaluation

In this article, we evaluate the performance of the Spark Machine Learning 1.0: Recommendation engine-Movie recommendation model. Mse/rmse

Mean Variance (MSE) is the sum of the values of the POW (forecast score-actual score, 2), divided by the number of items, for each actual existing rating. and the RMS Difference (RMSE) is the MSE open radical.

We first use ratings to generate the (user,product) Rdd as a parameter to the model.predict (), resulting in a key,value-based RDD (user,product) for rating prediction. Then, use ratings to generate the RDD with (user,product) as key, the actual rating as value, and join the forward person:

Val usersproducts = ratings.map{case Rating (user, product, Rating) = =  (user, product)}
val predictions = model . Predict (usersproducts). map{Case
    Rating (user, product, Rating) = ((user, product), Rating)
}
Val Ratingsandpredictions = ratings.map{case
    Rating (user, product, Rating) = (user, product), Rating)
}.join ( Predictions)
Ratingsandpredictions.first ()
//RES21: ((int, int), (double, double)) = ((291,800), ( 2.0,2.052364223387371))

Using the Mllib evaluation function, we are going to pass in a (actual,predicted) Rdd. Actual and predicted positions can be exchanged:

Import Org.apache.spark.mllib.evaluation.RegressionMetrics
val predictedandtrue = ratingsandpredictions.map { Case (user, product), (actual, predicted)) = (actual, predicted)}
val regressionmetrics = new Regressionmetrics ( Predictedandtrue)
println ("Mean squared Error =" + regressionmetrics.meansquarederror)
println ("Root Mean Squared error = "+ regressionmetrics.rootmeansquarederror)
//Mean squared error = 0.08231947642632852
//Root Me An squared Error = 0.2869137090247319
Mapk/map

K-Value average accuracy (MAPK) can be simply understood as:
Set recommended k=10, which is recommended for 10 items. Predict the user's highest rated 10 item ID as text 1, in fact the user has scored all the item ID as text 2, to find the correlation between the two. (Personal opinion that this assessment method is not very applicable here)
We can predict the item ID by rating, and then traverse from the beginning, if the forecast ID appears in the actual score over the ID of the collection, then add a certain score (of course, the high ranking should be lower than the ranking to add more points, because the former better reflect the recommended accuracy). Finally divide the accumulated score by min (k,actual.size)
If it is for all users, we need to add the cumulative score for each user, divided by the number of users.
In Mllib, the global average accuracy rate (MAP, not set K) is used. it needs us to pass in (predicted. Array,actual. Array) of the Rdd.
Now, let's start by generating predicted:
Our husband into the product matrix:

/* Compute Recommendations for all users *
/val itemfactors = model.productFeatures.map {case (id, factor) = Fact or}.collect ()
val itemmatrix = new Doublematrix (itemfactors)
println (itemmatrix.rows, Itemmatrix.columns)
//(1682,50)

So that the work node can access it, we distribute the matrix as a broadcast variable:

Broadcast the item factor matrix
val imbroadcast = Sc.broadcast (Itemmatrix)

The matrix is multiplied to calculate the score. Scores.data.zipwithindex,scores.data are then sorted by rating. Generate Recommendedids, Build (UserId, Recommendedids) RDD:

Val allrecs = model.userfeatures.map{case (userId, array) = 
  val uservector = new Doublematrix (array)
  Val Scor Es = ImBroadcast.value.mmul (uservector)
  val sortedwithid = scores.data.zipWithIndex.sortBy (-_._1)
  Val Recommendedids = Sortedwithid.map (_._2 + 1). Toseq
  (userId, recommendedids)
}

Extract actual values:

Next get all the movie IDs per user, grouped by user ID
val usermovies = ratings.map{case Rating (user, product, RA Ting) = (user, product)}.groupby (_._1)
//usermovies:org.apache.spark.rdd.rdd[(int, seq[(int, int)]) = MAPPARTITIONSRDD[277] at GroupBy at <console>:21

Build (predicted. Array,actual. Array) of the RDD and use the Evaluate function:

import org.apache.spark.mllib.evaluation.RankingMetrics val predictedandtrueforranking = Allrecs.join (usermovies). map{Case (UserId, (predicted, actualwithids)) = val actual = Actualwithids.map (_._2) (p Redicted.toarray, Actual.toarray)} val rankingmetrics = new Rankingmetrics (predictedandtrueforranking) println ("Mean Average Precision = "+ rankingmetrics.meanaverageprecision)//Mean Average Precision = 0.07171412913757183 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.