Machine learning with Spark learning notes (training on 100,000 movie data, using recommended models)

Source: Internet
Author: User

We will now start training the model and enter the parameters as follows:
The number of factors in the rank:als, usually the larger the better, but has a direct impact on memory usage, usually rank between 10 and 200.
Iterations: The number of iterations, each iteration reduces the reconstruction error of the ALS. After several iterations, the ALS model converges to get a good result, so many iterations (usually 10 times) are not required in most cases.
Lambda: The regularization parameter of the model, which controls the avoidance of overfitting, the greater the value, the more regularization.

We will use 50 factors, 8 iterations, regularization parameters 0.01来 training model:

val model = ALS.train(ratings, 50, 8, 0.01)

Note: The iteration parameter used in the original book is 10, but using 10 iterations on this machine causes the heap memory to overflow and is debugged to change it to 8.
It returns a Matrixfactorizationmodel object that contains the RDD of user and item, in the form of (id,factor) pairs, which are userfeatures and productfeatures.

println(model.userFeatures.count)println(model.productFeatures.count)



The Matrixfactorizationmodel class has a very handy way of predict, which predicts scores for a combination of users and items.

val predictedRating = model.predict(789, 123)

The user ID chosen here is 789, which calculates his probable score for the film 123, with the following results:

The results you get may not be the same as mine, because the ALS model is randomly initialized.

The Predict method creates an RDD (User,item), personalized recommendation for a user, Matrixfactorizationmodel provides a very convenient way to--recommendproducts, input parameters: User, Num. User Id,num is the number of users that will be recommended.

10 movies are now recommended for user 789:

val789val10val topKRecs = model.recommendProducts(userID, K);println(topKRecs.mkString("\n"))

The results are as follows:

Take the name of the movie below:

val movies = sc.textFile("F:\\ScalaWorkSpace\\data\\ml-100k\\u.item")val titles = movies.map(line => line.split("\\|").take(2)).map(array => (array(0).toInt, array(1))).collectAsMap()println(titles(123))

The results are as follows:

Let's take a look at how many movies the user scored in 789:

val moviesForUser = ratings.keyBy(_.user).lookup(789)println(moviesForUser.size)

The results are as follows:

You can see that user 789 has scored 33 movies.
Next we are going to get the top 10 highest rated movies, use the rating field of the rating object, and get the name of the movie according to the movie ID:

moviesForUser.sortBy(-_.rating).take(10).map(rating => (titles(rating.product), rating.rating)).foreach(println)

The results are as follows:

Then we'll see which 10 movies are recommended for this user:

topKRecs.map(rating => (titles(rating.product), rating.rating)).foreach(println)

The results are as follows:

Find similar Movies

By calculating the cosine of the angle of the two vectors to determine the similarity, if it is 1, then the description is exactly the same, if it is 0 then there is no correlation, if 1 indicates that the two are the exact opposite. First, we write a method for calculating the cosine of the two vectors:

def cosineSimilarity(vec1: DoubleMatrix, vec2: DoubleMatrix): Double = {    vec1.dot(vec2) / (vec1.norm2() * vec2.norm2())  }

Now to detect whether it is correct, choose a movie and see if it is 1 with its own similarity:

val567val itemFactor = model.productFeatures.lookup(itemId).headvalnew DoubleMatrix(itemFactor)println(cosineSimilarity(itemVector, itemVector))


You can see that the result is 1!

Next we calculate the similarity of the other movies to it:

valcase (id, factor) =>       valnew DoubleMatrix(factor)      val sim = cosineSimilarity(factorVector, itemVector)      (id,sim)    }

Then get the first 10:

val sortedSims = sims.top(K)(Ordering.by[(Int, Double), Double]{      case(id, similarity) => similarity    })println(sortedSims.take(10).mkString("\n"))

The results are as follows:

Now look at the name of the movie:

val sortedSims2 = sims.top(K+1)(Ordering.by[(Int, Double), Double]{      case(id, similarity) => similarity    })println(sortedSims2.slice(1, 11).map{case (id, sim) => (titles(id), sim)}.mkString("\n"))

The results are as follows:

Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.

Machine learning with Spark learning notes (training on 100,000 movie data, using recommended models)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.