Machine learning with Spark learning notes (extract 100,000 Movie Data features)

Source: Internet
Author: User

Note: The code in the original is written in Spark-shell, and I am writing execution in Eclipse, so the resulting output may not be the same as in this book.

First, the user data u.data read into the Sparkcontext, and then output the first data to see the effect, the code is as follows:

valnew SparkContext("local""ExtractFeatures")val rawData = sc.textFile("F:\\ScalaWorkSpace\\data\\ml-100k\\u.data")println(rawData.first())

Note: The first line of code I created the spark context, if you are running code in Spark-shell, it will automatically create the spark context, named SC, I am writing code in Eclipse, so I need to write my own code to create the spark context, We can see the following output:

Each data is separated by "\ T", we now want to take out each piece of data, and then fetch the first three elements of each data, that is, the user ID, movie ID, the user to the movie rating, the code is as follows:

val rawRatings = rawData.map(_.split("\t").take(3))rawRatings.first().foreach(println)

You can see output similar to the following:

Next we'll use Spark's built-in Mllib library to train our models. Let's see what methods are available and what parameters are required as input. First we import the built-in library file als:

import org.apache.spark.mllib.recommendation.ALS

The next operation is done in Spark-shell. Under Console, enter ALS. (Note that there is a point behind the ALS) plus the TAP key:

The method we are going to use is the train method.

If we enter Als.train, we will return an error, but we can look at the details of this method from this error:

As you can see, we have to provide at least three parameters: Ratings,rank,iterations, the second method requires another parameter, lambda. Let's take a look at the class rating of the parameter rating:

As we can see, we need to provide a rdd,rating that contains rating to the ALS model to encapsulate the user Id,movie ID (that is, product here) and rating. We will use the map method on the scoring dataset (rating dataset) to convert an array of IDs and ratings into rating objects:

val ratings = rawRatings.map {      case Array(user, movie, rating) =>        Rating(user.toInt, movie.toInt, rating.toDouble)    }println(ratings.first())

The output is as follows:

Now we get a rating type of RDD.

Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.

Machine learning with Spark learning notes (extract 100,000 Movie Data features)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.