Note: The code in the original is written in Spark-shell, and I am writing execution in Eclipse, so the resulting output may not be the same as in this book.
First, the user data u.data read into the Sparkcontext, and then output the first data to see the effect, the code is as follows:
valnew SparkContext("local""ExtractFeatures")val rawData = sc.textFile("F:\\ScalaWorkSpace\\data\\ml-100k\\u.data")println(rawData.first())
Note: The first line of code I created the spark context, if you are running code in Spark-shell, it will automatically create the spark context, named SC, I am writing code in Eclipse, so I need to write my own code to create the spark context, We can see the following output:
Each data is separated by "\ T", we now want to take out each piece of data, and then fetch the first three elements of each data, that is, the user ID, movie ID, the user to the movie rating, the code is as follows:
val rawRatings = rawData.map(_.split("\t").take(3))rawRatings.first().foreach(println)
You can see output similar to the following:
Next we'll use Spark's built-in Mllib library to train our models. Let's see what methods are available and what parameters are required as input. First we import the built-in library file als:
import org.apache.spark.mllib.recommendation.ALS
The next operation is done in Spark-shell. Under Console, enter ALS. (Note that there is a point behind the ALS) plus the TAP key:
The method we are going to use is the train method.
If we enter Als.train, we will return an error, but we can look at the details of this method from this error:
As you can see, we have to provide at least three parameters: Ratings,rank,iterations, the second method requires another parameter, lambda. Let's take a look at the class rating of the parameter rating:
As we can see, we need to provide a rdd,rating that contains rating to the ALS model to encapsulate the user Id,movie ID (that is, product here) and rating. We will use the map method on the scoring dataset (rating dataset) to convert an array of IDs and ratings into rating objects:
val ratings = rawRatings.map { case Array(user, movie, rating) => Rating(user.toInt, movie.toInt, rating.toDouble) }println(ratings.first())
The output is as follows:
Now we get a rating type of RDD.
Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.
Machine learning with Spark learning notes (extract 100,000 Movie Data features)