"Go" prediction model of movie scoring based on R language construction

Source: Internet
Author: User

First, the premise prepares the 1.R language pack: Ggplot2 package (Drawing), Recommenderlab package, reshape package (data processing) 2. Get data: You can download these free data sets at the University of Minnesota's Social Computing Research Center website, which links to HT tp://grouplens.org/datasets/movielens/, can also be downloaded through the network disk Https://yunpan.cn/Oc6R9apvCnVXGc
Access password E1AF. This includes a dataset and a description of the data set, which is a rating of 1682 movies by 943 users, with each scoring value 1,2,3,4,5. Information about the data, in the data described in the detailed instructions, here will not repeat. Second, the data processing first loads the packages we need:
Library (Recommenderlab) library (reshape) library (GGPLOT2)

Next we will read the data, if the data in the current working directory, then we can directly enter the data name in the following code, namely U.data. When the data is not in the current working directory, we can read the data by entering the path.

Mydata<-read.table ("E:/my blog/r blog/movie/ml-100k/u.data", header = False,stringsasfactors = TRUE)

Stringsasfactors = True in the code indicates that all columns in the table are not factors and are numeric data.

We can view the data in the first 6 rows of the dataset through the head () function. The first column is the user ID, the second row is the movie ID, the third column is the rating, and the fourth column is the user rating time. These are described in the data presentation. The user's comment time is of no use to our analysis, so we can delete this column.
MYDATA<-MYDATA[,-4]

Now there are only three columns in this dataset. I'm going to use Ggplot2 to analyze the user's scoring results for the movie. I decided to use a pie chart to show the results so that it would be good to show the distribution of the scoring columns.

Ggplot (Mydata,x=v3,aes (X=factor (1), Fill=factor (V3))) +geom_bar (width = 1) +  coord_polar (theta= "y") +ggtitle (" Score Map ") +    Labs (x=" ", y=" ") +  Guides (fill=guide_legend (title = ' score Score '))

The figure shows that the score is one point, two points of special less, the user gives three points, four points more, accounted for more than two-thirds. When a new movie scored below 3.5, almost half of the users were lost.

The data is processed using the reshape package to generate a fill matrix of v1*v2,v3 values.

Mydata<-cast (mydata,v1~v2,value= "V3") #生成一个以v1为行, V2 is the matrix of the column, fill v3 with Mydata<-mydata[,-1] #第一列数字为序列, you can delete

This time, MyData has two attribute values CAST_DF and data.frame, want to learn more about CAST_DF, you can view the following URL https://www.r-statistics.com/tag/cast_df/. We want to change the MyData property to a data frame, where CAST_DF cannot be converted directly to the matrix, so we need to remove this class attribute and keep only data.frame.

Class (MyData) <-"Data.frame"

Next, we still have to process the data to convert it into the Realratingmatrix property that the Recommenderlab package can handle. In the following, we first convert MyData to a matrix and then use the AS () function to force the type conversion to achieve the result we want.

Mydata<-as.matrix (MyData) mydata<-as (MyData, "Realratingmatrix") mydata# generates a 943*1682realratingmatrix type of matrix, 100,000 records included.
We also need to give me a name for each column of data, or there will be an error after modeling.

Colnames (MyData) <-paste0 ("M", 1:1682,sep= "") as (MyData, "Matrix") [1:6,1:6]

Third, build the model

In the Recommenderlab package, a total of 6 models were provided for the Realratingmatrix data types, namely: project-based collaborative filtering (IBCF), PCA, based on the popularity recommendation (POPULAR), Stochastic (random), singular value decomposition (SVD), based on the user collaborative filtering algorithm (UBCF).
There are two main steps in collaborative filtering: ① A user group that is similar to the target user's viewing style based on the target user's known movie score. ② calculates the user base's rating for other movies and is the forecast score for the target user.
This data is a score of 1682 movies for 943 users, but it's impossible for everyone to read all of them, and it's not possible to score all the movies they've seen, so the scoring matrix we just generated is a very sparse matrix with many missing values. But these do not affect the working effect of collaborative filtering. So we chose collaborative filtering to build our model.

Mydata.model<-recommender (mydata[1:800],method= "UBCF") mydata.predict<-predict (mydata.model,mydata[ 801:803],type= "ratings") #预测as (Mydata.predict, "Matrix") [1:3,1:6]

M1 M2 M3 M4 M5 M6
801 4.023833 4.017790 4.099041 4.061437 4.038462 4.038462
802 3.719220 3.505469 3.482577 3.485396 3.373351 3.493333
803 3.021637 3.090909 3.099141 3.099141 3.090909 3.090909
Above this is the forecast score of 801,802,803 users on M1 M2 M3 M4 M5 M6, the score is basically between 3-4 points, the same as before we analyzed the results.

We can also recommend movies to users, we can use the Predict () function, just need to modify the parameters on the line.

Mydata.predict2<-predict (mydata.model,mydata[801:803],n=5) as (mydata.predict2, "list")

The results of the operation are as follows:


$ ' 801 '
[1] "M272" "M258" "M315" "M327" "M298" $ ' 802 '
[1] "M313" "M50" "M298" "M328" "M127"

$ ' 803 '
[1] "M302" "M268" "M272" "M313" "M9"

The meaning here is to give the user 801 recommended movies have "M272" "M258" "M315" "M327" "M298″ so 5, other representatives mean the same.

Reference books: R language Combat: Programming basics, statistical analysis and data mining

"Go" prediction model of movie scoring based on R language construction

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.