The recommended model of ALS matrix decomposition
In fact, using the model to predict a user's rating of an item, thought is similar to linear regression to make predictions, roughly as follows
Define a predictive model (mathematical formula),
Then determine a loss function,
Use the existing data as a training set,
Constantly iterating to minimize the value of the loss function,
Finally, the parameters are determined and the parameters are set into the prediction model to make predictions.
The prediction model for matrix decomposition is:
The loss function is:
we just want to minimize the loss function and get the parameter Q and P.
The physical meaning of matrix decomposition model
We want to learn that a P represents the feature of the user, and Q represents the feature of item. Each dimension of the feature represents a recessive factor, such as the film, which may be the director, actor, etc. Of course, these recessive factors are machine-learned, specifically what meaning we are not sure.
After learning P and Q, we can predict all user ratings for item by directly multiplying p by Q.
After the matrix decomposition recommendation model, below to ALS (full name alternatingleast squares). In fact, ALS is a solution to minimize the loss function above, of course, there are other methods such as SGD.
The loss function in the ALS thesis is (slightly different from the one above)
Each iteration,
Fixed m, updating each user's feature u (biased to u for 0 solution).
Fixed u, updating each item's feature m (biased to M for 0 solution).
In this paper, the derivation
This is the formula that asks U for each iteration. The same as M.
For a clearer understanding, this is explained in conjunction with Spark's ALS code.
There are three versions of the ALS implemented in Spark Source, one is Localals.scala (no spark), one is Sparkals.scala (using spark for parallel optimization), and the other is ALS in Mllib.
Originally Localals.scala and Sparkals.scala, the two implementations were official for developers to learn to use spark to show,
The ALS in Mllib can be used for practical recommendations.
However, the ALS in Mllib has been optimized and is not suitable for beginners to understand the ALS algorithm.
So, let me take Localals.scala and Sparkals.scala to explain the ALS algorithm.
Localals.scala
iteratively Update Movies then the users for (ITER <-1 to iterations) { println (S "Iteration $iter:") ms = ( 0 until M). Map (i = Updatemovie (i, MS (i), US, R)). ToArray //fixed user, update the feature of all movies one by one US = (0 until U). Map (j = Updat Euser (J, US (j), MS, R)). ToArray //fixed movie, update feature of all users individually println ("RMSE =" + RMSE (R, MS, US)) println () }
Updated character Vector def updateUser (J:int, U:realvector, Ms:array[realvector], R:realmatrix) for the J User: Realvector = { var Xtx:realmatrix = new Array2drowrealmatrix (f, f)//f is the number of recessive factors var xty:realvector = new Arrayrealvector (f) //For E Ach movie that the user rated iterates through the user-rated movie. Obviously, this user has scored all the movies by default, so it's 0-m. The actual application solution requires only traversing the user-rated movie. for (i <-0 until m) { val m = ms (i) //ADD M * m^t to XtX outer product accumulate to XtX XtX = Xtx.add (m.outerproduct (M))//vector and the outer product of the vector: one as a column vector, one as a row vector, the matrix multiplication, the result is a matrix //ADD M * Rating to xty xty = Xty.add (m.mapmultiply (R.getentry (i, j))) c11/>} //Add regularization coefficients to diagonal terms for (d <-0 until F) { xtx.addtoentry (d, D, LAMBDA * M) } //Solve it with Cholesky is actually the solution of a a*x=b equation for new Choleskydecomposition (XtX). Getsolver.solve ( xty) }
Combined with the formula in the paper
In fact, the XTX in the code is the part of the red circle on the left side of the formula, and Xty is the part of the right red circle.
Similarly, updating the feature m of each movie is similar and is not repeated here.
Sparkals.scala
For (ITER <-1 to iterations) { println (S "Iteration $iter:") ms = Sc.parallelize (0 until M, slices) . Map (i = = Update (i, Msb.value (i), Usb.value, Rc.value)) . Collect () MSB = Sc.broadcast (ms)//Re-broadcast MS because It was updated US = sc.parallelize (0 until U, slices) . Map (i = update (i, Usb.value (i), Msb.value, Rc.value.tra Nspose ())) . Collect () USB = Sc.broadcast (US)/re-broadcast us because it was updated println ("RMSE =" + RM SE (R, MS, US)) println () }
The sparkals version is parallel optimized when compared to the highlights of the localals. Localals, the characteristics of each user are serially updated. In Sparkals, it is updated in parallel.
Resources:
"Large-scale Parallel collaborative Filtering for the Netflix Prize" (ALS-WR original paper)
"Matrix factorization Techniques for recommender Systems" (good material for matrices decomposition models)
Https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/LocalALS.scala
Https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/SparkALS.scala
This article linger
This article link: http://blog.csdn.net/lingerlanlan/article/details/44085913
The recommended model of ALS matrix decomposition