Spark Machine Learning (TEN): ALS Alternate least squares algorithm

Last Update:2017-07-20 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1. Alternating Least Square

ALS (Alternating Least Square), alternating least squares. In machine learning, a collaborative recommendation algorithm using least squares method is specified. As shown, u represents the user, v denotes the product, the user scores the item, but not every user will rate each item. For example, user U6 did not give the product V3 scoring, we need to infer that this is the task of machine learning.

Since not every user gives each product a score, it can be assumed that the ALS matrix is low-rank, i.e. a m*n matrix, which is obtained by multiplying the m*k and k*n two matrices, wherein the k<<m,n.

Amxn=umxkxvkxn

This assumption is reasonable because users and products contain some hidden features of low dimensions, such as when we know that someone likes carbonated drinks, we can infer that he likes Coke, Coca-Cola, Fanta, without having to make it clear that he likes the three kinds of drinks. The carbonated drink here is the equivalent of a hidden feature. In the above formula, UMXK represents the user's preference for hidden features, and VKXN indicates the extent to which the product contains hidden features. The task of machine learning is to find out umxk and VKXN. It is known that UITVJ is the user I preference for commodity J and uses the Frobenius norm to quantify the errors generated by the reconstruction of U and V. Because many places in the matrix are blank, that is, the user does not rate the goods, for this case we do not have to calculate the unknown, only the observed (user, product) set R.

This translates the collaborative recommendation problem into an optimization problem. In the objective function, you and V are coupled to each other, which requires the use of alternating squares. That is, you first assume the initial value of U (0), so that the problem is converted to a least squares problem, you can calculate the V (0) according to the U (0), and then calculate the U (1) according to V (0), so that the iteration continues until a certain number of iterations, or convergence. Although the global optimal solution of convergence cannot be guaranteed, it has little effect.

2. Mllib's ALS implementation

Mllib's ALS uses a data partitioning structure that will decompose u into u1,u2,u3,... um,v into V1,v2,v3,... vn, the associated U and v are stored in the same partition, thus reducing the cost of inter-partition data exchange. For example, when you calculate v by U, the partition that stores U is p1,p2 ..., the partition where V is stored is q1,q2 ..., you need to send different u to different Q, the block that holds this relationship is called outblock; in P, what you need to calculate V, the block that holds the relationship is called Inblock.

For example, there are a12,a13,a15,u1 stored in the r p1,v2,v3 stored in the q2,v5 stored in the Q3, you need to P1 to U1 and Q2, this information stored in Q3 has outblock;r, so the calculation A12,a32 need v2 and U1, This information is stored in the Inblock.

Directly on the code:

Importorg.apache.log4j. {level, Logger}ImportOrg.apache.spark. {sparkconf, sparkcontext}ImportOrg.apache.spark.mllib.recommendation.ALSImportorg.apache.spark.mllib.recommendation.Rating/*** Created by Administrator on 2017/7/19. */Object ALSTest01 {def main (args:array[string])={    //setting up the operating environmentVal conf =NewSparkconf (). Setappname ("ALS 01"). Setmaster ("spark://master:7077"). Setjars (Seq ("E:\\intellij\\projects\\machinelearning\\machinelearning.jar"))) Val SC=Newsparkcontext (conf) Logger.getRootLogger.setLevel (Level.warn)//read sample data and parseVal Datardd = Sc.textfile ("Hdfs://master:9000/ml/data/test.data") Val Ratingrdd= Datardd.map (_.split (', ')) Match { CaseArray (user, item, rate) =Rating (User.toint, Item.toint, rate.todouble)}) //split into training sets and test setsVal dataparts = Ratingrdd.randomsplit (Array (0.8, 0.2)) Val Trainingrdd= Dataparts (0) Val Testrdd= Dataparts (1)    //establishment of an ALS alternate least squares algorithm model and trainingVal Rank = 10Val numiterations= 10Val Alsmodel= Als.train (Trainingrdd, Rank, numiterations, 0.01)    //ForecastVal user_product =Trainingrdd.map { CaseRating (user, product, rate) =(user, product)} val predictions=alsmodel.predict (user_product). Map { CaseRating (user, product, rate) =(user, product), rate)} Val ratesandpredictions=Trainingrdd.map { CaseRating (user, product, rate) =(user, product), rate)}.join (predictions) Val MSE=Ratesandpredictions.map { Case(User, product), (R1, r2)) =Val Err= (R1-R2) Err*err}.mean () println ("Mean squared Error =" +MSE) println ("User" + "\ T" + "products" + "\ T" + "rate" + "\ T" + "prediction") RatesAndPredictions.collect.foreach (rating={println (rating._1._1+ "\ T" + rating._1._2 + "\ T" + rating._2._1 + "\ T" +rating._2._2)} )  }}

The 4 parameters of the Als.train () function are the data set used for training, the number of features, the number of iterations, and the regular factor.

Operation Result:

It can be seen that the predicted results are very accurate.

Spark Machine Learning (TEN): ALS Alternate least squares algorithm

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Spark Machine Learning (TEN): ALS Alternate least squares algorithm

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Spark Machine Learning (TEN): ALS Alternate least squares algorithm

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support