The first reference is this article: http://blog.csdn.net/sadfasdgaaaasdfa/article/details/45970185
But the function is too old. So to change. In addition the starting point is my own article http://www.cnblogs.com/charlesblc/p/6206198.html inside about the gradient descent of that picture.
It takes a lot of time to change to a random vector, and finally it's done. The code is as follows:
Packagecom.spark.myImportorg.apache.log4j. {level, Logger}ImportOrg.apache.spark. {sparkconf, sparkcontext}ImportBreeze.linalg.DenseVectorImportBreeze.numerics.exp/*** Created by Baidu on 16/11/28. */Object gradientdemo{ Case classDatapoint (x:densevector[double], y:double) //Case class see def parsepoint (x:array[double]): Datapoint = { //Datapoint (vectors.dense (x.slice (0, x.size-2)), X (x.size-1))Datapoint (Densevector (x.slice (0, x.size-2)), X (x.size-1) } def main (args:array[string]) {Logger.getlogger ("Org.apache.spark"). SetLevel (Level.warn) Val conf=Newsparkconf () Val SC=Newsparkcontext (conf) println ("Begin Load Gradient file") //Load Data SetVal Text = Sc.textfile ("Hdfs://master. Hadoop:8390/gradient_data/spam.data.txt ") Val Lines=Text.map { line=Line.split (" "). Map (_.todouble)} Val points= Lines.map (Parsepoint (_))//(Parsepoint (_)) looks the samevar w = Densevector.rand (Lines.first (). size-2) Val iterations= 100 for(I <-1to iterations) {val Gradient= Points.map (P = = (1/(1 + exp (-P.Y * (w dot p.x)))-1) * P.Y *p.x). Reduce (_+_) W-=gradient} println ("Finish data loading, W num:" + w.length + "; W: "+W)}}
Then on the M42n05 machine, the first use is to Http://www-stat.Stanford.edu/~tibs/elemstatlearn/datasets/spam.data This file is copied to Hadoop:
$hadoop fs-mkdir/-put spam.data.txt/gradient_data/-ls/gradient_data/1 Items- rw-r--r-- 3 work supergroup 698341 2016-12-21 17:59/gradient_data/spam.data.txt
Then copy the jar package and run the command:
$./bin/spark-submit--classCom.spark.my.GradientDemo--master Spark://10.117.146.12:7077 Myjars/scala-demo.jarGet output:16/12/21 18:17:57 WARN util. Nativecodeloader:unable to loadnative-hadoop Library forYour platform ... using builtin-Java classes where applicable16/12/21 18:17:58INFO util.log:Logging initialized @1689ms16/12/21 18:17:58 INFO Server. server:jetty-9.2.z-SNAPSHOT16/12/21 18:17:58 INFO handler. contexthandler:started [Email Protected]{/jobs,NULL, AVAILABLE}16/12/21 18:17:58 INFO handler. contexthandler:started [Email Protected]{/jobs/json,NULL, AVAILABLE}16/12/21 18:17:58 INFO handler. contexthandler:started [Email Protected]{/jobs/job,NULL, AVAILABLE}16/12/21 18:17:58 INFO handler. contexthandler:started [Email Protected]{/jobs/job/json,NULL, AVAILABLE}16/12/21 18:17:58 INFO handler. contexthandler:started [Email Protected]{/stages,NULL, AVAILABLE}16/12/21 18:17:58 INFO handler. contexthandler:started [Email Protected]{/stages/json,NULL, AVAILABLE}16/12/21 18:17:58 INFO handler. contexthandler:started [Email Protected]{/stages/stage,NULL, AVAILABLE}16/12/21 18:17:58 INFO handler. contexthandler:started [Email Protected]{/stages/stage/json,NULL, AVAILABLE}16/12/21 18:17:58 INFO handler. contexthandler:started [Email Protected]{/stages/pool,NULL, AVAILABLE}16/12/21 18:17:58 INFO handler. contexthandler:started [Email Protected]{/stages/pool/json,NULL, AVAILABLE}16/12/21 18:17:58 INFO handler. contexthandler:started [Email Protected]{/storage,NULL, AVAILABLE}16/12/21 18:17:58 INFO handler. contexthandler:started [Email Protected]{/storage/json,NULL, AVAILABLE}16/12/21 18:17:58 INFO handler. contexthandler:started [Email Protected]{/storage/rdd,NULL, AVAILABLE}16/12/21 18:17:58 INFO handler. contexthandler:started [Email Protected]{/storage/rdd/json,NULL, AVAILABLE}16/12/21 18:17:58 INFO handler. contexthandler:started [Email Protected]{/environment,NULL, AVAILABLE}16/12/21 18:17:58 INFO handler. contexthandler:started [Email Protected]{/environment/json,NULL, AVAILABLE}16/12/21 18:17:58 INFO handler. contexthandler:started [Email protected]{/executors,NULL, AVAILABLE}16/12/21 18:17:58 INFO handler. contexthandler:started [Email Protected]{/executors/json,NULL, AVAILABLE}16/12/21 18:17:58 INFO handler. contexthandler:started [Email Protected]{/executors/threaddump,NULL, AVAILABLE}16/12/21 18:17:58 INFO handler. contexthandler:started [Email Protected]{/executors/threaddump/json,NULL, AVAILABLE}16/12/21 18:17:58 INFO handler. contexthandler:started [Email protected]{/Static,NULL, AVAILABLE}16/12/21 18:17:58 INFO handler. contexthandler:started [Email protected]{/,NULL, AVAILABLE}16/12/21 18:17:58 INFO handler. contexthandler:started [Email Protected]{/api,NULL, AVAILABLE}16/12/21 18:17:58 INFO handler. contexthandler:started [Email Protected]{/stages/stage/kill,NULL, AVAILABLE}16/12/21 18:17:58 INFO Server. serverconnector:started [Email protected]{http/1.1}{0.0.0.0:4040}16/12/21 18:17:58INFO Server. server:started @1811ms16/12/21 18:17:58 INFO handler. contexthandler:started [Email Protected]{/metrics/json,NULL, AVAILABLE} Begin Load Gradient file 16/12/21 18:18:00 INFO mapred. Fileinputformat:total input paths to PROCESS:116/12/21 18:18:02WARN netlib. blas:failed to load implementation From:com.github.fommil.netlib.NativeSystemBLAS16/12/21 18:18:02WARN netlib. blas:failed to load implementation From:com.github.fommil.netlib.NativeRefBLASFinish data loading, W Num: W:densevector;(0.5742670447735152, 0.3793477463119241, 0.9681722093411653, 0.5967720119758925, 1.513648869152009, 0.8246263930800145, 0.8513296345703405, 0.5016541916805365, 0.10371045067354999, 1.0622529560536655, 0.7333760424194737, 2.1149483032187897, 0.9299367625800867, 0.7255747859512406, 0.13008556580706143, 1.4831202765138185, 0.7729907277492736, 0.9723309264036033, 13.394753146641808, 0.5531526429090097, 2.7444722115693665, 0.11325813324181622, 0.5096129116641023, 0.7201439311127137, 0.44719912156747926, 0.8273500952621051, 0.6736417633922696, 0.046531684571481415, 0.017895929000231802, 0.4726397794671698, 0.394438566392741, 0.8438784726078483, 0.4144073806784945, 0.18873920886297268, 0.4760240368798872, 0.31604719205329873, 0.694745503752298, 0.721380820951884, 0.988535475648986, 0.13515871744899247, 0.15694652560543523, 0.6939378895510522, 0.9279201378471407, 0.3336083293555714, 0.38938263676999685, 0.17159756568171308, 0.18897754115255144, 0.7281027812135723, 0.7233165381530381, 1.1093715737790655, 0.15675561193336351, 2.059622965151493, 0.6839713282339183, 0.11528695729374866, 7.413534050555067, 23.13404922028611)16/12/21 18:18:07 INFO Server. serverconnector:stopped [Email protected]{http/1.1}{0.0.0.0:4040}16/12/21 18:18:07 INFO handler. contexthandler:stopped [Email Protected]{/stages/stage/kill,NULL, unavailable}16/12/21 18:18:07 INFO handler. contexthandler:stopped [Email Protected]{/api,NULL, unavailable}16/12/21 18:18:07 INFO handler. contexthandler:stopped [Email protected]{/,NULL, unavailable}16/12/21 18:18:07 INFO handler. contexthandler:stopped [Email protected]{/Static,NULL, unavailable}16/12/21 18:18:07 INFO handler. contexthandler:stopped [Email Protected]{/executors/threaddump/json,NULL, unavailable}16/12/21 18:18:07 INFO handler. contexthandler:stopped [Email Protected]{/executors/threaddump,NULL, unavailable}16/12/21 18:18:07 INFO handler. contexthandler:stopped [Email Protected]{/executors/json,NULL, unavailable}16/12/21 18:18:07 INFO handler. contexthandler:stopped [Email protected]{/executors,NULL, unavailable}16/12/21 18:18:07 INFO handler. contexthandler:stopped [Email Protected]{/environment/json,NULL, unavailable}16/12/21 18:18:07 INFO handler. contexthandler:stopped [Email Protected]{/environment,NULL, unavailable}16/12/21 18:18:07 INFO handler. contexthandler:stopped [Email Protected]{/storage/rdd/json,NULL, unavailable}16/12/21 18:18:07 INFO handler. contexthandler:stopped [Email Protected]{/storage/rdd,NULL, unavailable}16/12/21 18:18:07 INFO handler. contexthandler:stopped [Email Protected]{/storage/json,NULL, unavailable}16/12/21 18:18:07 INFO handler. contexthandler:stopped [Email Protected]{/storage,NULL, unavailable}16/12/21 18:18:07 INFO handler. contexthandler:stopped [Email Protected]{/stages/pool/json,NULL, unavailable}16/12/21 18:18:07 INFO handler. contexthandler:stopped [Email Protected]{/stages/pool,NULL, unavailable}16/12/21 18:18:07 INFO handler. contexthandler:stopped [Email Protected]{/stages/stage/json,NULL, unavailable}16/12/21 18:18:07 INFO handler. contexthandler:stopped [Email Protected]{/stages/stage,NULL, unavailable}16/12/21 18:18:07 INFO handler. contexthandler:stopped [Email Protected]{/stages/json,NULL, unavailable}16/12/21 18:18:07 INFO handler. contexthandler:stopped [Email Protected]{/stages,NULL, unavailable}16/12/21 18:18:07 INFO handler. contexthandler:stopped [Email Protected]{/jobs/job/json,NULL, unavailable}16/12/21 18:18:07 INFO handler. contexthandler:stopped [Email Protected]{/jobs/job,NULL, unavailable}16/12/21 18:18:07 INFO handler. contexthandler:stopped [Email Protected]{/jobs/json,NULL, unavailable}16/12/21 18:18:07 INFO handler. contexthandler:stopped [Email Protected]{/jobs,NULL, unavailable}
You can see that the data is processed properly.
In the iteration loop of the code, add this sentence and look at the process:
println ("In Data loading, W num:" + w.length + "; W: "+ w")
Then re-copy the jar package and run it. The discovery adds a lot of intermediate data, but each change is small, and some are just the last number changes:
In data loading, w num:56; W:densevector (0.8387794911469437, 0.041931950643148204, 0.610593576873822, 0.775693127624059, 0.9595814255406686, 0.8346753461732199, 1.3049939469403333, 0.7056665962054256, 0.4607139317388798, 0.7272237992038442, 0.658182563650663, 0.733627042229442, 0.49543528179048996, 0.43928474305383947, 0.7784540121519834, 3.3618947233533456, 0.8863247999385253, 0.4007587753541083, 2.0631977325748334, 0.8211289850510815, 1.2076387347473903, 0.43209585536401196, 0.8361371667999544, 0.3902040623717107, 0.9249800607229486, 0.9684655358995048, 0.7122113545634148, 0.7564214721597596, 0.9295754044438086, 0.0667831407627083, 0.8262226990678785, 0.9866253536733688, 0.7214690647928418, 0.5992067836236182, 0.801215365214358, 1.0206941788488395, 0.8887684894893382, 0.39696145592511084, 0.7994301499483707, 0.39766237687949973, 0.3213782652296576, 0.3959330364022269, 0.6573698429264838, 0.5725594506918451, 0.932872703406284, 0.4276515117478306, 0.8908902872993782, 0.6281143587881469, 0.5136752276267151, 1.0933173640821512, 0.10820509511118362, 1.9426418431339785, 0.2017114624971559, 0.9827542778431644, 5.224634203803431, 16.694903977208174) in data loading, W Num: 56; W:densevector (0.8387794911469437, 0.041931950643148204, 0.6105935768739001, 0.775693127624059, 0.9595814255414439, 0.8346753461732199, 1.3049939469403333, 0.7056665962054256, 0.4607139317388798, 0.7272237992038442, 0.658182563650663, 0.733627042229442, 0.49543528179048996, 0.43928474305383947, 0.7784540121519834, 3.3618947233534118, 0.8863247999385373, 0.4007587753541083, 2.0631977325749897, 0.8211289850510815, 1.2076387347474142, 0.43209585536401196, 0.8361371667999544, 0.3902040623717107, 0.9249800607229486, 0.9684655358995048, 0.7122113545634148, 0.7564214721597596, 0.9295754044438086, 0.0667831407627083, 0.8262226990678785, 0.9866253536733688, 0.7214690647928418, 0.5992067836236182, 0.801215365214358, 1.0206941788488395, 0.8887684894893382, 0.39696145592511084, 0.7994301499483707, 0.3976623768795117, 0.3213782652296576, 0.3959330364022269, 0.6573698429264838, 0.5725594506918451, 0.932872703406296, 0.4276515117478306, 0.8908902872993782, 0.6281143587881469, 0.5136752276267151,1.093317364082217, 0.10820509511118362, 1.942641843152015, 0.2017114624971559, 0.982754277843168, 5.22463420411604, 16.694903977520784)
Gradient Descent principle
The gradient descent principle is relatively good, can be seen here:
http://blog.csdn.net/woxincd/article/details/7040944
And here's the article:
Http://www.cnblogs.com/maybe2030/p/5089753.html?utm_source=tuicool&utm_medium=referral
Look carefully and find the formula above, and the code inside the formula does not seem to be the same. The sigmoid function should be used in the code.
You need to take a good look at it.
The formula used in the above code is mainly:
(1/(1 + exp (-P.Y * (w dot p.x)))-1) * P.Y * p.x)
Above p.x is an n-dimensional vector,p.y is a numeric value.
then reduce (_+_) said to add up all the lines. The last is an n-dimensional vector.
W-= Gradient
Then iterate n times to get a new W.
Case class
The difference between case class and class can be seen: Http://www.tuicool.com/articles/yEZr6ve
There is a case class in Scala, which is actually a normal class. But it is slightly different from the normal class, as follows:
1, the initialization time can not be new, of course you can add, ordinary class must add new;2, tostring implementation more beautiful; 3, the default implementation of equals and hashcode;4, the default is can be serialized, that is, the realization of serializable;
5, automatically from Scala. Some functions are inherited from product;
6, Case class constructor parameters are public level, we can directly access;
7, support pattern matching.
Breeze
In addition, the above densevector is actually used breeze inside the class
Linearregressionwithsgd
In addition, this is the linear regression implemented within spark, which is based on a random gradient descent. Similar functions also include the following:
The linear regression algorithms available in Mllib are: the main classes involved in Linearregressionwithsgd,ridgeregressionwithsgd,lassowithsgd;mllib regression analysis, Generalizedlinearalgorithm,gradientdescent.
Scala with Java
The last one used is densevector, so there is no use of the following paragraph. But the following paragraph shows that Scala can be used in Java:
Importnew Random (53)
Using Scala to experiment with the gradient descent algorithm on spark