mllib--Logistic Regression Notes

Source: Internet
Author: User

The logistic regression of batch gradient descent can refer to this article: http://blog.csdn.net/pakko/article/details/37878837

After reading some Scala syntax, I'm going to look at the parallelization of Mllib's machine learning algorithm, which is logistic regression to find the package Org.apache.spark.mllib.classification under the LOGISTICREGRESSIONWITHSGD this class, directly search train () function.

  Def train (      Input:rdd[labeledpoint],      numiterations:int,      stepsize:double,      minibatchfraction: Double,      initialweights:vector): Logisticregressionmodel = {    new LOGISTICREGRESSIONWITHSGD (Stepsize, Numiterations, 0.0, minibatchfraction)      . Run (input, initialweights)  }

It is found that it calls a run function under Generalizedlinearalgorithm, and this class Generalizedlinearalgorithm is an abstract class. Also under the Generalizedlinearalgorithm.scala file, and the class LOGISTICREGRESSIONWITHSGD is inherited Generalizedlinearalgorithm

  def run (Input:rdd[labeledpoint], initialweights:vector): M = {if (Numfeatures < 0) {numfeatures = INPUT.M AP (_.features.size). First ()} if (Input.getstoragelevel = = Storagelevel.none) {logwarning ("the input data is n    OT directly cached, which may hurt performance if its "+" parent RDDs is also uncached. ") }//Check the data properties before running the optimizer if (ValidateData &&!validators.forall (func =&gt ;    Func (Input)) {throw new Sparkexception ("Input validation failed.") }/** * Scaling columns to unit variance as a heuristic to reduce the condition number: * * During the Opti Mization process, the convergence depends on the condition number of * the training dataset. Scaling The variables often reduces this condition number * heuristically, thus improving the convergence rate. Without reducing the condition number, * Some training datasets mixing the columns with different scales MAY not being able to converge.  * * Glmnet and LIBSVM packages perform the scaling to reduce the condition number, and return * The weights in the     Original scale. * See page 9 in Http://cran.r-project.org/web/packages/glmnet/glmnet.pdf * * Here, if usefeaturescaling is enabled , we'll standardize the training features by dividing * The variance of each column (without subtracting the mean), and train the model in the * scaled space.     Then we transform the coefficients from the scaled space to the original scale * as Glmnet and LIBSVM do. * * Currently, it's only enabled in LOGISTICREGRESSIONWITHLBFGS */val scaler = if (usefeaturescaling) {n  EW Standardscaler (WITHSTD = true, Withmean = False). Fit (Input.map (_.features))} else {null}//Prepend an    Extra variable consisting of all 1.0 ' s for the Intercept.    Todo:apply feature scaling to the weight vector instead of input data.  Val data = if (addintercept) {      if (usefeaturescaling) {INPUT.MAP (LP = = (Lp.label, Appendbias (Scaler.transform))). Cache () } else {INPUT.MAP (LP = = (Lp.label, Appendbias (lp.features))). Cache ()}} else {if (usefeaturescaling) {INPUT.MAP (LP = = (Lp.label, Scaler.transform (lp.features))). Cache ()} else {INPUT.MAP (LP =&G T        (Lp.label, Lp.features)) }}/** * todo:for better convergence, in logistic regression, the intercepts should is computed * from t He prior probability distribution of the outcomes;     For linear regression, * The Intercept should is set as the average of response. */Val initialweightswithintercept = if (addintercept && numoflinearpredictor = = 1) {Appendbias (initialwe ights)} else {/** If ' numoflinearpredictor > 1 ', Initialweights already contains intercepts. */INITIALW Eights} val weightswithintercept = optimizer.optimize (data, Initialweightswithintercept)//Here into the optimization val intercept = if (addintercept && numoflinearpredictor = = 1) {WEIGHTSWI Thintercept (weightswithintercept.size-1)} else {0.0} var weights = if (addintercept && Numoflin      Earpredictor = = 1) {vectors.dense (weightsWithIntercept.toArray.slice (0, weightswithintercept.size-1)}} else { Weightswithintercept}/** * The weights and intercept is trained in the scaled space;     We ' re converting them back to * the original scale. * * Math shows that if we are only perform standardization without subtracting means, the intercept * won't be Chan Ged. w_i = w_i '/v_i where w_i ' is the coefficient in the scaled space, w_i * are the coefficient in the original space, an     D V_i is the variance of the column I. */if (usefeaturescaling) {if (Numoflinearpredictor = = 1) {weights = Scaler.transform (weights)} els e {/** * for ' numoflinearpredictor > 1' We have to transform the weights back to the original * scale for each set of linear predictor. Note that the intercepts has to being explicitly * excluded when ' addintercept = = True ' Since the intercepts is par         T of weights now. */var i = 0 val n = weights.size/numoflinearpredictor val Weightsarray = Weights.toarray wh  Ile (I < Numoflinearpredictor) {val start = i * n val end = (i + 1) * N-{if (addintercept) 1 Else 0} val partialweightsarray = Scaler.transform (Vectors.dense (Weightsarray.slice (Start, End)). Toarra        Y system.arraycopy (partialweightsarray, 0, Weightsarray, start, partialweightsarray.size) i + = 1 } weights = Vectors.dense (Weightsarray)}}//Warn at the end of the "Run as well" for increased Visibil    ity. if (Input.getstoragelevel = = Storagelevel.none) {logwarning ("the input data is not directly cached, which may hurt PerfoRmance if its "+" parent RDDs is also uncached. ") }//Unpersist cached data if (data.getstoragelevel! = Storagelevel.none) {Data.unpersist (false)} creat Emodel (weights, Intercept)}

  Optimizer.optimize in the code above, passed in data and initialized Theta, and optimizer was initialized in LOGISTICREGRESSIONWITHSGD:

Class LOGISTICREGRESSIONWITHSGD Private[mllib] (private var stepsize:double, private var numiterations:int, PR ivate var regparam:double, private var minibatchfraction:double) extends Generalizedlinearalgorithm[logisticregressi  Onmodel] with Serializable {private Val gradient = new Logisticgradient () private Val updater = new Squaredl2updater () @Since ("0.8.0") override Val optimizer = new Gradientdescent (gradient, updater). Setstepsize (stepsize). Setnumiter Ations (numiterations). Setregparam (Regparam). Setminibatchfraction (minibatchfraction) override protected Val Valida Tors = List (datavalidators.binarylabelvalidator)/** * Construct a Logisticregression object with default parameters: {   stepsize:1.0, * numiterations:100, regparm:0.01, minibatchfraction:1.0}. */@Since ("0.8.0") def this () = this (1.0, +, 0.01, 1.0) override Protected[mllib] def Createmodel (Weights:vector, in tercept:double) = {New Logisticregressionmodel (weights, intercept)}} 

Optimizer is assigned a value of gradientdescent (gradient, updater) and then the Gradientdescent class:

Class Gradientdescent Private[spark] (private var gradient:gradient, private var updater:updater)  extends Optimizer With Logging {  private var stepsize:double = 1.0  private var numiterations:int =  $ private var Regparam: Double = 0.0  private var minibatchfraction:double = 1.0  private var convergencetol:double = 0.001  ...  @DeveloperApi  def optimize (data:rdd[(Double, Vector)], initialweights:vector): Vector = {    val (weights, _) = Gra DIENTDESCENT.RUNMINIBATCHSGD (      data,      gradient,      Updater,      stepsize,      numiterations,      Regparam,      minibatchfraction,      initialweights,      convergencetol)    weights  }}

It is found that the Minibatch method of the random gradient descent is called, RUNMINIBATCHSGD:

  def RUNMINIBATCHSGD (data:rdd[(Double, Vector)], gradient:gradient, Updater:updater, Stepsize:do Uble, Numiterations:int, regparam:double, Minibatchfraction:double, Initialweights:vector, CO nvergencetol:double): (Vector, array[double]) = {//Convergencetol should is set with non minibatch settings if (m Inibatchfraction < 1.0 && Convergencetol > 0.0) {logwarning ("testing against a convergencetol when USI    ng Minibatchfraction "+" < 1.0 can is unstable because of the stochasticity in sampling. ")} Val stochasticlosshistory = new Arraybuffer[double] (numiterations)/Record previous weight and current one to Calcula    Te solution vector difference var previousweights:option[vector] = none var currentweights:option[vector] = None Val numexamples = Data.count ()//If no data, return initial weights to avoid nans if (Numexamples = = 0) {Lo Gwarning ("Gradientdescent.runminibatCHSGD returning initial weights, no data Found ") return (Initialweights, Stochasticlosshistory.toarray)} if (n Umexamples * Minibatchfraction < 1) {logwarning ("The Minibatchfraction is too small")}//Initialize weigh TS as a column vector var weights = vectors.dense (Initialweights.toarray) val n = weights.size/** * for the First iteration, the Regval would be initialized as sum of weight squares * if it ' s L2 updater;     For L1 Updater, the same logic is followed. */var Regval = Updater.compute (Weights, Vectors.zeros (weights.size), 0, 1, regparam). _2//Calculate regularization value var Converg ed = false//Indicates whether converged based on convergencetol var i = 1 while (!converged && i <= num Iterations) {//iteration starts, runs at a time smaller than the maximum number of iterations val bcweights = data.context.broadcast (weights)//Sample a subset (fractio n minibatchfraction) of the total data//compute and sum of the subgradients on this subset (this is one map-reduceVal (Gradientsum, losssum, minibatchsize) = Data.sample (False, minibatchfraction, + i). Treeaggregate (BD            V.zeros[double] (n), 0.0, 0L)) (Seqop = (c, v) + = {//C: (Grad, loss, count), V: (label, features)            Val L = Gradient.compute (v._2, V._1, Bcweights.value, Vectors.frombreeze (c._1))//Calculate the gradient of each data in a batch            (C._1, C._2 + L, C._3 + 1)}, Combop = (c1, c2) = {//C: (Grad, loss, count) (C1._1 + = c2._1, c1._2 + c2._2, C1._3 + c2._3)//Add the gradient of all data in batch, add the loss function value, record the size of batch}) if (minibatchsize > 0) {/** * losssum is computed using the weights from the previous iteration * and Regval are T         He regularization value computed in the previous iteration as well.        */stochasticlosshistory.append (losssum/minibatchsize + regval)//The original loss function calculates the total loss value of batch divided by BatchSize plus the regularization value Val update = Updater.compute (weights, vectors.frombreeze (GRAdientsum/minibatchsize.todouble),//update weights and next regularization value stepsize, I, regparam) weights = update._1 Regva L = update._2 Previousweights = currentweights currentweights = Some (weights) if (previousweights! = None && currentweights = none) {converged = Isconverged (Previousweights.get, currentweights. Get, Convergencetol)}} else {logwarning (S "Iteration ($i/$numIterations). The size of sampled batch is zero ")} i + = 1} loginfo (" GRADIENTDESCENT.RUNMINIBATCHSGD finished. Last stochastic losses%s ". Format (Stochasticlosshistory.takeright () mkstring (", "))) (Weights, Stochasticlos Shistory.toarray)}

  found to compute gradients for each piece of data in batch, called the Gradient.compute function, and for binary classification:

  Override Def compute (Data:vector, label:double, Weights:vector, cumgradient:vector): Double = { Val datasize = data.size//(Weights.size/datasize + 1) is number of classes require (weights.size% datasize =  = 0 && Numclasses = = weights.size/datasize + 1) numclasses Match {Case 2 =/** * for         Binary Logistic Regression. * * Although the loss and gradient calculation for multinomial One are more generalized, * and multinomial         One can also is used in binary case, we still implement a specialized * binary version for performance reason.        */val margin = -1.0 * DOT (data, weights) Val multiplier = (1.0/(1.0 + math.exp (margin))-label Axpy (multiplier, data, cumgradient)//gradient is calculated as multiplier * data, (H (x)-y) *x if (Label > 0) {//T          He following is equivalent to log (1 + exp (margin)) but more numerically stable. Mlutils.log1pexp (MargIN)//return loss function value} else {mlutils.log1pexp (margin)-margin} ...///below is a multi-category, not seen} 

  After using treeaggregate to parallelize all of the data, get gradientsum to divide by Minibatchsize, Then enter Updater.compute to update the weights theta and regularization values for the next iteration:

@DeveloperApiclass Squaredl2updater extends Updater {override def compute (Weightsold:vector, Gradient:vecto R, Stepsize:double, Iter:int, regparam:double): (Vector, Double) = {//add up both updates from the Gradient of the loss (= Step) as well as//the gradient of the regularizer (= Regparam * weightsold)//w ' = w-th Isiterstepsize * (Gradient + regparam * W)//W ' = (1-thisiterstepsize * regparam) * w-thisiterstepsize * Gradient                           This is the iterative formula for the update of weights, this is the update after the L2 regularization, the magic is (1-thisiterstepsize * regparam) val thisiterstepsize = stepsize/math.sqrt (ITER) Remember to update the formula is not w ' = W-alpha*gradient alpha is the learning rate that is thisiterstepsize val brzweights:bv[double] = we IghtsOld.toBreeze.toDenseVector//You will find that alpha = Thisiterstepsize = 1/sqrt (iter) that is, as the number of iterations increases, the lower the learning rate, the smaller the step BRZW Eights:* = (1.0-thisiterstepsize * regparam) brzaxpy (-thisiterstepsize, Gradient.tobreeze, brzweights) val norm = Brznorm (brzweights, 2.0)    (Vectors.frombreeze (brzweights), 0.5 * regparam * norm * norm)//regularization value is W ' 's two norm squared multiplied by regularization parameter regparam multiplied by 0.5}} 

mllib--Logistic Regression Notes

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.