You are welcome to reprint it. Please indicate the source, huichiro.
Summary
This article will give a brief review of the origins of the quasi-Newton method L-BFGS, and then its implementation in Spark mllib for source code reading.
Mathematical Principles of the quasi-Newton Method
Code Implementation
The regularization method used in the L-BFGS algorithm is squaredl2updater.
The breezelbfgs function in the breeze library of the scalanlp member project is used for Algorithm Implementation. mllib customizes the difffunctions required by breezelbfgs.
The source code implementation of the runlbfgs function is as follows:
def runLBFGS( data: RDD[(Double, Vector)], gradient: Gradient, updater: Updater, numCorrections: Int, convergenceTol: Double, maxNumIterations: Int, regParam: Double, initialWeights: Vector): (Vector, Array[Double]) = { val lossHistory = new ArrayBuffer[Double](maxNumIterations) val numExamples = data.count() val costFun = new CostFun(data, gradient, updater, regParam, numExamples) val lbfgs = new BreezeLBFGS[BDV[Double]](maxNumIterations, numCorrections, convergenceTol) val states = lbfgs.iterations(new CachedDiffFunction(costFun), initialWeights.toBreeze.toDenseVector) /** * NOTE: lossSum and loss is computed using the weights from the previous iteration * and regVal is the regularization value computed in the previous iteration as well. */ var state = states.next() while(states.hasNext) { lossHistory.append(state.value) state = states.next() } lossHistory.append(state.value) val weights = Vectors.fromBreeze(state.x) logInfo("LBFGS.runLBFGS finished. Last 10 losses %s".format( lossHistory.takeRight(10).mkString(", "))) (weights, lossHistory.toArray) }
Costfun functions are the focus of Algorithm Implementation
private class CostFun( data: RDD[(Double, Vector)], gradient: Gradient, updater: Updater, regParam: Double, numExamples: Long) extends DiffFunction[BDV[Double]] { private var i = 0 override def calculate(weights: BDV[Double]) = { // Have a local copy to avoid the serialization of CostFun object which is not serializable. val localData = data val localGradient = gradient val (gradientSum, lossSum) = localData.aggregate((BDV.zeros[Double](weights.size), 0.0))( seqOp = (c, v) => (c, v) match { case ((grad, loss), (label, features)) => val l = localGradient.compute( features, label, Vectors.fromBreeze(weights), Vectors.fromBreeze(grad)) (grad, loss + l) }, combOp = (c1, c2) => (c1, c2) match { case ((grad1, loss1), (grad2, loss2)) => (grad1 += grad2, loss1 + loss2) }) /** * regVal is sum of weight squares if it‘s L2 updater; * for other updater, the same logic is followed. */ val regVal = updater.compute( Vectors.fromBreeze(weights), Vectors.dense(new Array[Double](weights.size)), 0, 1, regParam)._2 val loss = lossSum / numExamples + regVal /** * It will return the gradient part of regularization using updater. * * Given the input parameters, the updater basically does the following, * * w‘ = w - thisIterStepSize * (gradient + regGradient(w)) * Note that regGradient is function of w * * If we set gradient = 0, thisIterStepSize = 1, then * * regGradient(w) = w - w‘ * * TODO: We need to clean it up by separating the logic of regularization out * from updater to regularizer. */ // The following gradientTotal is actually the regularization part of gradient. // Will add the gradientSum computed from the data with weights in the next step. val gradientTotal = weights - updater.compute( Vectors.fromBreeze(weights), Vectors.dense(new Array[Double](weights.size)), 1, 1, regParam)._1.toBreeze // gradientTotal = gradientSum / numExamples + gradientTotal axpy(1.0 / numExamples, gradientSum, gradientTotal) i += 1 (loss, gradientTotal) } }}
Apache Spark Source Code 22 -- spark mllib quasi-Newton method L-BFGS source code implementation