The momentum method can be said to be a further optimization of SGD, details can be found here
Here is a simple implementation of Python with the following:
#Coding=utf-8"""Momentum (momentum) reference based on low-volume gradient descent: 72615621 Effect: When the learning rate is small, the appropriate momentum can play an accelerating rate of convergence; When the learning rate is higher, The appropriate momentum can play a role in reducing the amplitude of the oscillation during convergence. @author: reynold@date:2018-08-21"""ImportNumPy as NPImportRandom#Construct Training Datax = Np.arange (0., 10., 0.2) M=len (x) x0= Np.full (M, 1.0) Input_data= Np.vstack ([x0, X]). T#bias B as the first component of a weight vectorTarget_data = 3 * x + 8 +Np.random.randn (m)#two types of termination conditionsMax_iter = 10000Epsilon= 1e-5#Initialize weight valuenp.random.seed (0) W= NP.RANDOM.RANDN (2) v= Np.zeros (2)#Updated speed ParametersAlpha= 0.001#Step Sizediff =0.error= Np.zeros (2) Count= 0#Number of CyclesEPS= 0.9#attenuation force, which can be used to adjust, the greater the value of the previous gradient on the current direction of the impact of the greater whileCount <Max_iter:count+ = 1sum_m= Np.zeros (2) Index= Random.sample (Range (m), Int (Np.ceil (M * 0.2)) Sample_data=Input_data[index] Sample_target=Target_data[index] forIinchRange (len (sample_data)): Dif= (Np.dot (w, input_data[i])-target_data[i]) *Input_data[i] Sum_m= Sum_m +DIF v= EPS * V-alpha * sum_m#the speed update is herew = w + V#use momentum to update parameters ifNp.linalg.norm (W-error) <Epsilon: Break Else: Error=WPrint 'Loop count =%d'% count,'\tw:[%f,%f]'% (W[0], w[1])
The same convergence condition, the speed is really faster than the MBGD, use less times
Results:
Loop count = 432 w:[8.285241, 3.150939]
Python implementation of the Momentum (momentum) method