cs231n Introduction
See cs231n Course note 1:introduction.
This article is the author's own thinking, correctness has not been verified, welcome advice. Homework Notes
This part is Momentum,rmsprob, Adam three optimization algorithm, optimization algorithm is used to start from random points, and gradually find the local optimal point algorithm. For a detailed description of the various optimization algorithms, refer to CS231N Course Note 6.1: Optimizing the Sgd,momentum,netsterov Momentum,adagrad,rmsprop,adam of the iterative algorithm. 1. Momentum
Equation:
v = mu*v-learning_rate*dx
x + = V
Code:
v = v*config[' momentum ']-config[' learning_rate ']*dw
next_w = w + V
2. Rmsprop
Equation:
Cache = Cache*decay_rate + (1-decay_rate) *dx*dx
x-= learning_rate * dx/(sqrt (cache) +1e-7)
Code:
config[' cache '] = config[' cache ']*config[' decay_rate '] + (1-config[' decay_rate ']) *dx*dx
next_x = x-config[' Learning_rate ']*dx/np.sqrt (config[' cache ']+config[' epsilon ')
3. Adam
This algorithm needs to note that the PPT in the equation is wrong, the correct method is the following figure, the main difference is the bias correction part, not update M and V, see Adam:a Method for Stochastic optimization
Also pay attention to the update of T, this part is also not shown in the PPT.
Code:
m = config[' m ']*config[' Beta1 ']+ (1-config[' beta1 ']) *dx
v = config[' V ']*config[' Beta2 ']+ (1-config[' Beta2 ']) *dx* DX
config[' t '] + = 1
MB = m/(1-config[' beta1 ']**config[' t '])
vb = V/(1-config[' Beta2 ']**config[' t '])
next_x = x-config[' learning_rate ']*mb/(Np.sqrt (VB) +config[' Epsilon '])
config[' m '] = m
config[' V '] = V