Activation function:
1) sigmoid function-domain value (0,1)
2) Tanh Function-domain value ( -1,1)
Two functions are extended to a vector representation:
-Number of network layers
-Number of nodes in layer L (excluding offset units)
-The connection parameter between unit J of section L and Unit I of section l+1, size
-Offset of unit I of section l+1
-Activation value of layer L
-L unit input weighted sum (including offset unit)
-Sample
M-Number of samples
α-Learning rate
λ-weight attenuation parameter, the relative importance of controlling variance cost function.
Hw,b (x) =a
Forward propagation
Initialization
(1)
Back to Propagation
===> Objective : minimization of the overall cost function
(2)
A single-sample cost function, plus a ruleset, reduces the weight amplitude to prevent overfitting.
===> Gradient Descent method
(3)
Problem turns to two derivative
Conversion of ===> integral partial derivative into single sample partial derivative
(4)
===> of the posterior propagation for single sample biasing
Ask for the last layer first
(5)
Neural network pseudo-code
When the end condition is not met (iteration count/Result error rate, etc.):
For all layers, 0
For each sample (x, y):
#前向传播
#后向传播
Use the Formula Group (5) to calculate and
Update weight parameters
Divine Network-UFLDL tutorial notes