From linear regression to neural network
Mini-batchsgd
Forward propagation calculation loss reverse propagation calculation gradient, updating parameters according to gradient
Topological sort forward and reverse of graphs
Class Computationalgraph (object):
def Forward (inputs):
# 1.[ Pass inputs to input gates ...]
# 2.forward The computational graph: for
Gate in self.graph.nodes_topologically_sorted ():
Gate.forward ()
Return loss #the Final gate in the graph outputs the loss
def backward (): For
Gate in reversed (self.graph.nodes_to Pologically_sorted ()):
Gate.backward () #little piece of backprop (chain rule applied) return
inputs_gradients
Batch regularization of Batchnormalization
Advantages: Increase the gradient flow, but use a greater learning rate, reduce the dependence on initialization, by the role of regularization, reduce the use of dropout
Activate function
Data preprocessing
Learning Rate
Loss does not fall, the learning rate is too small
Loss explosion, learning rate is too large, when Nan, is the learning rate is too big
Attenuation of learning rate
1. Decrease after certain Epoch Times
2. Decrease in the number of points
3. Linear decrease over time
Optimization Method Adam
Rmsprop
The second order optimization method
Dropout
Why dropout is effective. Dropout is equivalent to a combination of a bunch of models, each opening and closing is a model, when testing, Monte Carlo estimate is to use a different model for all the results of the average. Or use one forward propagation to open all the nodes