Deep Learning Notes (i): Logistic classification
Deep learning Notes (ii): Simple neural network, back propagation algorithm and implementation
Deep Learning Notes (iii): activating functions and loss functions
Deep Learning Notes: A summary of optimization methods
Deep Learning Notes (iv): The concept, structure and code annotation of cyclic neural networks
Deep Learning Notes (v): lstm
Deep Learning Notes (vi): Encoder-decoder model and attention model
Recently looking at Google's deep learning a book, see the Optimization method that part, just before the TensorFlow is also on those optimization methods of a smattering of, so after reading after finishing down, mainly the first-order gradient method, including SGD, momentum, Nesterov Momentum, Adagrad, Rmsprop, Adam. Where Sgd,momentum,nesterov momentum are manually assigned learning rates, and Adagrad, Rmsprop, Adam, can automatically adjust the learning rate.
Second-order methods at present I am too poor to understand .... BGD
namely batch gradient descent. In training, each iteration uses all the content of the training set. That is, using existing parameters to generate an estimated output yi^ for each input in the training set, and then comparing it to the actual output Yi, statistics all the errors, averaging the average error later, as the basis for updating the parameters.
Specific implementation:
Required: Learning rate Ε, initial parameter theta
Iterative process per step:
1. Extract all contents of the training set {x1,..., xn}, and related output yi
2. Calculate gradients and errors and update parameters: