Origin: Linear neural network and single layer Perceptron
An ancient linear neural network, using a single-layer Rosenblatt Perceptron. The Perceptron model is no longer in use, but you can see its improved version: Logistic regression.
You can see this network, the input-weighted-------The calculation of classification error, and iterative modification W, B, in fact, and mathematical regression fitting indistinguishable.
Iterative modification of the parameters, so that the objective function convergence of the mathematical process, by some neuroscientists brain to complement the work process of neurons. So all kinds of tall on the neural network began to disorderly blow.
The model has been improved by logistic:
The ①rosenblatt Perceptron uses the SGN function as a weighted map to the input layer, and the function is not very smooth, and the distinguishing feature is not good.
So the logistic regression changed into the Logistic-sigmoid function. (The sigmoid function is considered to be the artifact of mapping distinguishing features and also the core of the nonlinear classification of BP networks.)
② uses the method of probability theory-maximum likelihood estimation to design the error objective function.
The mathematical principle of LMS (minimum mean square) used in linear neural networks can be obtained by the maximum likelihood estimation + hypothesis error probability model. (see Andrew Ng Video)
The likelihood function's numerical range is larger than the LMS, especially in two cases, the LMS has a smaller range of values.
In fact, the origins of both models are linear regression of least squares. The difference is that the early resolution of linear regression using matrix Solver (Newton method).
The mathematical method of convergence of objective function by gradient iteration can be regarded as the origin of neural network.
The structure and working mode of part I:BP network
In the BP network, the implicit layer (Hidelayer) setting is used to differentiate the feature by frantically parameter weighting and mapping through the fully connected network +sigmoid function.
Mathematically it is difficult to explain the principle. (In contrast to SVM support vector machines, maximizing the spacing mathematical principle makes you speechless).
The number of hidden layers and neurons in each layer determines the complexity of the network, and the data show that the number of neurons in each layer is similar to the number of samples and the network efficiency is the highest.
The BP network works in two parts:
①FP (Front propagation) forward propagation: input, linear weighted to hidden layer---sigmoid in the hidden layer and output--linear weighted to the input layer
The input layer uses sigmoid (or linear functions) to process the input, classify it, calculate the error and accumulate it to the total error (objective function)
②BP (back propagation) reverse propagation: Starting from the input layer, the error of the current processing single data, through the gradient method to update Wij, Bij.
Due to the particularity of the network, the update of WMI and BMI depends on Wij, Bij, and can only be updated in reverse.
There are two methods of training data in BP network:
① Single Sample serial: Enter each sample sequentially and perform FP, BP, cumulative total error for each sample.
Once executed, the iteration is counted, and the second iteration is resumed starting with the first sample. Until the total error converges, exit the iteration.
② Batch Sample parallel: (Leave pit, will not hand 艹 code)
Formula derivation of part II:BP process
Define UI,VI,UJ,VJ, which are the I/O of the hidden layer and input layer, respectively.
Derivative of the sigmoid function: $S ^{'} (x) =s (x) (1-s (x)) $
BP Neural network