Deep Learning Foundation--Neural network--bp inverse propagation algorithm

Source: Internet
Author: User

BP algorithm:

  1. is a supervised learning algorithm, often used to train multilayer perceptron.

2. The excitation function required for each artificial neuron (i.e. node) must be micro-

(Excitation function: the function relationship between the input and output of a single neuron is called the excitation function.) )

(If the excitation function is not used, each layer in the neural network is simply a linear transformation, and the multilayer input is also a linear transformation after stacking.) Because the linear model is not expressive enough , the excitation function can introduce nonlinear factors.

  

The following two images are: neural networks with no excitation function and neural networks for excitation functions

    

  

, the difference after adding a nonlinear activation function: To split a plane with a linear combination approximation to a smoothing curve, to split a plane with a smooth curve.

  

  3. This algorithm is particularly suitable for training feedforward neural networks.

feedforward Neural network: A kind of artificial neural network. In this kind of neural network, each neuron starts from the input layer, receives the previous input, and enters to the next level until the output layer. There is no feedback in the whole network, can be represented by a direction-free graph. According to the number of layers, it can be divided into single-layer feedforward neural network and multilayer feedforward neural network. Common Feedforward neural networks are: perceptual machine (perceptions), BP (back propagation reverse propagation) network, RBF (Radial Basis Function path basis function) network, etc.

The neural network that is commonly seen is this:

When calculating: The output of the signal from the input---"is passed forward."

So where does the feedback come from?

When our network is not well trained, the output must be different from the imagination, when the deviation, and the deviation of the first-level forward transmission, layer by level, this is feedback. The feedback is used to obtain partial derivative, and the partial derivative is used for gradient descent, and the gradient descent is to obtain the minimum value of the cost function, so that the error between the expectation and the output is as low as possible. (You can also define a cost function, you may not need feedback.)

4. Mainly by two links (excitation propagation, weight update) iterative iteration, until the network response to input to reach a predetermined target range.

  Incentive communication:

The propagation link in each iteration consists of two steps:

1. (Forward propagation phase) The training input is fed into the network to obtain an excitation response.

2. (Reverse propagation phase) The excitation response and the training input corresponding to the target output is poor, so as to obtain the response error of the hidden layer and the output layer.

      

Weight update:

For the weights on each synapse, follow these steps to update:

1. Multiply the input excitation and response error to get the gradient of the weights.

2. Multiply this gradient by a percentage (this ratio will affect the speed and effect of the training process and is therefore called the ' Training Factor '. The direction of the gradient indicates the direction of the error enlargement , so it is necessary to reverse the weight when it is updated, thereby reducing the error caused by the weight, and adding the weights to the inverse.

Comprehensive:

  The idea of a reverse propagation algorithm is as follows: Given a sample, we first perform a "forward conduction" operation to calculate all the activation values in the network, including the output values. Then, for each node of the first layer, we calculated its "residuals", which indicate how much of the image the node produces for the residual of the final output value. For the final output node, we can directly calculate the difference between the activation value generated by the network and the actual value, and we define the gap as (the first layer represents the output layer). What do we do with hidden units? We will compute the weighted average of the residuals based on the node (translator Note: Layer node) as input.

Important Reference http://www.cnblogs.com/Crysaty/p/6126321.html

Deep Learning Foundation--Neural network--bp inverse propagation algorithm

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.