Deep Learning Algorithm Practice 8---BP algorithm detailed

Source: Internet
Author: User

BP algorithm is about the error of the reverse propagation algorithm, that is, starting from the output layer, the results compared with the expected results, to find out the error, and then according to the gradient of the maximum descent direction, adjust the connection weights of the neurons, and then in turn to adjust the level of the connections between each layer, for the bulk learning method, repeating the process Until the error reaches a sufficient hour.

For the output layer, we can directly use the algorithm of the Perceptron model in the previous blog post, the difficulty of BP algorithm is how to deal with the hidden layer, because the hidden layer does not have the correct output information to calculate the error.

Below we will start with the output layer and derive the BP algorithm. Before the RWY derivation algorithm, we first define the representation method, and the original, we use I to represent the input (the previous layer) signal sequence number, we use superscript n to be the nth training sample, we use subscript j to represent the output layer (the latter layer) of the neuron number, the current layer is represented by L, the previous layer with L-1.

Let's look at the output layer, for the output layer, because we know the desired output for each training sample, so we can easily define the error for each training sample:

Style 8.1

In the formula J is the output layer neuron sequence number, L is the output layer neuron quantity. We assume that the output layer is the L layer, the first layer is the hidden layer is L-1 layer, then from the L-1 Layer I node to the L layer (output layer) J node of the connection weights are recorded as, our goal is based on the error value of the adjustment amount.

For each neuron in the output layer, we first seek the error for each node of the bias Guide:

Style 8.2

Error to the bias of each output node input:

Style 8.3

Deviations for each connection weight value:

Style 8.4

The value can be calculated by taking equation 8.3 into equation 8.4, because there are formulas:

Style 8.5

The adjustment value of the connection weights for all output layers.

So we finish the processing of the output layer, the following we see the output layer directly connected to the previous layer of L-1, whose neuron number is I, we also first for each neuron error for the output of the bias guide:

Style 8.6

The last item in the formula 8.6 is calculated when we process the output layer and can be used directly.

Next we ask for an error for the bias of the input quantity:

Style 8.7

The last one has been calculated by the Formula 8.6 formula.

Deviations for each connection weight value:

Style 8.8

The last item in equation 8.8 has been calculated in Equation 8.7, so the weight adjustment formula is:

Style 8.9

The formula 8.8 into the formula 8.9, you can calculate the output layer of the previous layer of the L-1 layer all the connection weights of the adjustment amount.

In fact, the above formula has completely derived the BP algorithm, but for the sake of intuition, we can find the formula of L-2 layer, which is the most typical form of hidden layer. Assume that the sequence number of the L-2 layer neuron is H.

For each neuron in the L-2 layer, the deviation of the output is obtained:

Style 8.10

The last item in the formula 8.10 is completed at the L-1 level and can be used directly here.

We then find the bias of the error to the input of the L-2 layer:

Style 8.11

The last item of formula 8.11 can be obtained by Formula 8.10.

Next, the error is biased to the first connection weight value: Here we assume that the neuron number of the l-3 layer is G

Style 8.12

The last item of formula 8.12 can be calculated by formula 8.11, so the weight adjustment formula is:

Style 8.13

At this point, we have completed the BP algorithm derivation process of multilayer network completely, it can be seen that the most simple neural network algorithm, but also need a complex derivation process. Moreover, the attentive reader may find that the derivation process above is actually only for the online learning method, which adjusts the network connection weights for the error of each training sample. We know that this approach might have the power to adjust too randomly and inefficiently, so we need to use a bulk learning approach, and then the upper brown derivation process will be more complex and not deduced here, because mathematics is not the focus of our attention.

In fact, this is not the end, for mathematicians, although this algorithm comes out, but also need to prove that this method, in the finite iteration, can achieve the global optimal solution, both need to prove its convergence, too difficult, it is not involved.

In the next blog post, we will use the Theano framework to implement a simple multilayer forward network, and you can see how the theory and practice are combined.




Deep Learning Algorithm Practice 8---BP algorithm detailed

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.