[NN] Some understandings of backpropagation (BP, error reverse propagation)

Source: Internet
Author: User

This article is heavily referenced by David E. Rumelhart, Geoffrey E. Hinton and Ronald J. Williams, Learning representation by back-propagating errors, Nature, 323 (9): p533-536, 1986.

In modern neural networks, the most used algorithms are backward propagation (BP). Although BP has a slow convergence, easy to fall into the local minimum and other defects, but its ease of use, accuracy is unmatched by other algorithms.

in this article, $w _{ji}$ is the weight of the previous layer of $unit_{i}$ and the next layer of $unit_{j}$.

In MLP, for the latter layer of neuron $unit_{j}$, its input $x_{j}$ is computed as follows (ignoring bias):

$x _{j} = \sum_{i} y_{i} w_{ji}$

You can see that its input equals the weighted sum of the output $y_{i}$ and corresponding connections of all neurons in the previous layer.

The output of the $unit_{j}$ is calculated as follows:

$y _{j} = \frac{1}{1+e^{-x_{j}}}$.

For supervised training, there are expected output $d$ and actual output $y$, which can be defined as:

$E =\frac{1}{2}\sum_{c} \sum_{j} (Y_{j,c}-d_{j,c}) ^2$

In order to find out $\partial e/\partial w_{ji}$, we first seek $\partial e/\partial y_{j}$ (then we know why):

$\partial e/\partial Y_{j} = y_{j}-d_{j}$

By the chain rule:

$\partial e/\partial X_{j} = \partial e/\partial y_{j} * d y_{j}/d x_{j}$, and the above input and output relations

$d y_{j}/d X_{j} = (\frac{1}{1+e^{-x_{j}}) ' = \frac{e^{-x_{j}}}{(1+e^{-x_{j}}) ^{2}} = y_{j} * (1-y_{j}) $ available:

$\partial e/\partial X_{j} = \partial e/\partial y_{j} * y_{j} * (1-y_{j}) $

At this point, we get the $layer_{j}$ error $e$ for the input $x_{j}$, but the network training is the weight (bias), so we must know $e$ for the $w_{ji}$ of the partial derivative expression.

Also by the chain rule:

$\partial e/\partial W_{ji} = \partial e/\partial x_{j} * \partial x_{j}/\partial w_{ji}$, and the relationship of input and weight of this layer:

$x _{j} = \sum_{i} y_{i} w_{ji}$, available $\partial x_{j}/\partial W_{ji} = y_{i}$, i.e.:

$\partial e/\partial W_{ji} = \partial e/\partial x_{j} * y_{i}$, tidy up,

$\partial e/\partial W_{ji} = (Y_{j}-d_{j}) * Y_{j} * (1-y_{j}) * y_{i}$

Where $y_{i}$ is the output of the $unit_{i}$ (after a nonlinear transformation), the output of the $y _{j}$ is $layer_{j}$.

Also in accordance with the chain rule, for the first neuron, we can obtain the gradient of the error to its output:

$\partial e/\partial Y_{i} = \partial e/\partial x_{j} * \partial x_{j}/\partial y_{i} = \partial E/\partial x_{j} * W_{ji }$, taking into account the first

So far, using the above formula, as long as the known expected output $d_{j}$ and each layer of output $y_{i}$, we can roll out the error relative to each layer of the weight of the gradient, so the weight of the adjustment, adjust the formula as follows:

$\delta w =-\epsilon \partial e/\partial w$

[NN] Some understandings of backpropagation (BP, error reverse propagation)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.