[NN] Some understandings of backpropagation (BP, error reverse propagation)

Last Update:2014-12-15 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This article is heavily referenced by David E. Rumelhart, Geoffrey E. Hinton and Ronald J. Williams, Learning representation by back-propagating errors, Nature, 323 (9): p533-536, 1986.

In modern neural networks, the most used algorithms are backward propagation (BP). Although BP has a slow convergence, easy to fall into the local minimum and other defects, but its ease of use, accuracy is unmatched by other algorithms.

in this article, $w _{ji}$ is the weight of the previous layer of $unit_{i}$ and the next layer of $unit_{j}$.

In MLP, for the latter layer of neuron $unit_{j}$, its input $x_{j}$ is computed as follows (ignoring bias):

$x _{j} = \sum_{i} y_{i} w_{ji}$

You can see that its input equals the weighted sum of the output $y_{i}$ and corresponding connections of all neurons in the previous layer.

The output of the $unit_{j}$ is calculated as follows:

$y _{j} = \frac{1}{1+e^{-x_{j}}}$.

For supervised training, there are expected output $d$ and actual output $y$, which can be defined as:

$E =\frac{1}{2}\sum_{c} \sum_{j} (Y_{j,c}-d_{j,c}) ^2$

In order to find out $\partial e/\partial w_{ji}$, we first seek $\partial e/\partial y_{j}$ (then we know why):

$\partial e/\partial Y_{j} = y_{j}-d_{j}$

By the chain rule:

$\partial e/\partial X_{j} = \partial e/\partial y_{j} * d y_{j}/d x_{j}$, and the above input and output relations

$d y_{j}/d X_{j} = (\frac{1}{1+e^{-x_{j}}) ' = \frac{e^{-x_{j}}}{(1+e^{-x_{j}}) ^{2}} = y_{j} * (1-y_{j}) $ available:

$\partial e/\partial X_{j} = \partial e/\partial y_{j} * y_{j} * (1-y_{j}) $

At this point, we get the $layer_{j}$ error $e$ for the input $x_{j}$, but the network training is the weight (bias), so we must know $e$ for the $w_{ji}$ of the partial derivative expression.

Also by the chain rule:

$\partial e/\partial W_{ji} = \partial e/\partial x_{j} * \partial x_{j}/\partial w_{ji}$, and the relationship of input and weight of this layer:

$x _{j} = \sum_{i} y_{i} w_{ji}$, available $\partial x_{j}/\partial W_{ji} = y_{i}$, i.e.:

$\partial e/\partial W_{ji} = \partial e/\partial x_{j} * y_{i}$, tidy up,

$\partial e/\partial W_{ji} = (Y_{j}-d_{j}) * Y_{j} * (1-y_{j}) * y_{i}$

Where $y_{i}$ is the output of the $unit_{i}$ (after a nonlinear transformation), the output of the $y _{j}$ is $layer_{j}$.

Also in accordance with the chain rule, for the first neuron, we can obtain the gradient of the error to its output:

$\partial e/\partial Y_{i} = \partial e/\partial x_{j} * \partial x_{j}/\partial y_{i} = \partial E/\partial x_{j} * W_{ji }$, taking into account the first

So far, using the above formula, as long as the known expected output $d_{j}$ and each layer of output $y_{i}$, we can roll out the error relative to each layer of the weight of the gradient, so the weight of the adjustment, adjust the formula as follows:

$\delta w =-\epsilon \partial e/\partial w$

[NN] Some understandings of backpropagation (BP, error reverse propagation)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

[NN] Some understandings of backpropagation (BP, error reverse propagation)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

[NN] Some understandings of backpropagation (BP, error reverse propagation)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support