Neural network post-propagation algorithm

Last Update:2014-11-17 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This paper, based on the http://en.wikipedia.org/wiki/Backpropagation of Wikipedia, makes a summary of the neural network's back propagation algorithm, and makes a simple formula derivation.

A typical post-propagation algorithm for a 3-layer neural network with only 1 hidden layers is as follows:

Initialize network weights (often small random values)  do     ForEach Training example ex        prediction = Neural-net -output (Network, ex)  //forward pass        actual = Teacher-output (ex)        compute error (prediction-actual) at the OU Tput units        Compute $\delta$ $w _h$ for all weights from hidden layer to output layer  //backward pass        COMPUTE $ \delta$ $w _i$ for all weights from input layer to hidden layer   //Backward pass continued        Update network weights Input layer not modified by error estimate  until all examples classified correctly or another stopping criterion s Atisfied  return the network

The simple explanation is that when the predicted results of the network come out, the error rate is calculated compared to the actual results, then the partial derivative of the error rate is computed for each weight $w_{ij}$, then the weights are adjusted according to the partial derivative (gradient).

Mark:

1, $net _j$ is recorded as the original output of $j$ neurons;

2, $o _j$ is recorded as the final output of $j$ neurons;

3, the conversion function is recorded as $\varphi (\mbox{net}_{j}) = \varphi\left (\sum_{k=1}^{n}w_{kj}x_k\right) =o_{j} $;

4, $E $ for the entire network error function, generally for the target value $t$ with the predicted value (output value) $y $ function $e=f (t,y) $;

5. $L _j$ is a collection of all output nodes $j$ the node, without causing ambiguity, the subscript will be omitted.

What we end up asking for is $\frac{\partial e}{\partial w_{ij}}$, according to the chain rules:

$$\frac{\partial e}{\partial W_{ij}} = \frac{\partial e}{\partial o_j} \frac{\partial o_j}{\partial\mathrm{net_j}} \ Frac{\partial \mathrm{net_j}}{\partial W_{ij}} $$

By definition, $net _j$ is a linear function of $w_{ij}$, so the last item on the right:

$$\frac{\partial \mathrm{net_j}}{\partial W_{ij}} = \frac{\partial}{\partial W_{ij}}\left (\sum_{k=1}^{n}w_{kj}x_k\ right) = x_i$$

The second item on the right is the derivative of the conversion function $\varphi (z) $, which is generally used more as a logistic function, $ \varphi (z) = \frac{1}{1+e^{-z}}, whose derivative is

$$\frac {\partial\varphi}{\partial Z} = \varphi (1-\varphi) $$

The main derivation is the right of the first item $\frac{\partial e}{\partial o_j}$.

When $j$ is in the output layer, it is a good calculation, that is, the partial derivative of $e$ to $y$: $\frac{\partial e}{\partial O_j} = \frac{\partial e}{\partial y} $.

When $j$ is in the hidden layer, we have the following relationship:

$$ \frac{\partial e}{\partial O_j} = \sum_{l \in l} \left (\frac{\partial e}{\partial \mathrm{net}_l}\frac{\partial \mathr M{net}_l}{\partial o_j}\right) = \sum_{l \in L} \left (\frac{\partial e}{\partial o_{l}}\frac{\partial o_{l}}{\partial \ Mathrm{net}_l}w_{jl}\right) $$

This is a recursive formula, from the back layer gradually forward layer, and the final layer is the output layer, given by the above formula, together, we have:

$$\dfrac{\partial e}{\partial W_{ij}} = \delta_{j} x_{i}$$

where

$$\DELTA_{J} = \frac{\partial e}{\partial o_j} \frac{\partial o_j}{\partial\mathrm{net_j}} = \begin{cases} (O_{j}-t_{j} ) \varphi (\mbox{net}_{j}) (1-\varphi (\mbox{net}_{j)) & \mbox{if} J \mbox{is an output neuron,}\\ (\sum_{l\in l} \delt A_{l} W_{JL}) \varphi (\mbox{net}_{j}) (1-\varphi (\mbox{net}_{j)) & \mbox{if} J \mbox{is an inner neuron.} \end{cases } $$

$$$$

Given the learning rate, we can get the weights adjusted to:

$$\delta W_{ij} =-\alpha \frac{\partial e}{\partial W_{ij}} $$

Differences from the L-BFGS algorithm

L-bfgs is a quasi-Newton method, which is the most advantageous algorithm for the objective function. The latter propagation is a neural network optimization algorithm, of course, its objective function is $e$. Neural network optimization, is based on the sample, supervised, that is, $e$ is the sample and weight (==> to adjust the coefficient) of the function, each sample given, corresponding to a configuration, the sample value can be regarded as a super-parameter, the weight is a variable, equivalent to the case of a given super-parameter to seek global optimal variables. Once a sample has been optimized, the next sample is optimized.

L-bfgs can be seen as, the hyper-parameter implied is given and therefore unsupervised.

Broadly speaking, all the samples are included, then a generalized objective function can be written, corresponding to the batch optimization.

In a narrow sense, each sample corresponds to a target function, which corresponds to an incremental optimization.

Regardless of the incremental Optimization batch optimization, the neural network has a target function, after the sample is given, the objective function is the weight of the function $e=e (T,o (x,w)) =e (W; x,t) $,$ (x,t) $ is a sample. For layered neural networks, O (W; X) is a nested function.

For networks with only input and output two layers:

$ $O (W; X) =\varphi (w\times x) $$

Only the gradient vectors are required $g_k=\nabla E (W) $ and Haisen Matrix $H_K=\NABLA^2E (W) $ to set (quasi) Newton method.

Neural network post-propagation algorithm

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Neural network post-propagation algorithm

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Neural network post-propagation algorithm

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support