bp algorithm derived from neural network error inverse propagation algorithm

Last Update:2017-11-29 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

?? The error inverse propagation algorithm is by far the most successful neural network learning algorithm, the use of neural networks in practical tasks, mostly using BP algorithm to train.
?? Given training set\ (d={(x_1,y_1), (x_2,y_2),...... (x_m,y_m)},x_i \in r^d,y_i \in r^l\), that is, the input example is\ (d\)Attribute description, Output\ (l\)a result. , is a typical single-layer Feedforward network, which has\ (d\)An input neuron,\ (l\)An output neuron,\ (q\)A hidden layer of neurons, of which,\ (\theta_j\)Represents the first\ (j\)The threshold value of a neuron,\ (\gamma_h\)Indicates that the hidden layer section\ (h\)Threshold of a neuron, input layer\ (i\)A neuron and a hidden layer\ (h\)The weight of a neuron connection is\ (v_{ih}\), Hidden layer Section\ (h\)The neuron and the output layer first\ (j\)The weight of a neuron connection is\ (w_{hj}\)。
?? Thus, according to the transmission law of the neural network, the hidden layer\ (h\)Input received by a neuron\ (\alpha_h=\sum_{i=1}^dv_{ih}x_i\)Output\ (b_h=f (\alpha_h-\gamma_h) \), the output layer\ (j\)Input of the first neuron\ (\beta_j=\sum_{h=1}^qw_{hj}b_h\)Output\ (\widehat{y}_j^k=f (\beta_j-\theta_j) \), where the function is an "activation function",\ (\gamma_h\)And\ (\theta_j\)Is the threshold of hidden layer and output layer, select Sigmoid function\ (f (x) =\frac{1}{1+e^{-x}}\)As an activation function.
?? Example of a training sample\ ((x_k,y_k) \), the output after the neural network is\ (\widehat{y}_k= (\widehat{y}_1^k,\widehat{y}_2^k,......, \widehat{y}_l^k) \), the mean square error is\[e_k=\frac{1}{2}\sum_{j=1}^{l} (\widehat{y}_j^k-y_j^k) ^2\tag{1}\]
In order to minimize the mean square error of the output, we adjust the negative gradient direction of weights with the mean square error, given the learning rate\ (\eta\)，\[\delta w_{ij}=-\eta\frac{\partial e_k}{\partial w_{ij}}\tag{2}\]Why do we take the negative gradient direction here? Because we are the least square error, and\ (w\)The updated estimate for\[w=w+\delta W\tag{3}\]If\ (\frac{\partial e_k}{\partial w_{ij}}>0\), it indicates that the reduction\ (w\)To reduce the mean square error, so\ (\delta w\)should be less than 0, conversely, if\ (\frac{\partial e_k}{\partial w_{ij}}<0\), it indicates that increasing\ (w\)Value can reduce the mean square error, so\ (\delta w\)should be greater than 0, so here to take a negative bias, in order to ensure that the weight of the change is towards reducing the direction of the mean square error.
?? In this neural network,\ (e_k\)is about\ (\widehat{y}_j^k\)The function,\ (\widehat{y}_j^k\)is about\ (\beta_j\)The function, while\ (\beta_j\)is about\ (w_{ij}\)function, so there's\[\frac{\partial e_k}{\partial w_{ij}}=\frac{\partial e_k}{\partial \widehat{y}_j^k}.\frac{\partial \widehat{y} _j^k}{\partial \beta_j}. \frac{\partial \beta_j}{\partial W_{ij}}\tag{4}\]Obviously\[\frac{\partial \beta_j}{\partial w_{ij}}=b_h\tag{5}\]
And the sigmoid function has\[f ' (x) =f (x) (1-f (x)) \tag{6}\]So\[\begin{aligned}g_j&=-\frac{\partial e_k}{\partial \widehat{y}_j^k}\cdot\frac{\partial \widehat{y}_j^k}{ \partial \beta_j}\&=-(\widehat{y}^k_j-y^k_j) F (\beta_j-\theta_j) \&=-(\widehat{y}^k_j-y^k_j) F (\beta_j-\ Theta_j) (1-f (\beta_j-\theta_j)) \&=-(\widehat{y}^k_j-y^k_j) \widehat{y}^k_j (1-\widehat{y}^k_j) \tag{7}\end{ Aligned}\]

The formula (7) in the formula (3) and the formula (4), we get the BP algorithm about\ (\delta w_{ij}\)The update formula\[\delta W_{ij}=\eta g_jb_h\tag{8}\]Similar to be available,
\[\delta \theta_j=-\eta g_j\tag{9}\]\[\delta V_{ih}=\eta e_hx_i\tag{10}\]\[\delta \gamma_h=-\eta e_h\tag{11}\]
Wherein, formula (10) and formula (11)
\[\begin{aligned}e_h&=-\frac{\partial e_k}{\partial b_h}.\frac{\partial b_h}{\partial \alpha_h}\\&=-\ Sum_{j=1}^l\frac{\partial e_k}{\partial \beta_j}.\frac{\partial \beta_j}{\partial b_h}f ' (\alpha_h-\gamma_h) \\& =\sum_{j=1}^lw_{hi}g_jf ' (\alpha_h-\gamma_h) \\&=b_h (1-b_h) \sum_{j=1}^lw_{hj}g_j\end{aligned}\tag{12}\]
?? At this point, the derivation of the error inverse propagation algorithm has been completed, we can look back to see why this algorithm is called error inverse propagation algorithm? Error inverse propagation, as the name implies is to let the error along the neural network reverse propagation, according to the above deduction,
\ (\delta w_{ij}=\eta (y^k_j-\widehat{y}^k_j). \frac{\partial \widehat{y}_j^k}{\partial \beta_j}.b_h=\eta g_jb_h\ )which\ ((y^k_j-\widehat{y}^k_j) \)is the output error,\ (\frac{\partial \widehat{y}_j^k}{\partial \beta_j}\)
is the output of the output layer node\ (y\)For input\ (\beta\)Can be regarded as the adjustment factor of the error, we call\ (g_j\)"Adjusted error";\ (\delta v_{ih}=\eta e_hx_i\),\ (E_h=b_h (1-b_h) \sum_{j=1}^lw_{hj}g_j=\frac{\partial b_h}{\partial \alpha_h}\sum_{j=1}^lw_{hj}g_j\)So\ (e_h\)Can be seen as "adjusted error"\ (g_j\)Through the neural network after the adjustment of the error, and we can see: the weight of the adjustment amount = Learning Rate x adjusted error x upper node output, is the error reverse propagation method on the surface of the popular understanding, to help memory. <\font>

bp algorithm derived from neural network error inverse propagation algorithm

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

bp algorithm derived from neural network error inverse propagation algorithm

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

bp algorithm derived from neural network error inverse propagation algorithm

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support