bp algorithm derived from neural network error inverse propagation algorithm

Source: Internet
Author: User

?? The error inverse propagation algorithm is by far the most successful neural network learning algorithm, the use of neural networks in practical tasks, mostly using BP algorithm to train.
?? Given training set\ (d={(x_1,y_1), (x_2,y_2),...... (x_m,y_m)},x_i \in r^d,y_i \in r^l\), that is, the input example is\ (d\)Attribute description, Output\ (l\)a result. , is a typical single-layer Feedforward network, which has\ (d\)An input neuron,\ (l\)An output neuron,\ (q\)A hidden layer of neurons, of which,\ (\theta_j\)Represents the first\ (j\)The threshold value of a neuron,\ (\gamma_h\)Indicates that the hidden layer section\ (h\)Threshold of a neuron, input layer\ (i\)A neuron and a hidden layer\ (h\)The weight of a neuron connection is\ (v_{ih}\), Hidden layer Section\ (h\)The neuron and the output layer first\ (j\)The weight of a neuron connection is\ (w_{hj}\)。
?? Thus, according to the transmission law of the neural network, the hidden layer\ (h\)Input received by a neuron\ (\alpha_h=\sum_{i=1}^dv_{ih}x_i\)Output\ (b_h=f (\alpha_h-\gamma_h) \), the output layer\ (j\)Input of the first neuron\ (\beta_j=\sum_{h=1}^qw_{hj}b_h\)Output\ (\widehat{y}_j^k=f (\beta_j-\theta_j) \), where the function is an "activation function",\ (\gamma_h\)And\ (\theta_j\)Is the threshold of hidden layer and output layer, select Sigmoid function\ (f (x) =\frac{1}{1+e^{-x}}\)As an activation function.
?? Example of a training sample\ ((x_k,y_k) \), the output after the neural network is\ (\widehat{y}_k= (\widehat{y}_1^k,\widehat{y}_2^k,......, \widehat{y}_l^k) \), the mean square error is\[e_k=\frac{1}{2}\sum_{j=1}^{l} (\widehat{y}_j^k-y_j^k) ^2\tag{1}\]
In order to minimize the mean square error of the output, we adjust the negative gradient direction of weights with the mean square error, given the learning rate\ (\eta\),\[\delta w_{ij}=-\eta\frac{\partial e_k}{\partial w_{ij}}\tag{2}\]Why do we take the negative gradient direction here? Because we are the least square error, and\ (w\)The updated estimate for\[w=w+\delta W\tag{3}\]If\ (\frac{\partial e_k}{\partial w_{ij}}>0\), it indicates that the reduction\ (w\)To reduce the mean square error, so\ (\delta w\)should be less than 0, conversely, if\ (\frac{\partial e_k}{\partial w_{ij}}<0\), it indicates that increasing\ (w\)Value can reduce the mean square error, so\ (\delta w\)should be greater than 0, so here to take a negative bias, in order to ensure that the weight of the change is towards reducing the direction of the mean square error.
?? In this neural network,\ (e_k\)is about\ (\widehat{y}_j^k\)The function,\ (\widehat{y}_j^k\)is about\ (\beta_j\)The function, while\ (\beta_j\)is about\ (w_{ij}\)function, so there's\[\frac{\partial e_k}{\partial w_{ij}}=\frac{\partial e_k}{\partial \widehat{y}_j^k}.\frac{\partial \widehat{y} _j^k}{\partial \beta_j}. \frac{\partial \beta_j}{\partial W_{ij}}\tag{4}\]Obviously\[\frac{\partial \beta_j}{\partial w_{ij}}=b_h\tag{5}\]
And the sigmoid function has\[f ' (x) =f (x) (1-f (x)) \tag{6}\]So\[\begin{aligned}g_j&=-\frac{\partial e_k}{\partial \widehat{y}_j^k}\cdot\frac{\partial \widehat{y}_j^k}{ \partial \beta_j}\&=-(\widehat{y}^k_j-y^k_j) F (\beta_j-\theta_j) \&=-(\widehat{y}^k_j-y^k_j) F (\beta_j-\ Theta_j) (1-f (\beta_j-\theta_j)) \&=-(\widehat{y}^k_j-y^k_j) \widehat{y}^k_j (1-\widehat{y}^k_j) \tag{7}\end{ Aligned}\]

The formula (7) in the formula (3) and the formula (4), we get the BP algorithm about\ (\delta w_{ij}\)The update formula\[\delta W_{ij}=\eta g_jb_h\tag{8}\]Similar to be available,
\[\delta \theta_j=-\eta g_j\tag{9}\]\[\delta V_{ih}=\eta e_hx_i\tag{10}\]\[\delta \gamma_h=-\eta e_h\tag{11}\]
Wherein, formula (10) and formula (11)
\[\begin{aligned}e_h&=-\frac{\partial e_k}{\partial b_h}.\frac{\partial b_h}{\partial \alpha_h}\\&=-\ Sum_{j=1}^l\frac{\partial e_k}{\partial \beta_j}.\frac{\partial \beta_j}{\partial b_h}f ' (\alpha_h-\gamma_h) \\& =\sum_{j=1}^lw_{hi}g_jf ' (\alpha_h-\gamma_h) \\&=b_h (1-b_h) \sum_{j=1}^lw_{hj}g_j\end{aligned}\tag{12}\]
?? At this point, the derivation of the error inverse propagation algorithm has been completed, we can look back to see why this algorithm is called error inverse propagation algorithm? Error inverse propagation, as the name implies is to let the error along the neural network reverse propagation, according to the above deduction,
\ (\delta w_{ij}=\eta (y^k_j-\widehat{y}^k_j). \frac{\partial \widehat{y}_j^k}{\partial \beta_j}.b_h=\eta g_jb_h\ )which\ ((y^k_j-\widehat{y}^k_j) \)is the output error,\ (\frac{\partial \widehat{y}_j^k}{\partial \beta_j}\)
is the output of the output layer node\ (y\)For input\ (\beta\)Can be regarded as the adjustment factor of the error, we call\ (g_j\)"Adjusted error";\ (\delta v_{ih}=\eta e_hx_i\),\ (E_h=b_h (1-b_h) \sum_{j=1}^lw_{hj}g_j=\frac{\partial b_h}{\partial \alpha_h}\sum_{j=1}^lw_{hj}g_j\)So\ (e_h\)Can be seen as "adjusted error"\ (g_j\)Through the neural network after the adjustment of the error, and we can see: the weight of the adjustment amount = Learning Rate x adjusted error x upper node output, is the error reverse propagation method on the surface of the popular understanding, to help memory. <\font>

bp algorithm derived from neural network error inverse propagation algorithm

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.