? The calculation process of BP algorithm

Source: Internet
Author: User

The calculation process of BP algorithm

When adding the $k$ input, the input weights of the hidden layer $h$ nodes are:

\[s_h^k = \sum\limits_i {w_{ih} x_i^k}\]

Output of the corresponding point:

\[y_h^k = f (s_h^k) = f (\sum\limits_i {w_{ih} x_i^k}) \]

Similarly, the input weights of the output layer $j$ nodes are:

\[s_j^k = \sum\limits_h {w_{hj} y_h^k} = \sum\limits_h {w_{hj} F (\sum\limits_i {w_{ih} x_i^k})}\]

Output of the corresponding point:

\[y_j^k = f (s_j^k) = f (\sum\limits_h {W_{HJ} y_h^k}) = F[\sum\limits_h {w_{hj} F (\sum\limits_i {w_{ih} x_i^k})}]\]

Here, the thresholds for each node are equivalent to a connected weighted $\theta=w_{oh}\text{or} w_{oj}$, which are connected from each node to a bias node with a fixed value of 1, the connection weighting is also adjustable, as with other weights participate in the adjustment process.

The error function is:

\[e (W) = \frac{1}{2}\sum\limits_{k,j} {(t_j^k-y_j^k) ^2} = \frac{1}{2}\sum\limits_{k,j} {\{T_j^k-f[\sum\limits_h {w_ {HJ} F (\sum\limits_i {w_{ih} x_i^k})}]\} ^2}\]

In order to minimize the error function, the weighted value is obtained by the gradient descent method, and the weights are corrected first from the output layer, and then the front layer weights are corrected in turn, thus containing the meaning of the inverse transmission.

According to the gradient descent method, the weighted adjustment amount of the connection from the concealed layer to the output layer is:

\[\delta W_{hj} =-\eta \frac{{\partial e}}{{\partial w_{hj}} = \eta \sum\limits_k {(t_j^k-y_j^k) F ' (s_j^k) y_h^k =} \eta \sum\limits_k {\delta _j^k y_h^k}\]

Where $\delta_j^k$ is the error signal of the output junction:

\[\delta _j^k = f ' (s_j^k) (t_j^k-y_j^k) = f ' (s_j^k) \delta _j^k \quad\quad (1) \]

\[\delta _j^k = t_j^k-y_j^k\]

For the weighted modifier $\delta w_{ij}$ of the input layer to the Concealed Layer node connection, the $e (w) $ must be considered to be derivative of $w_{ih}$, so the layered link method is used:

\[\begin{array}{l}\delta W_{ih} =-\eta \frac{{\partial e}}{{\partial W_{ih}} =-\eta \sum\limits_k {\frac{{\partial E }}{{\partial Y_h^k}}} \cdot \frac{{\partial y_h^k}}{{\partial w_{ih}} = \eta \sum\limits_{k,j} {\{(t_j^k-y_j^k) F ' ( s_j^k) W_{hj} \cdot F ' (s_h^k) x_i^k} \} \ \quad \quad = \eta \sum\limits_{k,j} {\delta _j^k w_{hj} F ' (s_h^k) x_i^k} = \ Eta \sum\limits_k {\delta _h^k x_i^k} \ \ \end{array}\]

which

\[\delta _h^k = f ' (s_h^k) \sum\limits_j {w_{hj} \delta _j^k} = f ' (s_h^k) \delta _h^k \quad\quad (2) \]

\[\delta _h^k = \sum\limits_j {w_{hj} \delta _j^k}\]

It can be seen that the formula (1) and (2) have the same form, the difference is the definition of the error value, so it is possible to define the general form of the weighted correction of the BP algorithm to any layer:

\[\delta W_{PQ} = \eta \sum\limits_{vector\_no\_p} {\delta _o y_{in}}\]

If all weights are adjusted once for each training session, it can be written as:

\[\delta W_{PQ} = \eta \delta _o y_{in}\]

where subscript o and in refers to the output endpoint of the associated connection and the input endpoint, $y _{in}$ represents the input endpoint of the actual input, $\delta_o$ represents the output end of the error, the specific meaning is determined by the specific layer, for the output layer by the formula (1) Given, the hidden layer is given by the formula (2).

The output layer $\delta _j^k = t_j^k-y_j^k $ can be calculated directly, so the error value $\delta_j^k$ is easily obtained. For the previous hidden layer is not directly given the target value, can not directly calculate $\delta _h^k $, and the output layer to use the $ \delta _j^k $ to calculate:

\[\delta _h^k = \sum\limits_j {w_{hj} \delta _j^k}\]

Therefore, after calculating the $\delta _h^k $, the $ \delta _h^k $ is also calculated.

If there is a hidden layer in front, with $ \delta _h^k $ and then the above method to calculate $\delta _1^k $ and $\delta _1^k $, and so on, and so on, has been the output error $\delta$ one layer to the first hidden layer. After the $\delta$ of each layer is obtained, the weighted adjustment amount of each layer can be obtained according to the above formula. Since the error $ \delta _j^k $ corresponds to the output to the input in reverse, so this training algorithm becomes the error back-propagation algorithm (BP algorithm).

The implementation steps of BP training algorithm

Preparation: Training data sets. The network has a $m$ layer, $y _j^m$ represents the output of the $j$ node in section $m$, $y _^0$ (0-layer output) equals $x_j$, which is the $j$ input. $w _{ij}^m$ represents a weighted connection from $y_i^{m-1}$ to $y_j^m$. Here, $m $ represents the layer number, not the class number of the vector.

1. Randomly place each weighting to a small random number. An evenly distributed random number can be used to ensure that the network is not saturated with a large weighted value.

2. From the Training Data group, select a data pair $x^k,t^k$, add the input vector to the input layer $ (m=0) $, make the $i$ for all endpoints: $y _i^0=x_i^k$, $k $ for the vector class number

3. The signal is transmitted forward through the network, i.e. using a relational formula:

\[y_j^m=f (s_j^m) =f (\sum_iw_{ij}^my_i^{m-1}) \]

Calculates the output $y_j^m$ of each node $i$ in each layer starting at the first layer until the output of each node of the output layer is computed.

4. Calculate the error value of each node of the output layer (using the formula (1))

\[\delta_j^m=f ' (s_j^m) (t_j^k-y_i^m) =y_j^m (1-y_j^m), (T_j^k-y_j^m to \text{function}) \]

It is obtained by the difference between the actual output and the desired target value.

5. Calculate the error value of the previous layer nodes (using the formula (2))

\[\delta_j^{m-1}=f ' (s_j^{m-1}\sum_iw_{ji}\delta_i^m) \]

The inverse error is calculated by layer, until the error value of each node in each layer is computed.

6. Using weighted correction formula

\[
\delta w_{ij}^m = \eta \delta _j^m y_i^{m-1}
\]

and relationships

\[
W_{ij}^{new} = w_{ij}^{old} + \delta W_{ij}
\]

Fixed all connection rights. General $\eta=0.01--1$, called the training rate coefficient.

7. Return to step 2nd and repeat the above steps for the next input vector until the network converges.

? The calculation process of BP algorithm

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.