Preparation: Training data sets. The network has a $m$ layer, $y _j^m$ represents the output of the $j$ node in section $m$, $y _^0$ (0-layer output) equals $x_j$, which is the $j$ input. $w _{ij}^m$ represents a connection weighting from $y_i^{m-1}$ to $y_j^m$ . Here, $m $ represents the layer number, not the class number of the vector.
1. Randomly place each weighting to a small random number. An evenly distributed random number can be used to ensure that the network is not saturated with a large weighted value.
2. From the Training Data group, select a data pair $x^k,t^k$, add the input vector to the input layer $ (m=0) $, make the $i$ for all endpoints: $y _i^0=x_i^k$, $k $ for the vector class number
3. The signal is transmitted forward through the network, i.e. using a relational formula:
\[y_j^m=f (s_j^m) =f (\sum_iw_{ij}^my_i^{m-1}) \]
Calculates the output $y_j^m$ of each node $i$ in each layer starting at the first layer until the output of each node of the output layer is computed.
4. Calculate the error value of each node of the output layer (using the formula (1))
\[\delta_j^m=f ' (s_j^m) (t_j^k-y_i^m) =y_j^m (1-y_j^m), (T_j^k-y_j^m to \text{function}) \]
It is obtained by the difference between the actual output and the desired target value.
5. Calculate the error value of the previous layer nodes (using the formula (2))
\[\delta_j^{m-1}=f ' (s_j^{m-1}\sum_iw_{ji}\delta_i^m) \]
The inverse error is calculated by layer, until the error value of each node in each layer is computed.
6. Using weighted correction formula
\[
\delta w_{ij}^m = \eta \delta _j^m y_i^{m-1}
\]
and relationships
\[
W_{ij}^{new} = w_{ij}^{old} + \delta W_{ij}
\]
Fixed all connection rights. General $\eta=0.01--1$, called the training rate coefficient.
7. Return to step 2nd and repeat the above steps for the next input vector until the network converges.
The implementation steps of BP training algorithm