As mentioned above, "The basic BP algorithm" preference "after the sample, so that the subsequent sample on the network more significant"
This article will record how to eliminate this effect
Used (x1,y1), (x2,y2),.... The total effect of (Xs,ys) is lost to the w^ (1), w^ (2),... w^ (L)
w^ (k) ij=∑ pw^ (k) IJ
Just replaced the original simple modifier weight matrix that part
The specific algorithm flow is as follows:
1 for k-1 to L do
1.1 Initialization of w^ (k)
2 initialization Precision control parameter ε
3 e=ε+1
4 while E>εdo
4.1 E=0
4.2 对所有的i,j,k : △W^(K)ij=0;
4.3 对S中的每一个样本 (Xp,Yp):
4.3.1计算出Xp对应的实际输出Op
4.3.2计算出Ep
4.3.3E=E+Ep
4.3.4对所有i,j 根据相应式子计算△pW^(L)ij
4.3.5对所有i,j: △W^(L)ij=△W^(L)ij+△pW^(L)ij
4.3.6 k=L- 1
4.3.7 while k!=0 do
4.3.7.1对所有 i,j根据相应式子计算△pW^(k)ij
4.3.7.2对所有i,j :△W^(k)ij=△W^(k)ij+△pW^(k)ij
4.3.7.3k=k-1
4.4对所有i,j,k : W^(k)ij=W^(k)ij=△W^ (k)ij
4.5E=E/2
This is one of the steepest descent methods. This method can be used to solve the problem of precision caused by the order of samples and the jitter of training, but the speed of convergence is comparatively slow.
Here are a few questions worth discussing.
Convergence speed problem, how to speed up the speed of convergence?
Local minima problem, how to escape or avoid the local minimum point?
Network paralysis problem, contingency, but the activation function of the Guide function value is very small, then the step will become very small, and then the speed of training will be reduced, the end will lead to network stop convergence, how is good?
Stability problems, if the network is experiencing a continuous change in the environment, the network will become ineffective, how to solve?
Step problem, if the step is too small, convergence speed is slow, if the step is too large, may cause network paralysis or instability, do?
The next chapter continues ^_^