A well-defined BP neural network explains, likes

Source: Internet
Author: User

The

Learning is one of the most important and compelling features of neural networks. In the development process of neural network, the study of learning algorithm has a very important position. At present, the neural network model proposed by people is corresponding to the learning algorithm. So, sometimes people don't ask for a strict definition or distinction between models and algorithms. Some models can have multiple algorithms. And some algorithms may be available for many models. However, sometimes people also call algorithms as models.

Since the learning rule proposed by Hebb in the 40 's, people have put forward various learning algorithms. Among them, the error reverse propagation method, i.e. the BP (Error backpropagation) method, which was put forward by Rumelhart in 1986, is the most widely affected. Until today, BP algorithm is still the most important and most effective algorithm in automatic control.

1. 2. 1 learning mechanisms and mechanisms of neural networks

In the neural network, the model samples provided by the external environment are trained and can store this pattern, then it is called the perceptron, and the external environment has adaptability, can automatically extract the external environment change characteristics, it is called the cognitive device.

The neural network in the study, generally divides into has the teacher and does not have the teacher to study two kinds. The Perceptron uses the teacher's signal to learn, while the cognitive device learns without the teacher's signal. In the main neural networks such as BP networks, Hopfield networks, art networks and Kohonen networks, BP networks and Hopfield networks need teachers ' signals to learn, while art networks and Kohonen networks can learn without a teacher's signal. The so-called teacher signal is a pattern sample signal provided by the outside in the neural network learning.

First, the learning structure of perceptual device

The learning of Perceptron is the most typical learning of neural networks.

At present, the application of the control is multilayer Feedforward network, which is a perceptron model, the learning algorithm is BP method, so there is a teacher learning algorithm.

A learning system with a teacher can be represented in Figure 1-7. This learning system is divided into three parts: input, training and output.

Fig. 1-7 block diagram of neural network learning System

The input part receives the external input sample x, which is adjusted by the training department for the network weight coefficient w, then outputs the result from the output unit. In this process, the desired output signal can be used as a teacher signal input, and the teacher's signal is compared with the actual output, resulting in the error to control the modified weight coefficient w.

The learning organization can be represented by the structure shown in Figure 1-8.

In the figure, Xl, X2, ..., Xn, is the input sample signal, W1, W2, ..., Wn is the weight factor. The input sample signal XI can take the discrete value "0" or "1". The input sample signal passes the weight coefficient function, in u produces the output result ∑wi Xi, namely has:

U=∑wi Xi =w1 X1 +w2 X2 +...+wn Xn

Then the desired output signal y (t) and u are compared, resulting in an error signal E. That is, the weight adjustment mechanism according to the error E to the learning system to modify the weight coefficient, the change direction should make the error e smaller, continuous, so that the error E is zero, then the actual output value U and expected output value Y (t) exactly the same, the learning process is over.

Neural network learning generally requires repeated training, so that the error value gradually approaching 0, and finally reached 0. This will make the output consistent with the expectations. Therefore, the study of neural networks consumes a certain period of time, and some of the learning process should be repeated many times, even to the sub-million. The reason is that the weighted coefficients of the neural network W have many components W1, W2,----Wn, which is a multi-parameter modification system. The adjustment of the system parameters must be time consuming. At present, it is very important to improve the learning speed of neural network and reduce the repetition of learning, which is also the key problem in real-time control.

Second, the learning algorithm of perceptual device

Perceptron is a neural network with a single-layer computing unit, which consists of linear elements and threshold components. As shown in sensor 1-9.

Figure 1-9 Perceptron Architecture

The mathematical model of the Perceptron:

(1-12)

of which: f[.] is a step function, and has

(1-13)

Θ is the valve value.

The biggest function of the perceptron is to classify the input sample, so it can be used as a classifier, and the input signal is classified as follows:

(1-14)

That is, when the output of the Perceptron is 1 o'clock, the input sample is called Class A; when the output is 1, the input sample is called Class B. From the above, the classification boundary of the Perceptron is:

(1-15)

When the input sample has only two component x1,x2, there are categorical boundary conditions:

(1-16)

That

W1 X1 +w2 X2-θ=0 (1-17)

can also be written

(1-18)

The classification at this time is as shown in solid 1-10.

The learning algorithm of Perceptron is aimed at finding the proper weight coefficient w= (W1). W2,...,WN), so that the system to a specific sample x= (XT,X2,...,XN) bear produces expected d. When x is classified as Class A, the expected value is d=1;x to Class B, d=-1. To facilitate the description of the Perceptron learning algorithm, the threshold is θ and the human rights coefficient w, and the sample x also adds a component xn+1. So:

Wn+1 =-θ,xn+1 = 1 (1-19)

The output of the perceptron can be expressed as:

(1-20)

The Perceptron learning algorithm steps are as follows:
1. The right coefficient w set initial value
The right coefficient w= (W1). W2, ..., Wn, wn+1) each component has a smaller 0 random value, but wn+1 =
-G. and recorded as WL (0), W2 (0), ..., Wn (0), and wn+1 (0) =-θ. Here WI (t) for T moment from First I
The weight coefficient on the input, i=1,2,...,n. Wn+1 (t) is the threshold value at t moment.

Figure 1-10 Classification examples of Perceptron

2. Enter the same as Ben X= (X1, X2, ..., xn+1) and its expected output d.

It is expected that the output value D is different when the class genus of the sample is different. If X is Class A, take d=1 and if X is Class B, take-1. Expected output D is also the teacher's signal.

3. Calculate actual output value Y

4. Error e based on actual output

E=d-y (t) (1-21)

5. Using error E to modify weight coefficient

i=1,2,..., n,n+1 (1-22)

Wherein, η is called the weight change rate, 0<η≤1

In the formula (1-22), the value of η can not be too large. If the value of 1 is too large it will affect the stability of WI (T), the value can not be too small, too small will make WI (t) process convergence speed is too slow.

When the actual output and expected d are the same, there are:

Wi (t+1) =wi (t)

6. Go to 2nd, until all samples are stable.

From the above formula (1-14), the perceptron is essentially a classifier, and its classification is corresponding to the two value logic. Therefore, the perceptron can be used to implement logical functions. The following is a description of the implementation of the logic functions of the perceptron.

Example: using Perceptron to implement logic function X1 VX2 truth:

X1 0011
X2 0101
X1 V X2 0111

With x1vx2=1 as Class A and x1vx2=0 as Class B, there is a equations

(1-23)

That is:
(1-24)

The formula (1-24) is:

w1≥θ,w2≥θ

Make W1 =1,w2 =2

Then there are: θ≤1

Take θ=0.5

Then there are: x1+x2-0.5=0, as shown in classification case 1-11.

1. 2. 2 gradient algorithm for neural network learning

From the sense of learning algorithm, the purpose of learning is to modify the network weight coefficient, so that the network for the input pattern samples can be correctly categorized. When the end of learning, that is, the neural network can be correctly classified, the weight coefficient is clearly reflected in the same pattern of the common characteristics of the model samples. In other words, the weight coefficient is the stored mode of the loser. Because the weight coefficient is dispersed, the neural network naturally has the characteristics of distributed storage. The transfer function of the perceptron in front of

is a step function, so it can be used as a classifier. The Perceptron learning algorithm described in the previous section has limitations due to the simplicity of its transfer function. The

Perceptron learning algorithm is fairly simple, and is guaranteed to converge when the function is linearly separable. However, it also has the problem that the function is not linearly separable, then the result cannot be obtained, and it cannot be generalized to the general Feedforward network.

To overcome the problems, another algorithm, the gradient algorithm (also known as LMS), is proposed.

In order to implement the gradient algorithm, the excitation function of the neuron is changed to a differentiable function, such as the sigmoid function, the asymmetric sigmoid function is f (x) =1/(1+e-x ), the symmetric sigmoid function f (x) = (1-e-x ) /(1+e-x ), without the step function of the formula (1-13).

For a given sample set xi  (I=1,2,,n), the gradient method is designed to find the weight coefficient w* , which makes f[w*. xi ] as close as possible to the desired output Yi.

Set error E in the following expression:

(1-25)

Among them, Yi =f(w* Xi] is a real-time output corresponding to the I-sample XI

Yi is the desired output corresponding to the first sample XI.

To minimize the error e, the gradient of E can be obtained first:

(1-26)

which

(1-27)

For the Uk =w Xk, there are:

(1-28)

That is:

(1-29)

Finally, the modification rules of the weight coefficient w are modified by negative gradient direction:

(1-30)

Can also be written as:

(1-31)

in the upper formula (1-30), the formula (1-31),μ  is the weight change rate, it varies according to the circumstances of the value, generally take a decimal between 0-1.
Obviously, the gradient method is a big step ahead of the learning algorithm of the original perceptron. The key is two points:

1. The transfer function of neurons takes a continuous s-type function instead of a step function;

2. The modification of weight coefficient is controlled by the gradient of error, not by error. Therefore, there are better dynamic special energy, that is, the convergence process is strengthened.

But the gradient method is still too slow for practical learning, so the algorithm is still not ideal.

1. 2. 3 BP algorithm for reverse propagation learning

The inverse propagation algorithm is also called the BP algorithm. Since this algorithm is essentially a mathematical model of neural network learning, it is sometimes called a BP model. The

BP algorithm is proposed to solve the weight coefficient optimization of multilayer feedforward neural networks; Therefore, the BP algorithm usually implies that the topological structure of neural network is a non-feedback multilayer feedforward network. therefore Sometimes also called the non-feedback multilayer Feedforward network is the BP model.

Here, does not require too strict to argue and distinguish between the algorithm and the model of the similarities and differences. Perceptual machine learning algorithm is a single-layer network learning algorithm. In a multilayer network. It can only change the final weight factor. Therefore, perceptual machine learning algorithm can not be used for multi-layer neural network learning. In 1986, Rumelhart proposed a reverse propagation learning algorithm, BP (backpropagation) algorithm. This algorithm can modify the weight coefficients of each layer in the network, so it is suitable for multi-layer network learning. BP algorithm is one of the most widely used neural network learning algorithms, and it is the most useful learning algorithm in automatic control.

The principle of the BP algorithm

BP algorithm is a learning algorithm for Feedforward Multilayer network, and the structure of Feedforward multilayer network is generally 1-12.

graph 1-12  Network learning structure

It contains a layer of transmission, an output layer, and an intermediate layer between the input and output layers. The middle layer has single or multilayer, because they have no direct contact with the outside world, so it is also called hidden layer. Neurons in the hidden layer are also called hidden units. Although the hidden layer is not connected with the outside world. However, their state affects the relationship between the input and output. It is also said that changing the weight coefficient of the hidden layer can change the performance of the whole multilayer neural network. The

has an M-layer of neural network and a sample x on the input layer, and the input sum of the I neurons in the K-layer is represented as ui k , and the output xi k ; from K- The wij  of the 1-layer J neuron to the first I neuron of the K-layer is the excitation function f for each neuron, and the relationship of each variable can be expressed in the following mathematical notation:

xi k =f (ui k )    (1-32)
(1-33)

1. A sample of the input of a forward propagation

is processed from the input layer through a layer of hidden elements, passing through all the hidden layers, then to the output layer, and the state of each layer of neurons is affected only by the state of the next layer of neurons in the process of layer by row. The output layer compares the current output to the desired output, and if the current output does not equal the desired output, it enters the reverse propagation process.

2. When the reverse propagation

is transmitted in reverse, the error signal is transmitted back to the original forward propagating path, and the weights of each neuron in each hidden layer are modified to minimize the error signal.

Second, the mathematical expression of the BP algorithm

The BP algorithm is essentially the minimum value of the error function. This algorithm uses the steepest descent method in nonlinear programming, and modifies the weight coefficients according to the negative gradient direction of the error function.

To illustrate the BP algorithm, first define the error function E. The squared sum of the difference between the expected output and the actual output is the error function:

(1-34)

Where: Yi is the expected value of the output unit, it is also used here as a teacher signal;

The Xi m is the actual output, since the first m layer is the output layer.

Because the BP algorithm modifies the weight coefficient according to the negative gradient direction of the error function E, the modified amount of the weight coefficient wij awij, and E

(1-35)
can also be written
(1-36)

Wherein: η is the learning rate, that is, the step length.

Obviously, according to the principle of BP algorithm, Ae/awij is the most critical. Below Beg Ae/awij;

tr>
(1-37)
because &NBSP;
(1-38)
so &NBSP;
(1-39)
thus having  
(1-40)
&NBSP;
(1-41)
There is a learning formula: &NBSP;
(1-42)

Wherein: η is the learning rate, that is, the step size, generally take 0-1 of the number.

From the above, Di K actually still gives the obvious algorithm formula, the following is the calculation formula of Di K.

(1-43)

The formula (1-32) indicates that in the formula (1-43), there is

(1-44)

To facilitate the derivation, the f is a continuous function. Generally take nonlinear continuous functions, such as sigmoid functions. When taking F as an asymmetric sigmoid function, there are:

Then there is: F ' (UI K) =f ' (UI K) (1-f (UI K))
=xi K (1-xi k) (1-45)

In the second consideration (1-43) of the partial differential item Ae/axi K, there are two situations to consider:

If the k=m is the output layer, then Yi is the output expectation, it is a constant. The formula (1-34) has

(1-46)
Thereby there is a di m =xi m (1-xi m) (Xi m-yi) (1-47)
2. If k<m, then the layer is a hidden layer. At this time should consider the last layer of its role, it is:
(1-48)
From the formula (1-41), it is known that:
(1-49)
From the formula (1-33), it is known that:
(1-50)
Therefore there are
(1-51)
Finally there are:
(1-52)

From the above process, we know that the training method of multilayer network is to add a sample to the input layer and according to the rules of forward propagation:

Xi k =f (Ui k)

is passed to the output layer one layer at a level, and finally output XI M can be obtained at the output layer.

Compare the Xim with the expected output Yi. If the two are not equal, the error signal E is generated, followed by the following formula to propagate the modified weight coefficient in reverse:

(1-53)

which

Di m =xi m (1-xi m) (Xi m-yi)

In the above formula, to get the di K in this layer, the Di k+1 of the upper layer is used, and the error function is obtained from the output layer and the reverse propagation process of the input layer. In this process, the recursive error is continuously obtained.

Through the repeated training of multiple samples, the weight coefficients are corrected in the direction of decreasing the error to achieve the ultimate elimination error. From the above formula can also know that if the network layer is more, the computational amount used is quite considerable, and therefore the convergence speed is not fast.

In order to speed up the convergence rate, we generally consider the last weight coefficient, and use it as one of the basis of this amendment, therefore, there are correction formulas:

(1-54)

Wherein: η for learning rate, that is, step, η=0. 1-0. Around 4

Amended is a weight coefficient correction constant, taking 0. 7-0. About 9.

On top, the formula (1-53) is also known as the generalized Delta rule. For neural networks without hidden layers, it is advisable to

(1-55)

Where:,yi  is expected output;

xj  is the actual output of the output layer;

xi  is input to the input layer.

This is obviously a very simple case, the formula (1-55) is also called the simple Delta rule.

In practical applications, only the generalized Delta Rule formula (1-53) or formula (1-54) is meaningful. The Simple Delta formula (1-55) is only useful in theoretical deduction.

Three, the execution steps of the BP algorithm

When the inverse propagation algorithm is applied to the Feedforward multilayer network, using sigmoid as the excitation surface number, the following steps can be used to wij  the weight coefficient of the network recursively. Note that there is a i=1,2,...,n;j=1,2,...,n when there are n neurons in each layer. For the first neuron of the K-layer, there is n weight coefficient wi1 ,wi2 ,...,win , and a further win+1  is used to denote the threshold θi ; and when the sample x is entered, the x= (x1 ,x2  ,...,xn ,1). The procedure for executing the

algorithm is as follows:

1. The initial value of the weighted coefficient wij  is placed. The

wij  a smaller non-0 random number for the weight coefficients of each layer, but wi ,n+1=-θ .

2. Enter a sample x= (xl ,x2 ,...,xn ,1) and correspond to the desired output y= (y1 ,y2 ,...,yn ).

3. Computes the output of the various layers

For the xi k  of the first neuron of the K-level, with:

xi k =f (ui k )

4. Learning error for each layer di K

For the output layer has k=m, there is

Di m =xi m (1-xi m) (Xi m-yi)

For the other layers, there are

5. Correction weight coefficient wij and threshold θ
The formula (1-53) is:

The formula (1-54) is:

which

6. When the weight coefficients of each layer are obtained, it is possible to determine whether the requirements are satisfied according to the given quality index. If the requirement is met, the algorithm ends, or (3) executes if the requirement is not met.

This learning process, for any given sample of XP = (XP1, XP2, ... XPN, 1) and expected output YP = (YP1, YP2, ..., YPN) are executed until all input and output requirements are met.

From:http://www.cnblogs.com/wengzilin/archive/2013/04/24/3041019.html

A well-defined BP neural network explains, likes

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.