Artificial neural Network (Artificial Neural Network, Ann) is a hotspot in the field of artificial intelligence since the 1980s. It is also the basis of various neural network models at present. This paper mainly studies the BPNN model. What is a neural network.

A neural network is an operational model, which is composed of a large number of nodes (or neurons) connected to each other. Each node represents a particular output function, called an excitation function (activation function). The connection between each two nodes represents a weighted value for the connection signal, which is equivalent to the memory of the artificial neural network. The output of the network depends on the connection mode of the network, the weight value and the excitation function are different. The network itself is usually an approximation of some kind of algorithm or function in nature, and it may be the expression of a logical strategy.

At present, there are dozens of kinds of neural network models: Forward type, feedback type, random type and competitive type.

The neural network can classify the following four types: forward type

Feedforward Neural network refers to the hierarchical arrangement of neurons, consisting of input layer, hidden layer and output layer, in which the hidden layer may have multiple layers. The neurons in each layer of the neural network only receive input from the previous layer of neurons, and the back layer has no signal feedback to the front layer. Each layer converts the input data to a certain extent, and then outputs the output as the input to the next level until the final output results.

Among them, the widely used BPNN (back propagation neural network, post propagation neural network) belong to this type of network. Feedback type

The feedback network is also called the regression network, the input signal determines the initial state of the feedback system, the system gradually converges to the equilibrium state after a series of state transitions, therefore, the stability is one of the most important indexes of the feedback network, and the Hopfield neural network is typical.

The application of Hopfield neural network in the analysis of nonlinear dynamics problems has been successfully applied in associative memory and optimization calculation. Random type

The stochastic simulated annealing (SA) algorithm solves the problem of the local minima in the optimization calculation, and has been applied successfully in the learning and optimization of neural networks. Competitive type

Self-organizing neural network is a teacher-free learning Network, it simulates the human brain behavior, according to the past experience automatically adapt to unpredictable environmental changes, because unsupervised, this kind of network usually uses the competition principle to carry on the network study, automatically gathers the class. Currently widely used in automatic control, fault diagnosis and other types of pattern recognition.

The BPNN of learning here is one of the forward-type neural networks. Neurons

To describe the neural network, we start with the simplest neural network, which is composed of only one "neuron", and here is the "neuron" diagram:

This neuron is made up of x1,x2,x3 and a bias B = +1 as input, W1,W2,W3 is their weight, the input node after the activation function f, get output. Where functions are called "Activation functions."

Here, we use the sigmoid function as the activation function f (x):

Its function image is shown below:

It takes a range of [0, 1]. So, for a neuron, the whole process is to enter data into the neuron, then activate the function, make some kind of conversion to the data, and finally get an output. Neural network

The so-called neural network is a number of single "neurons" together, so that the output of a "neuron" can be another "neuron" input. For example, the following figure is a simple neural network:

Figure 1

The leftmost layer is the input layer, the middle is the hidden layer, the right side is the output layer. Both the input layer and the hidden layer have 3 nodes, each representing one neuron. The input layer has 3 input nodes, x1, x2, X3, and a biased node (a circle labeled + 1). There is also a weight matrix w for each layer and the next layer.

For such a simple neural network, our whole process is to combine the input x with the weight matrix W, input the hidden layer (Layer L2) in WX + B, and the process of activating function f (x) to get the output A1, A2, A3, and then with the corresponding weights, offsets, As input of the output layer (Layer L3), the final output result is obtained by activating the function.

Well, after learning the network structure of some basic neural networks and the computational process. Let's introduce the BPNN model of this study.

Then what is BPNN? Understanding and principle of BPNN algorithm

BPNN full name back propagation neural network, after the transmission of neural network. The previous article has introduced, he also belongs to the Feedforward type neural network one kind. The BP Neural Network (propagation) adds a back-propagation algorithm to the structure of the Feedforward network. What difference does it have with the Feedforward neural network?

We know that feedforward neural networks, like water, flow from the source of water to the end, without going backwards. Feedforward is the meaning of the signal forward transmission. The feedforward performance of the BP network begins with the input signal from the input layer (input layer does not participate in the calculation), the neurons in each layer compute the output of each neuron in the layer and pass it down to the output layer to compute the output of the network, Feedforward is used to compute the output of the network, and the parameters of the network are not adjusted.

The latter is used for the adjustment of network weights and thresholds during training, which needs to be supervised and studied. When your network is not trained well, the output is certainly not the same as you think, then we will get a deviation, and the deviation of the first level forward, layered get δ (i), this is feedback.

Feedback is used for partial derivative, the partial derivative is used for gradient descent, gradient descent is to obtain the minimum cost function, so that the error between the expected and output as much as possible to reduce. The BPNN algorithm flow

Here we introduce the algorithm flow. As shown in the following illustration:

Can be roughly divided into five steps:

1. Initialize network weights and offsets. We know that the weights of connections between different neurons (network weights) are not the same. This is the result of the training. So in the initialization phase, we give each network connection weights a very small random number (generally -1.0~1.0 or -0.5~0.5), and each neuron has a bias (the bias can be seen as the weight of each neuron) and is initialized to a random number.

2. Forward propagation. Enter a training sample and then compute the output of each neuron. Each neuron is computed in the same way as the linear combination of its input. We take figure 1 in the Neural network section of this article as an example:

We use Wlij to represent the weights between the first I node of the L layer and the first J node of the l+1 layer, in which we will be the L-layer as L, in this figure L1 and L2 layer between the weight is W1ij, L2 and L3 layer between the weight is w2ij.

We use Bli to represent the offset of the first node of the l+1 layer.

We use SLJ to represent the input values of the l+1 layer, J nodes. When L=1, s1j=∑mi=1w1ij⋅xi+b1j.

We Theta (SLJ) represents the output value of the L+1 layer J node after the activation function theta (x).

Then we can get the following formula:

s (1) 1s (1) 2s (1) 3hw,b (x) =w (1) 11x1+w (1) 21x2+w (1) 31x3+b (1) 1=w (1) 12x1+w (1) 22x2+w (1) 32x3+b (1) 2=w (1) 13x1+w (1) 23x2+w (1 ) 33x3+b (1) 3=θ (S (2) 1) =θ (W (2) 11θ (S (1) 1) +w (2) 21θ (S (1) 2) +w (2) 31θ (S (1) 3) +b (2) 1)

This completes one training, and we get the output hw,b (x).

3. Error calculation and reverse transmission. This is also the learning process of this algorithm. So what are we going to learn? Of course, we hope that the output of our algorithm and our true value of the greatest degree of consistency. When our output value is inconsistent with the real value, we will produce an error, the smaller the error, the better the result of our algorithm prediction. So what's affecting the output. Obviously, the input data is known and the variables only have those connection weights, so how do these connection weights affect the output?

We assume that the connection weights of the input layer I nodes to the first J nodes of the hidden layer have a very small change δwij, then this δwij will have an impact on SJ, resulting in SJ also a change δsj, and then produce δθ (SJ), and then spread to each output layer, and finally in all the output layer will produce an error δ E. So, the weight adjustment will make the output change, then how to make these output in the correct direction. This is the next task: How to adjust the weight.

For a given sample, we know the correct output and the output of the neural network, both will produce an error, obviously the smaller the error, the better the network effect. How does that make the error small? In general, we measure the error size by minimizing the mean square root difference, the formula is as follows:

L (e) =12sse=12∑kj=1e2j=12∑kj=1 (YJ¯¯¯−YJ) 2

With the error, how do we minimize it. The use of gradient descent method. That is, the weight of each sample changes to its negative gradient direction. Also is to ask for the error L for the gradient of the weight w.

For the weight of the input layer to the hidden layer, we have

∂l∂w1ij=∂l∂s1j∗∂s1j∂w1ij (1)

Because

S1J=∑MI=1W1IJ⋅XI+B1J (2)

So

∂s1j∂w1ij=xi (3)

To be brought in (1):

∂l∂w1ij=∂l∂s1j⋅xi (4)

Then we need to ask for ∂l∂s1j, because all the s1j have an effect on s2i:

s2i=∑nj=1w2ji⋅θ (s1j) +b2i (5)

We can convert the ∂l∂s1j into:

∂L∂S1J=∑KI=1∂L∂S2I⋅∂S2I∂S1J (6)

And according to (5), we get:

∂s2i∂s1j=∂s2i∂θ (s1j) ⋅∂θ (s1j) ∂s1j=w2ji⋅θ ' (s1j) (7)

Brought in (6) can be obtained:

∂l∂s1j=θ ' (s1j) ∑ki=1∂l∂s2i⋅w2ji (8)

Now, we remember

Δli=∂l∂sli (9)

We have:

Δ1j=θ ' (s1j) ∑i=1δ2i⋅w2ji (10)

Because:

δ2i=∂l∂s2i=∂∑kj=012 (Yj¯¯¯−yj) 2∂s2i= (yi¯¯¯−yi) ⋅∂yi¯¯¯∂s2i=ei⋅∂yi¯¯¯∂s2i=ei⋅∂θ (s2i) ∂s2i=ei⋅θ′ (s2i)

In this step, we can see where the reverse propagation is, that is, ei. In reverse propagation, we can think of the output layer as input layer, then the EI of each output layer node here is actually our input, after activating function θ′ (s2i), obtains the output layer the reverse output δ2i. Then the δ2i is combined with the connection weights, and as the reverse input of the hidden layer, the reverse output δ1j of the hidden layer is obtained, namely:

Δ1j=θ ' (s1j) ∑i=1δ2i⋅w2ji

With the reverse output, we can calculate the weight gradient of the first layer:

∂l∂w1ij=δ1j⋅xi

And the weight gradient of the second layer

∂l∂w2ij=∂l∂s2j⋅∂s2j∂w2ij=δ2j⋅θ (s1i)

We can see a rule that each layer's weight gradient equals the input of the previous layer of the weight, multiplied by the backward output of the attached layer. For example, the input layer and the hidden layer of the link between the weight gradient is equal to the output layer of input XI times the hidden layer of the reverse output δ1j, that is, Δ1j⋅xi.

4. Network weights and Neural network element offset adjustment. With the weight gradient, we can easily update our weights.

Wlij=wlij−α∂l∂wlij

5. Conclusion of judgment. For each sample, we determine if the error is less than the threshold we set or the number of iterations has been reached. We end the training, or we go back to the second step and continue the training.

The training process for this BPNN is over. The trained neural network can be used. BPNN Advantages and Disadvantages

Advantages:

The nonlinear mapping of input and output is realized. In other words, we can use neural networks to approximate any nonlinear continuous function. This is very suitable for us to carry out multidimensional feature construction in data mining.

The gradient descent algorithm is used in the BP neural network. So we can optimize the parameters to reduce the error and get better results.

* * has a certain ability to generalize. **BP Neural network can train a network with less sample data, and this network can guarantee certain precision in a certain range. Moreover, the generalization ability of BP neural network is related to various parameters, and when the new data enters the network for training, the neural network can adjust the weights to fit more data.

Different transfer functions. Unlike other neuron models, the transfer function of the BP neuron model takes a differentiable monotone increment function, such as Sigmoid's Logsig, tansig function and linear function pureline. The characteristics of the last layer of the BP neural network determine the output of the whole network. When the last layer of neurons uses a sigmoid type of function, the output of the entire neuron is limited to a smaller range, and if the last layer of neurons uses a pureline function, the entire network output can be arbitrary.

Disadvantages:

The problem of local minimization. Because the BP neural network uses the gradient descent algorithm. It is well known that a gradient descent may produce a local minimum value. And what we need is the global minimum value. This feature causes our algorithm to fall into local extremum, and the weights converge to local minima, which leads to the failure of network training.

The convergence speed of BP neural network algorithm is slow. Because the BP neural network algorithm is essentially a gradient descent method, the objective function that it wants to optimize is very complex, therefore, it is inevitable that the "sawtooth phenomenon", which makes the BP algorithm inefficient, and because the optimization of the objective function is very complex, it will inevitably in the neuron output near 0 or 1 of the case, there are some flat areas, In these areas, the weight error changes very little, make the training process almost stop; in the BP neural network model, in order to make the network execute the BP algorithm, we can't use the traditional one-dimensional search method to find the step of each iteration, but we must give the network step updating rule beforehand, this method can also cause the algorithm inefficient. All these results in the slow convergence rate of BP neural network algorithm.

The design of the network structure. That is, the number of hidden layers and the choice of the number of nodes in each hidden layer, there is no theoretical guidance;

The contradictory problem of BP neural Network predictive ability and training ability. Predictive ability is also called generalization ability or extension ability, and training ability is also called approximation ability or learning ability. Under normal circumstances, poor training ability, predictive ability is also poor, and to some extent, with the training ability to improve, predictive ability will be improved. But this trend is not fixed, there is a limit, when reached this limit, with the training ability to improve, the predictive ability will decline, that is, the so-called "over fit" phenomenon. The reason for this phenomenon is that the network has learned too many samples of the details of the model has not been able to reflect the law of the sample, so we should grasp the good in combat. The application scene of BP neural network

1. Pattern recognition.

Pattern recognition usually refers to the processing and analysis of various forms of information (numeric, literal, and logical relationships) that characterize things or phenomena in order to determine which category they belong to.

2. Function approximation.

For functions that are too complex for expressions, or for functions that are difficult to express, we can use neural networks to approximate them indefinitely and simulate them.

3. Projections.

Use some of the known data to train the network, and then use the trained network to predict the new data. Compared to the traditional method, the neural network improves the accuracy and only needs less data.

4. Data compression.

Digital Image compression is actually a kind of image processing technology that represents the original pixel matrix with less bit number, which is actually to reduce the time redundancy, space redundancy, spectrum redundancy and so on in the image data. This is achieved by reducing one or more of the above redundant information to achieve more efficient storage and transmission of data. Image compression system In fact, no matter what specific architecture or technical methods, the basic process is consistent, mainly can be summarized as coding, quantization, decoding these three parts.

Theoretically, the problem of coding and decoding can be summed up as the problem of mapping and optimization, from the aspect of neural network, it realizes a non-linear mapping relation from input to output, and the criterion of performance can be judged from the efficiency of parallel processing and the suitability of fault-tolerant rate. and robustness. The basic principle of analyzing image compression is the same as that of the BP neural network. So we can use the BP neural network to solve the problem of image compression.

Participate in and refer to the source of the content:

http://www.cnblogs.com/chamie/p/5579884.html

Http://blog.csdn.net/dengjiexian123/article /details/22829509

Http://ufldl.stanford.edu/wiki/index.php/%E7%A5%9E%E7%BB%8F%E7%BD%91%E7%BB%9C

http:// blog.csdn.net/zhongkejingwang/article/details/44514073

https://www.zhihu.com/question/22553761

http:// blog.csdn.net/heyongluoyao8/article/details/48213345

http://blog.csdn.net/mysteryhaohao/article/details/ 51386235

http://www.docin.com/p-1625054975.html

http://blog.csdn.net/fengbingchun/article/details/ 50274471

http://www.jianshu.com/p/f129d1d73a1d