bp neural network in BP for back propagation shorthand, the earliest it was by Rumelhart, McCelland and other scientists in 1986, Rumelhart and in nature published a very famous article "Learning R Epresentations by back-propagating errors ". With the migration of the Times, the theory of BP neural network has been improved and updated, which has undoubtedly become one of the most widely used neural network models. Let's explore the initial basic models and concepts of the BP neural network!
Speaking from the biological model of neural network
We know that the transmission of human brain information, the response to external stimuli are controlled by neurons, the human brain is composed of such neurons in the upper Bai. These neurons are not isolated and closely linked, each neuron is connected to thousands of neurons on average, thus constituting the neural network of the human brain. Stimulation in neural networks is followed by certain rules, and a neuron does not respond to stimuli that are transmitted from other nerves every time. It will first accumulate the stimulation of its neighboring neurons, and at some point generate its own stimulation to pass it on to some neurons adjacent to it. The Bai neurons in this work form the brain's response to the outside world. The mechanism by which the human brain learns to stimulate the outside world is by regulating the connections between these neurons and their intensity. Of course, what is actually said is a simplified biological model of the real neural work of the human brain, which can be generalized to machine learning using this simplified biological model and described as an artificial neural network. BP Neural network is one of them, to look at the specific analysis of neurons.
Fig. 1 Neuron in neural network
The stimulation of neuronal accumulation is the sum of the stimuli transmitted by other neurons and the corresponding weights, which are expressed in Xj , andYi denotes the amount of stimulation transmitted by a neuron,Wi Represents the weight of a link to a neuron stimulation, resulting in a formula:
Xj = (Y1 * W1) + (y2 * W2) +...+ (Yi * Wi) +...+ (yn * Wn)
And when Xj completes accumulating, the accumulated neurons themselves are stimulated by the propagation of some neurons around them, which are expressed as YJ to be shown below:
YJ = f (Xj)
The neurons were treated by the results of the accumulated Xj , and the YJ was stimulated externally. This processing is represented by an f function map, which is called an activation function .
The composition of BP neural network
After analyzing the individual neurons, and then looking at the situation after they make up the network, it is the most intuitive way to illustrate them graphically, as shown in 2:
Figure 2 BP Neural network
In the first region, they are equivalent to external stimuli, are sources of stimulation and transmit stimuli to neurons, so the first region is named the input layer . The second area, which indicates that neurons transmit stimuli to each other is equivalent to the inside of the brain, so the second area is named the hidden layer . The third region, which expresses the response of neurons to the outside world after multi-level transmission, is named the output layer of the third region.
The simple description is that the input layer passes the stimulus to the hidden layer, and the hidden layer passes the stimulus to the output layer through the intensity (weight) of the link between the neurons and the transfer rule (activation function), and the output layer organizes the hidden layer processing after the stimulation produces the final result. If there is a correct result, then the correct result and the resulting results are compared, get the error, and then inverse push the link weights in the neural network feedback correction, so as to complete the learning process. This is the BP neural network feedback mechanism, is also the source of the BP (back propagation) name: The use of backward feedback learning mechanism to correct the weight in the neural network, and finally achieve the goal of output correct results!
Mathematical derivation of BP neural network
Based on the mathematical analysis of BP Neural network model, the first part of the neural network model in this paper can obtain a BP neural network formula (1):
For the activation function of the output of the neuron itself, generally choose the Sigmoid function , then you can get the second formula (2):
Through the above two formulas, we can analyze the calculation process of output results in BP neural network. Each neuron receives the stimulus Yi and then the weighted accumulation (weighted Wji ) is completed after the XJ is produced, and then stimulated by the activation function of yJ , The next layer of neurons to which it is connected is passed, and, in turn, results are eventually output.
We will then analyze how to use the backward feedback mechanism to correct the neuron weight Wji, this part of the mathematical derivation needs to apply to the mathematical content of the multivariate differential. To fix the Wji , you need to get the amount of error. In particular, the first to use a DJ to express the true correct results, and set the error to e , then (YJ-DJ) corresponds to the e for yj differential increment, that is yJ minus (YJ-DJ) can get the correct value, get the formula (3):
Then, to clarify the target, you need to know what is the amount of error for the weight Wji is also the value . In the equation (1) It is known that the Wji is related to XJ , then the formula (4)can be deduced:
The error amount of Wji needs to be obtained and converted to the required value. It is deduced as follows:
The value of which can be derived from the formula (2) :
So the value of the resulting error amount is:
The above formula needs to pay attention to subscript: The last one is Yi , the front is YJ . Pushing here can be a complete use of the neural network output value YJ and the correct value of the DJ to the last layer of hidden layer Wji correction, then the other hidden layer? Then look down.
The above derivation process starts with the formula (3) , if we know (note is Yi , the formula (3) is YJ ), it is possible to deduce the same with other hidden layers to find out the weight value of the error amount of correction. The derivation is as follows:
So all the error amount can be the same deduction to complete!
The last step of correcting Wji is to add the following variables, setting an L (0 to 1) learning rate.
At this point, the BP neural network feedback part of the mathematical deduction is completed.
The definition of some data
First, we describe some of the important data definitions in the programs described below.
#define Data 820#define in 2#define out 1#define Neuron 45#define trainc 5500
Data is used to indicate the number of samples that have been known, that is, the number of training samples. In indicates the number of input variables for each sample; Out represents the number of output variables for each sample. Neuron represents the number of neurons, trainc to indicate the number of sessions. Let's look at the data definition for the neural network description, and let's see that the data types in the following diagram are double type.
Figure 1
D_in[data][in] Stores Data samples, in inputs for each sample. D_out[data][out] Stores Data samples, out of each sample output. We use the Adjacency table method to represent the network in Figure 1, W[neuron][in] represents the weight of an input to a neuron, v[out][neuron] to indicate the weight of a neuron on an output, and corresponding to save them two of the amount of correction array dw[neuron][in] and Dv[out][neuron]. An array of O[neuron] records the output of a neuron through an activation function, Outputdata[out] stores the BP neural network.
Process of program execution
Here, the execution details of the specific function are not considered, and the procedure is introduced in general. With pseudo-code to represent, the specific content of the following step-by-step introduction, as follows:
The main function main{reads the sample data ReadData (); initializes the BP neural network initbpnework () { including data normalization, initialization of neurons w[neuron][in], v[out][neuron], etc.; } BP Neural Network Training Trainnetwork () { do{for (i is less than sample capacity Data) { calculates the output Computo (i) of the BP neural network generated by the input of the sample I); The feedback regulates the neuron in the BP neural network, completes the Learning backupdate (i) of the first sample, and the }while (the number of training times or the accuracy of the error); Store trained Neuron information Writeneuron (); using some data to test the results of the trained BP neural network; return 0;}
The above is the process of processing, for reading data, saving data and other processing this article will omit this aspect, highlighting the main part.
Initializing BP neural network
Initialization is mainly involved in two aspects of the function, on the one hand is to read the training sample data normalization, normalization refers to the conversion of data into 0~1. In the BP neural network theory, this is not required, but in practice, normalization is indispensable. Because the theoretical model does not take into account the rate of convergence of the BP neural network, the output of the neuron is generally very sensitive to the data between the 0~1, and normalization can significantly improve the training efficiency. It can be normalized with the following formula, where a constant A is added to prevent the occurrence of 0 (0 cannot be the denominator).
y= (x-minvalue+a)/(Maxvalue-minvalue+a)
On the other hand, the weight of the neurons is initialized, the data is one to the (0~1), then the weight is initialized to ( -1~1) between the data, and the correction amount is assigned a value of 0. The implementation reference code is as follows:
void Initbpnework () {int i,j; /* Find data minimum, maximum value */for (i=0; i<in; i++) {minin[i]=maxin[i]=d_in[0][i]; for (j=0; j<data; J + +) {Maxin[i]=maxin[i]>d_in[j][i]? Maxin[i]:d _in[j][i]; Minin[i]=minin[i]<d_in[j][i]? Minin[i]:d _in[j][i]; }} for (i=0; i<out; i++) {minout[i]=maxout[i]=d_out[0][i]; for (j=0; j<data; J + +) {Maxout[i]=maxout[i]>d_out[j][i]? Maxout[i]:d _out[j][i]; Minout[i]=minout[i]<d_out[j][i]? Minout[i]:d _out[j][i]; }/* Normalized processing */for (i = 0; i < in; i++) for (j = 0; J < Data; J + +) d_in[j][i]= (d_in [j] [I]-minin[i]+1]/(MAXIN[I]-MININ[I]+1); for (i = 0, I < out, i++) for (j = 0; J < Data; J + +) d_out[j][i]= (d_out[j][i]-minout[i]+1)/(maxout[ I]-MINOUT[I]+1); /* Initialize neuron */for (i = 0; i < Neuron; ++i) for (j = 0; J < in; ++j) {w[i][j]= (rand () *2.0/rand_mAX-1)/2; dw[i][j]=0; } for (i = 0; i < Neuron; ++i) for (j = 0; J < out; ++j) {v[j][i]= (rand () *2.0/rand_max-1) /2; dv[j][i]=0; }}
BP Neural Network Training
This part should be said that the whole BP neural network formed the engine, driving the sample training process execution. By the basic model of BP neural network, we know that the feedback learning mechanism consists of two parts, one is the result of predicting by BP Neural Network, the other is the comparison between the predicted result and the accurate result of the sample, and then the error amount of the neuron is corrected. Therefore, we use two functions to represent such two processes, the training process is also the average error e monitoring, if the set of accuracy can be achieved training. Since it is not always possible to reach the desired accuracy requirement, we add a number of training parameters and exit the training if the number of times is reached. The implementation reference code is as follows:
void trainnetwork () { int i,c=0; do{ e=0; for (i = 0; i < Data; ++i) { Computo (i); E+=fabs ((outputdata[0]-d_out[i][0])/d_out[i][0]); Backupdate (i); } printf ("%d %lf\n", c,e/data); C + +; } while (C<trainc && e/data>0.01);}
One of the functions, Computo (i) (o is the output abbreviation), calculates the BP neural network to predict the output of the sample I, which is the first process. Backupdate (i) is based on the prediction of the first sample output of the neural network to update the weights, E is used to monitor the error.
bp neural network output
The function Computo (i) is responsible for the input of the sample I through the mechanism of the BP neural network, and predicts its output. Recall that the basic model of BP neural network (see the basic model for details) corresponds to the formula (1) and the formula (2) that corresponds to the activation function:
In the previous design of the BP neural network, the input layer and the hidden layer weight corresponding data structure is w[neuron][in], the hidden layer and the output layer weights corresponding data structure is V[out][neuron], and the array O[neuron] recorded is the neuron through the activation function external output, The sample results of BP neural network prediction are saved in outputdata[out]. From this, you can get the reference code for the following implementations:
void Computo (int var) { int i,j; Double sum,y; /* Neuron output */For (i = 0; i < Neuron; ++i) { sum=0; for (j = 0; J < in; ++j) sum+=w[i][j]*d_in[var][j]; O[i]=1/(1+exp ( -1*sum)); } /* Hide layer to output layer output * /for (i = 0; i < out; ++i) { sum=0; for (j = 0; j < Neuron; ++j) sum+=v[i][j]*o[j]; outputdata[i]=sum; } }
Feedback learning of BP neural network
The function backupdate (i) is responsible for the comparison between the results of the predicted output and the real results of the sample, then the weights involved in the neural network are corrected, and this is the key to the realization of the BP neural network. How to find the correct amount of error for w[neuron][in] and V[out][neuron] is the key! the error correction amount of the method in the basic model of the mathematical analysis of the part of the solution, the specific problem of specific analysis , the implementation of the BP neural network we designed, need to get to w[neuron][in] and V[out][neuron] Two data for correction error, the amount of error is stored using data structure Dw[neuron][in] and Dv[out][neuron]. So what's the difference between these two correction errors? The idea of derivation is consistent with the derivation of the error amount in the basic model, and only the mathematical derivation process in the BP neural network specifically designed for us is listed here:
If you do not want to know the derivation process, then only need to look at the above two so (there are three points of the place) of the content, you can know what the amount of error required, if you want to understand, you may need to draw on the manuscript to see the deduction deduction. To complete the mathematical deduction here, the implementation of the code is very easy to write. In the specific implementation of the error modification, we add to the learning rate, and the previous learning to the correction error amount of inheritance, the straightforward is that all multiply the number from 0 to 1, the concrete see the following implementation reference code:
#define a 0.2#define b 0.4#define a 0.2#define b 0.3
void backupdate (int var) { int i,j; Double T; for (i = 0; i < Neuron; ++i) { t=0; for (j = 0, J < out; ++j) { t+= (Outputdata[j]-d_out[var][j]) *v[j][i]; dv[j][i]=a*dv[j][i]+b* (Outputdata[j]-d_out[var][j]) *o[i]; V[j][i]-=dv[j][i]; } for (j = 0, J < in; ++j) { dw[i][j]=a*dw[i][j]+b*t*o[i]* (1-o[i]) *d_in[var][j]; W[I][J]-=DW[I][J];}} }
Well, the implementation of the C language of the BP neural network is complete. Finally, we can test the operation of the BP neural network. I am here to give the data, two inputs A, B (10 within the number), an output c,c=a+b. In other words, the BP neural network is taught to add operations. In 45 neurons, 820 training samples, the sample average error is less than 0.01 when the completion of training (learning rate, etc. see reference Code), the final prediction (6,8), (2.1,7), (4.3,8) the actual output is as follows:
Finally, the reference implementation code is attached, as well as the data and neuron information during the experiment training. (This example is a simple demo for the BP Neural network, which needs to be considered if used in practice!!!) )
Derivation of BP neural network model and implementation of C language (reproduced)