These two days in the study of artificial neural networks, using the traditional neural network structure made a small project to identify handwritten numbers as practiced hand. A bit of harvest and thinking, want to share with you, welcome advice, common progress.
The usual BP neural network refers to the traditional artificial neural network, which is simpler compared to the convolutional neural Network (CNN).
Artificial neural network has the function of complex pattern and associative and reasoning memory, and it is a powerful tool to solve some problems that traditional methods can't solve. At present, it is increasingly valued, while the development of other disciplines provides a greater opportunity for them. In 1986, Romelhart and McClelland put forward the error back propagation algorithm (error back propagation algorithm), called BP algorithm, because the multi-layer Feedforward network training often uses the error reverse propagation algorithm, Multilayer Feedforward networks are also called BP networks.
For readability, here is the logical order of the full text:
1, the common theory of neural network structure and working principle, simple and good understanding, recommended to watch
2, the mathematical derivation of the inverse propagation algorithm, if it is too complicated to temporarily skip
3,matlab Code and Image Library
(1) Plain English explain the traditional neural networkFirst, let's look at the basic unit of a neural network-a single neuron:
The circle in the picture shows a neuron, and we know that a neuron receives stimuli from neighboring neurons, and neurons accumulate them in different weights, and at some point they generate their own stimuli to pass on to some neurons adjacent to it. The countless neurons that work in this way form the brain's perception of the outside world. And the brain's learning mechanism for the world is by adjusting the weights of the neurons stimulated by these adjacent connections.
In the picture, the stimuli transmitted by the surrounding neurons are expressed as Y, the weights are expressed as w, and the stimulation of the neurons represented by the circle is that all stimuli are summed up according to the weights, i.e.
At the same time the neuron as a part of the network, as well as other neurons need to spread the stimulus signal, but not directly to the s spread, but spread an F (s) out, why? Actually an overall picture, we analyze behind. where F (s) is named "Activation function", the commonly used functions are as follows:
Okay, well, if there's nothing wrong with that, congratulations, you're getting started, and now we're connecting the basic units together, and that's what makes us the ultimate neural network. The traditional neural network structure is as follows:
Do you think it's messy? No hurry we look at 1.1, from the whole to the minute to dissect it. First of all, its structure is divided into three parts, the input layer, the hidden layer and the output layer, the general input layer and the output layer of each one, hidden layers of several, drawing a picture. Subtly, the connecting structure, each neuron in the latter layer is connected by all the neurons in the previous layer.
The handwritten numeral recognition experiment uses a three-layer neural network structure, that is, there is only one hidden layer, as described below.
Here is a description of each layer's representation and the relationship of each layer:
Input layer: x= (X1,X2,X3...XN)
Hidden layer: y= (Y1,Y2,Y3...YM)
Output layer: o= (o1,o2,o3...or)
Two weights:
Weight of input layer to hidden layer: v= (V1,v2,v3 ... VM), VJ is a column vector that indicates that all neurons in the input layer are weighted by VJ, and the first j neurons of the hidden layer are obtained
Hide layer-to-output layer weights: w= (W1,W2,W3 ... Wr), WK is a column vector that indicates that all neurons in the hidden layer are weighted by wk to get the K neurons of the output layer
According to the stimulation of the individual neurons we have mentioned above, it is believed that many people here should have reached the following layers of relationships:
Here, the working process of the neural network is clear. Example illustrates that, assuming that the input is an image, 16x16 size, converted to a two-dimensional gray value matrix, and then stitching each line at the end of the previous line, stitching into a 1x256 line vector, as input layer input, that is, X, followed by Equation 2 to calculate the hidden layer, Then the output layer can be calculated according to Equation 1, and the output of the output layer is obtained. In this handwritten numeral recognition project, I used the image input is 16x16, so the input layer has 256 neurons, the hidden layer of neurons I took 64, the final output layer neuron I took 10, why 10? Because the number 0 to 91 a total of 10, the expectation is that, for example, input an image with the number 1, the output output is {1 0 0 0 0 0, 0 0 0 0}, the input image is 2 o'clock, the output {0 1 0 0 0, 0 0 0 0}, and so on, the output is not necessarily just 0 After the adjustment and training, the output is more than 0.9 and positive and negative more than 0.0, but also enough, only to judge the location of the maximum value can be identified in the image of the number.
At this point, we have learned the structure of the entire network and the specific process of forward work. It can be said that we have already understood the neural network to have 50%. Why is it only 50%? Think about it and believe you will find that we do not yet know that the two important quantities in the network are the weights matrix W and V.
How to find W and V, here to use an algorithm, is the error back propagation algorithm (error back propagation algorithm), referred to as the BP algorithm. The words are very obscure, we come to translate adult speech. First to look at the algorithm's working process, first randomly initialize the values of W and V, and then take some pictures to calculate, to get an output, of course, because the W and V parameters will not be perfect, the output will not be like above, just is {1 0 0 0 0 0 0 0 0 0} This type, so there are errors, According to this error can in turn correct W and V values, the modified W and V can make the output closer to the ideal output, this is called "error reverse propagation" meaning, fixed once, and then substituting some other pictures, the output from the ideal output and close to a point, we continue to calculate the error, Then fixed the values of W and V, so after many iterations of the calculation, the final multiple correction obtained a relatively perfect W and V matrix, it can make the output very close to the ideal output, so that our work completed is 100%. This kind of calculation error at the output, according to the idea of adjusting the error, learning Automation or contact with the Freescale class of intelligent car competition students experience should be relatively deep, and PID control algorithm has a great similarity.
The following is a mathematical derivation of the problem of how the error between the actual output and the ideal output is specific to adjust the values of W and V, and how much to adjust. Said above, temporarily do not understand the words can skip the deduction, see the final conclusion is good, I finally followed the code practice again, have a deeper experience, slowly will understand.
(2) Mathematical derivation of inverse propagation algorithmoutput layer Ideal output: d= (D1,D2,D3...DR), for example {1 0 0 0 0 0 0 0 0 0} and {0 1 0 0 0 0 0 0 0 0} etc.
Assuming that the gap between the actual output and the ideal output is E, it is obvious that W is a function of input x, weights W and V, and output o. To fix W, you need to know the specific modified increment δw, in discrete cases, to characterize the differential increment, you can get:
In this way, change the size of the η can change the amplitude of each adjustment, η large words adjust faster, small is slow to adjust, but too large to lead to oscillation, which is also the PID of the proportional factor p is the same. The average η size requires several attempts to find the appropriate value.
Okay, here's the neural network, and here's a less important one, and as we've said, by iterating over the weights W and V, how do you measure whether the iterations can stop? A natural idea is to judge whether the output and the ideal output are close enough each time, so we can use the method of calculating the vector distance, which is the same as the mean variance, as follows:
In this way, the primary s is small enough that the iterations can end.
(3) PracticeHere is the MATLAB code I experimented with.
Training section: RECOGNIZE_HANDWRITING_NUMBERS_BY_SIMPLE_NN_TRAIN.M
V=double (rand (256,64)); W=double (rand (64,10));d elta_v=double (rand (256,64));d elta_w=double (rand (64,10)), yita=0.2;% scaling factor, and some articles called the learning rate yita1= 0.05;% I add the parameters, scaling the activation function of the arguments to prevent input too large into the saturation of the function, you can remove the change train_number=9;% training samples, how many numbers, altogether 9, no 0train_num=30;% training samples, The number of each figure, altogether 100 x=double (zeros (1,256)), the input layer y=double (zeros (1,64)), and the middle layer, also hidden layer output=double (zeros (1,10));% output Layer Tar_ Output=double (Zeros (1,10));% target output, i.e. ideal output delta=double (zeros (1,10));% an intermediate variable that can be easily drawn s_record=1:1000;tic% regardless of the total mean variance of the% record Timing for train_control_num=1:1000% training count control, in the last found 1000 times in the tuning parameters actually have more, about 400 times completely enough s=0;% read the graph, enter the network for number=1:train_numberreaddir= [' E:\Matlab\recognize_handwiting_numbers\train_lib\ '];% reads the path of the sample for Num=1:train_num% control how many photo_name=[num2str ( Number), Num2str (num, '%05d '), '. png '];% picture name photo_index=[readdir,photo_name];% path plus picture name get total picture index Photo_matrix=imread ( PHOTO_INDEX);% use Imread to get image matrix Photo_matrix=uint8 (photo_matrix<=230);% binary, black is 1tmp=photo_matrix '; tmp=tmp (:);% The above two steps completed the image two-dimensional matrix transformation into a column vector, 256-dimensional, as input% input x=double (TMP '), the conversion of the input layer into a line vector because the input layer x is a row vector, and as a floating-point number to obtain the hidden layer input y0=x*v;% activation y=1./(1+exp (-y0*yita1));% get output layer input output0=y*w;output=1./(1+EXP (-output0*yita1));% compute expected output tar_output=double (zeros (1,10)); Tar_output (number) =1.0;% calculation error% is calculated by the formula W and v adjustment, in order to avoid the use of a For loop time-consuming, the following uses the direct matrix multiplication, more efficient delta= (tar_output-output). *output.* (1-output);d Elta_w=yita*repmat (y ', 1,10). *repmat (delta,64,1); Tmp=sum ((W.*repmat (delta,64,1)) '); Tmp=tmp.*y.* (1-y );d Elta_v=yita*repmat (x ', 1,64). *repmat (tmp,256,1);% calculates mean variance s=s+sum ((tar_output-output). * (tar_output-output))/10;% Update weight value w=w+delta_w; V=v+delta_v;endends=s/train_number/train_num% without semicolon, at any time the output error viewing convergence train_control_num% No semicolon, at any time to output the number of iterations to view the running state S_record (t Rain_control_num) =s;% record Endtoc% chronograph end plot (1:1000,s_record);
Test section: RECOGNIZE_HANDWRITING_NUMBERS_BY_SIMPLE_NN_TEST.M
correct_num=0;% record the correct number incorrect_num=0;% record the number of Errors test_number=9;% test set, total number of numbers, 9, no 0test_num=100;% test set, how many each number, Max 100% load w;%% previously trained W saved, can be loaded directly in% load v;% load yita1;% record time tic% chronograph start for number=1:test_numberreaddir=[' E:\Matlab\ Recognize_handwiting_numbers\test_lib\ '];for num=1:test_num % control how many photo_name=[num2str (number), Num2str (num, ' %05d '), '. png '];p Hoto_index=[readdir,photo_name];p hoto_matrix=imread (photo_index);% size Change photo_matrix=imresize ( PHOTO_MATRIX,[16 16]);% binary photo_matrix=uint8 (photo_matrix<=230);% Black is 1% line vector Tmp=photo_matrix '; tmp=tmp (:);% The input layer input x=double (TMP ') is computed, and the% is y0=x*v;% activated y=1./(1+exp (-Y0*YITA1)), and the output layer input o0=y*w;o=1./(1+EXP (-O0*YITA1)) is obtained; The maximum output is the recognized number [O,index]=sort (o); if index ==number correct_num=correct_num+1else incorrect_num= incorrect_num+1; % display of unsuccessful figures, display will take time to Figure (incorrect_num)% imshow ((1-photo_matrix) *255);% title (NUM2STR (number)); Endendendcorrect_rate=correct_num/test_number/test_numtoc% Chronograph End
Use the library from http://www.ee.surrey.ac.uk/CVSSP/demos/chars74k/#download
Using the Englishfnt.tgz Library, the first 100 are selected for training, and 100 additional tests are taken as a test. The picture collection has the need or can mail me to send.
Operation Result:
The final tuning parameters to get a better result, using η=0.2,η1=0.05, on my Computer, i5-3210m,4g memory, the time to iterate 1000 times is 468s, finally get the following curve, the horizontal axis is the number of iterations of training, ordinate is the mean variance, you can see, In fact, in the 350 iterations of the time has achieved a very small mean variance, considering the time and performance of the compromise, the iteration 350 times is already possible.
end of training, using η=0.2,η1=0.05, test set test, you can get 89.11% accuracy, continue to adjust parameters, once can get 90.56% accuracy, but also difficult to go up, the bottleneck is estimated.
If you follow the steps of this article step by step down, and finally do a self-play code, it should be a good understanding of the traditional type of neural network, the next understanding of convolutional neural network should be some help.
My own understanding of the neural network, in order to identify the picture handwritten figures, for example, the spatial arrangement between the black pixel and the white pixel in the picture is the number we see, if we can find a very powerful equation, the input parameter of the equation is a picture, return a number to indicate the recognition result, That's certainly ideal, but unfortunately, it's hard to find the equation to map the spatial arrangement of the pixel points of the picture and to identify the result of such a relationship. And the neural network just completes the mapping function, all the information about the pixel space arrangement is hidden in the matrix W and V, so W and V actually represent such a mapping relationship. And the neural network clever place is, when such mapping relationship is unknown, we designed a set of mechanisms let it look for itself, this is the process of learning and training, and finally get the W and V is the result of learning, the process is like the baby began to recognize the process, the beginning of the baby's cognitive system is blank, Like just randomly initialized W and V, we show it a book, like giving the network an input, telling him that this is "book", like telling the web, the ideal output should be "book", he will start to understand, build the book this item to the "book" The Word Mapping relationship, at first he will make a mistake, See similar like a piece of paper will also think is a book, but after seeing more books, and we told him this is "book", he will continue to revise such a mapping relationship, and finally establish a complete book such an article to the "book" The word mapping process, the initial cognitive system established. Now it can be said that the above "activation function" is what to use, it can be imagined that the nature of the various mapping relationship is obviously more complex, the relationship between the linear is not able to describe, and the neural network is used in the weighted addition of the operation, if not activated function processing, All the inputs of the rear layer can be reached by the linear operation of the signals before the multilayer, a phenomenon model is obviously not enough, so we add the nonlinear factor in the neural network model, which is the activation function.
Finally, thanks to read so long a text, there are shortcomings, welcome comments, common progress.
Analysis and code of handwritten numeral project recognition by BP Neural network