I. CNN's biological principles, applications and advantages
CNN based on the local characteristics of the human eye Vision nerve design, widely used in image image, pattern recognition, machine vision and speech recognition, it on the image translation, scaling, rotation and other deformation has a high degree of invariance. In a word, the core idea of CNN is to combine the three ideas of local sensation field, weighted value sharing, time or space sub-sampling to obtain some degree of translation, scaling and rotation invariance.
Two. CNN's Network structure
CNN is a multi-layered neural network in which each layer consists of a plurality of two-dimensional planes, each of which consists of multiple independent neurons.
Figure 1 CNN's Network structure
The C layer is a feature extraction layer, each neuron's input is connected with the local sensing field in the previous layer, and the local characteristics are extracted, and once the local feature is extracted, the position relationship between it and other features is determined. The S layer is a feature map layer, each computing layer of the network consists of multiple feature mappings, each of which is mapped to a plane, and the weights of all neurons in the plane are equal. Each feature extraction layer (c-layer) in CNN is followed by a feature mapping layer (s-layer), a unique two-time feature extraction structure that enables CNN to have high distortion tolerance for input samples.
According to Figure 1, the first input image through and 3 convolution cores (filters) and offset items for convolution, the C1 layer produces 3 feature maps, followed by a feature map operation to get 3 S2 layer of the feature map, these feature maps are convolution to get C3 layer, and then get S4, Finally, the pixel values are concatenated into a long vector input to the traditional neural network to get the output.
Three. CNN and weight sharing
The biggest feature of CNN is that it's a great place to reduce the number of parameters that neural networks need to train by using the sense of wild and weighted sharing. So what exactly is weight sharing? Let's look at a specific example below.
Figure 2 Weight sharing
According to the left image of Figure 2, suppose we have an image of 1000*1000 pixels, that is, 1 million hidden-layer neurons, if the whole connection (that is, each hidden layer neuron is connected to each pixel of the image), then there are 1000*1000*1000000= connections, that is, the number of parameters. According to the biological principle, each neuron only feels the local area of the image, and then at the higher level, the neurons of these different parts can be synthesized to obtain the global information. In doing so, we can reduce the number of connections, that is to say, reduce the number of parameters that neural networks need to train.
According to FIG. 2, the right image assumes that the local sensing field is 10*10, and each hidden layer neuron is connected only to the 10*10 local image, so that the 1 million hidden-layer neurons have only 10*10*100000 parameters. But there are still a lot of parameters, what are the options?
We know that each hidden layer neuron is connected to the 10*10 image area, which means that each hidden layer neuron has 10*10=100 connection parameters. What if the 100 parameters are the same for each hidden layer neuron? That is, each hidden layer neuron uses the same convolution kernel to deconvolution the image. So we have only 100 parameters. Therefore, regardless of the number of hidden layer neurons, the connection between the two layers has only 100 parameters, which is the benefit of weight sharing. But is it true that a question comes along?
In fact, in doing so we only extract a feature of the image (for example, the edge of a direction), what if we want to extract the various features of the image? The answer is to add several convolution cores (filters). Assuming that we now have 100 convolution cores, each convolution core parameter is different, representing the extraction of the various characteristics of the input image, each convolution core deconvolution image to get the image of different characteristics of the map, we call the feature map. As a result, 100 convolution cores have 100 feature graphs, and these 100 feature maps form a single layer of neurons. Now we can calculate how many parameters this layer has, that is, 100 convolution cores * Each convolution kernel shares 100 parameter =100*100=, which is10,000 parameters.
It says that the number of hidden layer parameters is independent of the number of hidden layer neurons, only related to convolution kernel size and convolution kernel type. So what is the number of hidden-layer neurons associated with? It is related to the original image (number of neurons), convolution kernel size and convolution nucleus in the image sliding step. For example, assuming that the image is 1000*1000 pixels, the convolution kernel size is 10*10, with a step length of 10 (convolution cores do not overlap), so that the number of hidden layer neurons is (1000*1000)/(10*10) =100*100. This is just a convolution nucleus, which is the number of neurons in a feature map. If 100 convolution cores, or 100 feature graphs, the number of neurons is 100 times times. Of course, if the step size is 8, then the convolution core will overlap 2 pixels. It should be noted that the bias of each neuron is not considered above, so the number of parameter weights needs to be added 1. [A convolution kernel shares an offset, which is no doubt, but does the multiple convolution cores share a bias?] No, a convolution kernel shares a bias item]
Four. CNN Example LeNet-5
LeNet-5 is a typical convolutional neural network used to identify numbers, which has a total of 7 layers. As shown below: http://yann.lecun.com/exdb/lenet/index.html.
Figure 3 LeNet-5
In practical applications, often using multi-layer convolution, and then use the full connection layer for training, multilayer convolution is the purpose of a layer of convolution to learn the characteristics are often local, the higher the number of layers, the more the characteristics of the more global. Overall, LeNet-5 a total of 7 layers, does not contain input, each layer has more than one feature map, each feature map through a convolution kernel to extract a feature of the input image, each feature map has more than one neuron. Enter the size of the image as 32*32.
Figure 4 Convolution and sub-sampling process
Analytical:
(1) Convolution process
Use a filter to convolution an input image, and then add a bias to get the convolution layer.
(2) Sub-sampling process
The sum of 4 pixels per neighborhood becomes one pixel, first by a scalar weighting, followed by a bias, and then by a sigmoid activation function to produce a feature map that is roughly 4 times times smaller.
Each of the layers in LeNet-5 is explained in detail below, as follows:
Figure 5 LENET-5 Structure diagram
(1) C1 layer is a convolution layer (convolution operation an important feature is that the original signal features can be enhanced and reduce noise). It consists of 6 feature graphs, each of which is connected to the neighborhood of the 5*5 in the input image. The size of the feature map is 28*28. C1 has (5*5+1) *6=156 parameters, (5*5+1) *6*28*28=122304 a connection. [The parameter equals the convolution kernel parameter plus a bias, multiplied by the number of feature graphs.] The number of connections equals the parameter multiplied by the feature map size [convolution kernel is actually a weight parameter]
(2) The S2 layer is a down-sampling layer (why do you want to take the next sample?) Because of the principle of local correlation, the image can be sampled by sub-sampling to reduce the amount of data processing while preserving useful information, which is composed of 6 14*14 feature graphs. Each neuron in the feature map is connected to the 2*2 neighborhood of the C1 in the corresponding feature graph. Because the 2*2 of each neuron does not overlap, the size of each feature map in the S2 is 1/4 of the size of the feature map in C1. The S2 layer has a *6=12 parameter and 14*14* (4+1) *6=5880 connections. [What is the difference between a pooled layer and a lower sample layer?] is the same)
(3) The C3 layer is a convolution layer, which is composed of 16 10*10 feature graphs through the convolution core of the 5*5 convolution layer S2. However, the way the convolution is changed, because the input of the C3 layer is 6 feature graphs, and the input of the C1 layer is an image. Each feature map in the C3 layer is connected to all 6 or several feature graphs in the S2 layer, indicating that the feature map of this layer is a different combination of the feature graph extracted from the previous layer (this is not unique).
There must be a question now, why not all the feature graphs in the S2 layer are connected to the feature map on each C3 layer? On the one hand, because of the incomplete connection mechanism, the number of connections is maintained within a reasonable range. On the other hand, because the incomplete connection destroys the symmetry of the network. Because different feature maps have different inputs, they are forced to extract different features.
The LeCun is used in the following way: The first 6 features of the C3 layer are input from the subset of the 3 adjacent feature maps in the S2 layer, and the next 6 feature graphs are entered with 4 adjacent feature maps subsets in the S2 layer, and 3 feature graphs are entered as a subset of 4 nonadjacent feature maps in the S2 layer. Finally, all the feature graphs in the S2 layer are entered. As shown below:
Fig. 6 S2 layer and C3 layer
C3 layer has (25*3+1) *6+ (25*4+1) *6+ (25*4+1) *3+ (25*6+1) *1=1516 parameters and ((25*3+1) *6+ (25*4+1) *6+ (25*4+1) *3+ (25*6+1) * (10*10) = 151,600 connections.
(4) The S4 layer is a lower sampling layer, which is composed of 16 5*5 characteristic graphs. Each neuron in the feature map is connected to the 2*2 neighborhood of the corresponding feature map in the C3, as is the connection between the root S2 and the C1. The S4 layer has 16* (+) = 32 parameters and 5*5* (4+1) *16=2000 connections.
(5) The C5 layer is a convolution layer, which is 5*5 by the convolution core of the convolution layer S4, which is composed of 120 1*1 features, namely S4 and C5 full connection. The C5 layer has (5*5*16+1) *120=48120 parameters and 48120*1*1=48120 connections.
(6) The F6 layer has 84 neurons (the reason for selecting this number is from the design of the output layer) and is fully connected to the C5 layer. Just like the classic neural network, the F6 layer computes the dot product between the input vector and the weight vector, plus a bias, and then inputs it into the sigmoid activation function to produce the corresponding output. The F6 layer has a (120+1) *84=10164 parameter and 10,164 connections.
(7) The output layer consists of a European radial basis function (Euclidean Radial Basis function) unit, each class [0-9] a unit, each unit has 84 inputs.
Description: The mapping from a plane to the next plane can be regarded as a convolution operation, and the S layer can be regarded as a fuzzy filter, which plays the role of two feature extraction. The spatial resolution decreases between the hidden layer and the hidden layer, and the feature graph contained in each layer is incremented, which can be used to detect more characteristic information.
Five CNN Training Process
CNN's training process is similar to the BP algorithm, which consists of 4 steps, which are divided into two stages, as follows:
1. First stage, forward propagation
(1) Take a sample from the training concentration and enter the network;
(2) Calculate the corresponding actual output .
2. Phase II, post-transmission
(1) Calculate the difference between the actual output and the corresponding ideal output;
(2) Press the method of minimizing errors is the inverse propagation of the adjusted weights matrix.
Reference documents:
[1] convolutional neural network: http://www.cnblogs.com/ronny/p/ann_03.html
[2] convolutional neural network: http://ibillxia.github.io/blog/2013/04/06/Convolutional-Neural-Networks/
[3] A text to read convolutional neural network cnn:http://dataunion.org/11692.html
[4] An implementation of convolutional neural network algorithm: http://www.cnblogs.com/fengfenggirl/p/cnn_implement.html
[5] convolutional neural network cnn:http://blog.sina.com.cn/s/blog_628b77f80102v3ud.html
[6] Mariana CNN parallel frame and image recognition: http://djt.qq.com/article/view/1239
[7] convolutional neural network: http://blog.csdn.net/zouxy09/article/details/8781543/
[8] convolutional neural network: HTTP://WENKU.BAIDU.COM/LINK?URL=BCHVUUVDR2UNG8XHNBYRBQYBJONNVUQDWEBRP_ Wmf5c9gkvsmf8ajyuikeqtatw9a2yr4y3hg3wmu1ytspaqyx0mgxul-x-aysmqpzahbjg
[9] convolution feature extraction and pooling--processing of large images: Http://www.tuicool.com/articles/eyaQzeM
[Ten] Notes on convolutional neural Networks
[11] Pooling method Summary: http://demo.netfoucs.com/danieljianfeng/article/details/42433475
[learning:http://blog.csdn.net/liulina603/article/details/44216677]
A common model or method of deep learning: http://www.xuebuyuan.com/1541870.html
CNN Study: Http://www.docin.com/p-758296611.html
[convolutional] Neural Networks convolutional neural network: http://www.gageet.com/2014/0878.php
[16] Radial basis function: http://wenku.baidu.com/link?url=L4Hk-EMNH2XpmB-EFECf0BkC5rvfYhf05_ Esmreq51pmqj0i2tjqylh6cnqp-ugrny-zwmytmohlldnjhv2p3xrioiyvg1p2-0kcj2wfrwm
[Handwriting:http://wenku.baidu.com/link?url=] Cmvf2vgogywwa-vzzdeyfwlmjbitampbohy4rkzsfwznuncthst24gmtx0lonsexndpl0whtoe5xvmjsb-nkoakp_uevhqzdn3f5p05iohy
A summary of convolutional neural networks