convolutional Neural Network is the first multi-layered neural network structure which has been successfully trained, and has strong fault tolerance, self-learning and parallel processing ability.
First, the basic principle
1.CNN algorithm Ideas
convolutional neural network can be regarded as a special case of Feedforward network, which simplifies and improves Feedforward network mainly in network structure, in theory, the inverse propagation algorithm can be used to train convolutional neural networks. Convolutional neural networks are widely used in speech recognition and image classification.
2.CNN Network Structure
convolutional Neural Network is a multilayer feedforward network, each layer is composed of several two-dimensional planes. Each plane consists of multiple neurons.
The network input is a two-dimensional visual mode, alternating with the convolution layer (C) and the sampling layer (S) of the network middle layer. The network output layer is the full connection of the Feedforward network, and the number of dimensions of the output layer in the category task.
2.1 Input Layer
The input layer of convolutional neural network receives two-dimensional visual pattern directly, such as two-dimensional image. Can no longer need additional human participation process to select or design the appropriate features as input, automatic extraction of features from the original image data, learning classifier, can greatly reduce the manual preprocessing process, to help learning and the current classification task the most effective visual characteristics.
In order to display clearly, the input layer in the only one two-dimensional visual mode, that is, a grayscale image. In practical applications, the input layer can be a multichannel image, such as a three-channel color image, or an image sequence, such as a continuous multi-frame image in a video.
2.2 convolutional layer (C) layer
The convolution layer belongs to the middle layer and extracts the layers for the feature. Each convolution layer contains multiple convolutional neurons (c) elements, each C-element is connected with the local sensing domain corresponding to the previous layer of the network, and extracts the image features of the part, and the specific extracted features are embodied in the C-element and the connection weights of the previous layer local sense domain. Compared with the general Feedforward network, the local connection method of convolutional neural network greatly reduces the network parameters.
In order to further reduce the network parameters, convolutional neural networks also limit the weights of different neurons in the same convolutional layer to the different positions of the previous network, that is, a convolution layer is only used to extract the same feature at different locations in the previous layer network, which is called weight sharing. This hypothesis stems from the practical experience of image processing: If a feature is valid somewhere in the image, it can be useful elsewhere in the image. By designing this weighted shared connection, the network parameters can be further reduced, and the network learning and location-independent robust visual features can be used for classification, that is, a feature that is eventually learned regardless of where the image appears, the network always extracts the feature and uses it for classification.
By designing multiple convolution layers, the network can draw multiple different features for final classification tasks.
As shown in 2, the local domain of the convolution layer is 5x5, using yellow and red to label the corresponding relationship between the two convolutional neurons on the convolutional layer and the previous network, and the number of join weights for each convolutional neuron and the previous network is 5x5.
2.3 Sample Layer (S) layer
The sampling layer belongs to the middle tier and is the feature map layer. Each sampling layer contains multiple sampled neuron (s) elements, and S-elements are only connected to the local sensing domain of the corresponding location in the previous layer of the network. Unlike the C-element, each S-element and the previous layer of network local sense domain connection of the ownership weight is fixed to a specific value, in the network training process is no longer changed.
The current C-layer network not only no longer produces new training parameters, but also the first layer of network sampling to get the characteristics of the next sampling, further reducing the network size. By sampling the local sensing domain of the first layer network, the network is more robust to the potential deformation of the input mode.
3, the local sensing domain size of the front layer network of the sampling layer is set to 2x2, after the sampling layer, the feature dimension of the front-layer network extraction is reduced to the original 1/4.
2.4 Output Layer
The output layer of convolutional neural network is the same as the common Feedforward network, which is the full connection mode. The last layer of the hidden layer (which can be either C or s) is stretched to a vector and connected to the output layer in a fully connected manner.
The structure can fully excavate the relationship between the last extracted feature and the output category label, and in complex application, the output layer can design multilayer full-connected structure.
2.5 CNN Structure Example
The complete neural network can not only control the whole scale of the network, but also realize the robustness of the network to identify the deformation such as displacement, scaling and distortion through the partial connection of structure, weighted value sharing and down sampling.
For a feedforward network that contains only one layer of middle tier, assuming that the input is an image of 1000x1000px, the number of hidden layer neurons is 1 million and the output class number is 10, then if the network is an all-connected structure, the network parameter is 1000x1000x1000000+ 1000000x10≈1012.
If the local connection structure is used, and if the size of the local sensing domain is 10x10, then each hidden layer neuron only needs to be connected with this 10x10 local image, and the network parameters are 10x10x1000000+1000000x10≈108, and the network parameters fall by about 4 orders of magnitude.
If the weight sharing is further adopted, that is, the weights of the 1 million hidden-layer neurons are equal to the local sensing domains in the previous layer, then the network parameters are 10x10+1000000x10≈107 and the network parameters are further reduced by an order of magnitude.
If sampling is sampled under the sampling layer, assuming that the local sensing domain size of the sampling layer is 10x10, then the network parameter is 10x10+10000x10≈105 and the network parameter drops by two orders of magnitude.
Convolutional neural networks can greatly reduce the network parameters through structural constraints and design, and realize certain regularization from the structure itself, making the training process relatively easy and difficult to learn. The structural constraints that reduce the weight of the network are based on the invariance of the input modes in translation, scaling, and distortion, and the trained network is more robust to these deformations.
3. CNN Network Learning
Convolutional neural networks are a feedforward network in general, and in theory, the inverse propagation algorithm for training feedforward networks can be used to train convolutional neural networks. The convolution neural network has the particularity in structure, in order to train the convolutional neural network efficiently, the inverse propagation algorithm needs to be optimized.
3.1 Inverse propagation algorithm
Suppose that the sample set for network training is {xn,tn}n=1n, and the dimensions of the input and target categories are NI and N0, respectively. With the sum of squares and errors as loss functions, the training error of the network over the entire training set is expressed as follows:
Here yn represents the output of the network with Xn as input. The training error of the network over the entire training set is the error of all samples, and the training error of the network on a training sample can be expressed as:
The training process of the inverse propagation algorithm mainly includes the forward calculation process and the reverse update process. For a common fully connected Feedforward network, the forward calculation process computes the input x per layer by the output of each layer of network as follows:
Al=δ (ZL) ZL=WLAL-1+BL
Here, Al indicates the result of the input of the node after the first level of the neuron, δ (·) Represents the activation function of the neuron compute node, and ZL represents the total input of the level l neurons. For input layer al=x, there is anl=y for the output layer. After the sample has been output through the network, it is possible to calculate the training error of the network for the sample.
During the reverse updating process, the derivative of the training error relative to the network parameter is propagated from back to front, and the derivative of the training error relative to the arbitrary layer bias B is:
The variable δ is introduced here to indicate that the training error is relative to the derivative of the bias B, and its physical meaning is the rate of change of the training error relative to the biased term, that is, the derivative, which can be called the sensitivity. For the output layer, the variable corresponds to the value:
Δnl-1=δ ' (UNL-1) · (YN-TN)
For middle-tier L (1<L<NL-1), the variable corresponds to a value of: δl= (wl+1) tδl+1 δ ' (UL)
Depending on the specific value changed in each layer, the derivative of the training error relative to the layer join weight WL is:
3.2 BP algorithm for CNN
The process of training convolutional neural networks using BP algorithm is the same as the normal Feedforward network, which is divided into forward calculation and reverse updating process. convolutional neural Networks alternately appear on the structure of the convolution layer and sampling layer, the last layer or layers of the full-connected layer. For the fully connected layer. For the fully connected layer, the forward calculation and reverse update process is the same as the normal Feedforward network.
(1) Convolution layer
The input of the convolution layer is a plurality of two-dimensional feature graphs of the previous layer network, convolution layer is used to convolution them with several learning convolution cores (local connection, weight sharing weights matrix), and then the convolution results are obtained by the Neuron computing node to obtain the output two-dimensional feature map of the convolution layer. Each output feature map merges the convolution results with multiple input feature graphs. The forward calculation process of the convolution layer can be expressed as:
Here MJL represents the index set of multiple input feature graphs corresponding to the J output feature graph of the L-layer, the symbol "*" represents the convolution operation, and all the input feature graphs in MJ share a biased bjl.
Based on the basic BP algorithm, how to calculate the gradient of convolution layer is deduced.
Assuming that the next layer of the convolution layer L is the sampling layer L+1.BP algorithm shows that in order to calculate the corresponding weights of the L layer, the sensitivity δl of L layer is calculated. In order to find the sensitivity of L layer, we need to sum up the sensitivity of the corresponding node of the l+1 layer (the corresponding node refers to the L + 1 layers of those nodes), and then multiplied by the weight of the join between them, and finally multiplied by the L-layer of the Neural node input u through the activation function of the guide value, the L layer of each neuron node corresponding sensitivity δl. However, due to sampling under the sampling layer, the sensitivity of each pixel (neuron node) of the sampling layer corresponds to a pixel (i.e. sampling window) of the convolution Layer output feature graph, so each node in an output feature graph of the convolution layer L is connected only to a node in the response feature graph in the sampling layer l+1. δl+1. In order to efficiently calculate the sensitivity of the convolutional layer L, it is possible to sample the sensitivity map sampled under the sampled layer (each pixel of the feature map corresponds to a sensitivity, constituting a sensitivity graph), so that the size of the obtained upper sampling sensitivity graph is consistent with the size of the convolution layer sensitivity graph. Then, the derivative of the activation value of the convolutional layer L and the upper sampling sensitivity graph are dot product operation. Since the sampling layer below the sampling factor is a constant value β, so as long as the dot product after the feature graph multiplied by this constant. The above process needs to be repeated for each feature map of the convolution layer and the sensitivity map it corresponds to in the sampling layer. This process is expressed as:
Up here (·) Represents the on-sample operation, if the next sampling factor is n, simply repeat n times in the horizontal and vertical directions of each pixel. In fact, this function can use the Kronecker product to define:
Up (x)x?lNxN
For each feature map of the convolution layer, the sensitivity graph can be calculated. The gradient of the training error about the bias is calculated by summing all the nodes of the convolution layer's sensitivity graph:
Since many of the connection weights are shared, for a given weight, all connections to that weight need to be graded on that point, and then the gradients are summed.
(2) Sampling layer
The sampling layer produces a down-sampled feature graph for each characteristic graph of the input, and if there are N input feature graphs, there are N output feature graphs, but each output feature graph becomes smaller. The sampling layer can be expressed as:
Down (•) Represents the next sampling function, which sums the image blocks of each non-repeating nxn in the input graph to obtain a point value in the output graph, and the length and width of the output graph are the 1/n of the input graph. Each output has a specific multiplicative bias β and an additive bias B.
As long as the sensitivity of the sample layer is obtained, the gradient of the bias parameter β and B can be calculated. For the next layer, the sensitivity graph can be calculated directly using the standard BP algorithm for the sampling layer of the fully connected layer.
If the next layer of the sampling layer is a convolution layer, in order to calculate the gradient of the next layer of convolution nucleus, it is necessary to find out which image block in the sampling layer corresponds to a given pixel in the sensitivity graph of the convolution layer. Since the weighting coefficients between the connection input image block and the output pixel are actually the weights of the convolution cores (after rotation), the relationship between the sampling layer and the convolution layer sensitivity map can be efficiently achieved using convolution operations:
In MATLAB, the full-type convolution operation allows the image boundary to be automatically filled with 0 processing. Based on the sensitivity graph of the sampling layer, the gradient of the training error relative to the additive bias BJ can be summed by summing the elements on the sensitive graph:
and the training error relative to the multiplicative bias βj gradient, due to the layer in the forward calculation process obtained by the initial feature map, can be in the forward calculation of the next sampled feature map, you can use DJL to represent the previous layer of output feature map under the sample obtained after the feature map, namely: Djl=down ( AJL-1)
Then the gradient of the training error relative to the multiplicative bias βj can be obtained from the following formula:
According to the specific structural characteristics of each layer of convolutional neural network, the network parameters can be updated by choosing corresponding formula to get the gradient of training error relative to training parameters.
Second, the algorithm improvement
convolutional neural network training process converges slowly and time-consuming. The researchers put forward a lot of methods to improve the computational efficiency of convolutional neural networks.
1, design a new convolutional neural network training strategy
At present, the training of convolutional neural network is using BP algorithm to study the whole supervision mode, by using some kind of pre-training (supervised or unsupervised), it can provide a better initial value for convolutional neural network, thus greatly improving the convergence speed of the training using BP algorithm in whole.
2. Using GPU to accelerate the convolution operation process
Based on GPU operations, convolutional operations can be increased 3~10 times by writing high-efficiency C + + code to achieve convolution operations.
3, using parallel computing to improve network training and testing speed
A large convolutional neural network is divided into several small sub-networks, and then the operation process of each sub-network is processed in parallel, which can effectively improve the operation process of the whole large network.
4, using distributed computing to improve network training and testing speed
The acceleration method uses tens of thousands of compute nodes, each computing node value to complete the entire network calculation of a small part of the calculation, through the dispatch node for each computing node assigned the corresponding computing tasks, each operation node to complete their own computing tasks, all compute nodes to complete the calculation task, The scheduling node then aggregates the computed results of each compute node.
With this acceleration, the speed increase is very obvious, you can complete some of the previously almost impossible network training tasks, but the main problem is that the related programming is more complex, need to consume more computing resources.
5, Hardware convolutional neural network
Convolution neural network has been successfully applied to more and more practical problems, and if the convolution layer and the lower sampling layer in convolutional neural network can be hardware, the operation efficiency of convolutional neural network will be improved again.
Three, the simulation experiment
This paper first gives the simulation of CNN training, then introduces two well-known examples of convolutional neural network, and details the concrete form of network structure in practical application of CNN.
1, convolutional neural network training algorithm simulation
Algorithm 1: Training algorithm of convolutional neural network based on BP algorithm
Input: Training Sample {xn,tn}n=1n, convolutional neural network structure {hl}l=1l, learning rate η
Output: Parameters of convolutional neural networks
Training process:
Initialize: Sets the convolution core and offset of all the layers of the convolutional network to a smaller random value;
While not converged
Sample samples from a collection of training samples (XN,TN)
Forward calculation process:
For L=1:l
If HL is a convolution layer:
If HL is the next sampling layer:
If HL is a fully connected layer: Al=δ (ZL), where ZL=WLAL-1+BL
End for
Reverse propagation process:
For L=l:1
If HL is a convolution layer:
If HL is the next sampling layer:
If HL is a fully connected layer:
End for
Update parameters:
End while
2, convolutional neural network practical Application Example
(1) convolutional neural network used by the lenet developed by Yann LeCun
The Lenet-based handwritten digital classification system has been extremely successful commercially, with LENET5 on-line demonstrations on the web, which can be accurately identified for a variety of complex handwritten numerals.
As shown in 4, the LENET5 network input layer is composed of 32x32-aware nodes, each of which is used to receive a pixel of the input 32x32 grayscale image.
The first hidden layer of the network is a convolutional layer, the number of convolution cores is 6, the size of the convolution core (that is, the size of the local sensing domain) is 5x5, the input layer passes through the first convolution layer, the input image is convolution into 6 28x28 feature map, the size of the feature map from the input feature map 32x32 into the output 28x28, Convolution uses the valid method, that is, the size of the output feature graph = the size of the input feature graph-(the size of the convolution core-1).
The second hidden layer of the network is a sampling layer, the local sensing domain size is 2x2, that is, each 2x2 pixel is sampled as 1 pixels, after the second layer of hidden layer, the size of the feature map becomes 14x14, the number of feature maps is unchanged, still not 6.
The third hidden layer of the network is the convolution layer, the number of convolution cores is 16, the convolution core size is 5x5, so the feature map size after the output of this layer is 10x10 (10=14-5+1). It should be noted that the 6 characteristics of the input layer into the output of the 16 feature map, this process can be seen as the S2 feature map with 1 input layer (=5x5x6, not 5x5) nodes.
The output layer of the network is convolutional on a network of 16 nodes, and each feature map of the C3 is not connected to each of the S2 's feature graphs, but may be connected to only a few of them.
LeNet5 the specific connection relationship 5, the ordinate represents the input feature map index, the horizontal axis represents the output feature map index, the position of the X mark indicates that the location corresponding to the input feature map and the output feature map there is a connection between.
The fourth hidden layer of the network S4 is the lower sampling layer, the size of the local domain is 2x2, and the layer input 16 5x5 feature maps.
The fifth hidden layer of the network is the convolution layer, which is the last convolution layer, the size of the convolution core is 5x5, the number of convolution cores is 6, after the convolution operation is completed, C5 will be the resulting feature map expanded into a vector, the size of the vector (=20x6).
The sixth hidden layer of the network, F6, is a fully connected network with 84 nodes and 84 nodes fully connected to the 120 input nodes of C5.
The number of output layer nodes for the network is 10, which indicates the number of clusters that correspond to the problem (0~9 a total of 10 numbers).
(2) convolutional neural network used in the ILSVRC2013 competition of Geoffrey Hinton
Based on the image classification algorithm of this network, the first place of Imagenet ILSVRC2013 is obtained, and the classification result is greatly improved, which is a milestone work of convolutional neural network in computer vision problem.
As shown in 6, the network structure is more complex than the LENET5, and the input is 224x224x3 color image.
Each convolution layer and sampling layer of the network divides the input feature graph into two parts, realizes the parallel processing, the first hidden layer convolution core number is 96, divided into 48 convolution cores, each 4 pixels convolution, convolution layer output feature map dimension greatly reduced.
The sampling layer of the network does not use the mean-value sampling (mean polling) used by LeNet5, and the maximum-value down-sampling (max polling) is used.
In the last two hidden layers, all connected networks can be selected to learn more complex relationships between the features and outputs that have been learned by convolution layer and sampling layer.
The output layer dimension of the network is 1000 dimensions, which means that the input image is finally divided into 1000 classes.
Four, the characteristics of the algorithm
1. Algorithm advantages
convolutional Neural Network is the first successfully trained neural network with multi-layered structure, which has obvious advantages over other neural networks.
Firstly, the structural design of convolutional neural network is inspired by the research achievements of neuroscientists on visual nerve mechanism, which has a reliable biological basis.
Second, convolutional neural networks can automatically learn the corresponding features directly from the original input data, eliminating the feature design process required by the General machine learning algorithm, saving a lot of time, and learning and discovering more effective features.
Thirdly, the structure of convolutional neural network realizes the regularization of certain forms, the training parameters are controlled at a very small order of magnitude, and the whole network is not only easier to train, but also has a very good generalization performance in the final network.
Finally, the structural characteristics of convolutional neural networks make the network adaptable to changes such as noise, deformation and distortion of input.
The early convolution neural network is used to deal with two-dimensional image, and the convolution neural network is widely used to deal with one-dimensional sound signal, three-dimensional image sequence or video signal, etc., and has strong expansion ability.
2. Algorithm disadvantage
There are obvious shortcomings of convolutional neural networks.
Firstly, because of the restrictive structure, convolutional neural network is far weaker than the corresponding full-connected network in network memory ability and expression ability.
Secondly, because of the convolution operation, the operation efficiency is very low in training and testing, and the operation process is very time consuming, which is nearly three times times slower than the corresponding full-connected network.
Thirdly, there is no general theory and method to guide the structure design, it takes a lot of time to set up an effective convolutional neural network structure.
Visual machine Learning notes------CNN Learning