http://blog.csdn.net/diamonjoy_zone/article/details/70576775
Reference:
1. inception[V1]: going deeper with convolutions
2. inception[V2]: Batch normalization:accelerating deep Network Training by reducing Internal covariate Shift
3. inception[V3]: Rethinking the Inception Architecture for computer Vision
4. inception[V4]: inception-v4, Inception-resnet and the Impact of residual Connections on learning
1. Preface
The NIN presented in the previous article made a notable contribution to the transformation of the traditional CNN structure through the network model established by Mlpconv Layer and Global Average Pooling in multiple data A good result is obtained on the set, and the training parameters are controlled in the 1/12of the AlexNet parameter number.
This article is going to introduce the googlenetof the best achievement in ILSVRC , and its core structure- Inception. The early V1 structure borrowed from the design idea of NIN, modified the traditional convolutional layer in the network, and aimed at the main problem of restricting the performance of deep neural network, the continuous improvement extended to V4:
- The parameter space is large, easy to fit, and the training data set is limited;
- Complex network structure, insufficient computing resources, resulting in difficult to apply;
- The deep network structure is prone to gradient dispersion, and the model performance decreases.
2. Inception
Googlenet has modified the traditional convolutional layer in the network, and proposed a structure called Inception , which is used to increase the depth and width of network and improve the performance of deep neural network.
Its various versions of the design ideas:
2.1 Inception V1
Naive Inception
Inception Module of the proposed main consideration of a number of different size convolution core can enhance the adaptability of the network, paper respectively using 1*1, 3*3, 5*5 convolution core, while adding 3*3 max pooling.
The article then points out that there is a problem with this naive structure: Each layer Inception module's filters parameter amount is the total number of all branches, and multilayer Inception will eventually result in a large number of model parameters, more dependence on computing resources.
The equivalent mlpconv of the 1*1 convolution layer in the NIN model can not only organize information across the channel, but also improve the expression ability of the network, and can effectively reduce the dimension of the output, so the Inception module with dimension is proposed in this paper. Reduction, in order to reduce the number of filters, and to reduce the complexity of the model, without losing the capability of the model feature representation:
Inception Module
The 4 branches of the Inception Module are finally merged by a single aggregation operation (aggregated on the dimension of the output channel number, using the Tf.concat (3, [], []) function in the TensorFlow to enable merging).
The complete googlenet structure is introduced into the Inception structure behind the traditional convolution layer and the pool layer, compared with AlexNet although the network layer increases, but the number of parameters is reduced because most of the parameters are concentrated in the full connection layer, and finally achieved the ImageNet 6.67% the results.
2.2 Inception V2
Inception V2 learned that the Vgg used two 3′3 convolution instead of the large convolution of 5′5, and built more nonlinear transformations while reducing the parameters, making CNN more capable of learning features:
Two 3′3 convolution layer functions similar to a 5′5 convolution layer
In addition, the famous Batch normalization(hereinafter referred to as BN) method is proposed. BN is a very effective regularization method, which can accelerate the training speed of large convolutional networks many times, and the classification accuracy rate can be greatly improved after convergence. When used in a layer of a neural network, BN standardizes (normalization) The interior of each mini-batch data, normalizing the output to the normal distribution of N (0,1), reducing Internal covariate Shift(changes in the distribution of internal neurons).
Add: The image is standardized using tf.image.per_image_standardization () in the TensorFlow 1.0.0, and the older version is tf.image.per_image_whitening.
BN's paper pointed out that the traditional deep neural network training, each layer of input distribution is changing, resulting in training is difficult, we can only use a very small learning rate to solve the problem. After using BN for each layer, we can solve this problem effectively, the learning rate can be increased many times, the number of iterations required to reach the previous accuracy rate is only 1/14, the training time is greatly shortened. And before the accuracy rate, can continue to train, and finally achieved far more than the Inception V1 model performance--top-5 error rate 4.8%, has been superior to the human eye level. Because BN also plays a regular role in a sense, it can reduce or eliminate dropout and LRN, simplifying the network structure.
2.3 Inception V3
Splitting a 3′3 convolution into a 1′3 convolution and a 3′1 convolution
First, the idea of factorization into small convolutions is introduced, which splits a larger two-dimensional convolution into two smaller one-dimensional convolution, such as splitting 7′7 convolution into 1′7 convolution and 7′1 convolution, or splitting 3′3 convolution into 1′ 3 convolution and 3′1 convolution as shown in. On the one hand, it saves a lot of parameters, accelerates the operation and reduces the over-fitting (compared with the 7′7 convolution and the 7′1 convolution, which is more economical than splitting into 3 3′3 convolution), and increases the expression ability of a nonlinear extended model. It is pointed out in the paper that the result of this asymmetric convolution structure is more obvious than that of symmetric splitting into several identical small convolution nuclei, which can deal with more and richer spatial features and increase the feature diversity.
On the other hand, Inception V3 optimizes the structure of the Inception module, and now Inception module has three different structures of 35′35, 17′17 and 8′8. These Inception Module only appear at the back of the network, and the front is a normal convolution layer. and Inception V3, in addition to using branches in the Inception Module, also uses branches in the branch (8′8 structure), which can be said to be the network in Network in network. Finally get top-5 error rate 3.5%.
2.4 Inception V4
Inception V4 compared to V3 is mainly combined with Microsoft's ResNet, the error rate is further reduced to 3.08%.
3. Googlenet V1
For example, in tflearn googlenet.py, define the branching structure in Inception (3a) and merge feature map with the merge function:
[Python]View PlainCopy
- Network = Input_data (shape=[None, 227, 227, 3])
- conv1_7_7 = conv_2d (Network, 7, strides=2, activation=' relu ', name = ' Conv1_7_7_s2 ') /c12>
- Pool1_3_3 = max_pool_2d (conv1_7_7, 3,strides=2)
- Pool1_3_3 = Local_response_normalization (pool1_3_3)
- Conv2_3_3_reduce = conv_2d (Pool1_3_3, 1,activation=' relu ', name = ' conv2_3_3_reduce ')
- Conv2_3_3 = conv_2d (Conv2_3_3_reduce, 192,3, activation= 'relu ', name=' conv2_3_3 ')
- Conv2_3_3 = Local_response_normalization (conv2_3_3)
- Pool2_3_3 = max_pool_2d (Conv2_3_3, kernel_size=3, strides=2, name=' pool2_3_3_s2 ')
- Inception_3a_1_1 = conv_2d (Pool2_3_3, 1, activation=' Relu ', name=' inception_3a_1_1 ')
- Inception_3a_3_3_reduce = conv_2d (Pool2_3_3, 1,activation=' Relu ', name=' inception_3a_3_3_ Reduce ')
- Inception_3a_3_3 = conv_2d (inception_3a_3_3_reduce, 128,filter_size=3, activation=' relu ', name = ' Inception_3a_3_3 ')
- Inception_3a_5_5_reduce = conv_2d (pool2_3_3,filter_size=1,activation=' relu ', name =' Inception_ 3a_5_5_reduce ')
- Inception_3a_5_5 = conv_2d (inception_3a_5_5_reduce, filter_size=5, activation= 'relu ', name= ' Inception_3a_5_5 ')
- Inception_3a_pool = max_pool_2d (Pool2_3_3, kernel_size=3, strides=1,)
- Inception_3a_pool_1_1 = conv_2d (Inception_3a_pool, filter_size=1, activation= 'relu ', name=' Inception_3a_pool_1_1 ')
- # merge The inception_3a__
- Inception_3a_output = Merge ([Inception_3a_1_1, Inception_3a_3_3, Inception_3a_5_5, Inception_3a_pool_1_1], mode= ' Concat ', axis=3)
Finally, the output of the 7*7*1024 for Inception (5b) takes the dropout and Softmax of the 7*7 avg pooling,40%:
[Python]View PlainCopy
- pool5_7_7 = avg_pool_2d (inception_5b_output, kernel_size=7, strides=1)
- Pool5_7_7 = Dropout (pool5_7_7, 0.4)
- Loss = fully_connected (pool5_7_7, 17,activation=' Softmax ')
The final output channel is set to 17 due to the resolution of the Oxford category 17 flower DataSet classification task.
4. Summary
Inception V1-a branch network of 1x1, 3x3, 5x5 Conv, and 3x3 pooling, using both mlpconv and global averaging pooling , Widening the width of convolutional layer network, increasing the adaptability of network to scale;
Inception V2-The regularization effect of Batch normalization, instead of dropout and LRN, makes the training of large convolutional networks many times faster, At the same time, the classification accuracy rate can be greatly improved, while learning Vgg using two 3′3 convolution nucleus instead of 5′5 convolution kernel, reducing the number of parameters while improving the network learning ability;
Inception V3--The introduction of factorization, a larger two-dimensional convolution is split into two smaller one-dimensional convolution, such as the 3′3 convolution to the 1′3 convolution and 3′1 convolution, on the one hand, saving a large number of parameters, Accelerated operations and reduced overfitting, while adding a layer of non-linear extended model expression capabilities, in addition to using branching in the Inception Module and branching in the branch (network innetwork in Network );
Inception V4--The Inception Module combines residual Connection, combining ResNet can greatly accelerate training, At the same time, greatly improving the performance, in the construction of Inception-resnet network at the same time, also designed a deeper and more optimized Inception V4 model, can achieve comparable performance
"Turn" CNN convolutional Neural Network _ googlenet Inception (V1-V4)