"Turn" CNN convolutional Neural Network _ googlenet Inception (V1-V4)

Last Update:2017-07-27 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

http://blog.csdn.net/diamonjoy_zone/article/details/70576775

Reference:

1. inception[V1]: going deeper with convolutions

2. inception[V2]: Batch normalization:accelerating deep Network Training by reducing Internal covariate Shift

3. inception[V3]: Rethinking the Inception Architecture for computer Vision

4. inception[V4]: inception-v4, Inception-resnet and the Impact of residual Connections on learning

1. Preface

The NIN presented in the previous article made a notable contribution to the transformation of the traditional CNN structure through the network model established by Mlpconv Layer and Global Average Pooling in multiple data A good result is obtained on the set, and the training parameters are controlled in the 1/12of the AlexNet parameter number.

This article is going to introduce the googlenetof the best achievement in ILSVRC , and its core structure- Inception. The early V1 structure borrowed from the design idea of NIN, modified the traditional convolutional layer in the network, and aimed at the main problem of restricting the performance of deep neural network, the continuous improvement extended to V4:

The parameter space is large, easy to fit, and the training data set is limited;
Complex network structure, insufficient computing resources, resulting in difficult to apply;
The deep network structure is prone to gradient dispersion, and the model performance decreases.

2. Inception

Googlenet has modified the traditional convolutional layer in the network, and proposed a structure called Inception , which is used to increase the depth and width of network and improve the performance of deep neural network.

Its various versions of the design ideas:

2.1 Inception V1

Naive Inception

Inception Module of the proposed main consideration of a number of different size convolution core can enhance the adaptability of the network, paper respectively using 1*1, 3*3, 5*5 convolution core, while adding 3*3 max pooling.

The article then points out that there is a problem with this naive structure: Each layer Inception module's filters parameter amount is the total number of all branches, and multilayer Inception will eventually result in a large number of model parameters, more dependence on computing resources.

The equivalent mlpconv of the 1*1 convolution layer in the NIN model can not only organize information across the channel, but also improve the expression ability of the network, and can effectively reduce the dimension of the output, so the Inception module with dimension is proposed in this paper. Reduction, in order to reduce the number of filters, and to reduce the complexity of the model, without losing the capability of the model feature representation:

Inception Module

The 4 branches of the Inception Module are finally merged by a single aggregation operation (aggregated on the dimension of the output channel number, using the Tf.concat (3, [], []) function in the TensorFlow to enable merging).

The complete googlenet structure is introduced into the Inception structure behind the traditional convolution layer and the pool layer, compared with AlexNet although the network layer increases, but the number of parameters is reduced because most of the parameters are concentrated in the full connection layer, and finally achieved the ImageNet 6.67% the results.

2.2 Inception V2

Inception V2 learned that the Vgg used two 3′3 convolution instead of the large convolution of 5′5, and built more nonlinear transformations while reducing the parameters, making CNN more capable of learning features:

Two 3′3 convolution layer functions similar to a 5′5 convolution layer

In addition, the famous Batch normalization(hereinafter referred to as BN) method is proposed. BN is a very effective regularization method, which can accelerate the training speed of large convolutional networks many times, and the classification accuracy rate can be greatly improved after convergence. When used in a layer of a neural network, BN standardizes (normalization) The interior of each mini-batch data, normalizing the output to the normal distribution of N (0,1), reducing Internal covariate Shift(changes in the distribution of internal neurons).

Add: The image is standardized using tf.image.per_image_standardization () in the TensorFlow 1.0.0, and the older version is tf.image.per_image_whitening.

BN's paper pointed out that the traditional deep neural network training, each layer of input distribution is changing, resulting in training is difficult, we can only use a very small learning rate to solve the problem. After using BN for each layer, we can solve this problem effectively, the learning rate can be increased many times, the number of iterations required to reach the previous accuracy rate is only 1/14, the training time is greatly shortened. And before the accuracy rate, can continue to train, and finally achieved far more than the Inception V1 model performance--top-5 error rate 4.8%, has been superior to the human eye level. Because BN also plays a regular role in a sense, it can reduce or eliminate dropout and LRN, simplifying the network structure.

2.3 Inception V3

Splitting a 3′3 convolution into a 1′3 convolution and a 3′1 convolution

First, the idea of factorization into small convolutions is introduced, which splits a larger two-dimensional convolution into two smaller one-dimensional convolution, such as splitting 7′7 convolution into 1′7 convolution and 7′1 convolution, or splitting 3′3 convolution into 1′ 3 convolution and 3′1 convolution as shown in. On the one hand, it saves a lot of parameters, accelerates the operation and reduces the over-fitting (compared with the 7′7 convolution and the 7′1 convolution, which is more economical than splitting into 3 3′3 convolution), and increases the expression ability of a nonlinear extended model. It is pointed out in the paper that the result of this asymmetric convolution structure is more obvious than that of symmetric splitting into several identical small convolution nuclei, which can deal with more and richer spatial features and increase the feature diversity.

On the other hand, Inception V3 optimizes the structure of the Inception module, and now Inception module has three different structures of 35′35, 17′17 and 8′8. These Inception Module only appear at the back of the network, and the front is a normal convolution layer. and Inception V3, in addition to using branches in the Inception Module, also uses branches in the branch (8′8 structure), which can be said to be the network in Network in network. Finally get top-5 error rate 3.5%.

2.4 Inception V4

Inception V4 compared to V3 is mainly combined with Microsoft's ResNet, the error rate is further reduced to 3.08%.

3. Googlenet V1

For example, in tflearn googlenet.py, define the branching structure in Inception (3a) and merge feature map with the merge function:

[Python]View PlainCopy

Network = Input_data (shape=[None, 227, 227, 3])
conv1_7_7 = conv_2d (Network, 7, strides=2, activation=' relu ', name = ' Conv1_7_7_s2 ') /c12>
Pool1_3_3 = max_pool_2d (conv1_7_7, 3,strides=2)
Pool1_3_3 = Local_response_normalization (pool1_3_3)
Conv2_3_3_reduce = conv_2d (Pool1_3_3, 1,activation=' relu ', name = ' conv2_3_3_reduce ')
Conv2_3_3 = conv_2d (Conv2_3_3_reduce, 192,3, activation= 'relu ', name=' conv2_3_3 ')
Conv2_3_3 = Local_response_normalization (conv2_3_3)
Pool2_3_3 = max_pool_2d (Conv2_3_3, kernel_size=3, strides=2, name=' pool2_3_3_s2 ')
Inception_3a_1_1 = conv_2d (Pool2_3_3, 1, activation=' Relu ', name=' inception_3a_1_1 ')
Inception_3a_3_3_reduce = conv_2d (Pool2_3_3, 1,activation=' Relu ', name=' inception_3a_3_3_ Reduce ')
Inception_3a_3_3 = conv_2d (inception_3a_3_3_reduce, 128,filter_size=3, activation=' relu ', name = ' Inception_3a_3_3 ')
Inception_3a_5_5_reduce = conv_2d (pool2_3_3,filter_size=1,activation=' relu ', name =' Inception_ 3a_5_5_reduce ')
Inception_3a_5_5 = conv_2d (inception_3a_5_5_reduce, filter_size=5, activation= 'relu ', name= ' Inception_3a_5_5 ')
Inception_3a_pool = max_pool_2d (Pool2_3_3, kernel_size=3, strides=1,)
Inception_3a_pool_1_1 = conv_2d (Inception_3a_pool, filter_size=1, activation= 'relu ', name=' Inception_3a_pool_1_1 ')
# merge The inception_3a__
Inception_3a_output = Merge ([Inception_3a_1_1, Inception_3a_3_3, Inception_3a_5_5, Inception_3a_pool_1_1], mode= ' Concat ', axis=3)

Finally, the output of the 7*7*1024 for Inception (5b) takes the dropout and Softmax of the 7*7 avg pooling,40%:

[Python]View PlainCopy

pool5_7_7 = avg_pool_2d (inception_5b_output, kernel_size=7, strides=1)
Pool5_7_7 = Dropout (pool5_7_7, 0.4)
Loss = fully_connected (pool5_7_7, 17,activation=' Softmax ')

The final output channel is set to 17 due to the resolution of the Oxford category 17 flower DataSet classification task.

4. Summary

Inception V1-a branch network of 1x1, 3x3, 5x5 Conv, and 3x3 pooling, using both mlpconv and global averaging pooling , Widening the width of convolutional layer network, increasing the adaptability of network to scale;

Inception V2-The regularization effect of Batch normalization, instead of dropout and LRN, makes the training of large convolutional networks many times faster, At the same time, the classification accuracy rate can be greatly improved, while learning Vgg using two 3′3 convolution nucleus instead of 5′5 convolution kernel, reducing the number of parameters while improving the network learning ability;

Inception V3--The introduction of factorization, a larger two-dimensional convolution is split into two smaller one-dimensional convolution, such as the 3′3 convolution to the 1′3 convolution and 3′1 convolution, on the one hand, saving a large number of parameters, Accelerated operations and reduced overfitting, while adding a layer of non-linear extended model expression capabilities, in addition to using branching in the Inception Module and branching in the branch (network innetwork in Network );

Inception V4--The Inception Module combines residual Connection, combining ResNet can greatly accelerate training, At the same time, greatly improving the performance, in the construction of Inception-resnet network at the same time, also designed a deeper and more optimized Inception V4 model, can achieve comparable performance

"Turn" CNN convolutional Neural Network _ googlenet Inception (V1-V4)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

"Turn" CNN convolutional Neural Network _ googlenet Inception (V1-V4)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

"Turn" CNN convolutional Neural Network _ googlenet Inception (V1-V4)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support