from:73648100

Inception v1 Network, the main proposed inceptionmodule structure (1*1,3*3,5*5 conv and 3*3 combination of pooling), the biggest highlight is from the NIN (network in network) introduced 1*1 Conv, as shown in the structure, Representative googlenet

Assuming that the size of the previous layer is 28*28*192,

Weights size of A, 1*1*192*64+3*3*192*128+5*5*192*32=387072

Output Featuremap size of a, 28*28*64+28*28*128+28*28*32+28*28*192=28*28*416

Weights size of B, 1*1*192*64+ (1*1*192*96+3*3*96*128) + (1*1*192*16+5*5*16*32) +1*1*192*32=163328

The output of B is feature map size, 28*28*64+28*28*128+28*28*32+28*28*32=28*28*256

Writing here, can not help feeling the genius of the 1*1 conv, from the above data could be seen on the one hand to reduce the weights, on the other hand reduced dimension.

The highlights of Inception V1 are summarized below:

(1) convolution layer of a common function, can achieve the channel direction of the reduction and enhancement, as to whether the reduction or increase, depending on the number of channels of the convolution layer (number of filters), in the inception V1 1*1 convolution for dimensionality reduction, reduce weights size and feature map dimension.

(2) 1*1 convolution unique function, because the 1*1 convolution only one parameter, equivalent to the original feature map made a scale, and this scale or training learned, will undoubtedly improve the recognition accuracy.

(3) Increase the depth of the network

(4) Increase the width of the network

(5) The convolution of 1*1,3*3,5*5 is used at the same time, which increases the adaptability of network to scale.

For the GOOGLENET network structure:

Here are 2 places to be aware of:

(1) The entire network to ensure convergence, there are 3 loss

(2) The last fully connected layer was used before the global average pooling, the overall pooling used well, there are many places to play.

V2:**Batch normalization:accelerating deep Network Training by reducinginternal covariate Shift**

Inception V2 's network, represented as an improved version 5*5 that joined the BN (Batch normalization) layer, and replaced 1 googlenet convolution with 2 3*3.

The highlights of Inception v2 are summarized below:

(1) added bn layer, reduced internalcovariate Shift (internal neuron data distribution changes), so that each layer of output normalized to an n (0, 1) of Gauss, thereby increasing the robustness of the model, can be trained at a larger learning rate, convergence faster, The initialization operation is more arbitrary, and as a regularization technique, it can reduce the use of the dropout layer.

(2) Replace the 5*5 in the inception module with 2 continuous 3*3 conv to increase the depth of the network and increase the depth of the network by 9 layers, the disadvantage is to increase the 25% weights and 30% of the calculation consumption.

V3:**Rethinking the inceptionarchitecture for computer Vision**

Inception v3 network, mainly on the basis of V2, put forward the volume Integral solution (factorization), the masterpiece is Inceptionv3 version of Googlenet.

The highlights of Inception V3 are summarized below:

(1) Decomposition of the 7*7 into two one-dimensional convolution (1*7,7*1), 3*3 is the same (1*3,3*1), such a benefit, both can accelerate the calculation (redundant computing power can be used to deepen the network), and 1 conv can be split into 2 conv, so that the network depth is further increased, It increases the nonlinearity of the network and designs the 35*35/17*17/8*8 module more finely.

(2) Increase the network width, the network input from 224*224 into 299*299.

v4:**inception-v4,inception-resnet and the Impact of residual Connections on learning**

Inception v4 mainly uses residual connection (residual Connection) to improve the V3 structure, representing as, inception-resnet-v1,inception-resnet-v2,inception-v4

The residual structure of the resnet is as follows, the structure is designed to be very ingenious, simply genius, using the original layer and after 2 volume base feature map to do eltwise. The inception-resnet improvement is to replace the conv+1*1 conv in ResNet shortcut with the Inception module above.

The highlights of Inception V4 are summarized below:

(1) Combining the Inception module and the Residualconnection, the inception-resnet-v1,inception-resnet-v2 is put forward, which makes the training accelerate the convergence faster and the accuracy is higher.

ILSVRC-2012 test results are as follows (single crop),

(2) designed a deeper version of INCEPTION-V4, the effect and inception-resnet-v2 equivalent.

(3) The network input size is the same as V3, or 299*299

**Aggregated residualtransformations for deep neural Networks**

This article presents an updated version of ResNet. Resnext,the Next dimension meaning, because the text proposed another dimension cardinality, and the channel and space dimensions, cardinality dimension mainly represents the number of module in Resnext, Final conclusion

(1) Increase the cardinality ratio to increase the width of the model or depth effect is better

(2) compared with ResNet, the Resnext parameter is less, the effect is better, the structure is more simple, more convenient design

Among them, the left image is a module of ResNet, the right picture is a module of Resnext, is a kind of split-transform-merge thought

**Xception:deeplearning with depthwise separable convolutions**

This article mainly puts forward the xception (Extreme Inception) on the basis of Inception V3, the basic idea is the channel separation convolution (depthwise separable convolution operation). It finally achieves

(1) The model parameters have a small amount of reduction, the reduction is very little, specifically as follows,

(2) The accuracy is higher than inception V3, the accuracy of the imagenet is as follows,

First of all, the operation of convolution, mainly for 2 kinds of transformations,

(1) Spatial dimensions, space transformation

(2) Channel dimension, channels transform

And Xception is in these 2 transformations on the fuss. The difference between Xception and Inception V3 is as follows:

(1) The difference between the sequence of convolution operations

Inception V3 is the first to do 1*1 convolution, and then do 3*3 convolution, so that the channel has been merged, that is, the channel convolution, and then the space convolution, and xception is just the opposite, the first space 3*3 convolution, and then the 1*1 convolution of the channel.

(2) Is there any Relu

This difference is the most different, Inception V3 in each module has relu operation, and Xception in each module is not relu operation.

**Mobilenets:efficientconvolutional Neural Networks for Mobile Vision applications**

Mobilenets is actually the application of exception thought. The difference is that exception article focuses on improving accuracy, while mobilenets focuses on compression models while guaranteeing accuracy.

The idea of depthwiseseparable convolutions is to decompose a standard convolution into a depthwise convolutions and a pointwise convolution. Simple comprehension is the factorization of matrices.

The differences between the traditional convolution and the deep separation convolution are as follows,

Suppose that the input feature map size is DF * DF, the dimension is M, the size of the filter is DK * DK, the dimension is N, and the padding is assumed to be 1,stride 1. The

The original convolution operation, the number of matrix operations that need to be performed is DK DK M. N· DF DF, convolution core parameter is DK DK N· M

Depthwise separable convolutions the number of matrix operations to be performed is DK DK M. DF DF + M. N· DF DF, convolution core parameter is DK DK M+n M

Due to the convolution process, mainly a spatial dimensions reduction, the channel dimension increases the process, namely N>m, so DK DK N· M> DK DK M+n M.

Therefore, depthwiseseparable convolutions has a lot of compression on model size and model calculation, which makes the model fast, the calculation cost is low and the accuracy is good. As shown, where the horizontal axis macs represents the calculation amount of addition and multiplication (multiply-accumulates), the longitudinal axes are accurate.

Depthwise separable convolutions in Caffe, mainly through the convolution layer group operations, Base_line model size is about 16M.

The Mobilenet network structure is as follows:

**shufflenet:anextremely efficient convolutional neural Network for Mobile Devices**

This article on the basis of mobilenet mainly made 1 improvements:

Mobilenet only do 3*3 convolution deepwiseconvolution, and 1*1 convolution is the traditional convolution method, there is a lot of redundancy, shufflenet on this basis, 1*1 convolution did shuffle and group operations, The channel shuffle and pointwise group convolution operations have been implemented, resulting in higher speed and accuracy than mobilenet.

As shown,

(a) is the original framework of the mobilenet, with no information exchange between the group.

(b) Shuffle operation of feature map

(c) is the result of the channel shuffle.

The basic idea of shuffle is as follows, assuming input 2 group, Output 5 Group

| Group 1 | Group 2 |

| 1,2,3,4,5 |6,7,8,9,10 |

Transform to a matrix of matrix 2*5

1 2 3) 4 5

6 7 8) 9 10

Transpose matrix, 5*2 matrix

1 6

2 7

3 8

6 {

5 10

Flatten matrix

| Group 1 | Group 2 | Group 3 | Group 4 | Group 5 |

| 1,6 |2,7 |3,8 |4,9 |5,10 |

The structure of Shufflenet Units is as follows,

(a) is a bottleneck unit with depthwiseconvolution (DWCONV)

(b) On the basis of (a), the Pointwisegroup convolution (GCONV) and channel shuffle

(c) Final shufflenetunit of AVG pooling and concat operations

**mobilenetv2:inverted residuals and Linear bottlenecks**

The main contributions are 2 points:

1, the inverse residual structure (inverted residuals) is proposed.

Since the MobileNetV2 version uses the residual structure, and the residual structure of the resnet has the same wonderful, originated from ResNet, but different.

Since ResNet does not use depthwise conv, the number of feature channels before entering Pointwise Conv is more, so the residual module uses 0.25 times times the dimensionality reduction. and Mobilenet v2 because has depthwise conv, the channel number is relatively few, so the residual difference uses 6 times times the ascending dimension.

summed up, 2 points difference

(1) The residual structure of resnet is 0.25 times times reduced, mobilenet V2 residual structure is 6 times times ascending dimension

(2) The residual structure of the ResNet 3*3 convolution is a general convolution, mobilenet V2 3*3 convolution for depthwise Conv

Mobilenet v1,mobilenet v2 has 2 different points:

(1) V2 version before entering the 3*3 convolution, first carried out the 1*1pointwise Conv Ascension Dimension, and after Relu.

(2) After 1*1, no relu operation is carried out.

2, a linear bottleneck unit (linear bottlenecks) is proposed.

Why no RELU?

Preferred to look at the features of Relu. Relu can map negative values all to 0 with a high degree of nonlinearity. For the test of the paper. When the dimension is relatively low 2,3, the loss of information using Relu is more serious. While the single dimension is higher 15, 30 o'clock, the loss of information is relatively small.

Mobilenet v2 in order to ensure that the information is not a large loss, should be in the residual module to remove the last Relu. Therefore, it is also called a linear module unit.

Mobilenet V2 Network structure:

where T represents the expansion factor of the channel expansion factor,C represents the number of output channels,

n indicates the number of repetitions of the unit, s indicates the sliding step stride

In the bottleneck module, stride=1 and stride=2 modules, respectively, as shown, only the stride=1 module has the residual structure.

Results:

Mobilenet v2 speed and accuracy are better than mobilenet v1

References

http://iamaaditya.github.io/2016/03/one-by-one-convolution/

Https://github.com/soeaver/caffe-model

Https://github.com/facebookresearch/ResNeXt

Https://github.com/kwotsin/TensorFlow-Xception

Https://github.com/shicai/MobileNet-Caffe Https://github.com/shicai/MobileNet-Caffe

Https://github.com/tensorflow/models/blob/master/slim/nets/mobilenet_v1.md

Https://github.com/HolmesShuan/ShuffleNet-An-Extremely-Efficient-CNN-for-Mobile-Devices-Caffe-Reimplementation

Https://github.com/camel007/Caffe-ShuffleNet

Https://github.com/shicai/MobileNet-Caffe

Https://github.com/chinakook/MobileNetV2.mxnet

From inception V1,v2,v3,v4,rexnext to Xception to MOBILENETS,SHUFFLENET,MOBILENETV2