resnext--compared to ResNet, the same number of parameters, the result is better: a 101-layer Resnext network, and 200 layers of ResNet accuracy is similar, but the calculation of only half of the latter

Last Update:2018-05-13 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Tag: Top car means CTI chooses image network Pytorch thought

from:53455260

Background

Paper Address: Aggregated residual transformations for deep neural Networks
Code Address: GitHub
This article on the arxiv time is almost the CVPR deadline, we first understand that is the CVPR 2017, the author includes the familiar RBG and He Keming, moved to Facebook after the code is placed on the Facebook page, the code also from the ResNet Caffe changed into a torch:)

Contribution

Concise network structure, modular
Minimum number of parameters requiring manual adjustment
Compared to ResNet, the same number of parameters, the result is better: a 101-layer Resnext network, and 200 layers of ResNet accuracy is similar, but the calculation of only half of the latter

Method

Put forward to cardinality concept, in the left and right have the same number of parameters, where the left side is a block of ResNet, the resnext in the rights of each branch is identical, the number of branches is cardinality. It draws on the split-transform-mergeof Googlenet, and the repeat layerof vgg/resnets.
The so-called Split-transform-merge is the way to control the number of cores and reduce the number of parameters by adding a 1x1 network layer to both sides of a large convolution core layer. Learn from Fei-fei Li's cs231n courseware 1:

And the repeat layer is to repeat the same several layers, the precondition is that the output of these layers have the same dimension, generally in different repeat layers between the use of strip=2 dimensionality, while the number of nuclear functions multiplied by 2.

This article network parameters

For example, inside brackets is split-transform-merge, which controls the repeat layerby the value of cardinality (C).
The output is halved in the upper and lower adjacent squares, and the number of convolution cores after the comma in the brackets is constantly doubled.

Equivalence mode

The model on the right of figure one has two equivalent models, the rightmost is the grouped convolution proposed in the AlexNet, the same layer's width grouping convolution, and the final author uses the rightmost model, which is more concise and faster to train.

Model parameters

When adjusting the cardinality, how to guarantee the same number of parameters as ResNet? In this paper, the number of the second-layer convolution cores in the middle of split-transform-merge is adjusted.

Experiment

Basic and ResNet similar, augmentation, and various parameters

Conclusion

Resnext and ResNet in the same number of cases, training the former error rate is lower, but the rate of decline is similar
With the same parameters, increasing the cardinality is more effective than increasing the number of volumes.
The 101-layer Resnext is better than the 200-storey ResNet.
Several Sota models, with the highest accuracy rate of resnext

Http://cs231n.stanford.edu/slides/winter1516_lecture11.pdf?

Deep learning--the classification of resnextfrom:https://zhuanlan.zhihu.com/p/32913695 Fan Shing. Xfanplus Computer Vision/deep Learning (CV/DL) in Reading

Paper: aggregated residual transformations for deep neural Networks

Saining Xie, Ross Girshick, Piotr Dollár, Zhuowen Tu, kaiming He

ImageNet TOP5 Error Rate: 3.03%

Central idea: Inception over there to get ResNet inception-resnet, this ResNet also took inception to make a resnext, mainly is the single-way convolution into multiple branches of the multi-path convolution, but a lot of grouping, structure consistent, To group the convolution.

The paradigm of convolution

The author first summed up the pattern of inception: Split-transform-merge.

As shown, the inputs are assigned to multiple paths, and then each path is converted, and the results of all the branches are fused at the end.

Need to mention the shortcomings of inception, too complex, the traces of artificial design is too heavy.

Then, standing higher, the standard paradigm of neural networks is analyzed to conform to the Split-transform-merge model. Take one of the simplest common neurons (for example, each neuron in FC):

is to assign the M-elements of the input to the M-branches, weighting the weights, then the merge sum, and finally an activation.

Thus, a common unit of neural network can be expressed by the following formula:

In combination with the identity mapping of ResNet, the structure with residual can be represented by the following formula:

The above transformation T can be any form, there is a total of C independent transformation, the author called C as the cardinality, and pointed out that cardinality C for the effect of the result is more important than width and depth.

Basic structure

For example, on the left is the basic structure of the resnet, and the right is the basic structure of Resnext:

Recall the above formula, you can see, next to the residual connection is the formula in the X directly connected, and then the rest is 32 groups of independent same structure of the transformation, and finally fusion, in line with the split-transform-merge pattern.

The author further points out that Split-transform-merge is the standard paradigm of the general neural network, as mentioned earlier, that basic neurons conform to this paradigm, as shown in:

A is the Resnext basic unit, if the output of 1x1 merged together to obtain equivalent network B has and inception-resnet similar structure, and further the input 1x1 also merged together, get the equivalent network C and the channel Packet convolution network has a similar structure.

Here, can see the ambition of this article is very big, equivalent to say, inception-resnet and channel Packet convolutional network, all just resnext this paradigm of special form, further explain the universality and validity of Split-transform-merge, and a higher degree of abstraction, a bit more essential.

Resnext

Then there is the Resnext specific network structure.

Similar to ResNet, the author chooses a very simple basic structure, each group of C different branches of the same simple transformation, the following is the configuration list of ResNeXt-50 (32X4D), 32 refers to enter the network of the first RESNEXT basic structure of the number of groups C (that is, the base) is 32, 4d means that depth is 4 of the number of channels per packet (so the first basic structure input channel number is 128):

You can see that ResNet-50 and ResNeXt-50 (32X4D) have the same parameters, but with higher precision.

Specifically, because 1x1 convolution can be merged, it merges, the code is simpler and more efficient.

The number of parameters is constant, but the effect is too good, this time there will usually be a "but" ... However, because of the grouping, multiple branches are processed separately, so intersect with the original whole convolution, the hardware execution efficiency will be lower, Training ResNeXt-101 (32X4D) each mini-batch to 0.95s, and ResNet-101 as long as 0.70s, although the intrinsic calculation is the same, through the bottom-level optimization because it can narrow the gap. The good news is, look at the update notes for the latest cuDNN7:

grouped convolutions for models such as Resnext and Xception and CTC (connectionist temporal classification) loss LA Yer for temporal classification

It seems to have been optimized for the sub-convolution and I haven't tested it yet, but I guess the efficiency should be improved a lot.

As to the specific effect, ResNeXt-101 (32X4D) size and Inception V4 equivalent, the effect is slightly worse, but inception-v4 slow ah = =,resnext-101 (64x4d) than Inception-resnet v2 a little bigger, The accuracy is quite or slightly lower.

The above comparison is not very rigorous, and training methods, the way to achieve a great relationship between the actual use of the difference is not small, has not found a very full benchmark can be accurately compared. But the results here can be used as a reference.

Resnet-inception v2 may work better thanks to a sophisticated network structure, but the Resnext network structure is simpler to prevent overfitting of a particular data set. And a simpler network means it's easier to customize and modify when it comes to your own tasks.

Finally, to mention a gossip, resnet author's paper was inception V4 that argue said residual connection can improve training convergence speed, but for precision not too much help, and then this resnext immediately godless back, said not to drop several points , the optimization of the network is helpful ...

Summing up: Split-transform-merge model is a very general abstraction of the author of a very high standard paradigm, and then resnext on this paradigm of a simple standard implementation, simple and efficient AH.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

resnext--compared to ResNet, the same number of parameters, the result is better: a 101-layer Resnext network, and 200 layers of ResNet accuracy is similar, but the calculation of only half of the latter

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

resnext--compared to ResNet, the same number of parameters, the result is better: a 101-layer Resnext network, and 200 layers of ResNet accuracy is similar, but the calculation of only half of the latter

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support