Paper notes aggregated residual transformations for deep neural Networks

Last Update:2017-06-10 Source: Internet

Author: User

Tags compact

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This article constructs a basic "block" and introduces a new dimension "cardinality" on this "block" (The letter "C" represents this dimension in a graph and a table). The other two dimensions of the depth network are depth (number of layers), width (width refers to the number of channel of a layer).

First, let's start by understanding how this "Block" is built, as shown (Resnext is the simplified representation of the model presented in this paper)

On the left is the standard residual network "block", the right image is the "block" introduced by the author. What are the advantages of this new block? The author should have been inspired by Inception models, which stated in the paper that "unlike Vgg-nets, the family of Inception models has demonstrated that carefully designe D topologies is able to achieve compelling accuracy with low theoretical complexity ". Further, "The split-transform-merge behavior of Inception modules is expected to approach the representational power of Larg E and dense layers, but at a considerably lower computational complexity ". The simple point is "to reduce the computational complexity of the model while achieving the accuracy of large, compact, deep networks" (this is an effect of this paper pursuit). Figure 1 Right is built using the Split-transform-merge strategy.

Inception models in practical applications there is a very inconvenient place: Each branch of the convolutional core size, size is "custom", the different "Block" is also "custom". If we want to apply this model or design a new network under this framework, then the "customization" feature will introduce many "hyper-parameters". If you have designed your own network or changed your existing network, you will understand that too many "hyper-parameters" are a "disaster" for our design. At this point, if there is not a suitable design strategy, say that the straightforward point is "dependent on".

Inspired by Vgg/resnets's success, the authors summarize the following two design "Block" principles:

"If producing spatial maps of the same size, the blocks share the same hyper-parameters (width and filter sizes)"
"Each time when the spatial map was downsampled by a factor of 2, the width of the blocks was multiplied by a factor of 2"

In addition, all "blocks" have the same topological structure. The author gives some design templates, combined with the above two principles, we can basically build the desired arbitrary network (do not feel the network structure of the design suddenly become a lot easier), the template is shown in the following table

This is not over, the author gives two equivalent representations of the left structure of Figure 1, as shown in

This is a great convenience to our implementation. At this point the group convolution concept introduced by alexnet comes into play (the concept introduced at the time is limited by GPU conditions). In the form of Figure 3 (c), it can be implemented directly in Caffe without changing any source code.

Let's take a look at the effects of this model by experimenting.

It can be concluded from table 4 that even if the complexity is reduced by half, the model can still achieve better experimental results than ResNet-200, and achieves the author's goal of "reducing the computational complexity while achieving the accuracy of the complex and compact depth model".

Summarize:

The author requests that "Block" has the same topological structure, and gives the design principle and template of "blcok" extension (through repeating building blocks can draw the network structure), which greatly simplifies the work of network structure design.
The same implementation of different equivalent forms of the given, one can deepen our understanding, the second can provide us with the possibility of rapid implementation.
This is really a masterpiece Oh.

Paper notes aggregated residual transformations for deep neural Networks

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More