Paper notes aggregated residual transformations for deep neural Networks

Source: Internet
Author: User
Tags compact

This article constructs a basic "block" and introduces a new dimension "cardinality" on this "block" (The letter "C" represents this dimension in a graph and a table). The other two dimensions of the depth network are depth (number of layers), width (width refers to the number of channel of a layer).

First, let's start by understanding how this "Block" is built, as shown (Resnext is the simplified representation of the model presented in this paper)

On the left is the standard residual network "block", the right image is the "block" introduced by the author. What are the advantages of this new block? The author should have been inspired by Inception models, which stated in the paper that "unlike Vgg-nets, the family of Inception models has demonstrated that carefully designe D topologies is able to achieve compelling accuracy with low theoretical complexity ". Further, "The split-transform-merge behavior of Inception modules is expected to approach the representational power of Larg E and dense layers, but at a considerably lower computational complexity ". The simple point is "to reduce the computational complexity of the model while achieving the accuracy of large, compact, deep networks" (this is an effect of this paper pursuit). Figure 1 Right is built using the Split-transform-merge strategy.

Inception models in practical applications there is a very inconvenient place: Each branch of the convolutional core size, size is "custom", the different "Block" is also "custom". If we want to apply this model or design a new network under this framework, then the "customization" feature will introduce many "hyper-parameters". If you have designed your own network or changed your existing network, you will understand that too many "hyper-parameters" are a "disaster" for our design. At this point, if there is not a suitable design strategy, say that the straightforward point is "dependent on".

Inspired by Vgg/resnets's success, the authors summarize the following two design "Block" principles:

    1. "If producing spatial maps of the same size, the blocks share the same hyper-parameters (width and filter sizes)"
    2. "Each time when the spatial map was downsampled by a factor of 2, the width of the blocks was multiplied by a factor of 2"

In addition, all "blocks" have the same topological structure. The author gives some design templates, combined with the above two principles, we can basically build the desired arbitrary network (do not feel the network structure of the design suddenly become a lot easier), the template is shown in the following table

This is not over, the author gives two equivalent representations of the left structure of Figure 1, as shown in

This is a great convenience to our implementation. At this point the group convolution concept introduced by alexnet comes into play (the concept introduced at the time is limited by GPU conditions). In the form of Figure 3 (c), it can be implemented directly in Caffe without changing any source code.

Let's take a look at the effects of this model by experimenting.

It can be concluded from table 4 that even if the complexity is reduced by half, the model can still achieve better experimental results than ResNet-200, and achieves the author's goal of "reducing the computational complexity while achieving the accuracy of the complex and compact depth model".

Summarize:

    • The author requests that "Block" has the same topological structure, and gives the design principle and template of "blcok" extension (through repeating building blocks can draw the network structure), which greatly simplifies the work of network structure design.
    • The same implementation of different equivalent forms of the given, one can deepen our understanding, the second can provide us with the possibility of rapid implementation.
    • This is really a masterpiece Oh.

Paper notes aggregated residual transformations for deep neural Networks

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.