resnext-aggregated residual transformations for Deep neural Networks

Source: Internet
Author: User
Tags first row

"Aggregated residual transformations for Deep neural Networks" is saining Xie and other people in 2016 in the public on the arxiv:
Https://arxiv.org/pdf/1611.05431.pdf

Innovation Point
1. The use of group convolution on the basis of traditional resnet, without increasing the number of parameters under the premise of obtaining a stronger representation ability

Named
This paper presents a resnet improved network--resnext, named Resnext, because a new parameter--cardinality is proposed, and the author thinks that cardinality is a network model measured from another dimension, so it is named Resnext ( Suggesting the next dimension) one, Introduction

The task of image recognition has shifted from the previous "representation" (feature engineering) to designing a new network model to get better. The design network has too many super parameters, such as (width,filter-size, strides, etc.). Vgg uses a simple strategy that works: stacking building blocks of the same shape, which uses the same block repeatedly. The author believes that this strategy can reduce the over-adapting of the network. The advantage of the inception series is that the topology of the network is carefully designed, and the most important feature is the split-transform-merge idea.

Advantages of the split-transform-merge operation: The split-transform-merge behavior of Inception modules is expected to approach the R Epresentational power of the large and dense layers, but at a considerably lower-computational. Second, Related work

A brief introduction of some about multi-branch convolutional networks grouped convolutions;compressing convolutional. Ensembling's work, I think the most important is this sentence:

But we argue that it's imprecise to view my as ensembling, because the members of IS aggregated are Ly, not independently.
(Everyone's own experience ~ haha) Three, method

3.1 Template
For reference to Vgg and ResNet, design block follows the following two rules:
The first is: if the block output of the same space size, then these blocks have the same super parameters (width and filter size), that is, the volume kernel size, the same number of convolution kernel

The second is: whenever the feature map resolution is reduced by half, the number of channels increases by one times. This rule ensures that the computational complexity of each block is almost the same.

3.2 Revisiting simple neurons

The authors use a computational model of a single neuron to analyze splitting, transforming, and aggregating.
As shown in the following figure, the input of a neuron is a D-dimensional vector, and the output of the neuron is the x,x of x and Weight W, namely: ∑ci=1wixi \sum _{i=1}^{c}w_{i}x_{i}

The operation of a neuron can be divided into (1) splitting (2) transforming (3) aggregating
(1) Splitting: input x is divided into D parts
(2) Transforming: For the divided part of the scaling, that is, with the WI to multiply
(3) Aggregating: Finally, all the results are aggregated (added) up to get the final output: ∑ci=1wixi \sum _{i=1}^{c}w_{i}x_{i}

3.3 Aggregated Transformations

This paper introduces the splitting-transforming-aggregating of single neuron, and it must be extended.

First, an input x is splitting into C, and splitting is transforming after it, using Ti (x
) to represent the final sum, or aggregating. The specific formula is as follows: F (x) =∑ci=1ti (x) f (x) =\sum _{i=1}^{c}t_{i} (x)

Here c means cardinality, cardinality means the size of the set of transformations, is a parameter for resnext such a network, this parameter can be understood as the group number of groups, such as the experiment in this article c=32

The splitting-transforming-aggregating operation is expressed by formula F (x) =∑ci=1ti (x) f (x) =\sum _{i=1}^{c}t_{i} (x), and the output of one block of Resnext is: y=x+ ∑32i=1ti (x) y=x+\sum _{i=1}^{32}t_{i} (x), the block schematic is shown in the following illustration:

The first x represents the identity mapping, which is the rightmost line; the rest is the splitting-transforming-aggregating operation of X. Here is the c=32, so 32 items are added.

As shown in the figure above, a block of Resnext is still too "bloated", in fact, can be more concise, as shown in the following figure, can eventually be changed to the following figure (c)

First look at figure (a), the first row 32 (256,1*1,4) represents splitting operation, the second line of 32 (4,3*3,4) represents transforming;

The third line and the + number indicate aggregating;

Figure (b) is to integrate the aggregating, first the transforming obtained feature map concatenate, and then use (128,1*1,256) the operation of the output;

(c) Further "simplification" on the basis of figure (b), placing the splitting operation in the transforming, so that the input can be splitting prepared with one (256,1*1,128), through the first line of diagram (c) (256,1* 1,128 after the 128 feature map, and then through the operation of the group convolution implementation of spliiting, a total of 32 group, each group to 4 feature map operation;

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.