"Aggregated residual transformations for Deep neural Networks" is saining Xie and other people in 2016 in the public on the arxiv:
Https://arxiv.org/pdf/1611.05431.pdf
Innovation Point
1. The use of group convolution on the basis of traditional resnet, without increasing the number of parameters under the premise of obtaining a stronger representation ability
Named
This paper presents a resnet improved network--resnext, named Resnext, because a new parameter--cardinality is proposed, and the author thinks that cardinality is a network model measured from another dimension, so it is named Resnext ( Suggesting the next dimension) one, Introduction
The task of image recognition has shifted from the previous "representation" (feature engineering) to designing a new network model to get better. The design network has too many super parameters, such as (width,filter-size, strides, etc.). Vgg uses a simple strategy that works: stacking building blocks of the same shape, which uses the same block repeatedly. The author believes that this strategy can reduce the over-adapting of the network. The advantage of the inception series is that the topology of the network is carefully designed, and the most important feature is the split-transform-merge idea.
Advantages of the split-transform-merge operation: The split-transform-merge behavior of Inception modules is expected to approach the R Epresentational power of the large and dense layers, but at a considerably lower-computational. Second, Related work
A brief introduction of some about multi-branch convolutional networks grouped convolutions;compressing convolutional. Ensembling's work, I think the most important is this sentence:
But we argue that it's imprecise to view my as ensembling, because the members of IS aggregated are Ly, not independently.
(Everyone's own experience ~ haha) Three, method
3.1 Template
For reference to Vgg and ResNet, design block follows the following two rules:
The first is: if the block output of the same space size, then these blocks have the same super parameters (width and filter size), that is, the volume kernel size, the same number of convolution kernel
The second is: whenever the feature map resolution is reduced by half, the number of channels increases by one times. This rule ensures that the computational complexity of each block is almost the same.
3.2 Revisiting simple neurons
The authors use a computational model of a single neuron to analyze splitting, transforming, and aggregating.
As shown in the following figure, the input of a neuron is a D-dimensional vector, and the output of the neuron is the x,x of x and Weight W, namely: ∑ci=1wixi \sum _{i=1}^{c}w_{i}x_{i}
The operation of a neuron can be divided into (1) splitting (2) transforming (3) aggregating
(1) Splitting: input x is divided into D parts
(2) Transforming: For the divided part of the scaling, that is, with the WI to multiply
(3) Aggregating: Finally, all the results are aggregated (added) up to get the final output: ∑ci=1wixi \sum _{i=1}^{c}w_{i}x_{i}
3.3 Aggregated Transformations
This paper introduces the splitting-transforming-aggregating of single neuron, and it must be extended.
First, an input x is splitting into C, and splitting is transforming after it, using Ti (x
) to represent the final sum, or aggregating. The specific formula is as follows: F (x) =∑ci=1ti (x) f (x) =\sum _{i=1}^{c}t_{i} (x)
Here c means cardinality, cardinality means the size of the set of transformations, is a parameter for resnext such a network, this parameter can be understood as the group number of groups, such as the experiment in this article c=32
The splitting-transforming-aggregating operation is expressed by formula F (x) =∑ci=1ti (x) f (x) =\sum _{i=1}^{c}t_{i} (x), and the output of one block of Resnext is: y=x+ ∑32i=1ti (x) y=x+\sum _{i=1}^{32}t_{i} (x), the block schematic is shown in the following illustration:
The first x represents the identity mapping, which is the rightmost line; the rest is the splitting-transforming-aggregating operation of X. Here is the c=32, so 32 items are added.
As shown in the figure above, a block of Resnext is still too "bloated", in fact, can be more concise, as shown in the following figure, can eventually be changed to the following figure (c)
First look at figure (a), the first row 32 (256,1*1,4) represents splitting operation, the second line of 32 (4,3*3,4) represents transforming;
The third line and the + number indicate aggregating;
Figure (b) is to integrate the aggregating, first the transforming obtained feature map concatenate, and then use (128,1*1,256) the operation of the output;
(c) Further "simplification" on the basis of figure (b), placing the splitting operation in the transforming, so that the input can be splitting prepared with one (256,1*1,128), through the first line of diagram (c) (256,1* 1,128 after the 128 feature map, and then through the operation of the group convolution implementation of spliiting, a total of 32 group, each group to 4 feature map operation;