0. contribution points of this article
The main contribution of this paper is to construct a structure called the inverted residual with linear bottleneck. The structure is in contrast to the traditional residual block in which the dimensions are reduced first, but the input low-dimensional feature map is amplified to high-dimensional, then the convolution is done by depthwise convolution, and then a linear convolution is used to map it to the low-dimensional space.
Depthwise separable convolutions principle can refer to this article.
1. inverted residual block Brief introduction
For example, the traditional residual block on the left (a) diagram uses a 1x1 convolution to reduce the dimension of the input feature map, then perform a 3x3 convolution operation, and then use a 1x1 convolution to make the dimension larger. On the right (b) is the structure proposed in this paper, first using 1x1 convolution to the input feature map dimension is larger, and then using the 3x3 depthwise convolution method to do convolution operations, and finally use 1x1 convolution operation to reduce its dimensions. Note that after the 1x1 convolution operation at this time, the Relu activation function is not used, but the linear activation function is used to preserve more characteristic information and to guarantee the model's expressive ability.
The block concrete structure is as follows:
When stride=1, there is a short cut in the block, and when stride=2, the block does not have a short cut.
2. The difference between MobileV2 and V1
Is the difference between MobileNetV2 and MobileNetV1 (original link):
The main differences are two points:
(1) Depth-wise convolution before a 1*1 "expansion" layer, the purpose is to increase the number of channels, to obtain more features;
(2) In the end, the Relu is not adopted, but the linear is designed to prevent relu damage.
MobileNetV2 Related information:
- Preliminary reading of Mobilenet V2 thesis
- [Paper Notes] (Mobilenet V2) Inverted residuals and Linear bottlenecks:mobile Networks for classification, Detection and segmentation
- The discussion about MOBILENETV2
Report:
The following paragraph is excerpted from the second article.
- Intuition
As shown, using the matrix B of MXN to transform the input tensor (2-dimensional, that is, n=2) into m-dimensional space, through the ReLU (Y=relu (Bx)), and then the inverse of the matrix to restore the original tensor (that is, from the M-dimensional space transformation back to 2-dimensional space). It can be seen that when M is small, the amount of the recovered tensor collapses seriously, and M is better when it is larger.
This means that linear transformations such as relu on the tensor representation of the lower dimensions have a significant loss of information. In this paper, a linear transformation is proposed to replace the activation layer of the bottleneck, and in the convolution layer that needs to be activated, the larger m is used to expand the tensor before activation, and the input and output of the whole unit is low-dimensional tensor, while the middle layer uses the higher-dimensional tensor.
2. MobileNetV2 Network
The MOBILENETV2 network structure is as follows:
The performance of the network is as follows:
[Paper Reading] Mobilenetv2:inverted Residuals and Linear bottlenecks