"Minimalist notes" encoder-decoder with atrous separable convolution for semantic Image segmentation
Article core: 1. The deeplabv3+ is proposed, and the Encoder-decoder structure is used (in fact, it is the common sampling and sampling of semantic segmentation); 2. The network can control the resolution of encoder feature arbitrarily with the hole convolution, and has better scale adaptability; 3. The modified Xception backbone network is adopted, and depthwise separable convolution is adopted in ASPP (with hole convolution module) and decoder module.
In a word, before the various article innovation point of the stack, plus a large number of structural tuning parameters.
The Encoder-decoder and the belt hole convolution are not spoken, the emphasis is on how it merges. As shown in the decoder section, it is a bilinear sampling x4, and the shallow layer through the 1x1conv, has the same spatial resolution feature map concatenate, and then a 3x3 conv, is a stage, So many times to return to the original size.
On the revision of Xception, 1. more layers; 2. The max pooling part is replaced with the depthwise convolution of the stride=2; 3. Both BN and Relu are added after each depthwise 3x3 convolution.