REF:
https://www.zhihu.com/question/56024942
http://m.blog.csdn.net/chaipp0607/article/details/60868689
The main functions of 1*1 convolution are as follows:
1, reduced dimension (dimension reductionality). For example, a picture of 500 * 500 with a thickness of depth 100 1*1 the convolution on 20 filter, then the size of the result is 500*500*20.
2, adding nonlinearity . After the convolution layer passes through the excitation layer, 1*1 's convolution adds the nonlinear excitation (non-linear activation) to the previous layer's learning expression, and enhances the network's expressive ability;
When the 1*1 convolution appears, in most cases it acts as a dimension of the ascending/descending feature, where the dimension refers to the number of channels (thickness), without changing the width and height of the picture.
For example, for example, the result of a convolution is a w*h*6 feature, and it is now necessary to use the 1*1 convolution kernel to Wi Cheng w*h*5, i.e. 6 channels into 5 channels:
The following figure is a w*h*6 characteristic, and the convolution core of 1*1 is marked on the graph, and the thickness of the convolution core itself is also 6 (the picture is ugly. )
Through a convolution operation, the w*h*6 will become w*h*1, so that the use of 5 1*1 convolution core, it is obvious that 5 w*h*1 can be convolution, and then do the channel string operation, the implementation of W*H*5.
Here we first calculate the number of parameters, one after another, 5 convolution cores, each convolution core size is 1*1*6, that is, one has 30 parameters.
We can also use another angle to understand the 1*1 convolution, which can be seen as an all-connected, as shown below:
The first layer has 6 neurons, respectively, is A1-A6, through the full connection into 5, respectively, is B1-B5, the first layer of six neurons to be connected with the back five to achieve full connection, the figure is only a1-a6 connected to the B1 of the schematic, you can see that in the full connection layer B1 is actually the first 6 neurons weighted sum, Right corresponds to the W1-W6, here is very clear:
The 6 neurons in the first layer are actually equivalent to the number of channels in the input feature: 6, while the second layer of 5 neurons corresponds to the new feature channel number after the 1*1 convolution: 5.
W1-W6 is the weight coefficient of a convolutional nucleus, and how to calculate b2-b5, it is clear that 4 cores of the same size are needed.
The final question is that the first layer of the image is different from the neuron, which is a 2D matrix or a number, but even a 2D matrix requires only one parameter (1*1 's core), which is because the weights of the parameters are shared.