The role of 61x1 convolution nucleus? (with examples) __ Depth study

Source: Internet
Author: User

Table of Contents: part I: Source partial II: Applications, role III: effects (dimensionality reduction, ascending dimension, trans-channel interaction, increasing of nonlinearity)--from the perspective of fully-connected layers


First, Source: [1312.4400] Network in Network (if 1x1 convolution is followed by a normal convolution layer, the network in network structure can be implemented with the activation function.) )

second, the application: The residual module in the inception and ResNet in Googlenet

third, the role:

1, dimensionality reduction (reduce parameters)

example of the 3a module in 1:googlenet

The input feature map is 28x28x192

1x1 convolution channel is 64

3x3 Convolution channel is 128

5x5 convolution channel is 32

Left Tou kernel parameters: 192x (1x1x64) +192x (3x3x128) + 192x (5x5x32) = 387072

The right-hand graph adds a 1x1 convolution layer with a channel number of 96 and 16 respectively before the 3x3 and 5x5 convolution layers, so the convolution kernel parameters become:

192X (1x1x64) + (192x1x1x96+ 96x3x3x128) + (192x1x1x16+16x5x5x32) = 157184


At the same time, after adding 1x1 convolution layer behind the parallel pooling layer, the output feature map number can be reduced (feature map size refers to W, H is the share weight sliding window,feature map number is channels)

Left Figure feature Map number: 128 + + + (pooling feature map unchanged) = 416 (if each module is the case, the network output will become larger)

Right Figure feature Map number: 128 + + + (pooling followed by a channel of 32 1x1 convolution) = 256

Googlenet using 1x1 convolution dimensionality reduction, the more compact network structure, although there are 22 layers, but the number of parameters is only 8 layers of alexnet one-twelveth (of course, a large part of the reason is to remove the full connection layer)



the residual module in example 2:resnet

Suppose the feature map on the previous layer is w*h*256, and the final output is 256 feature map

Left-Hand operand: w*h*256*3*3*256 =589824*w*h

Right-hand operand: w*h*256*1*1*64 + w*h*64*3*3*64 +w*h*64*1*1*256 = 69632*w*h, the left parameter is about 8.5 times times the right side. (Achieve dimension reduction, reduce parameters)


2, Ascending dimension (using the least parameters to broaden the network channal)

Example: in the previous example, not only is there a 1*1 convolution kernel at the input, there is also a convolution kernel at the output, and the channel of the 3*3,64 convolution core is 64, just add a 1*1,256 convolution kernel, use only the 64* 256 parameters can widen the network channel from 64 to four times times to 256.

3. Cross-channel information interaction (Channal transform)

Example: Using 1*1 convolution kernel, the operation of descending and ascending dimension is actually the linear combination of information between channel, and the 3*3,64channels of the convolution kernel adds a 1*1,28channels convolution nucleus, which becomes the 3*3,28channels convolution nucleus. The original 64 channels can be understood as a trans-channel linear combination into 28channels, which is the information interaction between channels.

Note: Only linear combinations are made on channel dimensions, W and h are sliding windows with shared weights

4. Increase the non-linear characteristic

1*1 convolution kernel can greatly increase the non-linear characteristics (using the Non-linear activation function of the latter) to keep the feature map scale unchanged (that is, no loss resolution), and make the network deep.


Note: After a filter corresponds to the convolution to get a feature map, different filter (different weight and bias), convolution after the different feature map, extract different features, get the corresponding specialized neuro.

Iv. to understand the 1*1 convolution kernel from the angle of fully-connected layers

Consider it as a fully connected layer


The 6 neurons on the left, respectively, are A1-a6, and become 5 after the full connection, respectively, of the B1-B5

6 neurons on the left are equivalent to the channels:6 in the input feature.

A new feature of the 5 neurons on the right, equivalent to the 1*1 convolution channels:5

W*h*6 on the left can be fully connected via 1*1*5 convolution cores.


In convolutional Nets, there is no such thing as "fully-connected layers". There are only convolution layers with 1x1 convolution kernels and a full connection table. –yann LeCun



Reference: one by one [1 x 1] convolution-counter-intuitively What is the effect of the useful/1x1 convolution nucleus.

Understanding of the 1*1 convolution kernel/How to understand the 1*1 convolution in convolution neural networks

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.