Deep interpretation of Googlenet's inception V1

Last Update:2018-05-02 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The purpose of googlenet design

Googlenet is designed to improve the utilization of computing resources within the network.

Motivation

The larger the network, the more parameters of the network, especially when the data set is very small, the network is more prone to fit. Another drawback of the larger network is the dramatic increase in the utilization of computing resources. For example, if two convolution layers are in series, any one of their filter quantities will cause two waste of computing resources. The solution to these two problems is to replace the full connection with a sparse connected structure. In order to break the symmetry of the network and improve the learning ability, the traditional network uses random sparse connection, but the computing efficiency of the computer hardware is very poor for the non-uniform sparse connection, so the full connection is enabled in alexnet so as to better optimize the parallel operation. The incpetion structure is therefore proposed, which can not only keep the network structure sparse, but also utilize the high performance computation of dense matrix.

Inception Structure

1. The structure uses a different size convolution core, the smaller convolution can extract local features, the larger convolution can be asymptotically global features, and different sizes of convolution have different senses, can improve the robustness of the network, finally through the concatenate merge these characteristics.

2. The 1x1,3x3,5x5 convolution is to facilitate alignment, assuming that the convolution core step is 1, then only pad=0, 1, 2, convolution can be obtained after the same dimension of the feature map, you can directly splicing them together.

3. The maximum pooling is also added to the structure, and the maximum pooling is the output of the previous layer, which should be provided to provide transfer rollover invariance.

4. At the higher levels of the network, the more abstract the features and the greater the perception of the network, the more often the number of 3x3 and 5x5 convolution will be increased, and a large number of parameters will be introduced. When the pooling unit is introduced, the parameters are more, because the number of output filters equals the number of filters in the previous phase, which can lead to unavoidable parameter expansion.

In order to solve the problem of too many parameters, 1x1 convolution is introduced in inception. 1x1 convolution has the following two benefits:

(1) Most importantly, 1x1 convolution plays a role in dimensionality attenuation, removing the computational bottleneck. Assuming that the original inception module's input characteristics are mapped to 28x28x192, where the 1x1 convolution channel number is 64,3x3 convolution channel number is 128,5x5 convolution channel number is 32, then the convolution core parameter is 1x1x192x64+3x3x192x128+ 5x5x192x32, while a 1x1 convolution with a channel number of 96 and 16 is added to the B structure, the parameter is 1x1x192x64+ (1x1x192x96+3x3x96x128) + (1x1x192x16+5x5x16x32), and the parameter is reduced to the original 1/3.

(2) A non-linear activation function, i.e. Relu, is usually introduced after the convolution of 1x1, which is equivalent to introducing more nonlinear transformations and improving the representation ability of the network.

googlenet

It is shown from the figure that googlenet is stacked by multiple inception modules, it has a depth of 22 layers, and the network finally uses an average pooled layer instead of an all-connected layer, which has the advantage of reducing the parameters to prevent overfitting. And in order to avoid gradients disappearing, the network added 2 additional auxiliary Softmax for the forward propagation gradient, in which two softmax will be removed during the test phase. As for why the inception module was not stacked at the outset, but rather with a few convolution layers, it is because in the early days of the network, the feature mapping scale of the output is usually very large, and the use of separate convolution layer and pooling layer can reduce the size of the feature map, reduce the parameters and prevent overfitting.

Deep interpretation of Googlenet's inception V1

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Deep interpretation of Googlenet's inception V1

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support