Interpretation (googlenet) going deeper with convolutions

Source: Internet
Author: User

( googlenet ) going deeper with convolutions

Inception structure

Currently the most direct way to improve the DNN effect is increasing their size, where the size includes both depth and width. This approach is the simplest and safest way to obtain a high-quality model when there is enough labeled training data. But often the actual large network will have more parameters, when the number of training data is very small, it is easy to appear overfitting, and large networks need more computing resources. This is the need to change fully connected into sparsely connected:

1x1 size convolution is the main function of dimension reduction, otherwise it will limit the size of the network, 1x1 convolution kernel application allows to increase the network from the depth and width, without the burden of a lot of computation.

In inception 1x1 takes into account the local region,3x3 and 5x5 considering the spatially spread out clusters. So in the lower layer is mainly local information, so 1x1 output number is more, but in the higher layer is often captured features of higher abstraction, so in higher The ratio of 3x3 and 5x5 in the layer should increase.

However, one of the serious problems in this naïve inception is that after the structure, the volume output number increases too much, resulting in a computation blow up problem that occurs only after several stages. Since the pooling can only change the size of the mapping without changing the output num, it is necessary to concatenate three convolution outputs and pooling output when using naïve inception, so when the channel on the previous layer is larger, Output number will be larger. And the 5x5 convolution even when the output number is moderate, when the channel is very large, the computational amount is also huge.

The above problem leads to the inception structure with dimension reduction: This approach stems from the large amount of information that even a low-dimensional embedding can contain a relatively big image patch. However, embedding compression is too dense, but requires the sparsity of this structure, so use 1x1 reduction to reduce the convolution input channel before consuming the computational amount of 3x3 and 5x5 convolution. Use Relu while using reduction to reduce the number of input channel on the one hand and enhance nonlinearity on the other.

The above structure has two advantages: first, you can increase the number of units each stage, only for even if the output number is more, in the next stage will also be through the dimension reduction to reduce the number of channel, So there is no calculation of the explosion. The second is that this structure satisfies the processing of visual signals at different scale and then aggregates them into the next stage so that they can continue to extract features from multiple scales.

googlenet ( A layer)

All convolution including inception use Relu, the training image size is 224x224,rgb three channels, minus the mean value. "#3x3 reduce" and "#5x5 reduce" represent the number of 1x1 reduction layer. "Pool Proj" represents the number of 1x1 projection layers after max-pooling. In the network, there is no full use of inception, the first three layers use the original convolution, this is for technical reasons (memory efficiency during training), but not required.

The last use of the network is average pooling, instead of full connectivity, the results can be increased by 0.6%, but using dropout is necessary, using the linear linear layer is to facilitate the fine-tuning model.

In addition, it is found that the features in the network middle layer are very discriminating, so adding auxiliary classifiers in the middle layer (auxiliary classifiers) would like to obtain a discriminant classifier in the shallow layers, which can enhance the gradient and enhance regularization for the inversion process. In the course of training these losses will be weighted into the total loss (0.3), the detailed structure of the paper.

Training Methodology

The training adopts random gradient descent, impulse momentum:0.9, fixed learning rate 8 epochs decrease 4%. The training strategy has been changing, refer to article some improvements on the deep convolutional neural network based image classification.

Tips for using improved accuracy in testing

    1. Integration method: Trained 7 googlenet models of the same structure, initialization method, the same learning rate adjustment strategy, image Adoption (patch) and random input order is not the same.
    2. Aggressive cropping method: Many of the graphs used in ILSVRC are rectangles, non-squares. The image is resize into 4 kinds of scales, so that the shortest sides are 256,288,320 and 352 respectively, and then the squares square image is intercepted from left, middle and right respectively (if portrait image, it is divided into upper, middle and lower), Then for each square image from 4 corners and center intercept 224x224 Square images, and the original square image resize into 224x224, in the above 5 to do mirror transformation. So such an image can get 4x3x6x2=144 a crops. Reference: Imagenet classification with deep convolutional neural networks
    3. The Softmax probability of multiple crops has the best effect on average.

Interpretation (googlenet) going deeper with convolutions

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.