Neural Network (10) googlenet

Source: Internet
Author: User

googlenet incepetion V1

This is the earliest version of Googlenet, appearing in the 2014 going deeper with convolutions. It is called "googlenet" rather than "googlenet", and the article says it is to salute the early lenet.

Introduced

Deep learning and the rapid development of neural networks, people are no longer focused on more hardware, larger datasets, larger models, but more attention to new idea, new algorithms and model improvements.

In general, the most straightforward way to improve network performance is to increase network depth and width, which means a huge number of parameters. However, a large number of parameters prone to overfitting can also greatly increase the computational capacity .

The paper thinks that the basic method to solve the two shortcomings is to convert all-connected and even general convolution into sparse connection. On the one hand, the connection of the real biological nervous system is sparse, on the other hand, 1 shows that: for large-scale sparse neural networks, the statistical characteristics of the activated values can be analyzed and the highly correlated outputs are clustered to build an optimal network on a per-layer basis. This indicates that a bloated sparse network can be simplified without sacrificing performance. while mathematical proofs have strict conditionality, the Hebbian guidelines strongly support this: fire together,wire together.

Earlier, in order to break the network symmetry and improve learning ability, the traditional network has used random sparse connection. However, the computational efficiency of the computer software and hardware is very poor for the non-uniform sparse data, so the full-connection layer is re-enabled in Alexnet in order to better optimize the parallel operation.

So the question now is whether there is a way to keep the network structure sparse and to take advantage of the high computational performance of dense matrices . A large number of literatures indicate that sparse matrix clustering can be used to improve computing performance, and the structure named inception is proposed to achieve this goal.

Objective

The main idea of Inception structure is how to use dense components to approximate the optimal local sparse structure (the feature is too scattered).
The author first proposes such a basic structure:

To do the following instructions:
1. The use of different size of convolution kernel means that different size of the field of perception, the final stitching means the fusion of different scale features;
2. The convolution kernel size is 1, 3, and 5, mainly for easy alignment. After setting the convolution step stride=1, as long as set pad=0, 1, 2 respectively, then convolution can get the same dimension characteristics, then these features can be directly spliced together;
3. The article said many places have shown that pooling is very effective, so inception inside also embedded.
4. The more the network comes to the back, the more abstract the feature, and the greater the sensitivity of each feature, so the proportion of 3x3 and 5x5 convolution increases as the number of layers increases.

However, the use of 5x5 convolution cores will still result in a huge amount of computation. for this reason, the article uses the 1x1 convolution kernel to reduce dimension by reference to NIN2.
For example, the output of the previous layer is 100x100x128, after a 5x5 convolution layer with 256 outputs (stride=1,pad=2), the output data is 100x100x256. The parameters of the convolution layer are 128x5x5x256. If the previous output passes through a 1x1 convolution layer with 32 outputs and then passes through a 5x5 convolution layer with 256 outputs, the final output data is still 100x100x256, but the number of convolution parameters has been reduced to 128x1x1x32 + 32x5x5x256, about 4 times times less.

Specifically improved inception module such as:

googlenet

The overall structure of the googlenet is as follows:

To do the following instructions:
1. Obviously googlenet adopts the modular structure, which is convenient to add and modify;
2. The network finally adopted the average pooling to replace the full connection layer, the idea from Nin, it turns out that TOP1 accuracy can be increased by 0.6%. However, the actual in the end or add a full connection layer, mainly for the convenience of everyone finetune;
3. Although the full connection is removed, dropout is still used in the network;
4. To avoid gradients disappearing, the network added 2 additional auxiliary Softmax for forward conduction gradients. The article says that the loss of these two auxiliary classifiers should add a attenuation factor, but the model in Caffe does not have any attenuation. In addition, the two additional softmax will be removed during the actual test.

is a relatively clear structure diagram:

Here in the middle of the network to add Softmax is to output intermediate results in the middle, early gradient update, reduce the amount of computation.

googlenet Inception V2Objective

The building of deeper networks has become mainstream, but the size of the model has also made computing more inefficient. Here, the article tries to find a way to expand the network while maximizing the computational performance .

First of all, the Googlenet V1 appeared in the same period, the performance and the approximate only vggnet, and both in the image classification of many areas have been successfully applied. In contrast, Googlenet's computational efficiency is significantly higher than that of Vggnet, which is only about 5 million parameters, equivalent to Alexnet 1/12 (googlenet caffemodel about 50M, The Vggnet Caffemodel is more than 600M).

Googlenet's performance is good, but if you want to build a larger network by simply zooming in on the inception structure, you'll immediately increase your compute consumption. In addition, in the V1 version, the article does not give a clear description of the considerations for building inception structures. Therefore, in the article, the authors first give some of the common Criteria and optimization methods which have been proved to be effective in amplifying networks . These guidelines and methodologies apply but are not limited to inception structures.

Design principles

1. Avoid expressing bottlenecks, especially where the network is in front of you. It is obvious that the process of flow forward propagation cannot pass through a highly compressed layer, that is, to express bottlenecks. The width and height of the map from input to output,feature will gradually become smaller, but not all of a sudden it becomes small. For example, you come up with a kernel = 7, stride = 5, which is obviously inappropriate.
In addition, the output of the dimension channel, generally will gradually increase (num_output per layer), otherwise the network will be difficult to train. (The feature dimension does not represent the amount of information, but as a means of estimating)

2. High-dimensional features are easier to handle. high-dimensional features are easier to differentiate and accelerate training.

3. You can spatially converge on low-dimensional embedding without worrying about losing a lot of information. For example, before the 3x3 convolution, the input can be reduced to a dimension (reduce computational capacity) without serious consequences. If the information can be simply compressed, then the training will be accelerated.

4. Balance the width and depth of the network.

Decomposition of large convolution nuclei into multiple small convolution cores

Large convolution cores can lead to greater susceptibility, but it also means that more parameters, such as the 5x5 convolution kernel parameter, are 25/9=2.78 times the 3x3 convolution kernel. For this reason, the authors suggest that a small network of 2 contiguous 3x3 convolution layers (stride=1) can be used instead of a single 5x5 convolution layer (while maintaining the sensing range while reducing the number of parameters) such as:

Then there will be 2 questions:

1. Will this substitution result in decreased expression capacity?
There are a lot of experiments behind it to show that there is no lack of expression.

2. Do you want to activate the 3x3 convolution after it has been added?
The authors also conducted a comparative experiment to show that adding non-linear activation would raise high performance

From the above, the large convolution nucleus can be replaced by a series of 3x3 convolution cores, which could be broken down a little bit. This paper considers the NX1 convolution kernel.
Replace the 3x3 convolution as shown:

Therefore, the convolution of any nxn can be replaced by the convolution of the 1xn convolution followed by NX1. In fact, the authors found that the use of this decomposition in the early days of the network was not good, and that it would be better to use the effect on a moderate-sized feature map . (For MXM size feature map, M is recommended between 12 and 20).

Summarized as:

(1) Figure 4 is the inception structure used in Googlenet V1;

(2) Figure 5 is a 3x3 convolution sequence to replace the large convolution core;

(3) Figure 6 is a nx1 convolution to replace the large convolution core, here set N=7 to deal with 17x17 size feature map. The structure is formally used in the Googlenet V2.

Neural Network (10) googlenet

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.