Very Deep convolutional Networks for large-scale Image recognition
Reprint Please specify:http://blog.csdn.net/stdcoutzyx/article/details/39736509
This paper is in September this year's paper [1], relatively new, in which the views of the convolution neural network to adjust the parameters of a great guide, a special summary.
About convolutional Neural Networks (convolutional neural Network, CNN), the author will explain the composition, if the reader is impatient or can be used Google Baidu a bit.
The following is the paper's notes. The author first tries to extract the key notes of a paper. If there are deficiencies please read the original note.
1. Main contribution
- In the case that the total number of parameters is not changed, the effect of CNN is changed with the addition of the number of layers.
- The method in the paper won the second place in the ILSVRC-2014 competition.
- Ilsvrc--imagenet Large-scale Visual recongnition challenge
2. CNN Improvement
After the appearance of the paper [2], there are many ways to improve the structure of CNN proposed. Like what:
- Use smaller receptive window size and smaller stride of the first convolutional layer.
- Training and testing the networks densely over the whole image and over multiple scales.
3. CNN Configuration Principals
- The input from CNN is a 224x224x3 image.
- The only preprocessing before the input is the minus mean value.
- 1x1 cores can be viewed as linear transformations of input channels.
- Use a larger convolution kernel size of 3x3.
- Max-pooling is typically done on a 2x2 pixel form with Stride 2.
- In addition to the last fully connected classification layer, rectification non-linearity (RELU) is required for the other layers.
- You do not need to join the local Response normalization (LRN). Because it does not improve the effect, it brings the computational cost and the memory cost. Add calculation time.
4. CNN Configuration
- The number of channels (width) of the convolution layer is doubled from 64, each over a max-pooling layer, to 512.
- Use filters with 3x3 size throughout the whole net, because a stack of the 3x3 conv layers (without spatial pooling in bet Ween) has a effective receptive of 5x5, and three a stack of 3x3 conv layers have a receptive of 7x7, and so on.
- Why use a three-layer 3x3 instead of a layer of 7x7?
- First. Three layer is more discriminating than the first layer;
- Second. If the same number of channels C, then the three-layer 3x3 the number of parameters is 3x (3x3) CXC=27CXC, the first layer of 7x7 the number of 7X7XCXC=49CXC. Greatly reduced the number of parameters.
Convolution cores using 1*1 can add nonlinear discriminant functions without affecting the field of view. The core can be used in the network structure of "networking" and can refer to article 12.
Figure 1 is a neural network structure used in the experiment to see that the number of CNN layers from 11 to 19, the structure conforms to the above summary points. Figure 2 is the total number of individual CNN references. Can see. Despite the depth of the change. But the number of participants changed little.
Figure1 convnet Configuration
Figure2 Parameter Num 5. Training
6. Testing
Test use such as the following steps:
- The first is proportional scaling, and the short side length q is greater than 224. The meaning of Q is the same as S. Just S is the training set. Q is the number of parameters in the test set.
Q does not have to be equal to S. Conversely, for a s, using multiple Q values to test, and then averaging will make the effect better.
- Then, the test data are tested according to the method of document 16.
- Converts an all-connected layer to a convolution layer, and the first full-connection is converted to a 7x7 convolution. The second one converts to a 1x1 convolution.
- Resulting net is applied to the whole image by convolving the filters of each layer with the full-size input. The resulting output feature map is a class score map with the number channels equal to the number of classes, and the Var Iable spatial resolution, dependent on the input image size.
- Finally, class score map is spatially averaged (sum-pooled) to obtain a fixed-size vector of class scores of the image.
7. Implementation
- Implemented using C + + Caffe Toolbox
8. Experiments
A total of three groups of experiments were conducted:
8.1 Configuration Comparison
Using the CNN structure in Figure 1, the C/D/E network structure is trained on multiple scales. Note that the test set for this group of experiments has only one scale. For example, as seen in:
Figure3 performance at a single test scale 8.2 Multi-scale Comparison
Test set multi-scale. And considering the scale difference over the General Assembly leads to decreased performance. So the scale q of the test set floats within the upper and lower 32 of S.
For the training set is interval scale, the test set scale is the minimum, maximum and median of interval.
Figure4 convnet performance at multiple test scales 8.3 convnet Fusion
The model fusion method is to take the mean value of the posterior probability estimate.
Merging the two best model in Figure 3 and Figure 4 to achieve a better value, the fusion of seven model will become worse.
Figure5 convnet Fusion 9. Reference
[1]. Simonyan K, Zisserman A. Very deep convolutional Networks for large-scale Image recognition[j]. ARXIV Preprint arxiv:1409.1556, 2014.
[2]. Krizhevsky A, Sutskever I, Hinton G E. Imagenet classification with deep convolutional neural networks[c]//advances I n Neural information processing systems. 2012:1097-1105.
Copyright notice: This article Bo Master original articles, blogs, without consent may not be reproduced.
Very Deep convolutional Networks for large-scale Image recognition