Very Deep convolutional Networks for large-scale Image recognition

Source: Internet
Author: User
Tags scale image

Very Deep convolutional Networks for large-scale Image recognition

Reprint Please specify:http://blog.csdn.net/stdcoutzyx/article/details/39736509

This paper is in September this year's paper [1], relatively new, in which the views of the convolution neural network to adjust the parameters of a great guide, a special summary.

About convolutional Neural Networks (convolutional neural Network, CNN), the author will explain the composition, if the reader is impatient or can be used Google Baidu a bit.

The following is the paper's notes. The author first tries to extract the key notes of a paper. If there are deficiencies please read the original note.

1. Main contribution
    • In the case that the total number of parameters is not changed, the effect of CNN is changed with the addition of the number of layers.

    • The method in the paper won the second place in the ILSVRC-2014 competition.

      • Ilsvrc--imagenet Large-scale Visual recongnition challenge
2. CNN Improvement

After the appearance of the paper [2], there are many ways to improve the structure of CNN proposed. Like what:

    • Use smaller receptive window size and smaller stride of the first convolutional layer.
    • Training and testing the networks densely over the whole image and over multiple scales.
3. CNN Configuration Principals
    • The input from CNN is a 224x224x3 image.
    • The only preprocessing before the input is the minus mean value.

    • 1x1 cores can be viewed as linear transformations of input channels.
    • Use a larger convolution kernel size of 3x3.
    • Max-pooling is typically done on a 2x2 pixel form with Stride 2.
    • In addition to the last fully connected classification layer, rectification non-linearity (RELU) is required for the other layers.
    • You do not need to join the local Response normalization (LRN). Because it does not improve the effect, it brings the computational cost and the memory cost. Add calculation time.
4. CNN Configuration
    • The number of channels (width) of the convolution layer is doubled from 64, each over a max-pooling layer, to 512.
    • Use filters with 3x3 size throughout the whole net, because a stack of the 3x3 conv layers (without spatial pooling in bet Ween) has a effective receptive of 5x5, and three a stack of 3x3 conv layers have a receptive of 7x7, and so on.
    • Why use a three-layer 3x3 instead of a layer of 7x7?
      • First. Three layer is more discriminating than the first layer;
      • Second. If the same number of channels C, then the three-layer 3x3 the number of parameters is 3x (3x3) CXC=27CXC, the first layer of 7x7 the number of 7X7XCXC=49CXC. Greatly reduced the number of parameters.

    • Convolution cores using 1*1 can add nonlinear discriminant functions without affecting the field of view. The core can be used in the network structure of "networking" and can refer to article 12.

    • Figure 1 is a neural network structure used in the experiment to see that the number of CNN layers from 11 to 19, the structure conforms to the above summary points. Figure 2 is the total number of individual CNN references. Can see. Despite the depth of the change. But the number of participants changed little.

Figure1 convnet Configuration

Figure2 Parameter Num 5. Training
    • In addition to using multiple scale. [1] The experiment basically follow the setting of the paper [2]. Batch size is 256,momentum is 0.9, the regularization factor is 5x10e-4, the first two levels of fully connected dropout are set to 0.5, the learning step is initialized to 10e-2, and the step is divided by 10 when the result of the validation set no longer rises, except three times.

      Stopped when learning 370K iterations (epochs).

    • Paper guess. The network of this article is more easy to converge than the original network. There are two reasons:
      • Implicit regularization imposed by greater depth and smaller conv filter sizes
      • Pre-initialisation of certain layers. The shallow network is trained first. A network in which, when a deeper network such as E is trained, the corresponding layer is initialized with the parameters obtained in a, and the new layer's parameters are randomly initialized. It is important to note that this is the way to initialize it. Does not change the stride length.

    • 224x224 the input, scale the original picture, and so on. Ensure that the short edge is greater than 224. Then randomly select the 224x224 form, in order to further data augment. Also consider random horizontal affine and RGB channel switching.
    • Multi-scale Training. The meaning of Multiscale is that the scale of the object in the picture changes. Multi-scale can better identify the object.

      There are two ways to do multi-scale training.

      • at different scales. Train multiple classifiers. The number of references is S. The meaning of the parameter is the length of the short edge when zooming on the original image. In this paper, two classifiers of s=256 and s=384 are trained. The parameters of the s=384 classifier are initialized with the s=256, and the step size is set to 10e-3.
      • Another way is to train a classifier directly. Each time the data is entered. Each picture is scaled again, the short edge of the zoom is randomly selected from [Min, Max], and the interval [256,384] is used in this article. The number of parameters to use when s=384 is initialized for network parameters.
6. Testing

Test use such as the following steps:

    • The first is proportional scaling, and the short side length q is greater than 224. The meaning of Q is the same as S. Just S is the training set. Q is the number of parameters in the test set.

      Q does not have to be equal to S. Conversely, for a s, using multiple Q values to test, and then averaging will make the effect better.

    • Then, the test data are tested according to the method of document 16.
      • Converts an all-connected layer to a convolution layer, and the first full-connection is converted to a 7x7 convolution. The second one converts to a 1x1 convolution.
      • Resulting net is applied to the whole image by convolving the filters of each layer with the full-size input. The resulting output feature map is a class score map with the number channels equal to the number of classes, and the Var Iable spatial resolution, dependent on the input image size.
      • Finally, class score map is spatially averaged (sum-pooled) to obtain a fixed-size vector of class scores of the image.
7. Implementation
    • Implemented using C + + Caffe Toolbox
      • Support for single-system multi-GPU
      • Multi-GPU divides batch into multiple gpu-batch, computes on each GPU, obtains the gradient of sub-batch, and averages it as the gradient of the whole batch.
      • In the reference document [9], a lot of accelerated training methods are proposed in this paper.

        The experimental results show that. Up to 3.75 times times faster on a 4-GPU system.

8. Experiments

A total of three groups of experiments were conducted:

8.1 Configuration Comparison

Using the CNN structure in Figure 1, the C/D/E network structure is trained on multiple scales. Note that the test set for this group of experiments has only one scale. For example, as seen in:

Figure3 performance at a single test scale 8.2 Multi-scale Comparison

Test set multi-scale. And considering the scale difference over the General Assembly leads to decreased performance. So the scale q of the test set floats within the upper and lower 32 of S.

For the training set is interval scale, the test set scale is the minimum, maximum and median of interval.

Figure4 convnet performance at multiple test scales 8.3 convnet Fusion

The model fusion method is to take the mean value of the posterior probability estimate.

Merging the two best model in Figure 3 and Figure 4 to achieve a better value, the fusion of seven model will become worse.

Figure5 convnet Fusion 9. Reference

[1]. Simonyan K, Zisserman A. Very deep convolutional Networks for large-scale Image recognition[j]. ARXIV Preprint arxiv:1409.1556, 2014.

[2]. Krizhevsky A, Sutskever I, Hinton G E. Imagenet classification with deep convolutional neural networks[c]//advances I n Neural information processing systems. 2012:1097-1105.

Copyright notice: This article Bo Master original articles, blogs, without consent may not be reproduced.

Very Deep convolutional Networks for large-scale Image recognition

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.