Understanding of the Sppnet network

Source: Internet
Author: User

Objective:

Then the previous article mentioned the RCNN network object detection, the network successfully introduced the CNN convolutional network for feature extraction, but there is a problem, is the need for feature extraction image size has strict restrictions. In the face of this problem, the RG great God used to split out more than 2000 candidate areas, cutting or scaling deformation processing to a fixed size, so that although the size of the CNN to meet the requirements of the picture, the information is missing or deformation, will reduce the accuracy of image recognition. As shown in the following:
     

Body:

He Keming in the RCNN model, after analyzing the characteristics of the CNN model: The convolution part and the full connection of the two parts, and for the convolution part, such as any image size (w,h), arbitrary convolution kernel sizes (A, b), the default step is 1, we will get convolution after the feature map F (w-a+ 1,H-B+1), so this part of the picture size is not required, there is a requirement in the full connection layer (such as), the entire connection layer after the neuron set is fixed (the number of Input layer neurons), and each one of the corresponding characteristics, RG big God in the image before entering CNN warp processing, It is for the characteristic number after convolution, it can be equal to the number of neurons in the whole connected layer.

    

But He Dashen feels that things can be more interesting, he proposed to the characteristics of the data (feature map) further processing, and then pieced together with the number of neurons of the same number of features, so that you can not warp the size of the image can also be obtained the same amount of characteristics, then he is how to deal with this feature map?

    

As mentioned in the paper, for example, we have a picture for examples:

              

We convolution this graph (we take ZF, for example, after the last convolution to get such a feature map)

    

This picture shows a 60*40*256 feature map, after here, if you want to get a fixed number of neurons, the paper mentioned is 21, we need to be 60*40 feature map, we call this feature map feature A, processing, how to deal with it?

Let's put a picture first:

        

As shown in the following:

We use the three-layer pyramid pool layer pooling, respectively set the picture cut into how many pieces, the paper set the difference is (1,4,16), and then according to the hierarchy of this feature map feature a respectively

Processing (implemented in code is for (layer)), that is, in the first layer of this feature map feature a full feature map pooling (pooling is divided into: The maximum pool, the average pool, random pooling), the paper uses the maximum pooling,

Get 1 features.

The second layer first divides this feature map feature a into 4 (20,30) small feature maps, and then uses the corresponding size of pooling to check its pooling to get 4 characteristics,

The third layer first divides the feature graph feature a into 16 (10,15) small feature maps, and then uses the corresponding size pooling to check its pooling to get 16 characteristics.

The 1+4+16=21 feature is then entered into the fully connected layer for weight calculation.

This is the core idea of sppnet, of course, in this model, He Dashen also optimized the RCNN, the above described pyramid pooling instead of warp the most important one, but this is also very important, what is it?

He Dashen that if the SS provide more than 2000 candidate areas are convolution, it is bound to take a lot of time, so he thought, can we first to a whole picture convolution to get a feature map, and then

The location of the more than 2000 candidate regions provided by the SS algorithm is recorded, and the candidate region's feature map B is extracted by scale mapping to the feature map of the whole map, then B is fed into the pyramid pool layer for weight calculation.

Then after trying, this method is feasible, so on the basis of RCNN, the two optimizations are obtained by this new network sppnet.

It is worth mentioning that Sppnet proposed this pyramid pooling to achieve any image size of the CNN processing of this idea, has been widely recognized, the future of many models, more or less in this regard are reference to this idea, even

RG Great God, in the later proposed FAST-RCNN is also the benefit of this idea inspired.

Reference:

Spatial Pyramid Pooling in deep convolutional Networks for Visual recognition

Understanding of the Sppnet network

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.