SPP Spatial pyramid Pooling (spatial Pyramid Pooling)

Source: Internet
Author: User
pooling of spatial pyramids Overview of the spatial pyramid pooling layer: in the convolution operation of the picture, the size of the convolution core is not changed, the weight of the inverse adjustment is only the value will change. But, however, the size of the input picture can you control it. Haha, the size of our input images will change, where the change in image size does not affect the convolution operation and polling operation, but will affect the link of the whole connection layer. The core of this article is to solve how the different input images (mainly for different sizes) can be directly applied to the already trained network.


Why to introduce spatial pyramid pooling:
First of all, why do we have this layer: we deal with the size of the picture, we have their own different pixel values, but the same batch of data, if you have to go through a certain crop to the same size of the image, for example, we can first cut the image of Four corners, in addition to a central area of the picture, This is the five distorted picture, and then the picture is flipped horizontally after the same operation will get five pictures, the total is 10 of the same size picture, this is a method, of course, there are other methods, such as in Overfeat's paper also mentioned a method, and so on. These cutting techniques will achieve good results, but there are still some problems, for example, some areas of the crop will be repeated, the invisible increase in the weight of the area. Therefore, this paper proposed pyramid pooling to solve the input image size is different.
Observing figure I, the top represents the original picture is to be properly crop and wrap after the appropriate pixel value, the middle is the corresponding network model. The bottom one is the model of this paper, followed by a spatital pyramid polling layer behind the last convolutional layer, followed by the full join layer. This will solve the problem of testing the current network regardless of the size of the input image.
But what we need to understand is why this layer is behind the last circle of the base. That is to say why convolution and polling are not sensitive to the size of the image, but the full connection layer is sensitive. Let's take a look. Assuming that the size of the input image is 100*100, after 5 convolution core 3*3 will be poor 5*98*98 feature maps, even if the size of your input image into 102*102, then my feature maps is 5*100*100. The feature maps here have been 25*25 and 26*26 after 2*2 polling. No effect, the size of the convolution core here is fixed, you can go to convolution any size picture. But the whole connection layer is different. Assuming that the last convolutional layer has 50 outputs, the next layer of the full link has 1000 inputs, then this link matrix is 50*1000, haha, you think, if each time the input image size is different, how to link here? Because the different picture size after the last convolutional layer of input to the output is not likely to be 50 ah. That's why we have to operate on the full-connection layer.
The characteristics of the spatial pyramid pooling layer
Of course, the characteristics of this paper are not just one. Pyramid pooling layer is like the following three advantages, the first: he can solve the input image size caused by the defects. Second: Because of a feature map from different angles of feature extraction, and then the characteristics of aggregation, showing the robust characteristics of the algorithm. Third: It also adds precision to the object recongtion. In fact, you can think of this, the most cattle break the place is because in the back of the convolution layer of each picture has been a multi-faceted feature extraction, he can improve the accuracy of the task. Like the different size of the picture in different networks to train, greatly improved the accuracy of the model. SPP has been given the height of state of the art in various existing network models, such as R-CNN. Not only that, r-cnn need to different sizes of the image feed into a different network model, the whole process is particularly time-consuming, spp-net just can refuse this problem, greatly reducing the time.
What is pyramid pooling layer
Haha, said a long time, not to mention the focus, and now we go to the point, said what is the pyramid pool layer.

As shown above, from the bottom up, this is a traditional network architecture model, the 5-layer convolution layer, where the convolution layer is called the convolution and pooling layer of the Union, unified called convolution layer, followed by the full connection layer. What we need to deal with here is to add a layer of pyramid pooling in front of the fully connected layer of the network to solve the varying size of the input image. We can see here that the Spatital Pyramid pooling layer is a 3 convolution operation on every image of the previous convolution feature maps. The far right is the original image, the middle is the image is divided into 4 of the size of the feature map, the right is the image is divided into 16 of the size of the feature map. Then every feature map becomes 16+4+1=21 a feature maps. This does not solve the feature map size of the situation is different.
So what is the specific operation? Let's take a look: when the input size of the picture is equal, we assume that the output size of the image after the convolution core of the fifth layer is a*a (for example, 13*13), the size of our bins is n*n, then each window Win=cell (a/n), but the stride size is stride= Floor (a/n), the front one is rounded up, followed by rounding down, resulting in the figure shown above the three pooling operation. The essence of these three is to maximize the pool, just using different window sizes and moving steps. FC6 represents the full-link layer. Experiments show that the multi-level convolution behavior can improve the final accuracy of the experiment. Picture operations of different sizes are the same way.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.