Summary of pooling methods (Pooling)

Source: Internet
Author: User
Tags scale image

In convolutional neural networks, we often encounter pooling operations, and the pooling layer is often behind the convolution layer, through pooling to reduce the convolution layer output of the eigenvector, while improving the results (not easy to appear over-fitting).

Why is it possible to reduce the dimensions?

Because an image has a "static" property, it means that features that are useful in one image area are most likely to be equally applicable in another area. Therefore, in order to describe a large image, a natural idea is to aggregate the characteristics of different locations, for example, people can calculate the average value (or maximum) of a particular feature on an area of an image to represent the characteristics of the region. [1]


1. generic Pooling (general Pooling)

The pool is used for areas that are not coincident in the image (this is different from the convolution operation), as in the process.

We define the size of the pooled window as Sizex, which is the edge length of the middle Red Square, which defines the horizontal displacement/vertical displacement of the two adjacent pooled windows as stride. General pooling because each pooled window is not duplicated, sizex=stride.

The most common pooled operations are the average pooled mean pooling and Max pooled Max pooling:

Average pooling: Calculates the average of an image area as a pooled value for that region.

Average pooling: The maximum value of the selected image area as the value after the zone is pooled.


2. Overlapping pooling (overlappingpooling) [2]overlapping pooling as its name says, there will be overlapping areas between adjacent pooled windows, at which point Sizex>stride.

In [2], the author uses overlapping pooling and the other settings are unchanged, and the error rates of top-1 and top-5 are reduced by 0.4% and 0.3% respectively.



3. Empty pyramid pooling (Spatial Pyramid Pooling) [3] 

The spatial pyramid pooling can transform the convolution feature of any scale image into the same dimension, which not only can make CNN deal with arbitrary scale image, but also avoid cropping and warping operation, which leads to the loss of some information, which has very important meaning.

General CNN need to input image size is fixed, this is because the input of the full connection layer needs to be fixed input dimension, but in the convolution operation is not a limit to the image scale, all authors proposed the spatial pyramid pooling, first let the image convolution operation, and then transformed into a dimension of the feature input to the full connection layer, This can extend CNN to any size image.


The idea of spatial pyramid pooling comes from spatial Pyramid Model, where a pooling becomes a pooling of multiple scale. By using different size pooling window to convolution feature, we can get 1x1,2x2,4x4 result, because there are 256 filters in conv5, so we get a 256-dimensional feature, 4 256 features, and 16 256-dimensional features, Then the 21 256-dimensional features are linked together into an all-connected layer, in this way the different size of the image into the same dimension features.


For different images to get the same size pooling results, it is necessary to calculate the size and step of the pooled window dynamically according to the size of the image. Assuming that the size of the CONV5 output is a*a, you need to get a pooled result of n*n size, which allows the window size to be sizex to the step. Take the size of the CONV5 output as an example of 13*13.


Question: If the conv5 output size is 14*14,[pool1*1] of sizex=stride=14,[pool2*2] sizex=stride=7, none of this is a problem, however, [pool4*4] sizex=5,stride= 4, the last and last row features are not counted as pooled operations.

1. Reference

[1] Ufldl_tutorial

[2] Krizhevsky, I. sutskever, ANDG. Hinton, "Imagenet classification with deep convolutional neural networks," in nips,2012.

[3] kaiming He, Xiangyu Zhang, shaoqing Ren, Jian su,spatial Pyramid Pooling in deep convolutional Networks for Visual RECOGNITION,LSVRC-2014 Contest



Summary of pooling methods (Pooling)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.