< turn > Convolution neural Network How to learn the invariant characteristics of translation

Source: Internet
Author: User

After some thought, I don't believe that pooling operations is responsible for the translation invariant property in CNN S. I believe that invariance (at least to translation) are due to the convolution filters (not specifically the pooling) an D due to the fully-connected layer.

For instance, let's use the Fig. 1 as reference:

The blue volume represents the input image, while the green and yellow volumes represent layer 1 and Layer 2 output Activa tion volumes (see cs231n convolutional neural Networks for Visual recognition If your is not familiar with these Volum ES). At the end, we had a fully-connected layer that was connected to all activation points of the yellow volume.

These volumes is build using a convolution plus a pooling operation. The pooling operation reduces the height and width of these volumes, while the increasing number of filters in each layer Increases the volume depth.

For the sake of the argument, let's suppose that we had very "ludic" filters, as show in Fig. 2:

    • The first layer filters (which would generate the green volume) detect eyes, noses and other basic shapes (in Rea L CNNs, first layer filters would match lines and very basic textures);
    • The second layer filters (which would generate the yellow volume) detect faces, legs and other objects that is Aggrega tions of the first layer filters. Again, this is a example:real life convolution filters may detect objects, which has no meaning to humans.

Now suppose this there is a face at one of the corners of the the image (represented by, red and a magenta point). The eyes is detected by the first filter, and therefore would represent the activations at the first slice N volume. The same happens for the nose, except that it's detected for the second filter and it appears at the second slice. Next, the face filter would find that there is eyes and a nose next to each other, and it generates an activation at t He yellow volume (within the same region of the the face at the input image). Finally, the fully-connected layer detects that there are a face (and maybe a leg and an arm detected by other filters) and It outputs that it had detected an human body.

Now suppose that the face have moved to another corner of the image, as shown in Fig. 3:

The same number of activations occurs in this example, however they occur in a different region of the green and yellow VO Lumes. Therefore, any activation in the first slice of the yellow volume means that a-face is detected, independently of T He face location. Then the fully-connected layer was responsible to ' translate ' a face and a human body. In both examples, an activation is received at one of the fully-connected neurons. However, in each example, the activation path inside the FC layer is different, meaning that a correct learning at the FC Layer is essential-ensure the Invariance property.

It must is noticed that the polling operation only "compresses" the activation volumes, if there is no polling in this ex Ample, an activation in the first slice of the yellow volume would still mean a face.

In conclusion, what makes a CNN invariant to object translation is the architecture of the neural network:the convolution Filters and the fully-connected layer. Additionally, I believe that if a CNN was trained showing faces only at one corner, during the learning process, the fully- Connected layer may become insensitive to faces on other corners.

Source

Https://www.quora.com/How-is-a-convolutional-neural-network-able-to-learn-invariant-features/answer/Jean-Da-Rolt

< turn > Convolution neural Network How to learn the invariant characteristics of translation

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.