Wunda Deep Learning Course notes convolution neural network basic operation detailed

Source: Internet
Author: User

convolution layer

The role of convolutional layers in CNN:

The convolutional layer in CNN is represented in many network structures by Conv, which is the abbreviation for convolution.

The convolution layer plays an important role in CNN--The abstraction and extraction of features, which is also a significant difference between CNN and traditional Ann or SVM.

For the picture, the picture is a two-dimensional data, how can we learn the correct mode of the picture for a picture to have the correct classification of the picture. This time, someone put forward a point, we can, for all the pixels, all connected to the previous weight, we also divided into a number of layers, and then the final classification, this is also possible, but for a picture, there are too many pixels, too many parameters. Then someone asked, we only see part of how, just for a picture, we only look at a small window can be, for other places, we also provide a similar small window, we know that when we make a convolution of the image, we can do a lot of pictures, such as the overall image blur, Or the edge of the extraction, convolution operation for the picture can be very well extracted to the characteristics, and through the spread of BP error, we can according to different tasks, to the task of the best parameters, learning the best convolution kernel relative to this task, The logic of sharing weights is that if a convolution core can be well characterized in a small area of a picture, it can also be well characterized in other places.

padding (Padding)

Valid: That is, no padding.
Same: Fills the edge of the image so that the input and output are the same size.

The consequence of not adopting padding: Small output image of edge information sampling becomes smaller

The paddding usually guarantees that the input and output dimensions are the same in the convolution process. It can also make the frame near the edge of the image contribute to the output and the frame near the center of the image.

Suppose the input image size is: n*n, the filter size is f*f, the size of the fill is P, the step is s;

Then, the output size is

① assuming that the step stride size is 1 and there is no padding, the output is:

(n−f+1) ∗ (n−f+1) (n-f+1) * (n-f+1)

② assumes that the step stride size is 1 and the size of the fill is P, the output is:

(n+2p−f+1) ∗ (n+2p−f+1) (n+2p-f+1) * (n+2p-f+1)

According to the above formula, we can see that the output will be less than the size of the input, but in practice we tend to expect the output to be the same size as the input, so we can get the following equation:

(n+2p−f+1) ∗ (n+2p−f+1) =n∗n (n+2p-f+1) * (n+2p-f+1) =n*n

Solution to:
P=f−12 P=\frac{f-1}{2}

The above formula shows that when F is odd, the size of the fill is also determined. Perhaps you would ask, is the size of the filter must be an odd number. Theoretically, an even number of f is also possible. However, in practical engineering applications, F generally takes an odd number (in many cases, 3) for the following reasons: if even, there may be asymmetric padding, obviously we do not like the operation of the odd and central pixels, so that we can locate the filter position

Step Size (Stride)

③ assumes that the step size is s, and the size of the fill is P, the output is:
⌊n+2p−fs⌋+1 \biggl\lfloor\frac{n+2p-f}{s}\biggr\rfloor+1

It is important to note that when the result is not an integer, we generally take the next rounding action.

pooling (Pooling): Convolution layer is the image of a neighborhood convolution to obtain the neighborhood characteristics of the image, the pool layer is the use of Pooling technology to the small neighborhood of the feature points of the integration of new features.

Advantages: Significantly reduced number of parameters pooling units with translational invariance

The most common use in practice is maximum pooling.

convolution neural network means of reducing parameters:

1) Sparse connection

It is generally believed that the cognition of people to the outside world is from local to global, and the spatial relation of image is more closely related to pixels, while the pixel correlation is weaker than that of distance. Thus, it is not necessary for each neuron to perceive the global image, but to perceive it locally, and then to synthesize the local information at higher levels to get a global message. The idea that the network is partially connected is also a visual system structure inspired by biology. Neurons in the visual cortex are locally receptive (that is, these neurons respond only to stimuli in certain regions). As shown in the following illustration: The left image is full, and the image on the right is a sparse connection.

In the upper right image, if each neuron is connected to only 10x10 pixels, then the weight data is 1000000x100 and reduced to the original value of 1 per thousand. And that 10x10 pixel value corresponding to the 10x10 parameter, in fact, is equivalent to convolution operation.

2) parameter sharing

But in fact this argument is still too many, then start the second level artifact, that is, weight sharing. In the above local connection, each neuron corresponds to 100 parameters, altogether 1 million neurons, if the 100 parameters of the 1 million neurons are equal, then the number of parameters becomes 100.

How to understand the weight of sharing it. We can consider these 100 parameters (that is, convolution operations) as a way to extract features regardless of location. The implication of this is that the statistical characteristics of the part of the image are the same as the rest. This also means that the features we learn in this section can also be used in other parts, so we can use the same learning features for all the locations on this image.

More intuitively, when a small piece is randomly selected from a large image, such as 8x8 as a sample, and some features are learned from this small sample, we can apply the feature learned from this 8x8 sample as a detector to any place in the image. In particular, we can use the features learned from the 8x8 sample to make a convolution of the original large-size image, thus obtaining an activation value of a different feature for any position on this large-size image.

As shown in the following figure, a 33 convolution core is presented as a convolution process on a 55 image. Each convolution is a feature extraction method that, like a sieve, filters out parts of the image that match the criteria (the larger the activation value is, the better the condition).

Reference Documents:

Technology to: Read the convolutional neural network in one article CNN

The principle of convolutional neural network is intuitive to explain.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.