005-convolutional Neural Network 01-convolutional layer

Source: Internet
Author: User

Network Steps to do: (a Chinese, teach Chinese, why write a bunch of English?) )

1, sample Abatch of data (sampling)

2,it through the graph, get loss (forward propagation, get loss value)

3,backprop to calculate the geadiets (reverse propagation calculation gradient)

4,update the paramenters using the gradient (using gradient update parameters)

What convolutional neural networks can do:

Category Fetch (recommended)

Detect (colleagues have classification and regression) segmentation

Autonomous driving (GPU recommended)

Feature Extraction

Posture Recognition (positioning of key points)

Recognition of character recognition icons for cancer cells

Image captions (Let machines read the world) cnn+lstm

Style transfer (stylistic transfer)

How is it implemented?

convolutional neural Networks consist of:

[INPUT-CONV-RELU-POOL-FC]

? input Layer
? convolutional layer
? activation function
? Pool Layer
? Full Connection Layer

What the hell is convolution?

For the time being, a little helper, called filter, helps us extract features on the 32x32x3.

How to extract? First of all, from the intuitive understanding of what the convolution did, first look at two-dimensional:

To divide the 32x32 into areas, such as a lot of 5x5 areas, a 5x5 region extracts a eigenvalues

The eigenvalues are extracted to form a 5x5 matrix.

Extracted is called a feature map.

In front of a little bit of a filter depth, here's the point:

Filter has a depth of 3 and must be consistent with the input depth of 3.

Why is the extracted feature graph 2?

Because Filter1 and Filter2 are set up here, different features are extracted separately, and the two feature graphs have no relation.

For example, this:

Have F1,f2,f3,f4,f5,f6

A 6-layer feature map is obtained, and the 6 feature graphs are stacked together, resulting in the output of the convolution.

Multilayer convolution:

To put it bluntly is to extract the feature graph of convolution as input, and then to extract the convolution feature.

32x32x3 with 6 5x5x3 filter to extract the 28x28x6 feature map, and then the 28x28x6 feature map with 10 5x5x6 filter to extract the 24x24x10 feature map

What about the results of convolution extraction?

by input → convolution → features → convolution → features → convolution → features

The equivalent of a step-by-step enrichment, which takes the final feature as a task of classification or regression.

The specific calculation method of convolution (feature extraction):

W0 and X-blue areas do the inner product (add after the corresponding position is multiplied):

F1 1th level = 0x1+ 0x1+ 0x1 + 0x-1+ 1x-1+ 1x0 + 0x-1+1x1+1x0 = 0

F1 2nd level = 0x-1+0x-1+0x1 +0x-1+0x1+1x0 +0x-1+2x1+2x0 = 2

F1 3rd level = 0x1+0x0+0x-1+ 0x0+2x0+2x0+ 0x1+0x-1+0x-1+ = 0

So according to the neural network score function: F (x,w) = Wx+b

Here's b =1

Then the output score value is F1+f2+f3+b = 0+2+0+1 =3

The rightmost green matrix, row 1th, column 1th, is 3.

The second step: the same to calculate the eigenvalues of the 1th row, the 2nd column, get-5

The third, in the same vein, calculates the 1th row of eigenvalues, column 3rd, gets-4

A filter of the same size moves down 2 lines on the input, continues the calculation according to the steps of the method, and obtains the 2nd line of the characteristic matrix.

Move down another 2 lines to get the 3rd line of the feature matrix

Similarly, the F2 (W1) characteristic matrix, which is the following green matrix

Parametric analysis of convolution cores:

Why do you move the 2 lattice? Why do you want to slide in this way?

The sliding step is called stride

If the input is a matrix of 7x7, stride = 1, sliding 1 cells at a time, filter = 3x3

So the final matrix is 5x5

If stride = 2, slide 2 cells at a time, filter = 3x3

The final matrix is a 3x3

As can be seen, the larger the stride, the smaller the resulting feature map.

We hope that the stride is smaller, then we get the characteristics of the matrix will be larger, so that the characteristics will be more,

However, there is an inverse relationship between efficiency and precision, so the stride cannot be too large or too small.

Pad

When extracting eigenvalues from convolution, there is a +pad 1 what is a thing?

Let's start with the 7x7 of the convolution kernel with 3x3,stride = 2,

The ① in the ①,②,③ window was used 1 times, ② was used 2 times, ③ was used 2 times.

That is, the utilization of elements on the edge is lower than the utilization of intermediate elements

So ② and ③ contribute more to the final result than ①, how do you make ① (the element on the edge) contribute more to the final result?

In fact, the initial image is a 5x5 size, just like the yellow ones.

Pad means that on the edge of the original input image, add a lap 0

Increases the utilization of elements on the edge.

Why add 0, because this lap, not our original input, just let him help us to use the edge point,

These 0 are no use to the network.

input = 7 x 7

Filter = 3 x 3

Pad = 1

Stride = 1

Output =7x7

As an example:

All filter sizes must be the same size.

Parameter sharing:

Imagine a convolution feature extraction with a matrix of input 32x32x3

Filter = 5x5, a total of 10 filter,stride = 1 pad = 2 So the characteristic matrix is a

Matrix of the 28x28x10

The input has 32x32x3 = 3,072 weight parameters,

The filter extraction feature has 5x5x10 = 250 weight parameters

If the input layer is fully connected to the convolutional layer, then there will be 760,000 weight parameters

It's a horrible thing to do with computation and efficiency,

If multiple parameters are shared (equal) on the output layer,

Each lattice of 28x28 corresponds to the same weight parameter of the 5x5 zone in 32x32, then only 5x5x3 = 750 parameters are required.

In addition to the 10 B parameters, then only 760 parameters are required.

Personal Understanding: Filter in the entire extraction process is unchanged, so the W parameter of the filter will not change, then the characteristic matrix corresponding to the input w parameter will not change, the feature matrix all points are extracted with the same weight parameter.

Summarize:

What's the main thing about the convolutional layer? In plain speaking, feature extraction is extracted, and the characteristic matrix is obtained by extracting the character of input by weight parameter.

005-convolutional Neural Network 01-convolutional layer

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.