Network Steps to do: (a Chinese, teach Chinese, why write a bunch of English?) )
1, sample Abatch of data (sampling)
2,it through the graph, get loss (forward propagation, get loss value)
3,backprop to calculate the geadiets (reverse propagation calculation gradient)
4,update the paramenters using the gradient (using gradient update parameters)
What convolutional neural networks can do:
Category Fetch (recommended)
Detect (colleagues have classification and regression) segmentation
Autonomous driving (GPU recommended)
Feature Extraction
Posture Recognition (positioning of key points)
Recognition of character recognition icons for cancer cells
Image captions (Let machines read the world) cnn+lstm
Style transfer (stylistic transfer)
How is it implemented?
convolutional neural Networks consist of:
[INPUT-CONV-RELU-POOL-FC]
? input Layer
? convolutional layer
? activation function
? Pool Layer
? Full Connection Layer
What the hell is convolution?
For the time being, a little helper, called filter, helps us extract features on the 32x32x3.
How to extract? First of all, from the intuitive understanding of what the convolution did, first look at two-dimensional:
To divide the 32x32 into areas, such as a lot of 5x5 areas, a 5x5 region extracts a eigenvalues
The eigenvalues are extracted to form a 5x5 matrix.
Extracted is called a feature map.
In front of a little bit of a filter depth, here's the point:
Filter has a depth of 3 and must be consistent with the input depth of 3.
Why is the extracted feature graph 2?
Because Filter1 and Filter2 are set up here, different features are extracted separately, and the two feature graphs have no relation.
For example, this:
Have F1,f2,f3,f4,f5,f6
A 6-layer feature map is obtained, and the 6 feature graphs are stacked together, resulting in the output of the convolution.
Multilayer convolution:
To put it bluntly is to extract the feature graph of convolution as input, and then to extract the convolution feature.
32x32x3 with 6 5x5x3 filter to extract the 28x28x6 feature map, and then the 28x28x6 feature map with 10 5x5x6 filter to extract the 24x24x10 feature map
What about the results of convolution extraction?
by input → convolution → features → convolution → features → convolution → features
The equivalent of a step-by-step enrichment, which takes the final feature as a task of classification or regression.
The specific calculation method of convolution (feature extraction):
W0 and X-blue areas do the inner product (add after the corresponding position is multiplied):
F1 1th level = 0x1+ 0x1+ 0x1 + 0x-1+ 1x-1+ 1x0 + 0x-1+1x1+1x0 = 0
F1 2nd level = 0x-1+0x-1+0x1 +0x-1+0x1+1x0 +0x-1+2x1+2x0 = 2
F1 3rd level = 0x1+0x0+0x-1+ 0x0+2x0+2x0+ 0x1+0x-1+0x-1+ = 0
So according to the neural network score function: F (x,w) = Wx+b
Here's b =1
Then the output score value is F1+f2+f3+b = 0+2+0+1 =3
The rightmost green matrix, row 1th, column 1th, is 3.
The second step: the same to calculate the eigenvalues of the 1th row, the 2nd column, get-5
The third, in the same vein, calculates the 1th row of eigenvalues, column 3rd, gets-4
A filter of the same size moves down 2 lines on the input, continues the calculation according to the steps of the method, and obtains the 2nd line of the characteristic matrix.
Move down another 2 lines to get the 3rd line of the feature matrix
Similarly, the F2 (W1) characteristic matrix, which is the following green matrix
Parametric analysis of convolution cores:
Why do you move the 2 lattice? Why do you want to slide in this way?
The sliding step is called stride
If the input is a matrix of 7x7, stride = 1, sliding 1 cells at a time, filter = 3x3
So the final matrix is 5x5
If stride = 2, slide 2 cells at a time, filter = 3x3
The final matrix is a 3x3
As can be seen, the larger the stride, the smaller the resulting feature map.
We hope that the stride is smaller, then we get the characteristics of the matrix will be larger, so that the characteristics will be more,
However, there is an inverse relationship between efficiency and precision, so the stride cannot be too large or too small.
Pad
When extracting eigenvalues from convolution, there is a +pad 1 what is a thing?
Let's start with the 7x7 of the convolution kernel with 3x3,stride = 2,
The ① in the ①,②,③ window was used 1 times, ② was used 2 times, ③ was used 2 times.
That is, the utilization of elements on the edge is lower than the utilization of intermediate elements
So ② and ③ contribute more to the final result than ①, how do you make ① (the element on the edge) contribute more to the final result?
In fact, the initial image is a 5x5 size, just like the yellow ones.
Pad means that on the edge of the original input image, add a lap 0
Increases the utilization of elements on the edge.
Why add 0, because this lap, not our original input, just let him help us to use the edge point,
These 0 are no use to the network.
input = 7 x 7
Filter = 3 x 3
Pad = 1
Stride = 1
Output =7x7
As an example:
All filter sizes must be the same size.
Parameter sharing:
Imagine a convolution feature extraction with a matrix of input 32x32x3
Filter = 5x5, a total of 10 filter,stride = 1 pad = 2 So the characteristic matrix is a
Matrix of the 28x28x10
The input has 32x32x3 = 3,072 weight parameters,
The filter extraction feature has 5x5x10 = 250 weight parameters
If the input layer is fully connected to the convolutional layer, then there will be 760,000 weight parameters
It's a horrible thing to do with computation and efficiency,
If multiple parameters are shared (equal) on the output layer,
Each lattice of 28x28 corresponds to the same weight parameter of the 5x5 zone in 32x32, then only 5x5x3 = 750 parameters are required.
In addition to the 10 B parameters, then only 760 parameters are required.
Personal Understanding: Filter in the entire extraction process is unchanged, so the W parameter of the filter will not change, then the characteristic matrix corresponding to the input w parameter will not change, the feature matrix all points are extracted with the same weight parameter.
Summarize:
What's the main thing about the convolutional layer? In plain speaking, feature extraction is extracted, and the characteristic matrix is obtained by extracting the character of input by weight parameter.
005-convolutional Neural Network 01-convolutional layer