Deep convolutional Neural Network Learning notes (i)

Source: Internet
Author: User

1. the essence of convolution operation:

input volume, which is made up of many slice in the depth direction, can correspond to many neurons in one slice, and the weight of neurons is in the form of convolution cores, a square filter (such as 3x3), Each of these neurons corresponds to a local region in the image, which is used to extract the characteristics of the region. If the slice corresponds to the neuron parameter sharing, then the equivalent of only one convolution nucleus acts on all local areas (similar to image filtering). A local area can be called a block, if all blocks are pulled into a column vector (because the neuron function is defined as the input vector and the parameter vector for the inner product operation, Y=W0X0+W1X1+...+WNXN), Then we can get a lot of these column vectors of the local area data matrix, and then pull the neuron weight into a row vector, so that a parameter matrix (if the parameters are shared, then the number of rows of the matrix is slice ), Then the data matrix and the parameter matrix of the dot product operation, the result of convolution, in fact, all the filter and all the local area of the dot product operation, of course, this result also needs to be re-reshape to the desired output size. This process also explains why the parameters of a neuron can be stored in a filter and why the network layer is called a convolution layer.

2. Output image size after convolution:

assuming the input image size is W, the convolution core size is F, the Stride (stride) is S(the stride of the convolution core movement), padding uses P(to fill the bounds of the input image, General padding 0), the image size after the convolution layer is (W-F+2P)/S+1.

3. Pooling

The pooling layer reduces the parameter size, reduces the computational complexity, and prevents overfitting by reducing the size of the feature map produced by the intermediate process (bottom sampling, image depth unchanged). Pooling is each slice independently acting on the image depth direction, typically using the max operation (the maximum value in a local area represents the region), or the largest pool. Usually the spatial dimensions of the pool layer (spatial extent) should not be too large, too much structure information is lost, generally take f=3, S=2 or f=2, S=2. It is also not recommended to use pooling, but to increase stride in the convolution layer to reduce the image size.

4. Fully connected

a neuron acts on the entire slice, that is, the size of the filter is exactly the size of a slice, so that a value is output, and if there are n filter, the output is a vector of length n. The output of the general fully connected layer is the class/fractional vector (class scores).

5. Structure of the network

The general structure of the network is:

INPUT, [[CONV -RELU]*n -pool?< c15>]*m , [FC -RELU]*k FC

Input: Enter

CONV: Convolution layer

RELU: Activation function

Pool: Pooling layer, where the question mark indicates that the layer is optional

FC: Full Connection Layer

N >= 0(typically N <= 3) , M >= 0 K >= 0 (usuallyK < 3)

6. Other

(1) As far as possible use multilayer fliter size of the small convolution layer instead of a layer of filter larger convolution layer.

Because the size of the area that can be observed by using a multilayer filter with a smaller convolution layer is the same as that of a large layer of filter, but the former can see more abstract features and better performance of the extracted features.

In addition, the parameter size introduced by the former is smaller than the latter. For example, a 3-layer filter is used as a 3x3 convolution layer instead of the 1-layer filter-7x7 convolution layer, assuming that the input volume is a C-channel , the former parameter

Number is3X(CX(3X3Xc) ) =27c^2, the latter C & #x00D7; ( 7 & #x00D7; 7 & #x00D7; C ) = for C 2 "> The cx (7X7XC) =49c^2, which clearly introduces fewer parameters.

< span class= "Mo" > C X00d7; ( 7 & #x00D7; 7 & #x00D7; C ) = for C 2 ">    (2) Why use padding?

< span class= "Mo" > C X00d7; ( 7 & #x00D7; 7 & #x00D7; C ) = for C 2 ">            The advantage of using padding is that the image size before and after the convolution remains the same, and the information of the boundary can be maintained. The size of the general padding is P = ( f -1)/2, where F is the size of the filter. If you do not use

< span class= "Mo" > C X00d7; ( 7 & #x00D7; 7 & #x00D7; C ) = for C 2 ">            paddding, < span class= "Mo" > C X00d7; ( 7 & #x00D7; 7 & #x00D7; C ) = for C 2 ">

< span class= "Mo" > C X00d7; ( 7 & #x00D7; 7 & #x00D7; C ) = for C 2 ">    (3) Why stride is generally set to 1?

< span class= "Mo" > C X00d7; ( 7 & #x00D7; 7 & #x00D7; C ) = for C 2 ">           stride set to 1 the actual performance is better, the work of the next sample is all handed to the pool layer.

< span class= "Mo" > C X00d7; ( 7 & #x00D7; 7 & #x00D7; C ) = for C 2 ">    (4) The input layer size should generally be divisible by 2 many times, such as (CIFAR-10), 64,96 (STL-10), 224 (common ImageNet convnets), 384 and 512.

< span class= "Mo" > C X00d7; ( 7 & #x00D7; 7 & #x00D7; C ) = for C 2 ">    (5) Try to use a smaller (3x3 or up-to-5x5) convolution layer, and if you want to use a larger filter (such as 7x7), it is usually only the first convolutional layer.

< span class= "Mo" > C X00d7; ( 7 & #x00D7; 7 & #x00D7; C ) = for C 2 ">    (6) Sometimes due to too many parameters, the memory limit will use a larger filter (7x7) and Stride (2) (reference  zf Net) at the first convolutional layer. or filter (11x11), Stride (4)

(Refer to AlexNet).

Deep convolutional Neural Network Learning notes (i)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.