Deep convolutional Neural Network Learning notes (i)

Last Update:2016-07-30 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1. the essence of convolution operation:

input volume, which is made up of many slice in the depth direction, can correspond to many neurons in one slice, and the weight of neurons is in the form of convolution cores, a square filter (such as 3x3), Each of these neurons corresponds to a local region in the image, which is used to extract the characteristics of the region. If the slice corresponds to the neuron parameter sharing, then the equivalent of only one convolution nucleus acts on all local areas (similar to image filtering). A local area can be called a block, if all blocks are pulled into a column vector (because the neuron function is defined as the input vector and the parameter vector for the inner product operation, Y=W0X0+W1X1+...+WNXN), Then we can get a lot of these column vectors of the local area data matrix, and then pull the neuron weight into a row vector, so that a parameter matrix (if the parameters are shared, then the number of rows of the matrix is slice ), Then the data matrix and the parameter matrix of the dot product operation, the result of convolution, in fact, all the filter and all the local area of the dot product operation, of course, this result also needs to be re-reshape to the desired output size. This process also explains why the parameters of a neuron can be stored in a filter and why the network layer is called a convolution layer.

2. Output image size after convolution:

assuming the input image size is W, the convolution core size is F, the Stride (stride) is S(the stride of the convolution core movement), padding uses P(to fill the bounds of the input image, General padding 0), the image size after the convolution layer is (W-F+2P)/S+1.

3. Pooling

The pooling layer reduces the parameter size, reduces the computational complexity, and prevents overfitting by reducing the size of the feature map produced by the intermediate process (bottom sampling, image depth unchanged). Pooling is each slice independently acting on the image depth direction, typically using the max operation (the maximum value in a local area represents the region), or the largest pool. Usually the spatial dimensions of the pool layer (spatial extent) should not be too large, too much structure information is lost, generally take f=3, S=2 or f=2, S=2. It is also not recommended to use pooling, but to increase stride in the convolution layer to reduce the image size.

4. Fully connected

a neuron acts on the entire slice, that is, the size of the filter is exactly the size of a slice, so that a value is output, and if there are n filter, the output is a vector of length n. The output of the general fully connected layer is the class/fractional vector (class scores).

5. Structure of the network

The general structure of the network is:

INPUT, [[CONV -RELU]*n -pool?< c15>]*m , [FC -RELU]*k FC

Input: Enter

CONV: Convolution layer

RELU: Activation function

Pool: Pooling layer, where the question mark indicates that the layer is optional

FC: Full Connection Layer

N >= 0(typically N <= 3） , M >= 0 K >= 0 (usuallyK < 3)

6. Other

(1) As far as possible use multilayer fliter size of the small convolution layer instead of a layer of filter larger convolution layer.

Because the size of the area that can be observed by using a multilayer filter with a smaller convolution layer is the same as that of a large layer of filter, but the former can see more abstract features and better performance of the extracted features.

In addition, the parameter size introduced by the former is smaller than the latter. For example, a 3-layer filter is used as a 3x3 convolution layer instead of the 1-layer filter-7x7 convolution layer, assuming that the input volume is a C-channel , the former parameter

Number is3X(CX(3X3Xc) ) =27c^2, the latter C & #x00D7; ( 7 & #x00D7; 7 & #x00D7; C ) = for C 2 "> The cx (7X7XC) =49c^2, which clearly introduces fewer parameters.

C X00d7; ( 7 & #x00D7; 7 & #x00D7; C ) = for C 2 "> (2) Why use padding?

C X00d7; ( 7 & #x00D7; 7 & #x00D7; C ) = for C 2 "> The advantage of using padding is that the image size before and after the convolution remains the same, and the information of the boundary can be maintained. The size of the general padding is P = ( f -1)/2, where F is the size of the filter. If you do not use

C X00d7; ( 7 & #x00D7; 7 & #x00D7; C ) = for C 2 "> paddding, C X00d7; ( 7 & #x00D7; 7 & #x00D7; C ) = for C 2 ">

C X00d7; ( 7 & #x00D7; 7 & #x00D7; C ) = for C 2 "> (3) Why stride is generally set to 1?

C X00d7; ( 7 & #x00D7; 7 & #x00D7; C ) = for C 2 "> stride set to 1 the actual performance is better, the work of the next sample is all handed to the pool layer.

C X00d7; ( 7 & #x00D7; 7 & #x00D7; C ) = for C 2 "> (4) The input layer size should generally be divisible by 2 many times, such as (CIFAR-10), 64,96 (STL-10), 224 (common ImageNet convnets), 384 and 512.

C X00d7; ( 7 & #x00D7; 7 & #x00D7; C ) = for C 2 "> (5) Try to use a smaller (3x3 or up-to-5x5) convolution layer, and if you want to use a larger filter (such as 7x7), it is usually only the first convolutional layer.

C X00d7; ( 7 & #x00D7; 7 & #x00D7; C ) = for C 2 "> (6) Sometimes due to too many parameters, the memory limit will use a larger filter (7x7) and Stride (2) (reference zf Net) at the first convolutional layer. or filter (11x11), Stride (4)

(Refer to AlexNet).

Deep convolutional Neural Network Learning notes (i)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Deep convolutional Neural Network Learning notes (i)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Deep convolutional Neural Network Learning notes (i)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support