[OpenCV] convolutional Neural Network

Source: Internet
Author: User

REF: Convolution neural network CNNs from LeNet-5

The qac of some of the posts in this article:

1. Fundamentals

MLP (Multilayer Perceptron, multilayer perceptron) is a forward neural network (as shown), and is fully connected between adjacent two-layer networks.

Sigmoid typically use the Tanh function and the logistic function.

1998 Yann LeCun in the paper "gradient-based Learning applied to Document recognition" in the LeNet-5, and in the letter recognition has achieved very good results. The structure of the LeNet-5 is as follows:

    • Input: Enter image, 32*32 pixels;
    • C1:5*5 convolution kernel , generate 6 feature maps, total required (6*5*5 + 6) = 156 parameters;
    • S2:2*2 pixels are added and multiplied by a parameter, plus a bias, totaling 2*6=12 parameters;
    • C3:5*5 convolution kernel, generate 16 feature maps, each feature map by S2 in a number of feature maps convolution, shown Table1;
    • S4: The same operation as S2, total 16*2 = 32 parameters;
    • C5: Fully connected with S4, total (5*5*16*120+120) = 48,120 parameters;
    • F6: Fully connected with C5, total (120*84+84) = 10,164 parameters;
    • Output: Fully connected to F6.

A detailed analysis of each layer begins below.

convolution:

C1/c3/c5 these three layers have been used convolution operation, learned digital image processing of the students must be familiar with the convolution of this image, in essence, a weight template in the image of each area to do a weighted sum, as shown in:

The yellow 3*3 convolution kernel moves from the upper left corner of the image to the right or down, weighting the area covered in the move. Finally get the convolution result (5-3+1) * (5-3+1) size, called a feature map.

C1: The C1 layer of the LeNet-5 uses 6 5*5 convolution to check the input 32*32 image for convolution, and each convolutional check should generate a 32-5+1 map of (32-5+1) * (feature) with a total of 6 feature maps.

C3: The C3 process is a little more complicated, C3 a total of 16 feature maps, each feature map input according to the Table1 selection. For example, feature map with C3 number 0 is generated by feature map numbered 0, 1, 2 in S2. The 3 temporary feature map is generated by 3 convolution cores on the S2 0, 1, 2 feature map, and the three temporary feature maps are added to C3 map 0. There are two advantages to constructing C3: one is to reduce the number of parameters compared to full connection, and the other is that each feature map input is not the same, can achieve complementary effect.

C5: The C5 layer is fully connected, and each feature map is summed by all the feature map convolution results in S4. Since S4 's feature map size is 5*5, the convolution kernel size is also 5*5, so the convolution is a 1*1 matrix.

In addition, C1/C3/C5 each feature map calculation results in a bias at the end of the calculation.

pooling of pooling:

There are two main functions of pooling: one is to reduce the number of parameters, and the other is to make the model have better translation invariance.

is similar to convolution, except that the areas covered by the convolution are overlapping, and the areas of pooling are not overlapping. So when S2/S4 uses 2*2 's pooled template, the width and height of the feature map are reduced to half the original.

Let's borrow UFLDL tutorial about pooling:

..

Output layer:

The/F6 layer of the output layer is fully connected to the previous layer, and the c5-f6-output overall structure can be considered as a multilayer perceptron.

So LeNet-5 is actually made up of three different structures: convolution, pooling, multilayer perceptron . The use of these three structures can also constitute the majority of convolutional neural networks.

convolution:

At present, almost all the publicly released convolution models use the fully connected structure , that is, the feature map of a layer (layer m) is summed by all feature map convolution from the previous layer (layer m-1). However, it is necessary to pay attention to the number of parameters of the model in actual use, and the increase of the number of parameters has great influence on the computational quantity.

Pooling of:

At present, average pooling or maximum pooling is used, which is the average or maximum value of the cell values in each pooled area of the previous Layer feature map.

Output layer:

You can classify the results of the last layer to a classifier (such as a logistic regression, etc.).

References:

[1] Yann LeCun, gradient-based Learning applied to Document recognition, 1998

[2] Theano deeplearning Tutorial

[3] Stanford UFLDL tutorial:http://deeplearning.stanford.edu/wiki/index.php/ufldl%e6%95%99%e7%a8%8b

2. Edge Corner Problem

(1) is the convolution core a study or a pre-defined one?

The training of the whole network is mainly to learn the convolution core.

(2) What are the parameters?

If the convolution Layer outputs 4 feature maps, then it has 4 convolution cores.
The custom actually has the convolution core kernel_size, namely Kernel_width and Kernel_height, and then Num_outputIs the number of feature maps of the output. Another number is actually related to convolutional cores, the number of channel inputs for this convolutional layer.
In general, the number of parameters that determine the convolution core in the convolution layer has a total of 4: num_output, Num_channel, Kernel_height, Kernel_width.


(3) What is the channel? A feature map is obtained for an RGB image through a convolution kernel.

The principle is that the 2D convolution is actually 3D (the dimensions of the convolution kernel should be kernel_height * kernel_height * input_channel), except that the third dimension is exactly equal to the number of input channels, So the volume after the third dimension on the lost, became a flat two-dimensional feature map, so called 2D convolution.

Another way of understanding is that the shape of a convolution core is kernel_height * kernel_height, and there is a input_channel layer, the process of making it and the input image convolution is this: with the first layer of the convolution core and the first channel of the input image to do a 2D convolution, Use the second layer of the convolution core and the second channel of the input image to do a 2D convolution, ..., using the last layer of the convolution kernel and the last channel of the input image to do a 2D convolution, so get a input_channel feature map, and finally this input_channel a Feature map corresponding position added up, finally get a map feature, this is the result of convolution.

Parameter sharing of convolution layer

The parameters of convolutional cores are the input layers of neural networks.

[OpenCV] convolutional Neural Network

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.