Simple implementation of convolution neural network algorithm

Source: Internet
Author: User

Objective

From the understanding of convolution nerves to the realization of it, before and after spent one months, and now there are still some places do not understand thoroughly, CNN still has a certain difficulty, not to see which blog and one or two papers on the understanding, mainly by themselves to study, read the recommended list at the end of the reference. The current implementation of the CNN in the Minit data set effect is good, but there are some bugs, because the recent busy, the first to do the summary, and then continue to optimize.

Convolution neural network CNN is an important algorithm of deep learning, which shows remarkable effect in many applications, [1] compares the effect of multiple algorithms in document character recognition, and concludes that CNN is superior to all other algorithms. CNN has achieved the best results in handwriting recognition, [2] applying CNN to gender recognition based on human faces, and the results are very good. Some time ago I used the BP neural network for mobile phone pictures of the number of images to identify, the effect is not bad, close to 98%, but in the Chinese character recognition performance is poor, so want to try the test paper accumulated neural network.

1, CNN's overall network structure

Convolution neural network is an improvement in BP neural network, which is similar to BP, using forward propagation calculation output value, backward transmission adjustment weight and bias; CNN's biggest difference from standard BP is that the neural units between adjacent layers in CNN are not fully connected, but partially connected, That is, the sensory region of a neuron is derived from the upper part of the neuron, rather than being connected to all the neural units like BP. CNN has three important ideological structures:

Local region-aware weights sharing space or time sampling

Local area perception can find some local features of the data, such as a corner of the picture, a section of arc, which is the basis of animal vision [3], and BP, all pixels are a jumble of points, the relationship between each other has not been excavated.

Each layer of CNN consists of multiple maps, each map consists of a plurality of neural units, with all the neural units of the same map sharing a convolution kernel (i.e. weight), and the convolution kernel often represents a feature, such as a convolution and represents a section of an arc, then roll the volume kernel over the entire picture. An area with a larger convolution value is likely to be an arc. Note that convolution cores are actually weights, we do not need to compute a convolution alone, but a fixed sized weight matrix to match the image, this operation is similar to convolution, so we call it convolution neural network, in fact, BP can also be regarded as a special convolution neural network, Just this convolution nucleus is a layer of ownership weight, that is, the perceptual region is the entire image. Weight sharing strategy reduces the need for training parameters, so that the training of the model of the Pan-China capacity is stronger.

The purpose of the sampling is mainly to confuse the specific location of the features, because when a feature is found, its exact location is unimportant, and we only need this feature to be relative to other positions, such as a "8", when we get an "o" above, we don't need to know where it is in the image, Just know that underneath it is an "o" we can know is a ' 8 ', because in the picture "8" in the picture of the left or the right does not affect our understanding of it, this confusion of the specific location of the strategy can be distorted and distorted image recognition.

These three features of CNN are strongly robust to the distortion of input data on the space (mainly for image data) and time (mainly for time series data, reference TDNN). CNN generally uses the convolution layer and the sampling layer alternately set, namely one layer of convolution layer to the sample layer, after the sampling layer and then a layer of convolution ... This convolution layer extracts features, and then combine to form more abstract features, and finally form a description of the image object, CNN can also follow the full link layer, the full connection layer and BP. The following is an example of a convolution neural network:

Figure 1 (Photo source)

The basic idea of convolution neural network is this, but the concrete implementation has multiple versions, I refer to the Matlab Deep Learning Toolbox Deeplearntoolbox, here the realization of CNN and the other biggest difference is the sampling layer has no weight and bias, Just a sample process for the convolution layer, the test dataset for this toolbox is minist, each image is 28*28 size, and it implements the following CNN:

Figure 2

2. Network initialization

CNN initialization is mainly to initialize the convolution layer and the output layer of the convolution kernel (weight) and bias, deeplearntoolbox inside the convolution kernel and weight of the random initialization, and the bias for all 0 initialization.

3, Forward transmission calculation

In the forward calculation, the input layer, the convolution layer, the sampling layer and the output layer are calculated differently.

3.1 Input Layer : input layer has no input value, only one output vector, the size of the vector is the size of the picture, that is, a 28*28 matrix;

3.2 convolution layer : the input of the convolution layer is either from the input layer or from the sampling layer, as shown in the Red section above. Each map of the convolution layer has a volume kernel of the same size, and the toolbox is a 5*5 convolution nucleus. Here is an example of for simplicity, the volume kernel size is 2*2, the feature map of the upper layer is 4*4, roll over the picture with this convolution, get a feature map of one (4-2+1) * (4-2+1) =3*3, the convolution kernel moves one step at a time, so. In the Toolbox implementation, a map of the convolution layer is associated with all the maps at the top, as in the S2 and C3 of the above figure, that is, the C3 has a total of 6 * 12 convolution cores, each feature map of the convolution layer is a convolution kernel that accumulates on all maps on the previous layer and adds the corresponding elements to a bias, and ask Sigmod to get it. It is also necessary to note that the number of maps for the convolution layer is specified in the network initialization, the size of the map of the convolution layer is determined by the volume kernel and the size of the input map on the previous layer, assuming that the map size of the previous layer is n*n and the size of the convolution kernel is k*k, then the map size of the layer is (n-k+1) * (n-k +1, such as the map size 24 = (28-5+1) of the 24*24 above. Stanford's in-depth study tutorial the calculation process of convolution feature extraction is introduced in detail.

Figure 3

3.3 Sampling Layer (subsampling,pooling): The sampling layer is a sample processing of the previous layer map, where the sampling method is to aggregate statistics on the adjacent small regions of the previous layer map, the area size is Scale*scale, Some implementations are to take the maximum value of a small area, and the implementation in Toolbox is to use the mean value of 2*2 small area. Note that the calculation window for convolution is overlapping, and the use of the calculation window does not overlap, toolbox inside the calculation of the sample is also used convolution (Conv2 (a,k, ' valid ')) to achieve, convolution core is 2*2, each element is 1/4, remove the calculated convolution results in overlapping parts, That

Figure 4

4, Reverse transmission adjustment weight

The reverse transmission process is the most complex part of CNN, although the basic idea from a macro point of view is the same as BP, is to minimize residuals to adjust the weight and bias, but the network structure of CNN is not as single as BP, for different structure processing mode, and because of weight sharing, make it difficult to calculate residuals, Many papers [1][5] and articles [4] have been described in detail, but I found that there are some details are not clear, especially the sampling layer residual calculation, I will be detailed here.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.