Content
- Overview
- Word Recognition system LeNet-5
- Simplified LeNet-5 System
- The realization of convolutional neural network
Deep neural network has achieved unprecedented success in the fields of speech recognition, image recognition and so on. I have been exposed to neural networks many years ago. This series of articles mainly records some of the learning experiences of deep neural networks.
In the second chapter, we talk about the classic convolutional neural network. I am not going to describe in detail the biological mechanism of convolutional neural networks, because there are too many tutorials on the web to refer to. Here, the main description of its mathematical calculation process, that is, how to do their own programming to achieve the problem.
1. Overview
Think back to the BP neural network. BP network each layer node is a linear one-dimensional arrangement state, and the layer is fully connected to the network nodes of the layer. This assumes that if the node connection between the middle and layer of the BP network is no longer fully connected, it is locally connected. This is the simplest one-dimensional convolutional network. If we extend this idea to two dimensions, this is the convolutional neural network we see in most of the resources. See details:
Left: Fully connected network. If we have an image of 1000x1000 pixels, there are 1 million hidden layer neurons, each of which is connected to each pixel of the image, there are 1000x1000x1000000=10^12 connections, that is, 10^12 weight parameters.
Right: The local connection network, each node and the upper node with the Location Attachment 10x10 window connected, then 1 million hidden layer neurons are only 100w times 100, that is, 10^8 parameters. The number of weight connections is reduced by four orders of magnitude compared to the original value.
We can easily calculate the output of a network node according to the forward transfer process of BP network signal. For example, for a net input that is labeled as a red node, it is equal to the sum of the product of the weight of the previous neuron node value and the red line that are connected to the red wire. This process of calculation, many books are called convolution.
In fact, for digital filtering, the coefficients of their filters are usually symmetrical. Otherwise, the convolution calculation needs to be reversed in half, then multiply and accumulate. Does the above neural network weights satisfy symmetry? I think the answer is no! Therefore, the above-mentioned is a convolution operation, obviously biased. But it doesn't matter, it's just a noun title. Just, the signal processing people, in the first contact with convolutional neural network, brought some misunderstanding of understanding.
convolutional neural network Another feature is weight sharing. For example, on the right-hand side of the graph, the weights are shared, not that all red line labels have the same connection weights. This makes it easy for beginners to misunderstand.
Described above is only a single-layer network structure, the former a&t Shannon Lab Yann LeCun and other people based on the convolutional neural network of a word recognition system LeNet-5. The system was used in the 90 's to identify bank handwritten numerals.
2. Word Recognition systemLeNet-5
In the classical pattern recognition, the feature is usually extracted beforehand. After extracting many features, we should analyze the characteristics of these features, find the most representative characters, and remove the characteristics of classification-independent and autocorrelation. However, the extraction of these features is too dependent on human experience and subjective awareness, the characteristics of the extraction of different properties of the classification has a great impact, and even the sequence of extracted features will affect the final classification performance. At the same time, the quality of image preprocessing will also affect the extraction characteristics. So, how to extract the characteristics of the process as an adaptive, self-learning process, through machine learning to find the characteristics of the best classification performance?
The unit of each hidden layer of convolutional neurons extracts the local feature of the image, maps it to a plane, and the feature mapping function uses the sigmoid function as the activation function of the convolutional network, which makes the feature map have displacement invariance. Each neuron is connected to the local sensation field in the previous layer. Notice that before we say, not the local connections of the neuron weights are the same, but the same plane layer of the same neuron weights , have the same degree of displacement, rotation invariance. Each feature extraction is followed by a sub-sampling layer which is used to extract the local average and two times. This unique two-time feature extraction structure makes the network have high distortion tolerance to the input sample. In other words, convolution neural network can ensure the robustness of image to displacement, scaling and distortion by local sense field, shared weights and sub-sampling.
Below, it is necessary to explain the above-mentioned LeNet-5 deep convolutional network for text recognition.
1. The input image is the size of 32x32, the size of the local sliding window is 5x5, because the boundary of the image is not considered to expand, then the sliding window will have 28x28 a different position, that is, the size of the C1 layer is 28x28. There are 6 different C1 layers, each with the same weights within each C1 layer.
2. The S2 layer is a lower sampling layer. Simply put, the 4-point sample is 1 points, which is the weighted average of 4 numbers. But in the LeNet-5 system, the lower sampling layer is more complicated, because these 4 weighting coefficients also need to be learned, which obviously increases the complexity of the model. In Stanford's tutorial on deep learning, this process is called pool.
3. Based on the same understanding of the previous C1 layer, it is easy to get the size of the C3 layer to 10x10. Only, the C3 layer has become 16 10x10 network! Imagine that if the S2 layer has only 1 planes, then C3 from the S2 layer is exactly the same as the C1 layer obtained by the input layer. However, the S2 layer is composed of multiple layers, so we just need to follow a certain smooth combination of these tiers on it. The following table is given in the LENET-5 system for the specific combination rules:
In a nutshell, for example, for the No. 0 feature map of the C3 layer, each node is connected to the No. 0 feature of the S2 layer, the 1th feature, the 2nd feature, and a total of 3 5x5 nodes. The weights of each feature map in the C3 layer are the same, in turn, and so on.
4. The S4 layer is sampled above and below the C3 layer base, as described above. In the back layer due to the number of nodes in each layer is relatively small, are all connected layer, this is relatively simple, no longer repeat.
3. Simplified LeNet-5 system Simplified LENET-5 system combines the lower sampling layer and the convolution layer to avoid the excessive parameter learning process of the lower sampling layer, and also preserves the robustness of the image displacement and distortion. The network structure diagram is as follows:   &N bsp; Simplified LeNet-5 system includes the input layer, only 5 layer structure, and the original LENET-5 structure does not contain the input layer is already 7 layer network structure. It is very simple to implement the next sample, directly taking its first position on the node value can be. 1. Input layer. The size of the mnist handwritten digital image is 28x28, where the size of the 29x29 is expanded by a complement of 0. So the number of input layer neural nodes is 29x29 equals 841. 2. First floor. Consists of 6 different feature map diagrams. The size of each feature map is 13x13. Note that due to the convolution window size is 5x5, plus the next sampling process, it is easy to get its size of 13x13. Therefore, The second layer has a total 6x13x13 equal to 1014 neuron nodes. Each feature map plus bias total 5x5+1 equals 26 weights need training, a total of 6x26 equals 156 different weights. That is, a total of 1014x156 equals 26,364 lines. 3. Second floor. Consists of 50 different feature map diagrams. The size of each feature map is 5x5. Note that due to the convolution window size is 5x5, plus the next sampling process, it is easy to size as 5x5. Since the previous layer is composed of multiple feature maps, how do you combine these features to form the nodes of the next layer of feature map? The simplified LeNet-5 system employs a combination of all the upper-level feature graphs. That is, the combination of the last column of the original LeNet-5 feature map combo chart. Therefore, a total of 5x5x50 equals 1250 neuron nodes, with (5x5+1) x6x50 equals 7,800 weights, and a total of 1250x26 equals 32,500 lines. 4. Third floor. This layer is a one-dimensional linear arrangement of network nodes, and the previous layer is a fully connected network, the number of nodes set to 100, so that a total of 100x (1250+1) equals 125,100 different weights, at the same time, alsoHave the same number of connectors. 5. Fourth floor. This layer is the output layer of the network, if you want to identify 0-9 numbers, it is 10 nodes. The layer is fully connected to the previous layer, therefore, a total of 10x (100+1) equals 1010 weights, with the same number of connections. 4. The implementation of convolutional neural network online can be downloaded to a lot about convolutional neural network source code, including Matlab, also has C + +. What problems do you need to pay attention to when programming yourself? Because convolutional neural networks use the same algorithm as BP networks. Therefore, using the existing BP network can be achieved. Open source Neural Network code Faan can be exploited. This open source implementation uses a number of code optimization techniques, with double precision, single-precision, fixed-point operation three different versions. Because the classical BP network is a one-dimensional node distribution arrangement, convolutional Neural Network is a two-dimensional network structure. So, in order to map each layer of convolutional neural network into one-dimensional node distribution according to a certain sequence and rules, then, according to this distribution, we can create a network structure of multi-layer inverse propagation algorithm, and then learn the network parameters according to the general BP training algorithm. For the prediction of new samples in real environment, the same signal forward transfer algorithm is used in BP algorithm. Specific details can also refer to an open source code on the Web, the link is as follows: http://www.codeproject.com/Articles/16650/ neural-network-for-recognition-of-handwritten-digi Note: This code has an obvious bug when it comes to creating a CNN. If you see it, I'm going to look at the structural description of the simplified LeNet-5 and find out what the problem is. Literature: http://blog.csdn.net/celerychen2009/article/details/8973218http://www.so.com/s?&ie=utf-8&src= 360se7_addr&q= convolutional Network http://www.baidu.com/s?ie=utf-8&f=8&rsv_bp=1&rsv_idx=1&tn=baidu&wd= convolutional%20networks&rsv_pq=bcacd8ef0009128f&rsv_t=77f8gk0zxxpw9ai2hxduqxm1wx8a4ndfkuwimckrl3o5cro%2fdfcf4spxzkm&bs=experimentshttp://blog.sina.com.cn/s/blog_890c6aa30100yaqi.html
Data mining, CNN and CN---convolutional networks and convolutional neural networks in target detection