In fact, starting from this blog post, we are really into the field of deep learning. In the field of deep learning, the proven mature algorithm, currently has deep convolutional network (DNN) and recursive Network (RNN), in the field of image recognition, video recognition, speech recognition has achieved great success, it is because of these successes, can contribute to the current deep learning of the great heat. Corresponding to this, in the field of deep learning research, the hottest is Autoencoder, RBM, DBN and other production network architecture, but these research areas, although the paper is more, but the weight of the application has not yet emerged, whether it can be successful and uncertain. But there are some preliminary indications that these areas of research are still worth looking forward to. For example, Autoencoder in the field of image, video search, RBM for unstructured data processing, DBN network in combination with the field of artificial intelligence, two major schools of connectivity and symbolism, have a huge prospect, there is reason to expect to produce heavyweight results. We will introduce and implement these networks in a follow-up, in addition to the reconstruction of the Theano implementation code, but also to gradually supplement these algorithms in the actual application of the examples, we will mainly apply these algorithms in the start-up company data, from tens of thousands of start-up companies and investment and financing data, It is hoped to find out which companies are more likely to be invested, and which firms are more likely to be invested by particular companies.
Okay, turn to the point. What we're going to look at today is convolutional networks (CNN), one of the most successful areas of deep learning algorithm applications, mainly used in the field of image and video recognition.
Before discussing convolutional networks (CNN), let's start with our common sense and discuss how to improve the accuracy of image recognition. For example, let's say that we need the network to identify the capital a, and the training sample we give to the network might be:
But the picture we actually need to identify is probably this:
According to our experience, if the alphabet can be moved to the center of the field of view, the difficulty of recognition will be reduced a lot, in favor of improving the recognition rate.
In this case, if we can change the image to the standard size, we can increase the corresponding recognition rate.
For objects of real knowledge, from different angles, there will be different manifestations, even for the letter recognition, the letter can appear rotating:
If the image can be rotated, the recognition rate can be greatly improved.
The above methods, in the actual image recognition field, usually in the form of combination, namely, the elements in the image, need to undergo a series of translation, rotation, scaling, to obtain a similar to the training samples of the standard image, so in the traditional image recognition, the image needs to be preprocessed to achieve this goal. In the neural network image recognition, we also hope that the neural network can automatically process these transformations, in terms of academic terminology, is a translation, rotation, scaling invariance, convolutional network (CNN) is to solve this problem and propose a framework.
So how do you make the neural network have the transformation invariance I want? We know that the rise of neural networks, to a large extent, is the application of bionics in the field of artificial intelligence, we use artificial neural model and its connection to imitate the human brain, to solve some conventional methods can not solve complex problems. For image recognition, the researchers of neural network also hope to improve the accuracy of image recognition by simulating the processing mechanism of the brain visual cortex.
According to the study of Hubel and Wiesel on the visual cortex of cats, the visual cortex cells make up the visual receiving domain and are only responsible for the processing of part of the image signal, processing the local spatial information, such as the edge recognition in the example. At the same time, there are two kinds of cells in the visual cortex, a kind of cell is simple cell, mainly used to identify the basic information of Image edge, and a class of complex cells, with positional invariance, can identify a variety of advanced image information.
Based on the above findings, the researchers proposed a convolutional neural network (CNN) model, which consists of two main features: the first is the sparse connection between layers, and the second is the shared connection weight value.
Inter-layer sparse connection is mainly to simulate the receiving domain of the brain visual cortex, to have simple cells and complex cells of two different types of cells, respectively, to deal with local details and global spatial invariance. First we divide the image pixels into 3*3 regions, so for the input layer of the l=1, the 9 pixels are connected to 9 input layer neurons, and the 9 neurons are connected to only one neuron on the l=2, as shown in: note Because we are drawing a two-dimensional graph, we only show three neurons facing us:
As shown, each of the upper layers is connected to only the neurons of the underlying 3*3, so that the corresponding visual receiving domain of the topmost neurons becomes 9*9. The above structure can be used to represent the original image information, the underlying neurons are mainly responsible for the identification of the edge and other basic information, and the higher the higher the level of recognition, the upper layer can be expressed as we want to differentiate the category. This is in fact compared with the traditional digital image processing, the pyramid model is similar, solves the same kind of problem.
The second feature of a convolutional network is the shared connection weights:
The connections of different colors in the graph have the same connection weights. Therefore, in our convolutional neural Network (CNN) implementation, considering this situation, the original algorithm needs to be modified, the derivation of a single weight to the sum of three weights and derivative, will be detailed in the next blog post.
By sharing the weighted mode, the convolution neural Network (CNN) can recognize the object in the image, and it is independent of the space position of the object, that is, to realize the invariance of rotation, translation and scaling mentioned at the beginning of this article, which will change the position of the object in the image recognition field, size change, observation angle change, The result of identification difficulties, with a very good solution. At the same time, due to weight sharing, reducing the number of network parameters, but also greatly improve the learning efficiency of the network, and therefore become a convolutional neural network (CNN) a de facto standard.
After talking about the fundamentals of convolutional neural Networks (CNN), it's natural to use a platform like Theano to implement your own convolutional neural network (CNN), which we'll discuss in the next blog post.
Practice of deep learning algorithm---convolution neural network (CNN) principle