Recently has been looking at convolutional neural network, want to improve the improvement to make something new, read a lot of papers, wrote a review of Deep learning convolutional neural Network has some new understanding, and share with you.
In fact, convolutional neural network is not a new algorithm, as early as the 80 's has been proposed, but the hardware is limited in computing power, so it was only used to identify the handwritten numerals on the cheque, and applied to the actual. The leading authority of deep learning in the 2006 published an article in science, demonstrating the potential strength of the depth structure in the feature extraction problem, which set off the wave of deep structure research, as a kind of deep structure that already existed and had some application experience, and returned to People's sight again. At this time the hardware is also more than the ability to calculate the quality of the leap, chip chips to GPU have GPU, and open-source caffe framework, so CNN got up.
Throughout the current publication of the CNN literature, most of the focus on the application area. The issue of CNN in domestic journals is not much published, generally from 2012 onwards, and the four major journals on the CNN article has not thought about the big, do not know that they are not willing to vote, or the issue of the subject of the field of hesitant attitude, But there are so many academic papers on CNN that many schools, many teachers and many students are doing this. On the other side of CNN's paper on the relatively more, image recognition, speech recognition and other aspects are covered, and the country is different from the published literature in the theoretical aspects of Kung fu to more, the domestic general is directly to use the CNN directly, the old method of new problems, and the effect is good, CNN, as an important member of deep learning, is really strong.
have been thinking about how to improve the traditional CNN, look at everyone's work, the direction of improvement is either in the structural change, or in the training algorithm changes, the current CNN improvements are basically following this framework.
First, the structural improvement
The traditional CNN is essentially a map stack, as shown in the figure
It is said that it is a tradition, mainly it is the input form, convolution core, cascade form, initialization method are not strict requirements, the use of the most primitive convolution kernel, random initialization. Of course, it is because of his traditional, primitive, it has made it a space for improvement. Let's talk about some of the more successful improvements.
1. Work the input of the network. The traditional CNN, directly to the image as data input into, from the point of view of the sparse representation of the theory that the "pixel itself is the most redundant representation of image speech" viewpoint, but we still want to do some preprocessing of the image, after all, machine vision is inferior to human eyes, People's meat when they look at things may have completed a good variety of pattern classification work, and when we do research, a general study of one or several specific pattern classification problems. Since the problem is specific, in theory will have to deal with this kind of problem, it is like we have to identify the brush on white paper, there is no need to put the entire sheet of the operation, so that the information is sufficient, but the speed is too slow, in turn, if the conditions are ideal, perhaps the direct threshold of a set of OK, although information , but the important information is still, the speed is also fast, the correct rate also can accept, therefore needs the image preprocessing. Visible, not all of the problems are directly to the image directly to enter the OK, do some preprocessing or is very necessary, such as color layering processing, building scale pyramid, extract points what features (Gabor, SIFT, PCA, etc.), are possible, because of the problem. When someone uses CNN to make a significant test, the image is segmented by a single pixel, then the segmented super-pixel is used as the new network input, and three channels are input simultaneously, such as:
2, the characteristics of the integration of efforts. The traditional CNN is the image layer map, map to the final feature extraction results, popular talk is like using sieve sieve millet, one side of the sieve, sieve to the end is the essence, but those in the middle of the sieve out of things, certainly not garbage, but also contains certain information, the image by a certain performance ability, So why not put this part of the mapping results into use, so that the resulting features are not more expressive? Someone in the face recognition when the thought of this, and strive to achieve, such as
He is the mapping results of the various layers of PCA reduced to a combination of, the effect is good.
3. Add limit on convolutional cores. As I said before, traditional CNN is simply a convolution core, so we thought, could we change those convolution cores to Gabor? Wavelet kernel OK? Sparse mapping matrix is also possible, but then the neural network can not be called convolutional neural network, it should be called depth Gabor convolutional network, it is important that no one to do, perhaps later can work hard, but already someone has to improve the convolution kernel to the weighted PCA matrix, Make deep feature face convolution neural network, structure such as:
This looks a bit complicated, in fact, the image is divided into blocks, and then each small block is sent into the depth of the network mapping, map kernel is weighted PCA matrix, and then each layer of mapping results through Codebook aggregation, the final feature representation. In fact, this particular problem to build a specific map of the method in theory is reasonable, for example, before the scene classification, with the gist features have the miraculous, it may be possible to change the convolution nucleus to gist nucleus, in fact, similar to the Gabor core, get a depth gist convolutional neural network to solve the scene classification problem, Perhaps there will be better results, scientific research in the experiment. In fact, the improvement of convolutional nuclei and the previous traditional CNN has been very different, mainly abstract reference to the concept of deep structure, but I think this is the essence of deep learning.
4, combined with other classifiers. Convolution neural network can be regarded as a combination of feature extraction and classifier, and it is similar to the process of feature extraction from the mapping of each layer, and the characteristics of different layers are extracted. But if the map is mapped and then mapped to several tags, it has the function of classification. But I'm more inclined to think of CNN as a feature extraction tool. Since it is feature extraction, it is necessary to match some good classifier, SVM, sparse representation of the device, are good, I believe that the combination of the two can certainly achieve good results, but this part of the work is not many people do, do not know why.
The improvement of the training algorithm
A reference to the improvement of the algorithm, involving more theoretical parts, the difficulty is relatively large, the existing improvements are mainly embodied in two aspects: first, the change of nonlinear mapping function, the second is the unsupervised network training
1. Improvement of nonlinear mapping function
After each map layer of CNN, the result will be processed by a nonlinear function, mainly to adjust the scope of the mapping results. Traditional CNN generally uses the sigmoid function or hyperbolic tangent function (tanh). Later sparse expression arose, people found that sparse things are better, so we hope that the convolution layer mapping results can be as sparse as possible, closer to human visual response. One of the most successful improvements to nonlinear functions is the correction of the linear element (rectified Linear Units,relu), which is used if the convolution calculates a value that is less than 0, so that it is equal to 0, otherwise the original value remains unchanged. This practice is so-called simple and rude, but the results can be very sparse, the experiment shows everything.
2. Unsupervised Training algorithm
In fact, the unsupervised improvement of training algorithm is a very important improvement of CNN, the reason is very simple, deep learning needs a huge amount of data, it is not a simple job to label the massive data, let alone think of the expression, beauty and so on this abstract annotation. CNN's unsupervised improvement is currently relatively successful only a few programs, the most representative of which should be the 2011 J Ngiam and other people proposed sparse filtering (Sparse filtering) algorithm, by constructing a feature distribution matrix (feature distributions), A sparse optimization problem is solved by the characteristic direction of the matrix, and the characteristics of each sample are normalized by L2 norm, and finally a sample distribution sparsity ((Population sparsity), Activation time sparsity (Lifetime sparsity) and high dispersion are obtained ( High dispersal), and it is pointed out that the model of unsupervised deep learning can be formed by multi-cascade expansion of the sample distribution matrices. In fact, this is a bit like sparse representation of a little extension, white is to change the convolution kernel into a sparse dictionary, discarded the original BP algorithm, since it does not rely on BP, nature can also achieve unsupervised. Sparse filtering algorithm Here are one or two words are not clear, here is recommended two articles, one is the author of the original document, the other is its application, these two documents can be downloaded in Google, if the landing Google is difficult to come here to provide you with a stable landing method, one months 10 yuan is not expensive.
(1) Ngiam, Jiquan,koh Pang wei,chen Zheng hao,bhaskar sonia,ng Andrew Y. Sparse Filtering,[c]. Advances in Neural information processing Systems 24:25th annual Conference on Neural information processing Systems,2011 : 1125-1133.
(2) Zhen dong,ming tao Pei,yang he,ting Liu,yan mei Dong,yun de Jia. Vehicle Type Classification Using unsupervised convolutional neural network,[c]. Pattern Recognition (ICPR), 22nd International Conference on,2014:172-177.
FQ Address: Http://honx.in/_VV72a4kWGgZlShi3
CNN as the most widely used network model in deep learning, one of the most influential applications should be the Hong Kong Polytechnic University Prof. Xiaogang Wang Professor team proposed by the Deepid face recognition algorithm, the three-generation algorithm has reached 99%, indeed severe. The above is my one months of understanding of CNN, the wrong place to welcome you to correct, discuss, in addition, because of the blog, many algorithms have not given the original reference documents, need to consult relevant references can leave a message to me.
A new idea of convolutional neural networks