Pcanet:a Simple deep learning Baseline for Image classification?----Chinese Translation

Last Update:2018-08-06 Source: Internet

Author: User

Tags dnn

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

A summary

In this paper, we present a very simple image classification deep learning framework, which relies on several basic data processing methods: 1) Cascade principal component Analysis (PCA), 2) Two value hash coding, 3) chunking histogram. In the proposed framework, the multi-layer filter kernel is first studied by PCA method, and then sampled and encoded using two-valued hash coding and block histogram feature. Therefore, the framework is called pcanet, and it is easy to design and learn. For comparison and better understanding, we also introduced and studied two similar frameworks for pcanet: Randnet and Ldanet. They have the same topological structure as the pcanet, but the randnet filter cores are randomly initialized, and the ldanet filter cores are learned by linear discriminant analysis. For different tasks, we have tested these frameworks using a variety of benchmark visual datasets, including LFW for face authentication, Feret datasets for face Recognition (extended by Yale B,ar), and mnist datasets for handwritten digital recognition. Surprisingly, for all tasks, this seemingly simple pcanet model has some advantages over complex hand-selected feature recognition or the most advanced features that are carefully learned through deep neural networks (DNN). Even more surprisingly, the model has reached new heights in mnist datasets and many of the classification tasks on the Feret datasets extended by Yale's B,ar. Experiments on other public datasets also show the potential of pcanet as a simple but highly competitive baseline for texture classification and object recognition.

Keywords: convolutional neural networks, deep learning, pcanet,randnet,ldanet, face recognition, handwritten digit recognition, object classification.

Second Introduction

Image classification based on visual semantics is a very challenging task, largely due to different illumination conditions, mismatched misalignment, deformation factors, occlusion factors, and usually a large amount of variability within each type of image. We have made a lot of efforts to counteract variability, such as low-level features that are manually designed for classification tasks. Typical examples are the Gabor features and LBP for texture and facial classification as well as the SIFT and hog features for object recognition. Although the underlying features of these hand-picked are good for handling tasks in specific situations, the generalization capability of these features is limited, and new features are often required for new problems to be constructed.

Learning from the data is considered to be a good way to overcome the limitations of hand-selected features, and the most typical example is the use of deep neural networks for feature learning, a network that has recently aroused great concern. The basic idea of deep learning is to construct multi-layer network, multi-layer representation of targets, in order to express the abstract semantic information of data through multilayer high-level features, and to obtain better feature robustness. A key and relatively successful framework in deep learning at present is the Convolutional network framework. A deep convolutional network structure consists of a multi-layered training structure and supervised classifiers, each of which contains three sub-layers, namely convolution layer, non-linear processing layer and lower sampling layer.

In order to learn the filter cores in the CONVOLUTIONAL network structure, a variety of techniques have been proposed, such as: RBM and regularization automatic coding and their variants for this convolutional network structure generally through the random gradient descent method to train it. However, to learn to obtain a high-quality volume and network classification structure, need to have a variety of assistant experience and skills.

In spite of the different visual recognition tasks, many kinds of convolutional neural network structures have been proposed, and the results have been achieved remarkably. But it can be said that the first example with a clear mathematical proof is the wavelet scattering model (scatnet). It avoids the steps of algorithm learning by changing the convolution nucleus to the wavelet nucleus. However, this simple change makes it possible to surpass the same level of convolutional networks and deep neural networks in handwritten numeral recognition and text recognition, but it does poorly in face recognition and is difficult to cope with illumination changes and occlusion effects.

A Motive and purpose

Our first goal was to try to propose a solution to the variance before the convolution model and the wavelet scattering model, and we wanted to achieve two goals: first we hope that the deep learning model we have designed is simple in structure, can be trained and classified under different data and scene tasks, and secondly, Hope to provide a basic reference standard for deep study and application of deep learning.

To achieve these objectives, we use the most basic PCA filter as a convolution filter, using two-valued hash coding in the non-linear layer, using the block extension histogram in the resampling layer and supplemented with two-value hash coding, the output of the resampling layer as the final feature extraction result of the whole pcanet network, Considering the above factors, we named this concise deep learning structure as pcanet. Figure one shows a 2-order pcanet feature extraction process:

The important feature of pcanet is the simplicity of its structure, such as only PCA mapping per layer, binary hash coding and histogram block processing only at the last layer of output layer, which seems to challenge existing traditional deep learning models such as convolutional network architecture and wavelet distributed network architecture. However, a large number of experiments have shown that the depth model of the boulevard to Jane is satisfactory in different types of databases.

A network structure that is very similar to Pcanet is a second-order PCA (OPCA), which was first applied to the audio processing process. There is a significant difference between the two things OPCA there is no connection hash coding and local block histogram processing, for additional noise covariance matrix, OPCA has additional noise and deformation robustness, of course pcanet also absorbs OPCA this advantage, the noise has a good robustness. Finally, we will do some extended research on pcanet, including the use of linear discriminant analysis to train convolution cores (ldanet), and the random initialization method to pcanet convolution cores (randnet). In this paper, we will compare the performance of pcanet with the existing deep learning model (convolutional network model, wavelet dispersed convolution model, etc.) through a large number of contrast experiments, hoping that we can have a better understanding of pcanet through experiments.

The work done by B

Although our initial goal was to provide you with a basic standard for the performance of a lateral deep learning model by building a simple depth model framework, we were amazed by the various experiments: this basic pcanet framework, which shows the superiority of some major databases, such as face recognition, Handwritten font classification, text categorization and so on, has been able to compare with the current relatively mature deep learning model. Taking single-sample face recognition as an example, it achieves 99.58% accuracy on Yale B database, achieves 95% recognition rate on the illumination subset of AR database, achieves 97.25% accuracy rate on Feret database, and achieves 95.84% on two subsets of DUP-1 and DUP-2 respectively. And 94.02% of the correct rate. On the LFW database, the model achieves 86.28% accuracy in unsupervised face certification testing. The model also performed well on the Mnist database. Through experiments, we can fully prove that pcanet can learn to get the robust features suitable for classification.

The pcanet model has little to do with any new depth-learning approach, and most of the research we do is based on previously accumulated experience. However, this simple model has shown great value in deep learning and visual image recognition, and this has given us two important messages: on the one hand, pcanet can serve as a concise but highly competitive depth model judging criterion; The great success of Pcanet is largely due to its hierarchical feature learning structure. More importantly, because cpanet only once linear mapping after binary hash coding and histogram chunking, it can prove its validity from the angle of mathematical analysis and judgment. This model is useful for us to gain a deeper insight into the theoretical basis of the depth structure, and this is where we urgently need to work now.

three Cascade Linear Network structure (Pcanet)

The network structure of a pcanet

Suppose we have n training picture samples, each sample size is M*n, the filter size of each layer is set to K1*K2, Figure 2 is a typical pcanet model, only the PCA filter kernel needs to learn from the training sample concentration, We will describe in detail the meanings of each part of the diagram based on this network structure.

1) First layer (PCA)

For each pixel, we do a k1*k2 block sampling around it (which is sampled at pixel-by-cell, so it is a full overlay sample), and then collects all the sample blocks, cascade them, as a representation of the picture of I

which

The next step is to make a 0-value interpolation of the sampled blocks, as follows:

For each of these elements, do the following:

The other images in the training set are then treated in the same way, and the resulting training sample matrix is processed:

It is assumed that the number of filters in layer I is the LI,PCA algorithm to minimize the refactoring error by looking for a series of standard orthogonal matrices:

The solution to this problem is the classical principal component analysis, the first n eigenvectors of the covariance matrix of Matrix X, so the corresponding PCA filter is represented as follows:

The meaning of this equation is to extract the feature vectors corresponding to the L1 maximum eigenvalues of the covariance matrix of X to form the feature map matrix. These principal components retain the primary information for these 0-mean training samples. Of course, similar to DNN and scatnet, we can also extract more abstract features by increasing the number of network layers.

2) Second layer (PCA)

The mapping process of the second layer is basically the same as the first layer mapping mechanism, first the PCA mapping output of the first layer is computed:

Note that the edge-0 operation of the sample needs to be implemented before the convolution map is computed to ensure that the mapping result is the same size as the original image (the convolution operation results in a smaller size). As in the first layer of the block operation, in the second layer is also the input matrix (that is, the first layer of the mapping output) block sampling, cascading, 0 is the value of:

This is done for each input matrix, resulting in the block sampling form of the second input data:

Similarly, the second-level PCA filter is also composed by selecting the eigenvector corresponding to the covariance matrix:

Since the first layer has a L1 filter core, the first layer will produce L1 output matrix, in the second layer for each feature of the first layer of the output of the proof, corresponding to produce a L2 feature output. Finally, for each sample, the second-order pcanet will produce a characteristic matrix of l1*l2 output:

As can be seen, the first and second layers are structurally very similar, so it is easy to extend the pcanet into a deeper network structure with more layers.

3) output layer (hash coding and histogram processing)

We perform a binary processing of each output matrix in the second layer, and the resulting results contain only integers and 0, which are then hashed and encoded with the same number of bits as the second layer of filters:

The function h () here is similar to a unit-step function. After the above processing, each pixel value is encoded as an integer between (0,255) (in the case of l2=8). Of course, the coded values here are not associated with weights, as we treat each coded value as a separate keyword.

For each output matrix of the first layer, it is divided into B block, calculate the histogram information of each block, then cascade the histogram features of each block, and finally get the Block expansion histogram feature:

Overlapping and non-overlapping block patterns can also be used for histogram chunking, depending on the situation. The experimental results show that the non-overlapping blocks are suitable for face recognition, overlapping block patterns are used for handwritten numeral recognition, text recognition, target recognition and so on. In addition, the histogram feature adds some transformation stability to the features extracted by the pcanet (for example, scale does not deform).

Note: The above is the original text of the basic description and principle of pcanet, the rest of the pcanet performance analysis, experimental results and other content is no longer translated, this part of the content directly referenced in the original data can be.

Ref: 50039573

Pcanet:a Simple deep learning Baseline for Image classification?----Chinese Translation

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More