Introduction
Recently SVMs using spatial pyramid matching (SPM) kernel have been highly successful in image classification. despite its popularity, these nonlinear SVMs have a complexity in training and O (n) in testing, where N is the training size, implying that it is nontrivial to scaleup the algorithms to handle more than thousands of training images
The cost of Nonlinear SVM is huge.
In this paper we develop an extension of the SPM method, by generalizing Vector Quantization to Sparse Coding followed by multi-scale spatial Max pooling, and propose a linear SPM Kernel Based on sift sparse codes.
In recent years the bag-of-features (BOF) model has been extremely popular in image categorization. The method treats an imageAs a collection of unordered appearance descriptors extracted from local patches, quantizes them into discrete "Visual words", and then computes a compact histogram representation for semantic image classification
The method partitions an image into segments in different scales L = 0; 1; 2, computes the BOF histogram within each of21 segments, And finally concatenates all the histograms to form a vector representation of the image. In case where only the scale L = 0 is used, SPM reduces to BOF.
Replacing VQ with sparsecoding
Furthermore, unlike the original SPM thatPerforms spatial pooling by computing histograms, Our approach, called scspm,Uses Max spatial poolingThat is more robust to local spatial translations and more biological plausible
Use Max pooling to replace spatial pooling
After sparse encoding, a linear classifier can achieve good results.
Despite of such a popularity, SPM has to run together with nonlinear kernels, such
AsThe intersection kernel and the chi-square Kernel, In order to achieve a good performance, which requires intensive computation and a large storage.
Cross-core, Chi-square Core
Linear SPM using sift sparse Codes
VQ
In the training phase, the base vector V is learned, and the base vector coefficient U is learned in the test phase.
Sparse encoding adds the sparse constraint to the loss function
Same as VQ, the training phase is based on (over-complete) and the test phase is sparse.
Advantages: less reconstruction error; captured image features are prominent; image blocks are said to be sparse Signals
Note: Local Sparse Coding
Therefore, the VQ used for listening voting will cause a large quantitative error, even if the nonlinear SVM is used, the effect is not obvious, and the calculation cost is high.
In this work, we defined the pooling function f As a max pooling function on the absolute sparse Codes
It is said that this Max pooling has a biological basis ~~ And more robust
Similar to the Construction of histograms in SPM, we do Max pooling Eq. On a spatial pyramid constructed for an image.
Cause analysis:
This success is largely due to three factors: (1) SC has much less quantization errors than VQ; (2) it is well known that image patches are sparse in nature, and thus Sparse Coding is special suitable for image data; (3) the computed statistics by Max pooling are more salient and robust to local translations.
Implementation
1, Sparse Coding
Returns the loss function equation of SC. If u is fixed or V is fixed, it is convex, But if both are not fixed, it is non-convex. Therefore, the traditional solution is to fix one problem and solve the other problem. The feature-sign search algorithm proposed by Alibaba Cloud is faster.
Determining that the base V is online can achieve real-time determining of the feature expression Coefficient
2. multi-class linear SVM
Lbfgs
Linear spatial pyramid Matching Using Sparse Coding for Image Classification