Sparse representation of Images--SCSPM and LLC summary
Sparse Encoding Series:
- (i)----Spatial Pyramid Summary
- (ii) Sparse representation of----images summary of--SCSPM and LLC
- (iii)----Understanding sparse coding
- (iv)----sparse model and structural sparse model
---------------------------------------------------------------------------
The previous article referred to the SPM. This blog intends to summarize SCSPM and LLC together. SCSPM and LLC are actually improvements to the SPM. These techniques are a description of the features. They have not created new features (both extracting Sift,hog, Rgb-histogram et al.), and have not used new classifiers (which are also used with SVM for the last image classification), with a focus on how sift, Hog form the characteristics of the image (see Figure 1). From Bow, to BOW+SPM, is doing this step. Speaking of this, afraid will be confused------SIFT, hog itself is not the characteristics of extraction, they have formed a description of the image, why there are all kinds of bow and so on I mentioned later. The question is true, sift and hog they are already extracted, and we'll write them down as x. Now, BOW+SPM is a description of feature x, and it becomes φ (x)-This is equivalent to a deeper layer (deeper) model. A very similar concept is the kernel function in SVM kernel,k=φ (x) Φ (x),x is the characteristic of the input, Φ (x) A layer of abstraction is added to the character of the input (although we do not explicitly define φ (x) with the kernel function). According to Baidu's Kaiyu teacher in CVPR2012 's that tutorial to do the summary [5]:deeper model is preferred, naturally do a deep layer of the abstract effect will be better. And deep learning is the same reason has become fire up.
Again misappropriation some Kaiyu teacher in CVPR2012 of that tutorial on some of the pictures:
Figure (1)
SPM,SCSPM,LLC's work is also focused on the design feature step, not the machine learning step. It is worth noting that we have been in design features, while deep learning is the design feature learners.
The overall flow of the BOW+SPM (2) shows:
Figure (2)
Feature extraction The whole process is to extract the underlying features (Sift,hog, etc.), and then through the coding and pooling, to get the final feature representation.
----Coding:nonlinear mapping data into another feature space
----Pooling:obtain Histogram
and SIFT, hog itself is a coding+pooling process, so BOW+SPM is a two-layer coding+pooling process. so it can be said that SIFT, surf and other characteristics of the proposed, is to find a better first-tier coding+pooling approach, and the SPM, SCSPM, LLC's proposed, is to find a better second layer of coding+pooling approach. The better Coding approach proposed by SCSPM and LLC is sparse Coding.
Figure (3)
Before summing up the SCSPM, I have to say something. Why do you have SPM→SCSPM? One reason is to look for better coding + better pooling to improve performance, the second reason is to improve speed. How to improve speed? The speed here is not the speed of the coding+pooling, but the speed of the classifier. The SPM design is a linear feature, in which the author uses the nonlinear SVM (Mercer Kernels) in the experiment. Compared to linear svm,nonlinear SVM, the speed of training and testing is slower. As for its reasons, we might as well look at the dual form of SVM:
(1)
If the kernel function is a linear kernel:k (z, zi) =ztzi, then the decision function of SVM can be rewritten as:
(2)
From the two can see, throw aside the complexity of training and storage, for testing, (1) For each test sample to calculate the K (z, Zi), so the time complexity of testing is O (n). The (2) type of WT can be calculated in advance, so the time complexity of each testing is O (1). In addition, the scalability of the linear classifier is better.
Therefore, it would be best if you could design a linear feature description after coding+pooling. Therefore, the ability to design a nonlinear feature + linear SVM is equivalent to or better than linear feature + nonlinear SVM, and becomes the focus of research in SCSPM and LLC.
SPM uses HARD-VQ in coding, meaning that a descriptor can only be projected onto a term in the dictionary. This results in significant reconstruction errors (worse reconstruction,large quantization errors). In this way, a very similar descripors will become very dissimilar after coding. SCSPM This constraint, it argues that descripor can be projected onto a few terms, not just one. As a result, its objective function becomes:
(3)
where M is the number of descriptor, um represents the projection factor for the first m descriptor on the dictionary v.
It is sparse by using l1-norm to constrain the projection coefficients. The solution problem is called lasso (least absolute shrinkage and selection operator), and it can not get the analytic solution when the sparse result is obtained, so the speed is certainly very slow. For L1-norm and Lasso questions, see here.
Why sparse coding good, mainly have the following several reasons:
1) The reconstructed performance already mentioned is good; [2]
2) sparse helps get salient patterns of descripors;[2]
3) Image Statistics research shows that image patches are sparse signals;[2]
4) Biological Visual Systems Research shows that the sparse characteristics of the signal contribute to learning; [4]
5) Sparse features are more linear and can be divided. [2]
in short, "Sparse coding is a better building block".
After coding, the pooling method used by SCSPM is Max Pooling:zj=max Uij. Compared to the SPM average pooling:zj=1/m *σuij. You can see that average pooling is a linear feature representation, and Max pooling is nonlinear. I am so understanding the preface mentioned in the Linear and nonlinear feature. (@13.08.11: Today, when I wrote about understanding sparse coding, I found it wrong.) Not only is the function of pooling linear, but the VQ coding get U is also linear in relation to X. )
The author concludes that Max pooling is better than average pooling in experiments because max pooling is relatively robust to local spatial variations. And HARD-VQ is not good with Max pooling, because the elements in U are not 0 or 1.
Another interesting result of the experiment was the discovery that SCSPM had a better performance on large codebook size, and the spm,codebook size had little effect on the SPM results. As for why, I do not understand.
The LLC and SCSPM are almost the same and take advantage of sparsity. It is worth saying that, in fact HARD-VQ is also a kind of sparse Coding, but it is a kind of reconstruction error is relatively large sparse coding. The Improvement of the LLC to SCSPM is the introduction of the locality. To make it easy to describe, steal a picture of the paper:
Figure (4)
The picture is so great that it can explain the problem. VQ Needless to say, the focus is between SC and LLC, LLC introduces locality constraints, that is, not only sparse to meet, non-zero coefficients should also be assigned to the similar dictionary terms. The author explains in [4] that locality is important because:
1) The first order approximation requirement of nonlinear function codes is local;
2) locality can guarantee the sparsity of codes, but the sparse can not guarantee the locality;
3) Sparse coding can help learning only if the codes has local nature.
in short, "locality is more essential than sparsity".
The objective function of the LLC is:
(4)
As in (3), (4) can be divided into two parts according to the plus sign: a minimum before the plus sign is to reduce the quantization error (learning the dictionary, confirming the projection factor); One after the plus sign is to make assumptions (including regularization of some parameters). The solution is a closed solution, and there is a fast approximation algorithm to solve the problem, so the speed is faster than SCSPM.
Di describes the distance of Xi to each dictionary term. Obviously this is done to reduce the coefficient of the term corresponding to the large distance.
The biggest advantage of locality is that similar descriptors can be shared between descriptors, so correlation between codes is preserved. and SC in order to minimize the reconstruction error, may introduce the terms, so can not guarantee smooth. HARD-VQ, let alone.
In the experimental section, max Pooling + l2-normalization is used.
At the end of the article, the end of stealing a summary form of a SCSPM first author (and ending it by stealing someone else's icon)
References:
[1] S. LaZebnik, C. Schmid, and J. Ponce. Beyond bags of features:spatial pyramid matching for recognizing natural scene categories. CVPR2006
[2] Jianchao Yang, Kai Yu, Yihong Gong, and Thomas Huang. Linear spatial pyramid matching using sparse coding for image classification. CVPR2009.
[3] Jinjun Wang, Jianchao Yang, Kai Yu, Fengjun Lv, and Thomas Huang. locality-constrained Linear coding for image classification. CVPR2010
[4] Kai Yu, Tong Zhang, and Yihong Gong. Nonlinear learning using local coordinate coding. NIPS2009.
[5] Kai Yu. CVPR12 Tutorial on deep learning:sparse codinG.
-----------------------
jiang1st2010
Reference Please specify Source: http://blog.csdn.net/jwh_bupt/article/details/9837555
Sparse representation of Images--SCSPM and LLC summary