Why is there SPM → scspm? One of the reasons is to improve the performance by searching for better coding + better pooling. The second reason is to increase the speed. How to increase the speed? The speed here is not the coding + pooling speed, but the classifier speed. SPM is designed as a linear feature. In this article, the authors used nonlinear SVM for experiments (using Mercer kernels ). Compared with linear SVM, nonlinear SVM is slower in training and testing. For the reason, we may look at the dual form of SVM:
(1)
If the kernel function is a linear kernel: K (z, zi) = ztzi, then the SVM decision function can be rewritten:
(2)
From the two formula, we can see that the complexity of training and storage is not mentioned. For testing, the (1) formula calculates K (z, zi) separately for each test sample ), therefore, the time complexity of testing is O (n ). The (2) Type wt can be calculated in advance, so the time complexity of each testing is O (1 ). In addition, the scalability of linear classifier is better.
Therefore, it would be best to design a linear and segmented feature description after coding + pooling.Therefore, whether a nonlinear feature + linear SVM can be designed to achieve an equivalent or even better effect than linear feature + nonlinear SVM has become the focus of scspm and LLC research.
Scspm
SPM uses hard-VQ in coding step, that is, a descriptor can only be projected into one term in the dictionary. This results in a significant reconstruction error (worse reconstruction, large quantization errors ). In this way, similar descripors become very different after coding. Scspm removes this constraint. It considers that descripor can be projected onto several terms, not just one. Therefore, the target function becomes:
(3)
M indicates the number of descriptor, and um indicates the projection coefficient of the M descriptor on the dictionary v.
It restricts the projection coefficient by L1-norm to realize sparsity. The solution is called Lasso (least absolute shrinkage and selection operator). When sparse results are obtained, it cannot be parsed, so the speed must be very slow. For more information about L1-norm and lasso, see here.
There are several reasons why Sparse Coding is good:
1) The reconstruction performance mentioned above is good. [2]
2) sparse helps to obtain salient patterns of descripors; [2]
3) image statistics research shows that image patches are all sparse signals; [2]
4) biological visual systems research shows that sparse features of signals are helpful for learning. [4]
5) sparse features are more linearly segmented. [2]
In short, "Sparse Coding is a better building block".
After coding, the pooling method used by scspm is Max pooling: ZJ = max uij. Compared with SPM average pooling: ZJ = 1/M * Σ uij. We can see that average pooling is a linear feature representation, while Max pooling is nonlinear.
In the experiment, the authors found that the max pooling effect is better than average pooling because Max pooling is robust to local spatial variations. Hard-VQ does not use Max pooling, because each element in U is not 0 or 1.
Another interesting result of this experiment is thatScspm provides better performance for large codebook sizes. In contrast, SPM and codebook sizes have little impact on SPM results.
LLC
LLC is similar to scspm, and sparsity is also used. It is worth mentioning that hard-VQ is actually a Sparse Coding, but it is a sparse encoding with a large reconstruction error. LLC introduces locality to improve scspm. For ease of description, the image of the paper is stolen:
Figure (4)
This figure is really great, so it can explain the problem. VQ Needless to say, the focus is on the relationship between SC and LLC. LLC introduces locality constraints, that is, not only sparse needs to satisfy, but non-zero coefficients should also be assigned to similar dictionary terms. As explained in [4], locality is very important because:
1) The first-order approximation of nonlinear functions requires that codes be local;
2) locality can ensure the sparsity of codes, but the sparsity cannot ensure locality;
3) Sparse Coding helps learning only when codes has a locality.
In short, "locality is more essential than sparsity ".
The objective function of LLC is:
(4)
Like (3), (4) can be divided into two parts according to the plus sign: one of the two parts before the plus sign is minimized to reduce the quantitative error (learning the dictionary and confirming the projection coefficient ); one item after the plus sign is a hypothetical constraint (including regularization of some parameters ). This solution can be used to obtain closed solutions, and a fast approximation algorithm can be used to solve this problem. Therefore, the speed is faster than scspm.
Di describes the distance from Xi to each dictionary term. Obviously, this is done to reduce the coefficient corresponding to the long term.
Summary
Scspm mainly improves the hard voting method of SPM.
Scspm and LLC can be understood as feature encoding methods-that is, a new feature vector is provided, how to Use the codebook that was previously clustered to indicate that VQ is a hard voting (the nearest neighbor principle finds the code closest to it ), scspm soft voting (expressed by the combination of different codes and with sparse constraints), LLC soft voting (locality constraints)
From: http://blog.csdn.net/jwh_bupt/article/details/9837555
Scspm & LLC