Image sparse Coding representation

Source: Internet
Author: User

Note: This article learns from CVPR"Linear Spatial Pyramid Matching Using Sparse Coding
For image Classification ","image classification by non-negative sparse coding, low-rank and sparse decomposition "and" image visual feature extraction and application based on sparse coding "

This paper study notes is their own understanding, if there is wrong place, please correct your criticism, common progress!

After extracting the SIFT features of all training images , it is necessary to encode each image visually. The purpose of visual feature coding is to select and transform the original eigenvector, and to get the most expressive and differentiated visual feature vectors in the image, so that the computer can be processed more efficiently. The general encoding method is vector quantization, another kind of visual coding method sparse coding can better represent the image.

1. Vector quantization

The basic idea of vector quantization is to look for the nearest neighbor of the target vector in the base vector space, and then use the number of the base vector to represent the original target vector:


where x for a sift eigenvectors, di is the first in the base vector space i sift k a cluster center, which k sift The base vector to which the eigenvectors belong, to map.

In the process of vector quantization, the basic steps are as follows:
1) The normalization of all training samples;
2) The training samples are clustered, and several kinds of centers are obtained to form the base vector space:
3) Find the nearest neighbor for the target vector in all class centers.

In the BOW model, first clustering produces visual keywords, then vector quantization coding. For the use of SIFT features


Is the sparse encoding of this image.
The advantage of vector quantization is that the calculation is simple, the data compression rate is high, the disadvantage lies in the large precision loss ratio, and it is difficult to meet the requirements in some applications.

2. Sparse coding

The essence of sparse coding is that a target vector can be linearly fitted by a small number of base vectors, and there is some redundancy in the base vector space. The difference from vector quantization is that each vector quantization object vector can be represented by only one base vector. In other words, the constraint conditions of vector quantization are too strict, which will cause the reconstruction error. The difference can be expressed as follows:



Sparse coding for images is generally divided into two processes:

The first is the training process of the base vector, also known as the Dictionary learning. In this process, we use a large number of training samples, learning by unsupervised learning to obtain a set of redundant base vectors, which usually reflect some of the training samples with essential characteristics of the primitive, like the boundary, corner Point, the experiment shows that the learning process of the dictionary simulates the process of information processing in the human visual cortex.

The optimization problem (1) becomes the least squares problem of two constraints, namely:


This optimization problem, in the case of a given X , alternates one variable and trains another variable, so iterative.

The second is the solving process of linear fitting. That is, any target vector xn can be formed by a linear combination of several entries in the dictionary V, the process can obtain different fitting coefficients of UN according to different constraints, and then use the coefficient vectors to represent the image features.

At this point V is known, the image X sparse code gets U, the problem becomes the following solution:


3, the experimental process:

1. Arandom feature point vector is extracted from each image, and an initial training sample Xis obtained. (128*2600)
For II = 1:2,600

Fpath = Training.path{ii};

Load (Fpath);

Num_fea = Size (Feaset.feaarr, 2);

Rndidx = Randperm (NUM_FEA);

X (:, ii) = Feaset.feaarr (:, Rndidx (ii));

End

2, the visual dictionary V Learning
A, the initial visual dictionary V is given by random function, first randomly produces a 128*300 Matrix as the initial visual dictionary

v = rand ( -0.5;%) first randomly generates a visual dictionary v

V = V-repmat (mean (v,1), size (v,1), 1);

V = V*diag (1./sqrt (sum (v.*v)));

b, using the newly obtained V, the sample X is calculated to get U

U = L1qp_featuresign_set (X, V, Lambda);

This function learns to obtain a sparse sample at this time, given the training sample X and the visual dictionary V provided


C, using the newly obtained U, then training to get V
V = L2ls_learn_basis_dual (X, U, pars. Var_basis);

This function learns Vwith the knowledge that the training sample X and the sample sparse code U are given. In the condition


D, iterate The process of the B and C , and finally get the visual dictionary V and training samples X the sparse encoding U .

3, for each image, apply the resulting visual dictionary V, get its sparse code U

4, the following code in some way ( sc_approx_pooling) 21*300 ( u u,uij that is obtained for sparse encoding represents the i block SIFT feature area to section j k dimensional eigenvector to represent this image.

Sc_fea = Zeros (6300, 2600); sparse encoding of all training images

Sc_label = Zeros (2600, 1);

For iter1 = 1:2,600,

Fpath = Database.path{iter1};

Load (Fpath);

Percent of each image is given a sparse encoding matrix ( here, each image of each pyramid layer needs to give a 300*n sparse encoding matrix, and finally press Weights in series all sparse encodings of the graph represent this image as a final sparse encoding )

Sc_fea (:, Iter1) = Sc_approx_pooling (Feaset, V, Pyramid, Gamma);

Sc_label (Iter1) = Database.label (Iter1);

End

To summarize, sparse coding is actually the first of all imagesSIFTfeature is trained to get the base vector is also the visual keywordV. Then for each image, calculate the base vector index of each feature point that it belongs tou,ucontains multiple non-0 coefficients to fit multiple base vectors, resulting in a coefficient encoding of an imageU. WithUmultiply the base vectorVIt can represent an image .X. And then use the method(sc_approx_pooling)sparse representation of each imageUhave been processed to get(number of visual keywords)dimension vectors to represent this image.




Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.

Image sparse Coding representation

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.