Why bow and Lsh_ image retrieval is still used in image retrieval

Source: Internet
Author: User

At the end of last year, in a blog, the framework of Ann was used to explain the bow model [1], and the hash methods such as lsh[2] were compared, and the conclusion was drawn that bow was a learned hash function. Earlier last year, the sparse representation models such as llc[3] were briefly introduced, and the relevant papers almost unanimously concluded that these sparse representations had a consistently better performance in image recognition than bow. Then I began to have two questions:

1 bow in search of better than LSH, then why not at any time with bow instead of LSH it.
2 If the new methods proposed by SCSPM,LLC are uniformly better than bow, can you use these sparse models instead of bow to represent the characteristics of the image.


A cursory thought, the heart gradually to these two questions have the answer. This blog post I try to talk about the Bag-of-words model and the necessity of LSH existence in the retrieval question.

I. Review of LSH

Lsh method itself has been introduced in a number of articles, you can refer to here and here. The main idea is to carry out multiple random projections of all points in the feature space (which is equivalent to the random division of the feature space), the nearer the point, the more likely the value of the random projection will be the same. Usually the value after the projection is a binary code (0 or 1), then the Point XI after the N-time random projection can get an n-dimensional two-value vector Qi,qi is the XI after LSH encoded value.
The problem is that LSH is a random projection (see Figure 1), and it is mentioned in the previous blog that this random fact does not fully utilize the actual distribution information of the sample, so N needs to take a very large number to achieve good results. Therefore, in [2] The author naturally thought of the lsh projection function of learning (with BOOSTSSC and RBM to do learning), the effect can be seen in Figure 3. The learned LSH can make better distinctions through fewer projection functions. This is a bit like the role of bow (all by learning to the original feature space division), but the bow of the feature space is non-linear (see Figure 2), and LSH is linear.


Figure 1


Figure 2


Figure 3


Ii. LSH VS BOW: What features are encoded when retrieved.
(The following introduction to LSH will not distinguish between using BOOSTSSC and RBM for learning). LSH is generally a lsh of the global character of the image. such as the image of the GIST,HOG,HSV and other global characteristics. It can be said that the LSH is to encode a feature into another feature. This has a bit of a reduced dimension flavor. After the N-time random projection, the feature is reduced to a two-value feature with a length of N.
Bow is generally the image of the local characteristics to do coding, such as Sift,mser. Bow is to encode a set of characteristics (local features) into a feature (global feature) with a aggregation property. This is the biggest difference between it and LSH.

Iii. LSH VS BOW: How the process of retrieving and sorting differs.
First, say LSH. Assuming that two samples of X1 and Y1 are LSH encoded Q1 and Q2, the similarity between the two samples can be calculated as follows:

(1)

This is the Hamming distance between the two samples after the LSH code. Let's say we have a dataset that remembers the picture in the DataSet as Di. There is a query to the picture in query, remember to do Q. Assuming that all the pictures of the dataset and query have been LSH encoded, there are two ways to retrieve the image:
A a hash table is established, and DI encoded code is hashed (the key value). Each di has a unique key value. Query after coding, in this hash table to find, usually with query no more than D bits different codes, is considered to be with query neighbor, also the key values under the image retrieved. This is very fast (almost no calculation), the disadvantage is that the hash table will be very large, the size is.
b if n is greater than 30 (at which point the hash table in (a) is too large), it is usually exhaustive search, that is, to calculate the Hamming distance of Q to Di according to the (1) type, and do the sorting. Because it is binary code, the speed will be very fast (12M images can get results without 1 seconds).

Let's say bow How to do the search (which everyone is familiar with). Assuming that the global eigenvector of the image has been obtained through bow, the similarity of two vectors is usually determined by calculating the histogram distance of the two vectors, and then sorted. Because the bow feature is sparse, you can use inverted indexes to improve retrieval speed.

Four, bow can replace LSH

Bow is a mapping from a set of features to a feature. You might say that when "a set of characteristics" is a feature (that is, a global feature), bow can not also be used to encode the global features. This is not good because at this point bow is not equivalent to LSH. Why, then? An image can only extract a gist vector, after bow encoding, the entire vector will only be 1 bin on the value of 1, while the other bin on the value of 0. As a then, the similarity between the two images is either 0 or 1. Imagine that in a real image retrieval system, the similarity in the dataset is either 0 or 1, and similar picture similarity is 1, and is two-graded, almost impossible to measure the degree of similarity. Therefore, bow is still more suitable and local characteristics with the use of. In fact, the LSH index A) method is similar, hash value (codes) The same image is not similar to the degree. This is true, but there is still a difference between LSH and bow, that is, the LSH code will not be as extreme as bow (the entire vector has only 1 values of 1, the other value is 0). So the similarity calculated by the 1 formula can still reflect the original similarity of the two features. So in comparison with the overall characteristics of the time, or lsh more useful.

V. LSH can replace bow

Bow when dealing with local features, the equivalent of two images to do a point match between. If we put all the possible lsh of the encoding into a one-dimensional vector, I think that to some extent, it has a bow similar effect.

Vi. Can the LLC replace bow

No, it's not totally OK. Although LLC performance is better than bow in identifying problems, bow codebook can be trained to be very large (up to 1000000 dimensions) due to hkm[4 and akm[5]. and the LLC and other learning methods are not so lucky, said that the day to go to the tens of thousands of-dimensional bar. Although the bow performance is not so good under the same dimension, the advantages are reflected in the 1 million dimension. So in the search question, bow is still so popular.


----------------------------

Reference documents:

[1] Video google:a Text retrieval approach to Object Matching in videos

[2] Small codes and Large Image Databases for recognition

[3] locality-constrained Linear coding for Image classification

[4] Scalable recognition with a vocabulary

[5] Object retrieval with large vocabularies and fast spatial matching-----------------jiang1st2010 reproduced please indicate the source: HTTP://BLOG.CSD n.net/jwh_bupt/article/details/27713453

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.