Bag of Features (BOF) Image retrieval algorithm

Last Update:2016-03-26 Source: Internet

Author: User

Tags scale image svm idf

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1. First, we use the surf algorithm to generate the feature points and descriptors for each graph in the image library.
2. The K-means algorithm is used to train the feature points in the image library to generate the class heart.
3. Generate each image of the BOF, the method is: To determine the image of each feature point and which kind of heart recently, recently put into that kind of heart, and finally will generate a series of frequency tables, that is, the initial right BOF.
4. The final BOF is generated by adding weights to the frequency tables by TF-IDF. (Because each class heart has a different effect on the image.) For example, the supermarket bar code in the first place is always 6, it has no effect on the identification of products, so the weight of the important decrease).
5. A 3.4-step operation is performed on the image that the query comes in, generating a list of BOF for the query graph.
6. The BOF vector of query and the BOF vector of each picture in the image library are angled, the matching object is the smallest angle.
The application of LSH in image retrieval can realize fast searching, which solves the problem of high-dimensional feature query under the guarantee of certain probability, but the author uses lsh combined with SIFT feature to practice the image retrieval experiment, because each image involves hundreds of features, then when querying a picture, it is necessary to carry on the characteristic query of the upper and second. Even if the feature points of the query picture are filtered to 50%, the number of feature queries that a picture query needs to make is also not a small glimpse. So is there a way to make all the eigenvectors of any image with a vector of a fixed dimension, and this dimension does not change depending on the number of feature points in the image? The method to be addressed in this article solves this problem, although it is not born of this problem.

The Bag-of-words model originates from the text categorization technique, in which it assumes that for a text, it ignores its word order and grammar and syntax. Think of it as just a collection of words, or a combination of words, each word in the text is independent, does not depend on the presence of other words, or the author of this article chooses the words in any position independent of the previous sentence.

An image can be regarded as a document object in which different local areas or their features can be considered as the words that make up an image, in which a similar area or its characteristics can be regarded as a word. In this way, the method of text retrieval and classification can be used in image classification and retrieval.

Accelerating bag-of-features SIFT algorithm for 3D Model retrieval
The Bag-of-features model is modeled as a Bag-of-words method in the field of text retrieval, describing each image as an unordered set of features of a local area/key point (Patches/key Points). Clustering of local features using a clustering algorithm (such as K-means), where each cluster center is considered to be a visual term in a dictionary (visual Word), which is equivalent to a word in a text search, and a code word formed by the corresponding feature of the cluster center in a visual term To represent (as can be seen as a characteristic quantization process). All visual words Form a visual dictionary (visual Vocabulary), corresponding to a codebook (code book), a set of code words, the number of words contained in the dictionary reflects the size of the dictionary. Each feature in the image will be mapped to a word in the visual lexicon, which can be achieved by calculating the distance between features, and then counting the occurrences or occurrences of each visual term, which can be described as a histogram vector of the same number of dimensions, i.e. bag-of-features.
http://img.blog.csdn.net/20131002212031828?watermark/2/text/aHR0cDovL2Jsb2cuY3Nkbi5uZXQvY2hsZWxlMDEwNQ==/font/ 5a6l5l2t/fontsize/400/fill/i0jbqkfcma==/dissolve/70/gravity/center
Bag of Features Codebook Generation by self-organisation
Bag-of-features is more used in image classification or object recognition. Under the above thought, the Bag-of-features feature is extracted from the training set, and under some supervised learning (such as SVM), the bag-of-features eigenvector of the training set is trained to obtain the classification model of the object or scene. For the image to be measured, the local feature is extracted, To calculate the characteristic distance between local features and each code word in the dictionary, select the nearest distance code word to represent this feature, establish a statistic histogram, count the number of characters belonging to each code word, that is, the bag-of-features feature of the image to be measured, and under the classification model, the feature is predicted from the realization of the classification of the measured image.
Classification Process
1, local feature extraction: through segmentation, dense or random acquisition, key points or stable regions, significant areas, such as the formation of different patches images, and obtain the characteristics of each patches.

Among them, the sift characteristic is more popular.

2. Build a visual Dictionary:

The visual lexicon represented by the cluster Center forms a visual lexicon:

3, generate code book, that is, structural bag-of-features features, also known as local feature projection process:

4, SVM training BOF characteristics of the classification model to treat the image of the BOF feature prediction:

Retrieval Process
The application of Bag-of-words in CV first appeared in Andrew Zisserman[6] in order to solve the video scene search, it proposed to use the Bag-of-words key point projection method to represent the image information. Later more researchers attributed this method to Bag-of-features, and used for image classification, target recognition and image retrieval. On the basis of the Bag-of-features method, Andrew Zisserman further draws on the TF-IDF model in text retrieval (term Frequency inverse Document Frequency) To calculate the bag-of-features feature vector. Next, we can use the reverse indexing technology in the text search engine to index the image and efficiently carry out the image retrieval.

Hamming embedding and weak geometric consistency for large scale image search
The process of implementing the retrieval is not fundamentally different from the process of classification, but more of the difference in detail processing:
1, local feature extraction;
2, construct the visual dictionary;
3. Generate the original BOF characteristics;
4, the introduction of TF-IDF weight value:
TF-IDF is a commonly used weighted technique for information retrieval, which evaluates the importance of words for one of the documents in a file database in text retrieval. The importance of words increases in proportion to the frequency with which it appears in the file, but decreases inversely as it appears in the file database. The main idea of TF is: If a keyword appears in an article high frequency, indicating that the word can characterize the content of the article, the keyword in other articles rarely appear, it is considered that the word has a good classification of the category has a great contribution. The main idea of IDF was that if there were fewer files in the file database that contained word a, the greater the IDF, the better the category-sensitivity of word a.
Word frequency (term FREQUENCY,TF) refers to the number of occurrences of a given term in the file. For example: TF = 0.030 (3/100) indicates that the word ' A ' appears 3 times in a document that includes 100 words.
The inverse document frequency (inverse documents FREQUENCY,IDF) is a description of the universal importance of a particular word, if a word has been found in many documents, indicating that it is not strong in the document, it gives a smaller weight, and vice versa. For example: IDF = 13.287 (log (10,000,000/1,000)) indicates that 1,000 of the total 10,000,000 documents contain the word ' A '.
The final TF-IDF weight is the product of the frequency of the word and the inverse document.
5. Generate the same right-weighted BOF feature for the query image;
6, query: The first is measured by the cosine distance, as to the method of indexing has not been learned, hope spectators pointing.
Issues
1, using K-means clustering, in addition to its K and initial cluster center selection problem, for massive data, the huge input matrix will make memory overflow and inefficient. There is a way to extract some training set classification in a large amount of pictures, using naive Bayesian classification method to automatically classify the remaining images in the gallery. In addition, since the Image crawler constantly update the background image set, the cost of re-clustering is obvious.
2, the choice of dictionary size is also a problem, the dictionary is too large, the word is not general, the noise is sensitive, the computation is large, the key is the image projection of the dimension is high; the dictionary is too small, the word difference performance is poor, the similar target features can not be expressed.
3, the similarity measure function is used to classify the image feature to the corresponding word of the word book, it involves the choice of linear nucleus, the collapse distance measure nucleus, the histogram cross nucleus and so on.
4, the image is represented as an unordered local special collection of feature package method, lost all the information about the spatial characteristics of the layout, in the descriptive has a certain limited. To this end, Schmid[2] proposed a bag-of-features based on the space pyramid.
5, Jégou[7] proposed Vlad (vector of locally aggregated descriptors), the method is as if the BOF first established a codebook containing K visual word, and unlike the BOF will be a local Descriptor is categorized into the nearest visual word using NN, Vlad calculates the difference between the local descriptor and each visual word (c-i) on each component, forming a new Shangshilai representation of each component's gap.
Resources
Bag-of-words classifiers (Matlab)
Bag of Words/bag of features MATLAB source code
One with bow| Pyramid BOW+SVM matlab Demo for image classification

Bag of Features (BOF) Image retrieval algorithm

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More