Image retrieval (CBIR) BOF, VLAD and fv_ image retrieval of Three Musketeers

Source: Internet
Author: User

began to organize the two or three years of their own in the image retrieval some of the information to facilitate the graduation design of the coming year. Here is a list of image retrieval experiments, including the results of their own experiments, at any time to maintain the GitHub on the Image-retrieval synchronous update. Image retrieval based on local feature of SIFT

The BOF model based on SIFT local features is ideal for object retrieval, and the following are some of the experiments you have done on the Oxford Building Database (5063 pictures). The number of words in the table to be clustered, and whether to use sift or rootsift,rootsift how to calculate can read object retrieval with large vocabularies and fast spatial Matching This article, the space correction is in the rearrangement, the mismatch of the sift point to the elimination, the elimination method can be used RANSAC or class RANSAC method, detailed introduction can read Sift (asift) Matching with RANSAC, The retrieval accuracy adopts the average retrieval precision (mean Average Precision, MAP), and its computational process can read the index of information retrieval evaluation. The following needs attention is the query time of the results of a single query, and did not do a number of queries to average, in addition to query time is query and calculate the sum of map time. The number of words is 100k statistics of the indicators. The number of words is set to 100k, using Rootsift, in turn, the different rearrangement depth statistics of its map, where the query time is only as a reference, note that when the rearrangement depth of 1 o'clock, the result is the same as the non rearrangement map.

Number of words SIFT or rootsift space correction number Retrieval Precision Map query time (55 sheets) (s) 100k Rootsift no-62.46% 5.429707 100k Rootsift is 20 66.42% 20.8538 100k rootsift 21.673585 68.25% 100k rootsift 69.27% 23.300404 rootsift 100k 69.83% 23.719468 Rootsi 100k FT is 72.48% 24.180888 100k rootsift 75.56% 31.165427 100k rootsift is 78.85% 46.064313 100k Rootsift is 1000 7 9.93% 70.192928 100k Rootsift is 80.75% 110.999173 100k rootsift is 3000 80.92% 145.799017 100k rootsift is 4000 80.97% 176.786657 100k Rootsift is 5063 80.96% 207.201570

As you can see from the table above, after you rearrange the results, the map will be significantly improved, but not that the depth of the rearrangement is better, you can see from the rearrangement depth of 1000, with the increase in the depth of the row, its ascension has been very small, not only that, its time-consuming query time is getting longer. When the number of words is 500k, the indicators are counted. The same as the description of the indicator statistics, where the number of words increased to 500k.

Number of words SIFT or rootsift space correction number Retrieval Precision Map query time (55 sheets) (s) 500k Rootsift no-74.82% 5.345534 500k Rootsift is 20 77.77% 21.6467 500k Rootsift 79.06% 21.615220 500k rootsift Yes 79.86% 23.453462 500k rootsift is 80.54% 23.588034 500k Rootsi FT is 82.18% 24.942057 500k rootsift 83.35% 30.585792 500k Rootsift is 84.89% 41.023239 500k Rootsift is 1000 8 5.52% 54.836481 500k Rootsift is 85.73% 67.173112 500k Rootsift is 3000 85.77% 80.634803 500k Rootsift is 5063 85.76% 10 3.606303

When the number of words is 1M, the indicators are counted. Here I set the number of words to 1M (already very large), in the cluster phase, its time compared to the previous, the time spent longer, running on the server for about more than 20 hours.

Number of words SIFT or rootsift space correction number Retrieval Precision Map query time (55 sheets) (s) 1M rootsift no-77.64% 5.513093 1M Rootsift is 20 80% 18.864077 1 M Rootsift is 80.81% 18.948402 1M rootsift is 81.44% 21.543470 1M rootsift is 82.02% 23.290658 1M Rootsift is 100 83. 32% 25.396074 1M rootsift is 84.47% 31.414361 1M rootsift is 85.25% 39.314887 1M rootsift is 1000 85.51% 46.913126 1M Rootsift is 85.55% 58.102913 1M rootsift is 3000 85.55% 68.756579 1M rootsift is 4000 85.55% 77.051332 1M RootSIFT is 50 63 85.55% 85.428169

As you can see, the accuracy of all of its rearrangement, compared to 500k, has not been promoted, in order to see more clearly in the number of different words its effect on precision, I have three kinds of statistical results are drawn into the curve, as shown below:

As you can see from the diagram above, in a certain range, under the same rearrangement depth, the larger the number of words, the higher the map, note that within a certain range, when more than a certain range, its map will not be significantly improved, such as 500k and 1M, starting from the rearrangement depth of 500, its precision is almost the same, This tells us not to say that the larger the number of words, the better, we should choose a reasonable number of words through experimental tests, so as to avoid excessive computation and storage space consumption. Similarly, in the selection of the depth of the row, it is not the larger the better, we should choose those in the smooth rotation of the transition in the depth of the row is more reasonable, here, a better solution is the number of words selected 500k, the rearrangement depth is set to 500.

Similarly, I also 100k,500k,1m under the word query time to do a picture, it should be noted that the longitudinal axis of time is 55 query image total time average:

The figure above shows that the query time for statistics is very strange, because with the increase in the number of words, its query time should be more and more long, but it is really getting shorter, here may be because the server many people are using, does not meet a single condition in the changing environment, so the time here is just as a reference to the query time , and can not reflect the theoretical time change trend. Specific Scenario Application

The above analysis is too theoretical, in fact, I also hope that these things can be done in the actual scene of the test, the following on the actual situation I encountered (these applications are mainly to help others do the test) to do as much as possible, although some of the things I have done has been deleted, But the boy still wants to have a chance to fill them all up. Application Scenario 0: Product Search

"Commodity" is a very generic word, as long as the things to trade, can become a commodity, but specific to the image search application here, I have done, to the "merchandise" several categories, currently done and more understanding, mainly clothes, shoes, book cover. In front of two kinds of items, I personally tested, book cover search, with the company to do this kind of product telephone chat. Expand them separately below. Application Scenario 1: Cinema image Search

This project is mainly aimed at the image taken in the cinema search, the scale of about 100,000 to Zhang, the entire system to be made embedded. Image retrieval algorithm design, taking into account the pictures taken in the cinema, there will be a larger rotation, angle, light strong (light darker) changes, selection based on sift local characteristics of the method is more suitable, so the partner sent over the image library with the above method to do the corresponding test, the search results are more ideal, Be able to obtain the cooperation of the search accuracy requirements. In the partner sent to the image Library to do the test, the search effect is more ideal.

At present, the difficulties are: 1. Convert to C + + process Each module basic molding, but some of the core modules still need to spend a lot of time to debug; 2. Embedded in the hardware in the process of many of the problems that do not need to be considered on the server, compare the search time, storage space, how to turn into a pure C to write to the hardware. Apply scene 2:app icon to go heavy

This is a Sir encountered such a problem, and then the Mail asked this kid, is probably the app icon to redirect to the problem of image search, the specific task I am not very clear, and then I gave him a test. After looking at his app icon Library, I think this should be done with the most basic features, such as color, because the same app icon is either a difference in resolution, or the app icon is added.

Overall, the difference should be very small, with color can be solved very well, but this plan I did not test, directly with the above ready-made plan. The reason is so direct use, because this is also doing the same task search, sift is very suitable and very good at doing this, but it is possible to encounter the problem is, because the app image resolution is too small, may lead to detection of the SIFT feature points, will have a greater impact on the accuracy of the search, Experiments in the extraction of SIFT features, it is true, and some of the app to extract the sift only a few, but the final result is better. I'm guessing, with the color features, it might be a better result than this. Here is a partial search visualization (only two, too many, not one for each):

For the above figure, if the figure cubby larger, with the color feature might be a little bit more difficult for the same paragraph in the second row, let's take a look at the following:

For this case, with the color characteristics, very hard, because behind the green "where's Daddy" with the previous query image color Histogram will vary greatly, if from this point of view, in the app icon search, color does not seem to be a good feature, but still want to do all the query, Then look at the average retrieval accuracy. Application Scenario 3: Face Search

To be supplemented application Scenario 4: Die Image Search

This is a recent task to test, in this task is not only to do the same search, but also do the same kind of search. The other side in the initial development, probably with the other's algorithmic engineers chatted, feel the other side in the use of the method to complete the task of lack of experience and accumulation, anyway, I still give each other free to provide a feasible solution, and the program did the corresponding verification. Back to the previous two requirements, one is to be able to search the same mold, the second is similar to the mold should also be searched. In the design of the method, the best way is to have some kind of retrieval method, so that some of the first search of the image is the same stencil, and the following image is the query image with a similar appearance of some mold images. You can look at some of the illustrations in the image library:

You can see that the mold changes in the gallery is still relatively large, when the other side of the need to search for the same mold, I think of the method is this use of the above word bag model should feel better, because sift for the rotation of objects, light, scale changes, etc. have a good anti-interference, so with the above framework to do some testing, Here is a query result for the test:

From the above, we can see that the query results of the image is quite satisfactory, the 2nd row of the 1th column of the image slightly back, but I do not think that the impact of use, the subsequent results are almost all of this kind of small items, we will take a look at another image of the query results:

Compared to the above-queried belt buckle image, this query here The result looks worse, the reason for such a query results, mainly people sent over the image of the belt buckle Ring, and other, including appearance and other looks are very different classes, so the results of the search to look rather bad.

After the analysis of the same object retrieval, we come back to sum up the characteristics of the word bag model, using the SIFT characteristics of the word bag model, although it can partially deal with the problem of similar search, but it is more suitable for the same object search. If we want to be able to solve similar search tasks better, CNN convolution network is more suitable for this kind of task, which can be verified from the word bag model and CNN model in the precision of classification, and the accuracy of CNN classification is much higher than that of word bag model. It is shown that CNN can better capture the distinguishing features of different kinds of samples, and maintain the common characteristics of similar samples greatly. In addition, because the background of these pictures is very pure, so using CNN to do similar search should be a good solution, because there is not enough samples on hand, I directly with the Imagenet training good model to do the test, with my earlier written cnn-for-image-retrieval to do the verification, The following is a query of two images, and query image 1 as shown in the following illustration:

The resulting query results are shown in the following illustration:

Can see that all the search is similar to the picture, and a very interesting phenomenon is that the imagenet training model in this relatively special image library can achieve such an ideal result, speculation is probably imagenet contains many kinds of images, the library also contains a similar sample bar, It may also be because the Imagenet training model is sufficient to generalize the unknown class of natural images. Later also use other images to do some testing, the effect is better. Of course, after all, using the Imagenet library, if you want to get better results on your own library, still need their own training to do, but to step back, at least from the method, this is completely feasible, and in the precision can be further optimized, the early stage of the framework to form a usable system is the most important.

In addition, the original thought is that because to meet two requirements, and there is no better way to meet such a requirement, so the idea is to set two buttons on the interface, the background to run two methods, a button responsible for the same object search, a button for similar objects search. Then after careful thought, it is found that this scheme can be further optimized, for example, to search for similar, and then in similar to the same inside, the previous search similar to the equivalent of a filter or classification process, so that the resulting pictures are similar, and the return of the results are the same in front. Mser

Mser get the Ellipse region, and then combined with sift, you can remove a lot of useless points, vlfeat in the MESR example see here. In addition, Mser can also be used in text area filtering, which can be seen in this robust text detection in Natural scenes and Web Images. Concept and function related words: Diffuse water filling method, significant. Based on SIFT feature point matching

SIFT on GPU (SIFTGPU), works for NVidia, ATI and Intel cards. Fisher Vector

Number of words 128 to 64 search Precision Map 256 is 42.7% 512 is 52.27% 1024 is 56.26% 2048 is 58.68% 4096 is 62.37% 8192 is 65.43% 10000 is 66.88% 20 000 is 69.61%.

Fisher Vector 512 words, 128 dimensionality to 64,oxford Building map for 52.27%;L2 normalization if Max is not used, map is 43.43%. Fisher Vector 1024 words, 128 dimensionality to 64,oxford Building map for 56.26%;L2 normalization if Max is not used, map is 47.06%. From:http://yongyuan.name/blog/bof-vlad-fv.html

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.